Disk reliability: FC and SCSI vs SATA

I have been pointing out for some time that disk manufacturing processes and hardware are pretty much identical across all types of disk. There is nothing of significance different between the hardware in a SCSI, FC, or SATA drive, outside the drive electronics package.

One of the side effects of this would be effectively indistinguishable failure rates between the hardware. Of course, all these vendors publish MTBF, and other numbers which are “measured”. They are estimated really, based upon large collections of drives and lots of statistical analysis.
If the statistical analysis is correct, the MTBF should correlate well with the observed failure rates. One is a theory effectively predicting the measurement of the other.
This means, that if the FC and SCSI drives really are so much better than the SATA, like almost all of the vendors selling FC and SCSI storage systems claim, the observed signal should be that the average failure rates should be proportional or nearly so, to the MTBF ratios. FC and SCSI should be showing 50-200% better (e.g. smaller) failure rates.
There is a little fly in this particular ointment.
Such a signal is not observed. On the contrary, as we have been indicating, and others have been noting, analysis of various failure rates of large numbers of disks see no significant difference between FC, SCSI, and SATA. See this paper. Unlike the other presentation that I failed to make a copy of, I plan to have a copy of this here.
If the disks were built with different processes, different materials, it would quite reasonable to expect different failure rates. Different drive mechanisms would likely have differing failure rates. Different head mechanisms would have different failure rates. It is interesting to note one of the authors conclusions:

In our data sets, the replacement rates of SATA disks are not worse than the replacement rates of SCSI or FC disks. This may indicate that disk independent factors, such as operating conditions,
usage and environmental factors, affect replacement rates more than component specific factors. However,
the only evidence we have of a bad batch of disks was found in a collection of SATA disks experiencing high media error rates. We have too little data on bad batches to estimate the relative frequency of bad batches by type of disk, although there is plenty of anecdotal evidence that bad batches are not unique to SATA disks.

Basically they are saying that there may be factors other than disk components that impact replacement more strongly. Even with these other factors, they are not seeing replacement rates higher for SATA than SCSI or FC.
This is interesting.

6 thoughts on “Disk reliability: FC and SCSI vs SATA”

  1. The only difference I have found is that SCSI drives tend to be heavier(perhaps 25%, sometimes more) than SATA drives.
    But the weight difference maybe points to different materials?
    May be worth checking out.
    I might add, I have SCSI disks from 1999 still chewing away 24/7, which is almost 10 years of heavy duty.
    MTBF for those are 5 years, so I wouldnt count on MTBF for anything in a real environment at all.

  2. In a different blog, the statement is made:
    ???The drive vendors have arbitrarily tied cheap components, slow motors, a poor media to the SATA interface, while the expensive, good parts are all saved for SAS and SCSI.???
    I have asked that poster for verification. Does anyone here know of the validity of that statement?

  3. The economics of this sort of thing don’t make any sense. You minimize your bill of materials variations, which allow you to buy greater quantities of goods, and negotiate more favorable terms. This also reduces your support issues (fewer differences), makes builds easier/lower cost in the context of this hardware (same hardware with different electronics pagkages).
    Weight differences I have seen are attributable to electronics package differences in most cases. I see 543g for one SATA drive and 530g for the SAS version of the same unit.
    The fact that simply altering an electronics package and a label allowing you to charge more for the drive, enables you to segment your market, and get different margins from different segments. More power to these guys, they have a large number of people convinced that the differences are anything more than this. If they were, would the largest consumer of storage in the world (google) trust anything but the “highest” reliability units? Or is reliability not that different between the drives? Google found it really wasn’t.

  4. I have seen that. I think many people who have been putting SATA down for a long time have needed a bridge to get them over to the technology, that lets them save face.
    Nearline lets them do this. Note that there is no difference between nearline/desktop drives save the electronics package and more usually, just the firmware.
    For example, lets compare the masses of the ST31000340NS nearline 1 TB Seagate drive, rated at 24×7 operation and the ST31000340AS unit, rated for 8×5 operation. If the nearlines were built out of different parts, you would expect that the masses would be different.
    Both are 677 grams.
    Ok, what about the average latency. Higher quality components might have different masses on the heads. Therefore we should expect to see different average latencies, as a different amount of energy and settling time would be required for differing materials.
    Both are 4.16 ms.
    Same number of guaranteed sectors. Same interface.
    AFRs are different. Though, this could (and likely is) a marketing thing. Segment your market. Do it on nearline vs desktop. Desktop has higher AFR, spake the marketing department, thus our margins on the nearline can be higher (e.g. charge you more).
    If it makes some people feel better to say this, thus allowing them to take advantage of the newer better technologies, by all means, lets get them past this hurdle. Notice that in the same comment where you pulled the quote, the person points out the marketing dilemma of the storage vendors. I think he gets it mostly right. Not completely though. By reducing the production variations, the vendors could squeeze more cost out of their systems. Differentiation by electronics package. Makes differentiable changes cost less to build/test (same platform). Allows you to turn around changes and bug fixes faster.

  5. Yes, I suppose you can define a failure rate however you like for your own product, if it simply means to what extent you are standing behind the robustness of various models, and not the actual robustness.

Comments are closed.