Disk reliability: FC and SCSI vs SATA

I have been pointing out for some time that disk manufacturing processes and hardware are pretty much identical across all types of disk. There is nothing of significance different between the hardware in a SCSI, FC, or SATA drive, outside the drive electronics package.

One of the side effects of this would be effectively indistinguishable failure rates between the hardware. Of course, all these vendors publish MTBF, and other numbers which are “measured”. They are estimated really, based upon large collections of drives and lots of statistical analysis.

If the statistical analysis is correct, the MTBF should correlate well with the observed failure rates. One is a theory effectively predicting the measurement of the other.

This means, that if the FC and SCSI drives really are so much better than the SATA, like almost all of the vendors selling FC and SCSI storage systems claim, the observed signal should be that the average failure rates should be proportional or nearly so, to the MTBF ratios. FC and SCSI should be showing 50-200% better (e.g. smaller) failure rates.

There is a little fly in this particular ointment.

Such a signal is not observed. On the contrary, as we have been indicating, and others have been noting, analysis of various failure rates of large numbers of disks see no significant difference between FC, SCSI, and SATA. See this paper. Unlike the other presentation that I failed to make a copy of, I plan to have a copy of this here.

If the disks were built with different processes, different materials, it would quite reasonable to expect different failure rates. Different drive mechanisms would likely have differing failure rates. Different head mechanisms would have different failure rates. It is interesting to note one of the authors conclusions:

In our data sets, the replacement rates of SATA disks are not worse than the replacement rates of SCSI or FC disks. This may indicate that disk independent factors, such as operating conditions,
usage and environmental factors, affect replacement rates more than component specific factors. However,
the only evidence we have of a bad batch of disks was found in a collection of SATA disks experiencing high media error rates. We have too little data on bad batches to estimate the relative frequency of bad batches by type of disk, although there is plenty of anecdotal evidence that bad batches are not unique to SATA disks.

Basically they are saying that there may be factors other than disk components that impact replacement more strongly. Even with these other factors, they are not seeing replacement rates higher for SATA than SCSI or FC.

This is interesting.

