Thoughts on SSDs, spinning rust, …

So SSDs are upon us with a vengeance. No one is actively predicting the death of spinning rust … yet. But its in the back of many folks minds, even if they aren’t saying it now. Similar to the death of tape. Yeah, I know, its still around.

Call that the long tail. Sequential storage mechanisms are going the way of the dodo bird. The issues everyone worries about are cost per data volume, and speed of access/recovery, not to mention longevity. Sure, tape could cost less than spinning rust, but it is serial, and while tapes can “last forever”, the drives certainly can’t. Looking at inexpensive large volume SATA drives as an integrated drive/media for backup is rapidly supplanting most of the non-diehard tape sites I am aware of.

Basically, tape is dieing out, and being replaced by disks (yeah, there are “counter” examples of this, but they are growing fewer and further between, and actually lending strong support to the thesis that tape is in its long decline). There is an interesting concept coming in from the tape folks that is showing up in SSDs. I am not sure I like it, as it lends itself to incorrect expectations, very easily.

But spinning rust itself is “under attack”. SSDs have great hype, and great hope.

SSDs provide “performance” (purposefully in scare quotes) for end users. If you read the hype, it looks like they provide tremendous performance deltas.

The Sandforce SF-1200 controllers are a case in point. Currently they are reporting 285 MB/s read, and 275 MB/s write. They are the brand new controllers for MLC based units, and most of the press is fairly breathless about this performance.

We use SSDs, and I need to understand how close the marketing numbers are to the actual numbers. We need to establish a ratio for this. Call this the Benchmark Significance Ratio, or BS Ratio for short. Define BS Ratio as

BS Ratio = (what they claim) / (what you measure)

A BS Ratio close to 1 is good. A BS Ratio much greater than 1 is bad. Of course, a BS Ratio much less than 1 is either an indicator of a failed test, or an accidentally released product.

So here I am with my nice shiny new SF-1200 based SSD. Actually 2 of them. We are looking at them for a product and an application.

This is not a bash on Sandforce BTW. Don’t read it as that, and it is not intended as that. The BS Ratio bit is more a bash at marketing numbers.

So I attach them to our JackRabbit system, create partitions, setup an xfs file system (also tried a number of others such as ext4, nilfs2).

Then I use a simple standard streaming write fio input file. And I get 65 MB/s for streaming writes (uncached).

Ok. Try streaming reads, also uncached. 200 MB/s.

I don’t mind the latter number, but I am worried about that former number.

So I tried a simple dd, which uses zeros. And I got the marketing rated speed.

Hmmm…. something doesn’t sound right.

So I tried bonnie++ (which I am not as fond of for real testing), and got the benchmark speed as reported by the media.

A quick strace (Strace Is Your Friend) on the dd confirmed it was writing zeros.

I went back to the fio documention, and found a switch to set to fill the buffer with zeros.

And I got the rated speed.

Uh oh.

So I just added a -Z switch to io-bm (use zeros rather than random data), built a RAID0 out of my 2 units, and ran some tests. Same write, single thread, same file, same file name, same mount, file system, yadda yadda yadda.

Writing zeros:

[root@localhost ~]# mpirun -np 1 ./io-bm.exe -n 10 -f /data/d1/big.file -b 1 -w -d -Z
Thread=00000: host=localhost.localdomain time = 24.305 s IO bandwidth = 421.317 MB/s
Naive linear bandwidth summation = 421.317 MB/s
More precise calculation of Bandwidth = 421.317 MB/s

Writing random bits:

[root@localhost ~]# mpirun -np 1 ./io-bm.exe -n 10 -f /data/d1/big.file -b 1 -w -d 
Thread=00000: host=localhost.localdomain time = 88.818 s IO bandwidth = 115.292 MB/s
Naive linear bandwidth summation = 115.292 MB/s
More precise calculation of Bandwidth = 115.292 MB/s

This is a BS Ratio of about 3.7. Ugh.

With my naive understanding of the situation growing gradually more sophisticated, this is something of a redux of what we see in the tape world. They happily talk about compressed bandwidth of 2x native bandwidth and advertise this. But thats only true for compressible data … not all data is compressible.

It appears to be the same case with some of the SSDs. There are valid reasons for the compression. But the performance difference is huge. Almost 4x.

We’ve got more testing to do on these SSDs. Suffice it to say that most of our customers aren’t storing zero bytes everywhere.

Viewed 5915 times by 1557 viewers