Maximizing the minimum performance

Our little L. Flavigularis is shaping up nicely. IOzone tests are, well, quite respectable (bug me at SC06 about this if you are interested). I expect to see some serious FUD from competitors, especially if they get a look at the numbers. And that concerns me, as I am not at all convinced that IOzone and its ilk represent real measurements of meaningful items. I have a strong sense of a “herd” mentality/effect. That is, this is what everyone has been using, so we need to keep using it. No matter how good (or bad) it may be at characterizing our workload. Just like HPL/HPCC. They mean exactly what to HMMer users and BLAST users? IOzone isn’t bad, I just don’t grasp its relationship to real workloads. HPL/HPCC is completely irrelevant to informatics workloads. I wouldn’t build/architect an HPC solution for informatics based upon HPL scores, and nor would most of the other systems designers I know and have worked with. HPL is great at predicting HPL performance. How it does on workload performance is a whole other question.

Back to IOzone and throughput/workload. What worries me more than anything else are the real corner cases, the pathological cases that end users sometimes hit. How do you minimize their maximum pain, or put another way, how do you maximize their minimum performance?
This all goes back to what the real workloads are on such a system. You can’t architect for every possibility. This is systems engineering at its most base level; you have to design compromises in to the unit. Anyone arguing that compromise is not needed in design, is arguably not operating in the market, or dealing with market and customer constraints.
So when we architected this, we assumed a particular workload. This doesn’t mean this is the workload that we will normally use or see, just that this is what we are anticipating the user will prefer to address.
With that in mind, we ran IOzone. A prospective customer asked us to run another IOzone. And we ran a few more. Our hope is that our runs are close to what they need to see. But when we ask, we get the email equivalent of a shrug.
What I noticed was a particular corner case giving what I would not consider good performance. Maybe I am being too critical. I dunno.
Investigating, and it turns out that this is one of the limitations of a RAID6 system. RAID6 gives you a maximum of two failed drives before you have issues. RAID5 gives you one. We can try to tune for it a little better, to maximize the minimum performance in this corner case. In RAID6, you need to do 2 parity reads for each “read”, and at least 3 reads and 3 writes for each “write”. So if you do lots of small reads and writes, your “real” reads and “writes” are getting lost in a storm of RAID activity. If you are doing lots of long block “reads” and “writes”, you are amortizing these over much larger blocks. With clever scheduling on the processing chips, you can “hide” the parity computation and test behind additional IO. Which allows you to “stream” reads at high bandwidth. With small block “reads” and “writes” you have less of that as an option, you still need to read the parity block, and the next “read/write” may require a seek, which prevents you amortizing these “reads” and “writes”.
The way to fix this is to use much smaller blocks.
Of course that drastically negatively impacts your large block sequential access, as you now have a flurry of reads/scheduling/computation that you may not have had before. Large block sequential throughput drops drastically.
Just like the laws of thermodynamics: 1) you can’t win, 2) you can’t break even, and 3) you cannot stop playing the game. You have to compromise your design parameters intelligently, or put another way, you have to engineer a solution.
As always, the most important workloads and tests are real ones. Tuning to IOzone is like teaching to a test. Whether or not something real is created is secondary to whether or not your scores are good. Now if IOzone had a subtest that represented a real workload, this is a different issue. I am not trying to be critical of IOzone here. I am just noting that if all you have in your arsenal are hammers, then every problem starts to look like a nail. And this detracts from the real problems requiring different tools.
My hope is that our numbers are useful, to our customers. If there are specific tests you find useful, please let me know.