#SC11 benchmarketing gone horribly awry

OMFG … we were (and are) continuously inundated with benchmarketing numbers. These numbers purport to represent the system in question.

They don’t. We can derive their numbers by multiplying the number of drives by the theoretical best case performance, assuming everything else is perfect. Never mind that it never is perfect. Its that the benchmarketing numbers haven’t been measured, in a real context.

We do the measurements in a real context and report the results to end users. SATA SSDs which report 50k IOPs? Reality is 2k to 6k in most cases. And you know, we show, and at #SC11, we showed, exactly this, with the same tests that we ran on Flash (PCIe flash that is).

Our results were compelling. 10x SSDs in a RAID5, optimized for erasure block size, and other things, was 1/10th the performance on random IO (4k random reads and writes, 70% of them reads) as the PCIe flash.

This is tremendously compelling as I’ve pointed out before. PCIe flash blows the doors off of SSD flash for IOPs in real tests.

The more than 10x performance factor comes at a 2x in price, which makes the PCIe flash, actually, a bargin, in comparison with SSD flash. That is, you need 10x as many SSDs to gain performance equity with real tests (if you are writing zeros to flash, you are NOT doing real tests … massive hint to all the IOMeter and other windows based test programs we see out there) which will come at a 5x price premium to PCIe flash.

Yeah, we can build a massive SSD unit, and we do, and we sell such things … they are the siFlash-SSD. They do have their use cases. But for absolute raw performance on heavy random IOP reads and writes, you will need PCIe flash (and/or huge amounts of RAM).

So, imagine my surprise when someone comes over to our booth and tells us about their multi MIOP platform based upon SSDs. I won’t say who, as, despite this massive epic failure on their part on the actual performance measurement side, they do have some really good technology that we are likely going to deploy for some of our use cases (seriously good stuff, and no fake benchmarketing associated with it).

Went to their booth. Saw their system. I pointed out to them that their SSDs were actually 2k-6k IOPs, and not 50k. Even promised them real input decks and test cases to measure it with. A rebuttal of “but IOmeter reports …” is not considered valid. When you read and write zeros to something that discards writes and compresses reads, for which its optimized to the particular benchmark? Yeah … not so valid.

Anyone else remember the good old days of compilers special casing SPEC code and replacing the codes with hand optimized assembler? Wasn’t so long ago … 10-15 years or so.

Anyone … anywhere, telling you that their SSDs get more than 2k-6k IOPs on real workloads, with the current crop of SSDs out there, is blowing air up your skirt.

On a related note, I did see a short article on the register about a startup aggregating many of these SSDs in something similar to a siFlash unit. They noted that their 12+ SSDs only did 50k IOPs. They talked about how they hid extra performance and used it for themselves.

Yeah. Ok. Calling BS on them. And disappointed in the Register for getting hoodwinked. 50k IOPs = 10x 5k IOP SSDs, with 1 for RAID5 and 1 for hot spare. Exactly and precisely explained performance with my simple testable theory, as compared to their untestable marketing speak.

We may take them to task publicly if needed, if we see them in our market. We need to start going more aggressive on the benchmarketing side, and calling out all the massive steaming piles of BS we see.

At #SC11, I saw huge steaming mounds. We measure our numbers, our customers know what they are getting. Our competitors in many cases, guess, and use spec sheets to benchmarket. I can’t tell you how many customers came up to us with these ultra dense 60+ drive arrays, and told us the real measured performance of these arrays. Lets just say that our DeltaV would whup them, something fierce. And thats not even our fastest box.

Sheesh.

Benchmarketing was out in force at #SC11. Don’t be fooled by numbers you see. Ask for real tests, or real remote access to a system for testing. If they can’t replicate the theoretical performance in a lab setting, chances are, they can’t in the field.

Some of our competitors, again, took the bandwidth of each disk, multiplied it by the number of disks, and reported that as their bandwidth. Sad. Measure it, or go home. Enough of the BS.

Viewed 47827 times by 6386 viewers

7 thoughts on “#SC11 benchmarketing gone horribly awry

  1. One thing that would help for benchmarking is to have a bench to mark things on. Although I’ve recently started using fio to benchmark at your suggestion, I haven’t found a decent guide or baseline configuration to use, and I’ve gotten some weird results and errors that I don’t understand (such as fio exiting after 50ms even though it’s supposed to run for 100G or 1800 seconds). There doesn’t seem to be a community around this tool. Yes, everyone’s workload is different, but we can’t compare our benchmarks to yours unless we both post our fio configuration…

  2. @Karl

    Been thinking about this for a while. A centralized resource for real benchmarks, input decks, instructions, etc.

    Stay tuned. I will do something about this.

  3. Often times the numbers listed or displayed are pure sequential large IO numbers. The “actual” numbers change based upon the file system and application – how, then, can the vendors list all combinations of “real tests”? When someone actually buys the storage they usually mention the test to be run in the RFP, on acceptance of the RFP, they try to reproduce this number.

  4. @Dave

    Sequential IOPS are effectively meaningless. My point about actual test cases is that these are the things that give us the real data. Do I want vendors to list all combinations of tests? No. I do want them to explain their test cases. 50k IOPS on a sequential set of tests is not useful information. Knowing that this is what they tested is very useful in figuring out whether or not you can get meaningful numbers from them.

    As I’ve said here and many other locations, the best test case is your code, doing what it is supposed to be doing. Faux tests get gamed. SPEC was gamed many times in the early/mid 90s, Others are gamed with great regularity. And whats terrible is that people make purchase decisions based upon these faux numbers.

  5. @Erkki

    I’ve seen their suite, and like other suites it has plusses and minuses. Big issue is statistical relevance. I’ve seen far too many poor data plots coming from the use of these tools, where measurements are within error bars* yet still are stated, without reservation, to indicate specific patterns.

    * the error bar thing is a big issue. I’ve seen wrong analyses from people whom should know better, in everything from marketing literature through journal and conference submissions. It only gets sad when they defend their incorrect analysis, and cannot correctly explain why their data, which bounces around within their error window, isn’t as predictive as they might wish.

  6. Error bars require many runs, ideally scattered over a bit of time. Convincing people not to sell the box before finishing all those runs… I know folks in one large chip company who can’t keep hold of sufficiently large systems long enough before they’re sold.

    And then there’s the power issue for large sites. Piotr Luszczek gave a nice presentation at PPAM11 extrapolating the power cost per LINPACK run. Tuning will become infeasibly expensive. I imagine something similar for CERN-scale storage.

Comments are closed.