IT storage … why its not HPC storage, and shouldn’t be used where you need HPC storage

In a previous article, I railed on the concept of IT designing clusters. I pointed out many flaws we have seen when this happens.

I’d like to do the same thing with storage.

This will be brief.

Recently had a customer for our consulting ask us with deep incredulity, how one of our older 24 drive 7200 RPM SATA drive units could so thoroughly demolish (on benchmark testing) a brand new 24 drive 15kRPM SAS drive unit.

It all comes down to design. A 15kRPM drive doesn’t guarantee fast IO. Nor does SAS. I could keep going, but the salient point is that this always comes down to IO design. A great IO design will almost always demolish a crappy IO design.

This is what happened here.

I have a longer post about this, and the risk of making platform decisions before you make business decisions about the most appropriate platform to meet a business objective. Its frankly wrong to purposefully ignore better/cheaper/faster solutions because they don’t carry a particular brand name on the outside. Wrong for the business, as you eliminate a-priori any potential benefit of the platform you ignore, including cost savings, performance/productivity enhancement, etc.

There are many other reasons as well, but will stick with these for the moment.

IT designed storage all looks about the same. Its not particularly high performance. Actually the way it is usually designed, it is in all honesty quite low in performance.

And when you have hard performance targets to hit, this is where IT designed storage misses the mark.

IT != HPC, despite protestations of some OS vendors. Conflating the two means more customers constrained to bad designs for high performance IO. And as I observed recently, customers test alternative OSes, and discover, rapidly, that they have performance advantages to switching.

Add in these hard targets, and high hardware costs, and you have a recipe for several of the conversations I’ve had over the last few weeks, with about 5 customers in one market vertical.

If your vendor can’t meet your performance targets, you need to expand which vendors you are talking to.

I’ll finish up the longer post later on, but the gist is that you agree to accept technological and business risk when you limit your vendors to a select few. They may not be able to achieve their objectives. This is a bad thing. Is the cost of this risk worth the benefit? And what, precisely, is the benefit to using gear that can’t meet business performance objectives? How does this advance a mission?

Viewed 12625 times by 2942 viewers


3 thoughts on “IT storage … why its not HPC storage, and shouldn’t be used where you need HPC storage

  1. Dead-in – fire for effect. 🙂

    This is another topic we always agree on (why can’t the rest of the world realize we should be in charge because we know what’s going on and we are always right 🙂

    One of the key parts of your discussion that is missing is that customers are stating storage performance and almost without exception they have no data behind these measures. They are totally “gut-feel”. I think we as an industry need to spend much more time on understanding the IO patterns and needs of applications and systems. Then and only then can we make intelligent and rational performance requirements. Perhaps even more importantly, we can use this information to make trade-offs in storage design.

    Let me give an example. I don’t want to toot my horn but I’ve been working on some tools to help examine IO patterns. I’ve applied it to a few applications. One of these is WRF. Normally people consider WRF to be a very IO intensive application needing lots of IO performance (i.e. lots of GB/s). However, in a couple of cases that I have examined, the time spent in IO is less than 5% of the total wall clock time. This means that if I could double the IO performance I will only decrease the wall clock time by 2.5%. Is that worth it? If the doubling of IO performance costs less than 2.5% of the cluster, then the answer is perhaps yes. Otherwise, I would take the money and buy more nodes.

    Here’s another interesting aspect – for the examples I examined, WRF was more IOPS driven than throughput driven. So all of these people who said that WRF is throughput driven were mistaken, at least for the cases I have examined.

    With this kind of application level knowledge we (the community and vendors) can make much more intelligent decisions about the design of storage systems.

    Here’s an even cooler thing we can do. If we understand the IO patterns perhaps we can rewrite the application to “behave” better. For example, if an application does a huge number of lseeks killing throughput, perhaps we can rethink how IO is done in the application to reduce the number of lseeks. Or, as another example, perhaps the application does reads and writes with very small record sizes (1KB-8KB range). Perhaps we can rewrite the IO portion of the application to get this record size increased so that we can more efficiently read or write data.

    Perhaps an even cooler aspect we consider is new storage technology (aka’ SSDs). While I’m not a big fan of them today they have great potential but you have to understand their limitations (alot like GPUs 🙂 ). If we redesign the IO scheduler in the kernel so that it understand SSDs, if we design file systems so they understand SSDs better, and if we can design storage devices to better use SSD’s as part of the integrated whole, perhaps we can greatly improve storage performance and cost!!! (BTW – just slapping in SSD’s as a cache, while it sounds cool, is perhaps not the best way to use them – there are better approaches that allow SSD’s to be better used as part of the greater whole). For example, perhaps we can separate read-only storage from read-write storage. A user could tell a file system that some set of data is now read-only (as I write this there is a question on the ext3 where after a user does a huge number of writes all they do are reads after that point). Then the file system would know to migrate the data to SSDs (they have unbelievable read performance). Then when someone wants to read the data it will be served from the SSDs directly.

    I’m going to stop because I have way too any ideas to put here. 🙂 Thanks for letting me comment here Joe.


  2. I also fully agree with you, Joe, as you well know. The problem rests with convincing others of this.

    As for WRF, I’d be curious to know what types of runs you’re doing, Jeff. I have a run here which, when run on 256 cores, spent approximately 85% of its time in I/O. Yet on another system, also on 256 cores, it spent a ‘measly’ 21%. Clearly, the I/O design of the first system was atrocious, but that 21% on the second system still leaves a fair amount of room for improvement.

    When broken down in terms of dollars, IF that was indicative of all codes (eg, 20% of time in I/O), and I could halve the time for I/O by spending $X, and X is less than 10% of the total system cost, it seems to me to be a net win in terms of productivity. It kills me to see money thrown at storage at both ends of the spectrum with no thought to the needs of the applications. If your budget for a system is 5% for I/O when 10% would get you a net gain of 30% in performance, do it. Similarly, spending 80% of your budget on I/O when your needs are small is equally foolish. Ultimately, these sorts of calculations are FAR easier than the ones these systems are used for, yet very few people do the due diligence to find decent solutions.

    /rant 🙂

  3. Brian – I expect to see lots of difference in IO performance based on the data file. I’m seeing this in other applications.

    I absolutely agree with your rant. I see it every day with customers (I’m sure Joe does as well). When customers start slinging around performance numbers and I ask where they came from, up to now, 100% of the time they can’t tell me. When I tell them that I can help determine better requirements they get mad and insist that their first set of requirements is correct. So I shut up and move on 🙂

    IO is 90% of the problems on systems and about 10-15% of the budget. Sigh…


Comments are closed.