Storage bandwidth wall writ large

By joe

February 15, 2011 - 3 minutes read - 631 words

Henry Newman, CEO/CTO of Instrumental, has a great article on Enterprise Storage Forum. Remember, what we call the storage bandwidth wall, e.g. the time in seconds to read/write your disk, is your capacity divided by your bandwidth to read/write that capacity. Its a height, measured in seconds, to take one pass through your data. If you can read/write at 1GB/s and have 1TB of data, your wall height is 1000GB/(1 GB/s) = 1000s. Which gives you a rough (best case) scenario for access. Henry does a really good job describing problems with large archives (multi-PB range) which must be bit accurate, and not change this. Ever. Some of the things he calls out are, maybe, less of a problem (apart from some poorly designed data stores). Anyone not using ECC ram in their units … yeah … well … some things can’t be helped. FWIW, we (and Google etc.) haven’t seen amplified corruption on “consumer level” drives. We have seen some enterprise drives do very … very … bad things. So much so that there are now brands we will not give serious consideration to again for years (to give them time to work the kinks out of their systems), that our competitors with … well … a bit less concern, happily put in their systems. There’s nothing magical about the other issues. But there is a big one which can’t really be addressed very well by many of the designs on the market.

This is the very issue that storage bandwidth walls call out. And it points to a fundamental issue with storage designs. If building a 4GB/s sustained unit is impractical (it isn’t with the right architecture and hardware) More to the point, Henry points out that this is a very real point of pain for a few groups, and likely to be a much larger point of pain going forward. We agree. Tiering storage won’t help this. Thats a band-aid for a different issue. The issue is, at a fundamental level, if your architecture can’t handle the data rates you require to adequately service your mission objectives, then why on earth are you deploying it? And there is another issue lurking in there, right underneath this. Computing and verifying checksums. Ignoring the computing portion for the moment, take a step back and ask if this mechanism will scale as your archives hit 1PB, 10PB, 100PB and beyond. With most of the architectures in use now … the answer is decidedly no. This is in part because they aren’t focused upon that bandwidth wall, and all of its implications. We are. This is why our tightly coupled computing and storage platforms are perfect for this type of scenario. That and we have some seriously awesome stuff in the development pipeline (not just hardware, but some nice IP) that should help ameliorate these problems. Maybe later we’ll get a chance to talk about this.

Yes. This is why JackRabbit and DeltaV are such awesome systems. We sustain 1+ GB/s doing writes while computing checksums, per unit. Less for Delta-V, but its not terrible at all. This is as measured by one of our burn-in tools (fio). Where we write a coupla-TB to each unit, with checksums, read it back, compare stored to calculated. And this is why siCluster is such a good system to be used for archives. Everything is balanced. Computing power grows with storage capacity. Network bandwidth grows with storage capacity. The days of the filer heads backed by large FC or SAS links are numbered. This model doesn’t scale, and only gets worse over time with more capacity. If you start out with a bad design, and try to scale from there, you are going to run head first, without a helmet, into the storage bandwidth wall.