The looming (storage) bandwidth wall

This has been bugging me for a while. Here is a simple measure of the height of the bandwidth wall. Take the size of your storage, and divide it by the maximum speed of your access to the data. This is the height of your wall, as measured in seconds. The time to read your data. The higher the wall, the more time you need to read your data.

Ok, lets apply this in practice. A 160 GB drive, that can read/write at 100MB/s. Your wall height is 1600s (= 160GB / 0.1GB/s).

Take a large unit, like our 96TB high performance storage and processing unit. You get ~70TB available at 2GB/s. Your bandwidth wall height is then 35000s (= 70TB / 2E-3 TB/s).

I also wonder if it makes more sense to view this logarithmically … measure the wall height as a log base 10 of this ratio, lopping off the units (what is a log(second) ?). So 1600s wall height would be 3.2. A 35000s wall height would be 4.5. Sort of like the hurricane strength measures. A wall height of 1 second (say fast memory disk) would be a 0 on this log scale.

Using this, you could get a sense of where design points are for nearline, offline/archival storage are.

This is part of a longer set of thought processes on why current large array designs, or backblaze like designs are problematic at best for large storage systems.


If you cannot access your data in a reasonable period of time, why are you storing it? If your pipes are insufficient to allow you to perform replication or backup, why are you even trying?

Dedup is all the rage in IT circles these days. Dedup is basically a run-length compression, keeping a large hash based dictionary of what has been stored to date. Place a digest hash key representing the block in the dictionary. As you read your data, if you encounter an identical digest hash, all you have to do is store a pointer to the hashed block.

This is great. It is also a bandaid, as it makes some specific assumptions which might not be correct. Such as your data being basically bunches of replicated blocks. Sure, if you are backing up hundreds or thousands of laptops, its entirely possible that this will work out well. It won’t work well in HPC though, due to the diversity of data, not to mention the performance requirements for most high performance storage systems.

The higher your bandwidth wall is, the less accessible your data becomes, and you lose the ability to move your data as much.

This impacts RAID rebuilds as much as it does data motion.

Viewed 15148 times by 3599 viewers

4 thoughts on “The looming (storage) bandwidth wall

  1. It’s good to see a discussion like this. I’ve been wondering about this recently from a real-time perspective. Not a bandwidth wall for the whole array, but a bandwidth wall for what needs to be stored in regular bursts. Let’s say I have a dynamic simulation running with 10 million variables solved at each time step. Say that the client wants every value from every variable stored every timestep (it happens). Say it’s 8 bytes per value, and another 8 bytes of ancillary information (tag, i.d., whatever). That’s about 160 megs per timestep. That’s 1.6 seconds for that 100MB/s drive. If you want real-time+ speed, and many clients do, you need that 160MB out and stored in much less than 1.6 seconds. I’ve seen much smaller simulations choke a good workstation and as simulations get bigger it’s only going to get worse.

  2. @Damien

    I’ve been talking about this problem for a while, but in other ways. It is getting worse. Especially as more work gets moved to more muscular desktops than your standard workstation. We have put JackRabbit like IO in our units, and it has helped people get good performance on their desktop. 600MB/s to/from the disks can make everything seem peppy. You can of course now get multiple SSD units in desktops (and we are offering the same). This can help as well, though again, most desktop systems weren’t well designed for large fast IO, so their IO channel … where you plug the drives, is quite weak, and can’t really sustain the IO rates. Most motherboard SATA connections are on a shared PCIe-x1 or even x2 connection. That provides a nice rate limiter there for you. This is also part of the reason why motherboard RAID (usually fakeraid) does such a poor job at performance. Its not designed for it.

    At the end of the day, you need good design on the system, good I/O channel design, good storage design to have a fighting chance of overcoming the storage bandwidth wall. It doesn’t matter where you do your computing, the wall can be found when ever you have data to move.

  3. I just moved to a Mac Pro from my previous self-built box. The old box had 320MB SCSI for data, which was pretty quick. I didn’t spec that in the new box, which is a decision I might regret. Those are Intel 5400 boards so hopefully they’re better than your average gamer board. Then again, with any luck the next box might be one of yours if business goes well enough.

    The software plays a big part of the IO as well. It has to be designed to write lots of data quickly without chewing up processor time. That’s as important as getting the maths right.

Comments are closed.