The looming (storage) bandwidth wall

By joe

September 21, 2009 - 3 minutes read - 450 words

This has been bugging me for a while. Here is a simple measure of the height of the bandwidth wall. Take the size of your storage, and divide it by the maximum speed of your access to the data. This is the height of your wall, as measured in seconds. The time to read your data. The higher the wall, the more time you need to read your data. Ok, lets apply this in practice. A 160 GB drive, that can read/write at 100MB/s. Your wall height is 1600s (= 160GB / 0.1GB/s). Take a large unit, like our 96TB high performance storage and processing unit. You get ~70TB available at 2GB/s. Your bandwidth wall height is then 35000s (= 70TB / 2E-3 TB/s). I also wonder if it makes more sense to view this logarithmically … measure the wall height as a log base 10 of this ratio, lopping off the units (what is a log(second) ?). So 1600s wall height would be 3.2. A 35000s wall height would be 4.5. Sort of like the hurricane strength measures. A wall height of 1 second (say fast memory disk) would be a 0 on this log scale. Using this, you could get a sense of where design points are for nearline, offline/archival storage are. This is part of a longer set of thought processes on why current large array designs, or backblaze like designs are problematic at best for large storage systems.

If you cannot access your data in a reasonable period of time, why are you storing it? If your pipes are insufficient to allow you to perform replication or backup, why are you even trying? Dedup is all the rage in IT circles these days. Dedup is basically a run-length compression, keeping a large hash based dictionary of what has been stored to date. Place a digest hash key representing the block in the dictionary. As you read your data, if you encounter an identical digest hash, all you have to do is store a pointer to the hashed block. This is great. It is also a bandaid, as it makes some specific assumptions which might not be correct. Such as your data being basically bunches of replicated blocks. Sure, if you are backing up hundreds or thousands of laptops, its entirely possible that this will work out well. It won’t work well in HPC though, due to the diversity of data, not to mention the performance requirements for most high performance storage systems. The higher your bandwidth wall is, the less accessible your data becomes, and you lose the ability to move your data as much. This impacts RAID rebuilds as much as it does data motion.