About 6 years ago, I wrote this, about a benchmark test that did a 2TB write in 73s or so, on pure spinning disk. That result was just so far out there, compared to pretty much anything else available, in terms of performance density (single rack of storage units). The hardware was Scalable Informatics Unison storage, designed to be an IO monster in all respects. It was. Way ahead of its time.
The test was using my io-bm code. The command line to run the 2TB test was something like this:
mpirun [options] /path/to/io-bm.exe -n 2048 -b 16 -w -d -f /path/to/fast/storage
Here the size in gigabytes (-n 2048) meant 2TB. -b 16 means use a block size for IO if 16 MB. -w means write (so -r of course means read). -d means use direct-IO. That is, don’t cache it. See the performance of the underlying system. The -f /path/to/fast/storage tells the code where to read or write from.
I had written io-bm, as at the time, there weren’t great options for parallel IO benchmarks. This particular code is really simple. And it correlates well with more modern measurements. And unlike many of the “io benchmarking” codes of the day, this one generates real actual IO load.
So I’m working on yet another supercomputer right now. Big HPC system, and it has fast IO. I need to make sure it works, and I do some burn-in using various integrated and imported tooling.
I can’t report the numbers. I wish I could.
Because this one system broke some very interesting wall clock times on this 2TB write and subsequent read. Using direct IO, so no cache. If I left cache off … heh …
Years ago, at a customer site, they asked us for the best possible IO benchmark. I used io-bm again, and ran it across 8192 cores, and left caching on (they wanted big numbers). Hit more than 1TB/s to cache. Was amazing.
Useless, but amazing.