… well, I am trying to figure out what I am doing wrong in io-bm. I need a new method to defeat some of the smarter caching bits, my MPI_Send/MPI_Recv pairs are blocking pairs, and this impacted performance. Not only that, the additional traffic over the Infiniband was definitely a cause of contention on the wire.
Doing some TB sized writes at a good rate. The “naive” bandwidths (the way IOzone calculates them) are about where we predicted given the measured IB performance.
Will report more soon, expect a white paper and other docs soon.
As a teaser, here is a 128 GB read. The longest of the 128 threads took 12.11 seconds.
Naive linear bandwidth summation = 15443.780 MB/s
More precise calculation of Bandwidth = 10818.051 MB/s
And a 128 GB write.
Naive linear bandwidth summation = 11794.979 MB/s
More precise calculation of Bandwidth = 6488.514 MB/s
I’ll post a discussion of the various methods of accounting for measuring the time some time later on.