As you might know, we have been trying out a 10 GbE iSCSI connection to our JackRabbit server. We will be writing up a white paper about this later on.
The issue I keep running into was not having a real benchmark test. Most of the benchmark tests we have seen have been, well, completely artificial, in that end user work loads aren’t anything like that. We want to try to test end user work loads whenever possible. IOzone is a code that basically spends most of its time in cache memory, unless you go out and change the code (which I did). The issue is that their code has a bunch of 2GB limits courtesy of int data types. This makes it hard to test modern IO systems such as JackRabbit, which hit the high end performance/size limits of IOzone very quickly.
The user in this case, told us their workload. So we crafted a benchmark to see what happens. This is an MPI code, and is fairly easy to make work.
Here is what happened when we built and ran the code:
[root@pegasus-a io-bm]# make -f Makefile.io-bm
mpicc -c -g io-bm.c
mpicc -g -g -o io-bm.exe io-bm.o -lm
[root@pegasus-a io-bm]# mpirun -np 4 ./io-bm.exe -n 10 -f /big/test -w
[tid=0] each thread will output 2.500 gigabytes
[tid=0] using buffered IO
[tid=0] page size ... 4096 bytes
[tid=0] number of elements per buffer ... 2097152
[tid=0] number of buffers per file ... 160
[tid=0] Allocating memory ... 16777216 bytes
[tid=0] Done allocating memory
[tid=0] storing random numbers ...
[tid=2] each thread will output 2.500 gigabytes
[tid=2] using buffered IO
[tid=3] file open for file=/big/test.3 is complete
Thread=1: time = 19.746s IO bandwidth = 129.647 MB/s
Thread=0: time = 19.950s IO bandwidth = 128.323 MB/s
Thread=2: time = 19.746s IO bandwidth = 129.646 MB/s
Thread=3: time = 20.181s IO bandwidth = 126.853 MB/s
Naive linear bandwidth summation = 514.468 MB/s
More precise calculation of Bandwidth = 507.410 MB/s
This is interesting. All the reading I have done online suggests that most people are having trouble breaking 30 MB/s for their iSCSI arrays. I have seen iSER and SRP numbers that (to nullio, e.g. not a real backing store file system), they get 900 MB/s for Infiniband. So I am not displeased by these numbers, as they are going to real disk.
-rw------- 1 root root 5368709120 Feb 3 00:38 /big/test.0
-rw------- 1 root root 5368709120 Feb 3 00:38 /big/test.1
-rwx------ 1 root root 5368709120 Feb 3 00:38 /big/test.2*
-rwx------ 1 root root 5368709120 Feb 3 00:38 /big/test.3*