Scalable Informatics JackRabbit JR3 16TB storage system, 12.3TB usable.
[root@jr3 ~]# df -m /data Filesystem 1M-blocks Used Available Use% Mounted on /dev/sdc2 <strong>12382376</strong> 425990 11956387 4% /data [root@jr3 ~]# df -h /data Filesystem Size Used Avail Use% Mounted on /dev/sdc2 <strong>12T</strong> 417G 12T 4% /data
These tests are more to show the quite remarkable utility of the fio tool than anything else. You can probe real issues in your system (as compared to a broad swath of ‘benchmark’ tools that don’t really provide a useful or meaningful measure of anything)
This is on a RAID6, so its not really optimal for for seeks. The benchmark is 8k random reads, with 16 threads, each reading 4GB of its own file (64GB in aggregate, well beyond cache, but we are using direct IO anyway). 16 drive RAID6, 1 hot spare, 2 parity, giving 13 physical drives. Using a queue depth of 31 per drive, these 13 data drives have an aggregate queue depth of 403 (13 x 31). Of course, in RAID6, its really less than that, as you are doing 3 reads for every short read.
We get asked often if customers can benchmark our units for databases, and we tell them yes, with the caveat that we need to make sure they are configured correctly for databases (SQL type, seek based). This configuration is quite important.
Here is the fio input file:
[random] rw=randread size=4g directory=/data iodepth=403 direct=1 blocksize=8k numjobs=16 nrfiles=1 group_reporting ioengine=sync loops=1
And here are the results:
[root@jr3 ~]# fio random.fio random: (g=0): rw=randread, bs=8K-8K/8K-8K, ioengine=sync, iodepth=403 ... random: (g=0): rw=randread, bs=8K-8K/8K-8K, ioengine=sync, iodepth=403 Starting 16 processes ^Cbs: 16 (f=16): [rrrrrrrrrrrrrrrr] [2.0% done] [8,061K/0K /s] [984/0 iops] [eta 02h:27m:36s] fio: terminating on signal 2 random: (groupid=0, jobs=16): err= 0: pid=30405 read : io=1,483MiB, bw=8,415KiB/s, iops=<strong>1,051</strong>, runt=180507msec clat (usec): min=38, max=191K, avg=17101.10, stdev=2927.27 bw (KiB/s) : min= 257, max=11992, per=5.56%, avg=468.18, stdev=128.30 cpu : usr=0.07%, sys=0.23%, ctx=203801, majf=0, minf=1821 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued r/w: total=189886/0, short=0/0 lat (usec): 50=0.06%, 100=1.95%, 250=3.38%, 500=0.39%, 750=0.15% lat (usec): 1000=0.05% lat (msec): 2=0.06%, 4=1.10%, 10=23.37%, 20=48.04%, 50=19.95% lat (msec): 100=1.45%, 250=0.04% Run status group 0 (all jobs): READ: io=1,483MiB, aggrb=8,415KiB/s, minb=8,415KiB/s, maxb=8,415KiB/s, mint=180507msec, maxt=180507msec Disk stats (read/write): sdc: ios=189723/0, merge=0/0, ticks=2877670/0, in_queue=2877810, util=100.00%
So this is looking like 1k IOPs for this test case, on a system not configured/designed for seek loads. In fact, if you look at the latency calculation, you can see a broad peak from 10 to 50 milliseconds. Seek time is ~8ms on these drives, and you need to do 3 drive reads. I’d expect that this means your seek would be somewhere between 8 and 3x 8, but there is probably enough of a seek delay so if you miss one rotation, you might be forced to 3x (8+8) or 48 milliseconds. Which seems to be represented in the data.
Ok, lets change this from direct IO (uncached) to regular IO (cached). Sometimes cache is a good thing. Sometimes it is not. For seek bound loads which are much larger than physical ram, or local cache ram, cache usage is problematic in that it is basically wasted. This is why we have fadvise and other POSIXy like mechanisms to help the system optimize its memory/cache usage. Don’t cache what you won’t reuse.
[root@jr3 ~]# fio random-cached.fio random: (g=0): rw=randread, bs=8K-8K/8K-8K, ioengine=sync, iodepth=403 ... random: (g=0): rw=randread, bs=8K-8K/8K-8K, ioengine=sync, iodepth=403 Starting 16 processes ^Cbs: 16 (f=16): [rrrrrrrrrrrrrrrr] [2.5% done] [6,759K/0K /s] [825/0 iops] [eta 02h:54m:27s] fio: terminating on signal 2 Jobs: 5 (f=5): [E_E_r_r____rrr_E] [2.8% done] [7,471K/0K /s] [912/0 iops] [eta 02h:40m:25s] random: (groupid=0, jobs=16): err= 0: pid=30431 read : io=1,860MiB, bw=6,966KiB/s, <strong>iops=870</strong>, runt=273425msec clat (usec): min=84, max=284K, avg=20382.77, stdev=3316.73 bw (KiB/s) : min= 204, max= 638, per=5.64%, avg=392.56, stdev=13.36 cpu : usr=0.06%, sys=0.33%, ctx=476943, majf=0, minf=2732 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued r/w: total=238101/0, short=0/0 lat (usec): 100=0.01%, 250=0.62%, 500=0.01%, 750=0.01%, 1000=0.08% lat (msec): 2=0.31%, 4=0.22%, 10=15.48%, 20=54.51%, 50=26.09% lat (msec): 100=2.56%, 250=0.12%, 500=0.01% Run status group 0 (all jobs): READ: io=1,860MiB, aggrb=6,966KiB/s, minb=6,966KiB/s, maxb=6,966KiB/s, mint=273425msec, maxt=273425msec Disk stats (read/write): sdc: ios=476192/0, merge=0/0, ticks=4358100/0, in_queue=4358070, util=100.00%
And again, you can see the wide peak which represents disk latency for 3 reads. You don’t expect good IOP rates on a RAID6 … they are not designed for seek based loads. Streaming loads are good for RAID6.
Fio shows us why. Thats why we like using it.
Viewed 21672 times by 8014 viewers