More bonnie

Following Chris Samuel’s suggestion, I pulled down version 1.96 of bonnie and built it. The machine I am using now is a Scientific Linux based system, with Scalable Informatics kernel. Scientific Linux is yet another RHEL rebuild. This is a customer requested distribution for this machine.

SL suffers from the RHEL kernel, which is IMO inappropriate for use as a high performance storage system kernel. Workload patterns our customers wish to test regularly crash the RHEL distro kernels. These kernels are missing features they need (like xfs), and have numerous misfeatures (4k stacks, backports, …) which do compromise stability under heavy load. I have been informed that Redhat has started supporting xfs, in large part due to customer demand, but to specific named accounts only. This is unfortunate. Then again, as SGI has been the primary developer of xfs, and SGIs future had/has been in doubt, it would be prudent to look for alternatives. Ext4 and btrfs are obvious choices, but both are too early for serious consideration for large data storage. Pohmelfs, and nilfs2 are also somewhat early.

This doesn’t stop people from running benchmarks, and most of the benchmarks we have seen here are … well … not doing a good job testing what they purport to test. This was/is my bone to pick with bonnie. More in a moment.

As a benchmarker, one thing you absolutely must do is understand your measurement tool, how it interacts with your system, and what, with precision, you are actually measuring. Far too often we see and read of ‘benchmarks’ which aren’t of the system the authors claim they are. We have seen people try to benchmark I/O using cache based read/write for I/O sizes far less than ram. Which only exercises the cache and the eventual file system flush code, not the file system, nor the I/O system. Yet these ‘benchmarks’ are taken at face value, with results reported, analyzed, used to compare systems. When all you are really doing is comparing cache. The folks doing this are in good company. We have seen this in popular web sites, as well as national labs. The latter doesn’t make the failed technique any better, it just makes it that much more important to educate about.

If I want to benchmark I/O, I want, curiously, I/O to occur. If I want to benchmark computation, I want, again curiously, computation to occur. Its easy to see if I/O is occurring. Look for blinking lights on drives. If data is getting out to your drives, chances are you will visually see this.

Bonnie++ 1.96 doesn’t do cached writes very effectively. I can see this in the disk activity lights, and in the reported performance. In fact, what I am seeing looks something like this (according to bonnie)

Machinecached/directSequential read (MB/s)Sequential write (MB/s)Sequential rewrite(MB/s)

This says, generally, that sequential (streaming) reads are about the same between cached and direct IO. I expected this. Rewrite speed is quite similar too, also expected. One would expect streaming writes to be quite similar. But they aren’t. You can use fadvise to tell the OS not to cache a streaming write, so you make more effective use of the file system.

Hence my contention that the non-direct IO writes are borked in bonnie.

This is not the only place they are BTW. Its not just bonnie either.

But the point is, if you are trying to use the tool as a meaningful measuring device, it is leaving a great deal to be desired, as it does not appear to be effectively utilizing writes.

Ok, lets use that bonnie.fio that I ran previously. Run cached and then direct IO. Using the same normal read/write used in bonnie. We can switch to other versions (mmap, vsync, …) for more testing, as well as using different buffering.

Machinecached/directio depthr/w methodSequential read (MB/s)Sequential write (MB/s)Sequential rewrite(MB/s)

It doesn’t surprise me that the writes are better under direct IO, as cached has cache to manage. It doesn’t surprise me that at an IO depth of 1, the reads are better under cached IO, as it has a read-ahead function. The read-modify-write of rewrite also should be better under cached. Which it is. That is, these results are more in line with what you would expect.

Ok, now lets do what bonnie doesn’t, and bump up the iodepth to what this system can handle. Each drive has a queue depth of 32, and there are 15 active data drives here. Toital queue depth of 480 across all drives.

Machinecached/directio depthr/w methodSequential read (MB/s)Sequential write (MB/s)Sequential rewrite(MB/s)
jr4scached480read()/write() 737 912 228

As you can see, direct IO doesn’t benefit from increased IO depth. Doesn’t do much for it at all.

Ok, lets add in a better buffering scheme. Lets use huge memory buffers (linux huge pages), and 4096k sized buffers, using shm interface.

Machinecached/directio depthr/w methodSequential read (MB/s)Sequential write (MB/s)Sequential rewrite(MB/s)
jr4scached480read()/write() 729 929 729

Still we don’t see the sort of performance that a simple dd generates. I am not sure I understand the major differences, unless they are using different techniques to do the reads and writes. So instead of using read/write, what if you used pread/pwrite, or vsync, or mmap, or …

Lets try vsync as an engine rather than sync. This could positively impact the read-write portion, though it might not impact the read or write portion that much. Maybe add overhead for the cached IO.

Machinecached/directio depthr/w methodSequential read (MB/s)Sequential write (MB/s)Sequential rewrite(MB/s)
jr4sdirect480read()/write() 566 1155 234
jr4scached480read()/write() 725 871 274

So I can’t really explain the bonnie numbers with fio modeling it. I am assuming bonnie does something very different in terms of io. I’ll have to look at this.

Viewed 8831 times by 1950 viewers