JackRabbits are fast critters

By joe

March 13, 2007 - 5 minutes read - 879 words

We received our unit back from the testers. We were interested in seeing them run the unit hard and comparing it to others in similar configs. Sadly this is not what happened. Regardless, we decided to take the unit out, play with it, understand the performance little better, then take it out to the test track and crack the throttle wide open. Let it run flat out for a bit. See what it can do.

But before we did that, we had some things to do. With our unit back in house, we pulled out the temporary motherboard, and put in the design board. In the process I found a marginal memory stick, so I pulled it. Will get it replaced soon. While I was at it, updated the firmware on the MB, the RAID cards. Did a slight implementation change for the demo unit. The OS raid card was interacting, not positively, with the high end RAID units. So I pulled it. This is closer to what we deliver anyway, the OS RAID was there simply as a helper during the demo. We run all the drives off the higher end RAIDs. Will find a happy home for the OS raid card. Not an issue. Ok. With the system up to spec, install an OS and do some tuning studies. Start with SuSE 10.2. We will install/test SuSE 10.2, RedHat 5, Caos3, OpenFiler 2.2, Ubuntu Edgy Eft, Wasabi’s iSCSI bits, Windows x64 2003 server, and a Solaris 10 implementation (probably Nextenta, as the Solaris 10 distribution from Sun is so insanely hard and unforgiving to install). Started with SuSE 10.2. System took right away, no pain. I pulled the RAID drivers, and built updated RPMs for SuSE. These are better as they also correctly build the initrd under SuSE. Had to craft two quick perl scripts to insert/remove the driver into the initrd correctly, but apart from that (and they are part of the RPM), everything works nicely. Alrighty. Start with some of the tuning studies. Found performance maxima at a stripe size that was a little surprising. Also noticing a “ringing” or aliasing effect in the performance. The RAIDs were configured as 18 x 750 drives organized in a RAID 6, with 2 hot spares per controller. Each RAID card provided therefore 12 TB of RAID6 with 2 hot spares. With these two RAID controllers striped into a RAID0 (call it a RAID60), we are getting 24 TB raw of RAID6 storage. Sure enough

I noticed one of the drives failed and a hot spare kicked in. Ran tuning studies even while this was going on. The RAID is tuned for background job performance as compared to IO performance right now. Will change that later after the rebuild finishes.

An interesting thing I noticed was that a simple cached read speed benchmark maxed out at about 1.2 GB/s.

I am not sure why this is, it had been higher before. Likely a kernel oddity, this is a 2.6.18 kernel. May need a reboot after the rebuild to clear whatever odd state exists. We are up to 2.6.21+ kernels right now, so there may be some other things we can do. Firing off bonnie++ on the device, and watching the system using the excellent dstat tool I see output that looks like this:

Just to make sure I note this, this is a single machine, a single system image. And these are SATA drives. During the runs, atop reports about 10-30% utilization of the I/O channel during writes, and about 40-60% utilization per RAID controller during reads. This makes sense to me given what I have bene measuring. Also, the RAID unit rebuilding shows a noticeably higher load than the normal RAID. This means our performance measurements aren’t nominal, but more of a lower bound.

We do have head room per RAID controller. But we can double the number of RAID controllers as well. Likely we are going to run out of some other bandwidth before we hit the limits of the cards in that config. Also since we are testing much larger than system memory, caching isn’t nearly as important. Some recent tests we have seen have been entirely cache bound, with little to no real I/O during the “I/O benchmark”. Here we are doing real I/O. Data actually gets out to disk. Pretty much all data below 1-2 GB in IOzone tests is cache, so if you only care about this regime, then you should likely aim for the best caching systems around. This is quite good. The tuning done so far shows that most of the reads and writes are well balanced over the range we are most interested in. We re-did the IOzone tests, and will redo them again after the rebuild finishes. Overall, the design motherboard, updated firmware and bios, reasonably constructed systems show even better performance than previous units using pre-release motherboards. I don’t have a quantification of this. Working more diligently to gather this data so we can compare these systems in general, and between the OSes on the JackRabbit in particular. The IOzone data is also better. JackRabbit is fast. The throttle is still not fully open. But the data is great even though it is working through a rebuild.