JackRabbits are fast critters

We received our unit back from the testers. We were interested in seeing them run the unit hard and comparing it to others in similar configs. Sadly this is not what happened. Regardless, we decided to take the unit out, play with it, understand the performance little better, then take it out to the test track and crack the throttle wide open. Let it run flat out for a bit. See what it can do.

But before we did that, we had some things to do. With our unit back in house, we pulled out the temporary motherboard, and put in the design board. In the process I found a marginal memory stick, so I pulled it. Will get it replaced soon. While I was at it, updated the firmware on the MB, the RAID cards. Did a slight implementation change for the demo unit. The OS raid card was interacting, not positively, with the high end RAID units. So I pulled it. This is closer to what we deliver anyway, the OS RAID was there simply as a helper during the demo. We run all the drives off the higher end RAIDs.

Will find a happy home for the OS raid card. Not an issue.

Ok. With the system up to spec, install an OS and do some tuning studies.

Start with SuSE 10.2. We will install/test SuSE 10.2, RedHat 5, Caos3, OpenFiler 2.2, Ubuntu Edgy Eft, Wasabi’s iSCSI bits, Windows x64 2003 server, and a Solaris 10 implementation (probably Nextenta, as the Solaris 10 distribution from Sun is so insanely hard and unforgiving to install).

Started with SuSE 10.2. System took right away, no pain. I pulled the RAID drivers, and built updated RPMs for SuSE. These are better as they also correctly build the initrd under SuSE. Had to craft two quick perl scripts to insert/remove the driver into the initrd correctly, but apart from that (and they are part of the RPM), everything works nicely.

Alrighty. Start with some of the tuning studies. Found performance maxima at a stripe size that was a little surprising. Also noticing a “ringing” or aliasing effect in the performance. The RAIDs were configured as 18 x 750 drives organized in a RAID 6, with 2 hot spares per controller. Each RAID card provided therefore 12 TB of RAID6 with 2 hot spares. With these two RAID controllers striped into a RAID0 (call it a RAID60), we are getting 24 TB raw of RAID6 storage. Sure enough

jackrabbit1:~ # df -h /storage
Filesystem Size Used Avail Use% Mounted on
/dev/md0 22T 1.1M 22T 1% /storage

I noticed one of the drives failed and a hot spare kicked in. Ran tuning studies even while this was going on. The RAID is tuned for background job performance as compared to IO performance right now. Will change that later after the rebuild finishes.

# Name Raid# Level Capacity Ch/Id/Lun State
===========================================
1 jrvs1 1 Raid6 12000.0GB 00/01/00 Rebuilding(67.9%)
===========================================

An interesting thing I noticed was that a simple cached read speed benchmark maxed out at about 1.2 GB/s.

jackrabbit1:~ # hdparm -T /dev/md0
/dev/md0:
Timing cached reads: 2460 MB in 2.00 seconds = 1231.52 MB/sec

I am not sure why this is, it had been higher before. Likely a kernel oddity, this is a 2.6.18 kernel. May need a reboot after the rebuild to clear whatever odd state exists. We are up to 2.6.21+ kernels right now, so there may be some other things we can do. Firing off bonnie++ on the device, and watching the system using the excellent dstat tool I see output that looks like this:

—-total-cpu-usage—- -dsk/total—-dsk/sda—–dsk/sde—–dsk/sdf– -net/total- —paging– —system–
usr sys idl wai hiq siq| read writ: read writ: read writ: read writ| recv send| in out | int csw

0 29 67 1 0 2| 871M 0 : 0 0 : 432M 0 : 439M 0 | 126B 210B| 0 0 |5959 9181
0 32 63 2 0 3| 897M 0 : 0 0 : 448M 0 : 449M 0 | 126B 210B| 0 0 |6006 8521
0 26 68 3 0 2| 820M 0 : 0 0 : 415M 0 : 404M 0 | 126B 210B| 0 0 |5651 7563
0 23 70 4 0 3| 752M 17k: 0 0 : 373M 0 : 380M 0 | 186B 210B| 0 0 |5115 6952
0 25 69 3 0 2| 845M 0 : 0 0 : 427M 0 : 418M 0 | 126B 210B| 0 0 |5832 7688
0 24 69 4 0 2| 775M 0 : 0 0 : 384M 0 : 391M 0 | 126B 210B| 0 0 |5261 6662
0 24 70 2 0 3| 867M 0 : 0 0 : 432M 0 : 436M 0 | 126B 210B| 0 0 |5922 7648
0 20 72 6 0 2| 706M 0 : 0 0 : 358M 0 : 348M 0 | 66B 210B| 0 0 |4788 6143
0 19 71 8 0 2| 658M 4096B: 0 0 : 323M 0 : 336M 0 | 66B 210B| 0 0 |4626 5925
0 26 69 3 0 2| 844M 0 : 0 0 : 428M 0 : 416M 0 | 66B 210B| 0 0 |5684 6981
0 20 72 5 0 3| 752M 9216B: 0 0 : 376M 0 : 376M 0 | 231B 210B| 0 0 |5268 6569
0 23 70 4 0 2| 813M 0 : 0 0 : 404M 0 : 409M 0 | 126B 252B| 0 0 |5516 6771
0 25 68 4 0 3| 827M 0 : 0 0 : 416M 0 : 411M 0 | 66B 210B| 0 0 |5665 6976
0 24 70 4 0 2| 796M 17k: 0 0 : 392M 0 : 404M 0 | 270B 210B| 0 0 |5439 6609
0 25 71 1 0 2| 904M 16k: 0 0 : 456M 0 : 448M 0 | 271B 210B| 0 0 |6141 7359
0 17 72 8 0 3| 686M 16k: 0 0 : 344M 0 : 342M 0 | 353B 210B| 0 0 |4697 5341
0 24 67 5 0 3| 782M 0 : 0 0 : 388M 0 : 394M 0 | 66B 210B| 0 0 |5433 5867
0 23 69 3 0 4| 856M 0 : 0 0 : 432M 0 : 424M 0 | 66B 210B| 0 0 |5769 6287
0 26 68 2 0 2| 885M 6485k: 0 0 : 437M 3168k: 448M 3300k| 66B 210B| 0 0 |6137 6515
0 24 68 3 0 4| 897M 59M: 0 0 : 455M 28M: 442M 30M| 66B 210B| 0 0 |6569 6736
0 22 69 6 0 3| 766M 30M: 0 0 : 384M 14M: 382M 16M| 126B 210B| 0 0 |5527 5810

Just to make sure I note this, this is a single machine, a single system image. And these are SATA drives. During the runs, atop reports about 10-30% utilization of the I/O channel during writes, and about 40-60% utilization per RAID controller during reads. This makes sense to me given what I have bene measuring. Also, the RAID unit rebuilding shows a noticeably higher load than the normal RAID. This means our performance measurements aren’t nominal, but more of a lower bound.

bonnie++ -d /storage -u root -n 0 -f
Using uid:0, gid:0.
Writing intelligently…done
Rewriting…done
Reading intelligently…done
start ’em…done…done…done…
Version 1.01d ——Sequential Output—— –Sequential Input- –Random-
-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
jackrabbit1 24024M 591800 89 297600 53 785195 68 313.9 0
jackrabbit1,24024M,,,591800,89,297600,53,,,785195,68,313.9,0,,,,,,,,,,,,,

We do have head room per RAID controller. But we can double the number of RAID controllers as well. Likely we are going to run out of some other bandwidth before we hit the limits of the cards in that config.

Also since we are testing much larger than system memory, caching isn’t nearly as important. Some recent tests we have seen have been entirely cache bound, with little to no real I/O during the “I/O benchmark”. Here we are doing real I/O. Data actually gets out to disk. Pretty much all data below 1-2 GB in IOzone tests is cache, so if you only care about this regime, then you should likely aim for the best caching systems around.

This is quite good. The tuning done so far shows that most of the reads and writes are well balanced over the range we are most interested in. We re-did the IOzone tests, and will redo them again after the rebuild finishes.

Overall, the design motherboard, updated firmware and bios, reasonably constructed systems show even better performance than previous units using pre-release motherboards. I don’t have a quantification of this. Working more diligently to gather this data so we can compare these systems in general, and between the OSes on the JackRabbit in particular.

The IOzone data is also better. JackRabbit is fast. The throttle is still not fully open. But the data is great even though it is working through a rebuild.

Viewed 6486 times by 1172 viewers

Facebooktwittergoogle_plusredditpinterestlinkedinmail