Time trials: A new record

The JR5 is still building its RAID. 96TB of sweetness in that unit.

But its the JR4 that is tearing up the records.

Did a little tuning, just to fix a problem with the OS drives. I’ll have a long diatribe on this at some point, but not now.

JR4, sitting on the bench in the lab. Pair of Chelsio 10GbE cards, 8 cores of Nehalem goodness, 48 GB ram.

Lets take her for a spin, and light up the afterburners.

Really, you might like this.

[root@jr4s ~]# dd if=/dev/zero of=/data/big.file ...
2048+0 records in
2048+0 records out
34359738368 bytes (34 GB) copied, 17.4019 seconds, <strong>2.0 GB/s</strong>
[root@jr4s ~]# dd if=/data/big.file of=/dev/null ...
2048+0 records in
2048+0 records out
34359738368 bytes (34 GB) copied, 21.1945 seconds, <strong>1.6 GB/s</strong>

Yeah, less than ram, but direct IO, so RAM caching doesn’t matter. RAID caching does though. So lets go double ram and see what happens (reduces the effectiveness of RAID cache as well).

and while I am doing that, here are a few lines from vmstat 1, so you can see what the machine is doing in a raw sense.


procs ———–memory———- —swap– —–io—- –system– —–cpu——
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 49007856 19164 218048 0 0 0 0 65 44 0 0 100 0 0
0 0 0 49007856 19164 218048 0 0 0 0 41 33 0 0 100 0 0
0 0 0 49007856 19164 218048 0 0 0 0 37 28 0 0 100 0 0
0 0 0 49007856 19164 218048 0 0 0 0 40 36 0 0 100 0 0
0 0 0 49007856 19164 218048 0 0 0 0 53 35 0 0 100 0 0
0 0 0 49007856 19164 218048 0 0 0 0 48 50 0 0 100 0 0
0 0 0 49007856 19164 218048 0 0 0 0 38 30 0 0 100 0 0
0 0 0 49007856 19164 218048 0 0 0 0 38 36 0 0 100 0 0
0 1 0 48990872 19164 218048 0 0 12 1392640 5593 393 0 3 95 2 0
0 1 0 48991680 19164 218056 0 0 0 2523136 9901 661 0 6 87 7 0
0 1 0 48991216 19164 218056 0 0 0 2523392 10011 657 0 5 88 7 0
1 0 0 48991248 19164 218056 0 0 0 2506752 9924 651 0 5 88 7 0
0 1 0 48991136 19164 218056 0 0 0 2539776 9973 655 0 5 89 7 0
0 2 0 48991452 19172 218048 0 0 0 2261028 9032 594 0 5 88 7 0
1 0 0 48991308 19172 218056 0 0 0 1605632 6375 429 0 3 85 12 0

Those numbers that look like 2.2 – 2.5 million, are the raw IO rates. I had estimated 2.2 GB/s as theoretical max, I am assuming RAID caching has something to do with this as well.

[root@jr4s ~]# dd if=/dev/zero of=/data/big.file  ...
6144+0 records in
6144+0 records out
103079215104 bytes (103 GB) copied, 60.8338 seconds, <strong>1.7 GB/s</strong>
[root@jr4s ~]# dd if=/data/big.file of=/dev/null ...
6144+0 records in
6144+0 records out
103079215104 bytes (103 GB) copied, 65.8244 seconds, <strong>1.6 GB/s</strong>

This isn’t bad at all. Indeed, if we look at dtrace output …

  0   0 100   0   0   0|   0     0 :   0     0 :   0     0 :   0     0 :   0     0 | 240B  808B|   0     0 |  43    34 
  0   0 100   0   0   0|   0     0 :   0     0 :   0     0 :   0     0 :   0     0 | 240B  808B|   0     0 |  44    35 
----total-cpu-usage---- --dsk/sdc-----dsk/sdd-----dsk/sde-----dsk/sdf-----dsk/sdg-- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ: read  writ: read  writ: read  writ: read  writ| recv  send|  in   out | int   csw 
  0   0 100   0   0   0|   0     0 :   0     0 :   0     0 :   0     0 :   0     0 | 360B 1796B|   0     0 |  51    41 
  0   0 100   0   0   0|   0     0 :   0     0 :   0     0 :   0     0 :   0     0 | 300B  808B|   0     0 |  51    43 
  0   2  97   1   0   0|   0     0 :   0     0 :  12k  274M:   0   274M:   0   274M| 564B 1060B|   0     0 |3455   167 
  0   5  89   6   0   0|   0     0 :   0     0 :   0   856M:   0   856M:   0   857M| 180B  706B|   0     0 |  10k  359 
  0   6  87   7   0   0|   0     0 :   0     0 :   0   833M:   0   833M:   0   832M| 180B  706B|   0     0 |  10k  349 
  0   4  91   5   0   0|   0     0 :   0     0 :   0   480M:   0   480M:   0   480M| 180B  706B|   0     0 |5795   209 
  0   4  91   4   0   0|   0   128k:   0   128k:   0   834M:   0   834M:   0   834M| 180B  706B|   0     0 |  10k  364 
  0   6  88   6   0   0|   0     0 :   0     0 :   0   787M:   0   787M:   0   788M| 180B  706B|   0     0 |9540   336 
  0   6  87   6   0   0|   0     0 :   0     0 :   0   789M:   0   789M:   0   789M| 180B  706B|   0     0 |9562   336 
  0   3  88   9   0   0|   0     0 :   0     0 :   0   507M:   0   508M:   0   508M| 180B  706B|   0     0 |6197   223 
  0   5  87   8   0   0|   0     0 :   0     0 :   0   485M:   0   481M:   0   483M| 180B  706B|   0     0 |5914   213 
  0   5  88   7   0   0|   0   128k:   0   128k:   0   513M:   0   515M:   0   513M| 180B  706B|   0     0 |6323   243 
  0   5  87   8   0   0|   0     0 :   0     0 :   0   500M:   0   500M:   0   501M| 222B  760B|   0     0 |6137   225 

Each RAID settles down to 500+ MB/s, which makes sense in this design. It is operating right where we need it to operate.

Very cool. The 2GB/s mark.

I should note, I expect that our JR5 is going to be a mite bit faster …

Viewed 8855 times by 1808 viewers

5 thoughts on “Time trials: A new record

  1. BTW: I heard about some massive solution with 800+ disks that was having trouble with 5.4 TB/hour.

    [root@jr4s ~]# dd if=/dev/zero of=/data/big.file ...
    47788+0 records in
    47788+0 records out
    3206998392832 bytes (3.2 TB) copied, 2206.15 seconds, 1.5 GB/s
    

    Looks like we can write 5.2TB in an hour, with 24 drives. I am guessing we cost much less as well.

  2. @Chris

    Let me know what arguments you want. This is xfs. Btrfs isn’t quite ready yet, and some of the others I want to play with (pohmelfs, nilfs2, …) aren’t appropriate for the workload. This is a 2.6.28.7 kernel, not sure if I can do a good ext4.

    Machine is off to the customer no later than Tuesday next week, and I have lots of burn-in yet to do. Octobonnie is next, but I will run a unibonnie for you.

  3. I have the bonnie++ numbers. V1.94. They look strange, in that during output, the system was mostly idle. I suspect that bonnie isn’t doing its IO very intelligently in this version. Will try 1.03c

    Version 1.94 ——Sequential Output—— –Sequential Input- –Random-
    Concurrency 1 -Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–
    Machine Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
    jr4s 96G:1m 98037 9 265634 29 898256 42 175.0 51
    Latency 3683ms 1638ms 689ms 1654ms
    Version 1.94 ——Sequential Create—— ——–Random Create——–
    jr4s -Create– –Read— -Delete– -Create– –Read— -Delete–
    files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
    16 20982 53 +++++ +++ 31271 62 13212 47 +++++ +++ 24097 64
    Latency 37732us 1327us 2336us 39354us 7us 90761us
    1.93c,1.94,jr4s,1,1247252813,96G,1m,,,98037,9,265634,29,,,898256,42,175.0,51,16,,,,,20982,53,+++++,+++,31271,62,13212,47,+++++,+++,24097,64,,3683ms,1638ms,
    ,689ms,1654ms,37732us,1327us,2336us,39354us,7us,90761us

    I have found fio to be far superior to bonnie in testing. I’ll come up with a bonnie experiment for it, and post it.

Comments are closed.