times like this put a smile on my face ...

By joe

September 30, 2009 - 4 minutes read - 772 words

We are running some burn-in tests on the JackRabbit storage cluster. 6 of 8 nodes are up, 2 need to be looked at tomorrow. On one of the nodes, we have 3 RAID cards. Because of how the customer wants the unit, it is better for us to have 3 separate file systems. So thats what we have. They will all be aggregated shortly (hopefully tomorrow) with a nice cluster file system and some infiniband goodness. Ok. I wanted to stream some writes and reads to each file system. 3 of each at a time, one to each file system. Make each stream larger than ram, so there is no caching. Caching doesn’t mix well with streaming. And it interferes with measuring the raw horsepower of the underlying system. So here I am with 3 writes. I lit off a vmstat 1 in another window, just to see what was happening. the bo column is the number of 1k blocks output in the time interval (1 second). So do a quick multiplication by 1000 to get the aggregate byte output.

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  3      0 45741696 102976 3077232    0    0     0 2474240 13155  698  0 12 57 31  0
 2  1      0 45741176 102976 3077236    0    0     0 2457600 13207  718  0 10 65 25  0
 0  3      0 45741116 102984 3077236    0    0     0 1671212 9157  512  0  6 64 29  0
 1  2      0 45741028 102984 3077236    0    0     0 2541648 14052  724  0 12 62 27  0
 0  3      0 45741228 102984 3077236    0    0     0 2488496 13054  705  0 13 58 29  0
 0  3      0 45741052 102984 3077236    0    0     0 2473984 13198  703  0 11 63 26  0
 1  3      0 45741072 102984 3077236    0    0     0 2490880 13483  711  0 12 66 23  0
 0  3      0 45741152 102984 3077236    0    0     0 2490368 13348  719  0 12 62 26  0
 0  3      0 45741280 102992 3077236    0    0     0 2293792 12448  681  0 12 59 29  0
 0  3      0 45741128 102992 3077236    0    0     0 2457600 12971  700  0 12 60 27  0
 1  2      0 45740984 102992 3077236    0    0     0 2523392 13582  721  0  9 65 25  0
 1  2      0 45740896 102992 3077236    0    0     0 2473984 13316  711  0  8 65 27  0

It was sustaining around 2.5 GB/s writes. Each RAID is a RAID6 btw. What about reads?

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  3      0 45727080 102760 3077184    0    0 3026904     0 11896  831  0  3 64 32  0
 0  3      0 45727200 102760 3077188    0    0 3031040     0 11922  848  0  4 63 33  0
 1  3      0 45727128 102760 3077188    0    0 3014656     0 11944  830  0  3 67 30  0
 0  3      0 45727016 102760 3077188    0    0 2981888     0 11859  824  0  4 56 40  0
 0  3      0 45727136 102760 3077188    0    0 2949120     0 11753  815  0  3 68 28  0
 0  3      0 45727208 102768 3077188    0    0 3031040    56 12011  854  0  4 56 39  0
 2  1      0 45727128 102768 3077192    0    0 3000704     0 12022  827  0  4 58 38  0
 1  2      0 45727464 102768 3077192    0    0 3033776     0 12074  825  0  5 51 45  0
 1  2      0 45727432 102768 3077192    0    0 3014200     0 12033  841  0  3 76 22  0
 2  2      0 45727344 102768 3077192    0    0 3049248     0 12019  830  0  5 57 39  0
 2  1      0 45727448 102768 3077192    0    0 2985416     0 11826  824  0  4 58 38  0
 0  3      0 45727336 102776 3077192    0    0 2906288    48 11599  821  0  4 50 46  0
 2  2      0 45727168 102776 3077192    0    0 3014656     0 11970  833  0  2 76 22  0

Yup. We are sustaining 3GB/s reads. The dd output tells the tail

[root@jr5-1-1 ~]# dd if=/data/brick-sdd2/big.file of=/dev/null ...
4096+0 records in
4096+0 records out
68719476736 bytes (69 GB) copied, 67.3555 seconds, 1.0 GB/s
[root@jr5-1-1 ~]# dd if=/data/brick-sde2/big.file ...
4096+0 records in
4096+0 records out
68719476736 bytes (69 GB) copied, 67.3359 seconds, 1.0 GB/s
[root@jr5-1-1 ~]# dd if=/data/brick-sdc2/big.file ...
4096+0 records in
4096+0 records out
68719476736 bytes (69 GB) copied, 68.1395 seconds, 1.0 GB/s

Nice :) Our target is to hit 8GB/s with this cluster. With 24 RAID cards, each writing around 820 MB/s and reading around 1GB/s, I am not so concerned about the storage.