As the storage cluster builds …

Finally finished the Tiburon changes for the storage cluster config. Storage clusters are a bit different than computing clusters in a number of regards, not the least of those being the large RAID in the middle.
In this case, the storage cluster is 8 identical JackRabbit JR5 units, each with 24 TB storage, 48 drives, 3 RAID cards, dual port QDR cards, and for our testing, we are using an SDR network (as we don’t have a nice 8 port QDR switch in house).
Tiburon is our cluster load and configuration system. It is designed to be as simple as possible, as unobtrusive as you can make it … it does all the heavy lifting in our finishing scripts, to take a base OS install, and configure it with as much level of detail as we require.

To configure the cluster, we started with a base OS load. Using Centos 5.3 right now, could just as easily use something different (SuSE, Ubuntu, …). Centos in large part because OFED usually builds cleanly on it.
The beauty of Tiburon is, that we can change the base load underneath the configuration, and it won’t impact the configuration scripts much (though some of the differences in file placement and format need to be accounted for). We can do things to nodes that, well, other cluster load systems find simply impossible, as they usually impose a rigid worldview on how to configure them.
So now we have our storage cluster nodes PXE boot on startup, being bare metal, and within oh … 15 minutes or so, we have a fully up node, with multiple RAID6’s building, yet storage is available on the network.
With a little more work, we can automate specific target configuration in this, and I’ll be working on that tomorrow.
I guess I haven’t talked about I should at some point … I should 🙂 . Think of iSCSI configuration made easy. Or NFS. Or (insert your favorite file/block storage mechanism). Its still in alpha, but is showing great promise as a powerful tool to enable simple configuration of block and file targets.
Again, the idea is to make everything easy. Bring great power to bear, and get out of the way of the user if they want to do things on their own.
But I digress.
I just ran some a basic performance test. Our simple little “lets stream 3 simultaneous 64GB streams to and from disk”. Nothing special really. Machine has 48GB ram, so cache isn’t in the picture.
Needless to say, I like the results. I suspect the customer will be as well. The preliminary results … er … exceed my expectations by a bit on a per machine and per RAID basis.
We have some additional things to do to the nodes tomorrow and Tuesday.
Hoping to have raw performance benchies up around Thursday/Friday.