Archive for September, 2009

times like this put a smile on my face …

Tuesday, September 29th, 2009

We are running some burn-in tests on the JackRabbit storage cluster. 6 of 8 nodes are up, 2 need to be looked at tomorrow.

On one of the nodes, we have 3 RAID cards. Because of how the customer wants the unit, it is better for us to have 3 separate file systems. So thats what we have. They will all be aggregated shortly (hopefully tomorrow) with a nice cluster file system and some infiniband goodness.

Ok. I wanted to stream some writes and reads to each file system. 3 of each at a time, one to each file system. Make each stream larger than ram, so there is no caching. Caching doesn’t mix well with streaming. And it interferes with measuring the raw horsepower of the underlying system.

So here I am with 3 writes. I lit off a vmstat 1 in another window, just to see what was happening.

the bo column is the number of 1k blocks output in the time interval (1 second). So do a quick multiplication by 1000 to get the aggregate byte output.

(more…)

Viewed 29478 times by 1930 viewers

  • Share/Bookmark

As the storage cluster builds …

Sunday, September 27th, 2009

Finally finished the Tiburon changes for the storage cluster config. Storage clusters are a bit different than computing clusters in a number of regards, not the least of those being the large RAID in the middle.

In this case, the storage cluster is 8 identical JackRabbit JR5 units, each with 24 TB storage, 48 drives, 3 RAID cards, dual port QDR cards, and for our testing, we are using an SDR network (as we don’t have a nice 8 port QDR switch in house).

Tiburon is our cluster load and configuration system. It is designed to be as simple as possible, as unobtrusive as you can make it … it does all the heavy lifting in our finishing scripts, to take a base OS install, and configure it with as much level of detail as we require.

(more…)

Viewed 31295 times by 1928 viewers

  • Share/Bookmark

Is RAID over?

Thursday, September 24th, 2009

Henry Newman and a few other people I know are talking about RAID as being on the way out. John West pointed at this article this morning on InsideHPC. Their points are quite interesting.

It boils down to this: If the time to rebuild a failed raid is comparable to the mean time between uncorrectable errors (UCE), due to reading/writing volume, then RAID as it is currently thought of, is going to need some serious rethinking.

Put another way, if you are more likely than not to suffer an uncorrectable error during a rebuild, then rebuilding is a bad thing … and since this is one of the central pillars of RAID …

So what are the options?

(more…)

Viewed 34555 times by 2313 viewers

  • Share/Bookmark

Been horrifically busy … good busy … but busy

Thursday, September 24th, 2009

Will try to do updates soon, and I owe someone two articles (sorry!). Add to this fighting off a cold … not a happy camper.

Basically we are building an 8x JackRabbit JR5 storage cluster right now. I’ve caught a problem in Tiburon, our OS loader, in the process, and am fixing it. Tiburon is all about providing a very simple platform to enable PXE (and/or iSCSI) booting OSes to make installation/support simple. It uses our finishing scripts, which take a basic OS load and finish it, or polish it for the task at hand.

(more…)

Viewed 33483 times by 1967 viewers

  • Share/Bookmark

M&A: Microsoft buys the *assets* of Interactive Supercomputing

Tuesday, September 22nd, 2009

As seen on InsideHPC, John West notes that the assets of Star-P were purchased by Microsoft today.

Parsing of words is important. The phrase “acquired the assets of X” means that the IP was purchased. John points to the blog post where Kyril Faenov mentions that some of the staff will work at the Microsoft Cambridge site.

This is sadly, not a great exit for Star-P.

Acquiring assets usually means the choice has been to shut down the company, and auction the bits off, or find a buyer for the distressed assets and then wind down the rest of the organization that doesn’t go with the assets.

(more…)

Viewed 32905 times by 2138 viewers

  • Share/Bookmark

The looming (storage) bandwidth wall

Monday, September 21st, 2009

This has been bugging me for a while. Here is a simple measure of the height of the bandwidth wall. Take the size of your storage, and divide it by the maximum speed of your access to the data. This is the height of your wall, as measured in seconds. The time to read your data. The higher the wall, the more time you need to read your data.

Ok, lets apply this in practice. A 160 GB drive, that can read/write at 100MB/s. Your wall height is 1600s (= 160GB / 0.1GB/s).

Take a large unit, like our 96TB high performance storage and processing unit. You get ~70TB available at 2GB/s. Your bandwidth wall height is then 35000s (= 70TB / 2E-3 TB/s).

I also wonder if it makes more sense to view this logarithmically … measure the wall height as a log base 10 of this ratio, lopping off the units (what is a log(second) ?). So 1600s wall height would be 3.2. A 35000s wall height would be 4.5. Sort of like the hurricane strength measures. A wall height of 1 second (say fast memory disk) would be a 0 on this log scale.

Using this, you could get a sense of where design points are for nearline, offline/archival storage are.

This is part of a longer set of thought processes on why current large array designs, or backblaze like designs are problematic at best for large storage systems.

(more…)

Viewed 25776 times by 3052 viewers

  • Share/Bookmark

M&A continues: Dell snarfs up PDS

Monday, September 21st, 2009

This is going to make a few Dell partners (Wipro et al) nervous. Sort of like the HP acquisition of EDS did. Is it possible that the service providers are going to be snapped up now to provide differentiated value in the face of declining revenues for hardware?

Does this mean anything for HPC or storage?

(more…)

Viewed 16830 times by 1845 viewers

  • Share/Bookmark

Twitter Updates for 2009-09-16

Wednesday, September 16th, 2009

Powered by Twitter Tools

Viewed 23165 times by 2100 viewers

  • Share/Bookmark

We’re Back!

Monday, September 14th, 2009

We were knocked off the air around 11pm on 13-September, by a machine finally deciding to give up its ghost. A partially retired machine which happened to run scalability.org decided, finally, that it no longer wished to correctly run grub.

Grub being the thing essential to booting.

Like the bootloader.

Yeah. It was one of those nights.

(more…)

Viewed 19942 times by 1988 viewers

  • Share/Bookmark

Using fio to probe IOPs and detect internal system features

Saturday, September 12th, 2009

Scalable Informatics JackRabbit JR3 16TB storage system, 12.3TB usable.

[root@jr3 ~]# df -m /data
Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/sdc2             12382376    425990  11956387   4% /data

[root@jr3 ~]# df -h /data
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdc2              12T  417G   12T   4% /data

These tests are more to show the quite remarkable utility of the fio tool than anything else. You can probe real issues in your system (as compared to a broad swath of ‘benchmark’ tools that don’t really provide a useful or meaningful measure of anything)

This is on a RAID6, so its not really optimal for for seeks. The benchmark is 8k random reads, with 16 threads, each reading 4GB of its own file (64GB in aggregate, well beyond cache, but we are using direct IO anyway). 16 drive RAID6, 1 hot spare, 2 parity, giving 13 physical drives. Using a queue depth of 31 per drive, these 13 data drives have an aggregate queue depth of 403 (13 x 31). Of course, in RAID6, its really less than that, as you are doing 3 reads for every short read.

We get asked often if customers can benchmark our units for databases, and we tell them yes, with the caveat that we need to make sure they are configured correctly for databases (SQL type, seek based). This configuration is quite important.

Here is the fio input file:

(more…)

Viewed 19891 times by 2077 viewers

  • Share/Bookmark