OT: been very busy …

Good version of busy; lots of quotes, orders, builds, …. A new market has emerged for us, one I wasn’t sure how to break into, that looks like it is going to do good things for us. Entrenched expensive and slow competitor, everyone looking for better systems. Should be interesting coupla months. I hope I … Read moreOT: been very busy …

Still struggling with half-open and otherwise broken drivers

We have a nice pair of Qlogic 7220 DDR HCAs in house. Direct connecting a pair of machines for a simple point to point bit.
Using our updated 2.6.32.39.scalable kernel.
Want to set up SRP target. So we have to get OFED compiled. Need 1.5.3+ due to their … er … issues tracking kernels.
Basically the OFED build process is an abuse … a very severe one … of the RPM process.
RPMs should build, correctly, by default with
rpmbuild -ba spec-file.spec

… but … OFED … doesn’t … quite …
If you read through the install.pl, you see all the special casing they do for various drivers/subsystems. That is, they haven’t really resynced the OFED stack and the kernel drivers in a while, so its possible to build a late model kernel, say something in the 2.6.3x (x >= 6) and get 1.4.2 era-ish drivers.
Which means, if you have new-ish hardware, like things that require the qib driver (Qlogic updated driver to replace ipath driver), you are SOL.
Yeah, we are sort of in that boat. So I hacked the install.pl to make sure that it actually builds the qib driver. And packages it.
Which it doesn’t do by default. Remember that broken RPM process? I’d honestly hold up this stack as exhibit “a” of what not to do with RPM.

Read moreStill struggling with half-open and otherwise broken drivers

Updated JackRabbit JR5 results

Lab machine, updated RAID system (to our current shipping specs).
We’ve got a 10GbE and an IB DDR card in there for some end user lab tests over the next 2 weeks.
We just finished rebuilding the RAID unit, and I wanted a baseline measurement. So a fast write then read (uncached of course).

[root@jr5-lab fio]# fio sw.fio
...
Run status group 0 (all jobs):
  WRITE: io=195864MB, aggrb=3789.1MB/s, minb=3880.1MB/s, maxb=3880.1MB/s, mint=51680msec, maxt=51680msec

Thats the write.
Here’s the read.

[root@jr5-lab fio]# fio sr.fio
...
Run status group 0 (all jobs):
   READ: io=195864MB, aggrb=4639.3MB/s, minb=4750.6MB/s, maxb=4750.6MB/s, mint=42219msec, maxt=42219msec

Yeah.
Streaming 196GB in 42.2s. 4.6GB/s sustained read.
System has 32GB ram, RAID cache is 512MB. No SSDs were used for caching this file system (reads/writes).

Read moreUpdated JackRabbit JR5 results

IT storage

They see a shiny new storage chassis with 6G backplane. They fill it with “fast” drives, and build “raids” using integrated RAID platforms.
They insist it should be fast, showing calculations that suggest that it should sustain near theoretical max performance on IO.
Yet, the reality is that its 1/10th to 1/20th the theoretical max performance.
Whats going on?
In the past, I’ve railed against “IT clusters” … basically clusters designed, built, and operated by IT staff unfamiliar with how HPC systems worked. They share a number of traits, all partially or mostly anathema to high performance computing. I won’t re-hash that post, you can search for it.

Read moreIT storage

Unbelievable

A system designed to fail often will. Seen this a few times this past week. In one case, someone agrees that we we do and our machines have value, but want our stuff without paying us for our stuff. They don’t want to buy them. They just want us to tell them how to build … Read moreUnbelievable