Are HPC cloud users expectations realistic?

Several years ago, before clouds were all the rage, we were working with a large customer discussing an “on-demand” HPC computing service. This service predated Amazon’s setup, and was more in line with what Sabalcore, CRL and others are doing.
I remember distinctly from my conversations with the customer that they had particular desires. Specifically, they wanted to run on always the latest/greatest/fastest possible hardware, and not pay any more for this. A new CPU from Intel or AMD available? Sure, we had to have it fast, in our systems, for them to run on. And it couldn’t cost more than the existing systems.
Fast forward to today. A customer leveraging our small internal cluster isn’t happy with some of the CPU performance. Again, I have a sense that their expectation, and frankly, most customers expectations, are that they will get the latest and greatest for real cheap, as soon as it is available.
Moreover, they (current customer) were having issues with the queuing system. Smaller runs ran fine, but the bigger run had issues. Its obviously an environment interaction between their code and the scheduler. One that is resolvable.

Read moreAre HPC cloud users expectations realistic?

Throwing signs

Too funny: [had to update, as the folks putting the image up started blocking our link back to them … I thought we did this correctly … wasn’t trying to steal bandwidth]

Oh what a day

No details, but this is the sort of day I can do without in terms of excitement. Tonight is fight night in karate. Maybe I can suit up and hit with my good hand. Yeah, its been one of those days.

As the high performance storage world evolves …

Last year, say July time frame, if you asked me to name the top high performance computing file systems, and prognosticate who the up and comers were … well, you’d get lists much like I’ve said here in the past.
Lustre was the “king” and undisputed leader. pNFS was (sorry Bruce and team) effectively perpetually in the future (yeah, sort of like Perl6 … though we intend to play with both sometime soon … I hope). FhGFS was really niche. GlusterFS was an up and comer. Ceph was interesting, but way to early to see what it could do. There were some dark horses … Pomehlfs and others, that could make a case that they should have a seat at the table some time.
Now fast forward 8 months or so.
The Lustre community has multiple “leaders” none of whom want to say the magic F-word (fork, that is). Its future under Oracle has always been “fish or cut bait“. And, seemingly, they have cut bait. A valid business decision on their part, given the company, its direction, and its market set.
I am under NDA with a number of folks, so I can’t say much more about them. Lets just say that over the last month or so, I’ve come to the realization that some of the up and comers are not, at least in the HPC space.

Read moreAs the high performance storage world evolves …

My kingdom for good error messages … or something like that

I just spent too long tearing my (altogether far too few remaining) hair(s) out over a driver issue.
Qlogic 7240 IB card. Decent DDR unit. Our 2.6.32.22 kernel. Very stable kernel. Rock solid under ridiculous load.
OFED 1.5.2 with all the nice bug fixes etc.
And inserting/removing qib would cause all manner of kernel hiccups. So much for stability.
Well, that is, as long as the ib_ipath.ko, from the kernel RPM, was in there. Make that go away, and stuff works.
Yeah … stuff. Works.
Meanwhile, the error messages are exceedingly helpful. Like this:

Read moreMy kingdom for good error messages … or something like that