SLES 11 does not correctly support software RAID1 for boot disk

I’ve been chasing down a problem for a few days on a SLES 11 load. I’ve tried basic mdadm as well as the “Intel RAID”. Modified some of the mkinitrd scripts so that it doesn’t error out, and actually builds the initrd.
But it never includes the mdadm or the /etc/mdadm.conf files. So the boot with the new initrd can’t assemble the raid correctly, and can’t do a correct switchroot to the raid device.
This is annoying.
I pointed out here before that sometimes what distros do to differentiate themselves eventually winds up making things really bad. Be this poor choices of utilities and tools, or bad configuration options. Ubuntu has suffered from this disease as of late. 9.10 isn’t good.
SLES11 has its upsides. Some of it is good. But not allowing software RAID1 on OS drives?

Read moreSLES 11 does not correctly support software RAID1 for boot disk

A tale of an RFP gone wrong

… sadly this appears to be true.
Specifications were given, and we met the requirements, which entailed a demonstration of a particular level of performance over NFS.
In case you aren’t sure, we demonstrated a sustained 1GB/s over NFS between 2 boxes over 10GbE last year. There aren’t too many companies that can do this. Our results were with RAID6 storage target, and an NFS client with small RAM size. Total read and write size each was much larger than either system memory. This wasn’t a cache test, data was going to and coming from disk.
This stage set, a very high data rate over NFS was required for the RFP, which we met, quite easily. Our competitors … not so much. They never performed the measurement we did.

Read moreA tale of an RFP gone wrong

On the difference between marketing numbers, and measured numbers …

I should define what I mean by marketing numbers. These are best effort benchmarking numbers assuming the best of all possible test cases, with equipment functioning solely for the benchmark test purposes. These are not benchmark results you will normally achieve in practice. They represent an extrema in performance. Measured benchmark numbers are sensitive to … Read moreOn the difference between marketing numbers, and measured numbers …

On the importance of speed … part 1 of likely many

Raw end user accessible performance on data motion, data storage is rapidly becoming one of the most important problems in any HPC system. We’ve been talking about it for years, but its getting far more important by the day. And not just in HPC.
I just spent a long time on the phone with someone from a government agency talking about their need for high performance storage, and analytical capability. We hear these refrains quite commonly, FC4/FC8 is simply too slow for their workloads, and they need to go faster. Can we help?
Digging into this, listening to the problem, its self-evident that what this person is describing is very much an HPC problem, but couched outside of the HPC lingo.

Read moreOn the importance of speed … part 1 of likely many

I admit it, I am conflicted

We have been sent an RFP from a university we have some history with on bids. Our history has been, not winning the business. The winning bids sometimes (often) deviate wildly from the specifications as we read them. One thing I have learned from my experience with them is that the singular most important aspect of any bid is the price we present to them.
You might think “well … duh” buts its more subtle than that. That is, if we purposefully underspec’ed our bid, and offered the underspec’ed system, it would likely win. I was in a bid opening when I saw some nameless vendor do exactly this, at this university. And win.
If those were the only issues, we might not have any real conflicts.
Generally university bids are hard. You know a-priori that if you win, you will not make much money. We won’t bid on things that we will lose money on. And we are a business, so we can’t afford to pay a university to take our gear. That makes no sense.
But thats still not why I am conflicted. It was the last thing we bid on … actually the process … that I thought was … well …

Read moreI admit it, I am conflicted

"Sustaining" strategies and startups

I read an article on InsideHPC.com that I can’t say I agree with.
The discussion on creative destruction is correct. You create a new market by destroying an old market. That has happened many times in HPC, by enabling better, cheaper, faster execution. If our SGI boxen of days old were 1/100th the cost of the Cray YMP at the time, and 20% of the performance, who won that battle? In all but a vanishingly small number of cases, SGI won. The same system with an easy migration path (created by removing barriers to migration) eventually won the market from the more expensive platform.
There was nothing particularly new about the approach, it was a motherboard, ram, and IO channel. It was better/cheaper than vector machines of the time. And they decimated the vectors market.
Today, we see exactly this with GPUs. It took a while for NVidia to comprehend that end users will not port a large Fortran code to C/C++ just to use the accelerator technology. So they worked with PGI to get a fortran compiler out. And that massively increased the size of their addressable market by removing a barrier to adoption (I could go into a long series of posts why its a really bad idea(TM) to insist that all a user has to do is port their code to your nice shiny new system … this is a guaranteed path to failure … you have to make it easy for them to move … drop in replacement … at low cost)

Read more"Sustaining" strategies and startups