Why … oh … why …

Dear Red Hat: You put out a good product in RHEL 6.x. Ignoring the (often massive) performance regressions, other things are better/more stable. Dracut, is growing on me. Actually liking being able to debug startup. But, this said … I have to inquire … Why on earth did you include an End-Of-Lifed version of Perl … Read moreWhy … oh … why …

RIP Kyril Faenov

Kyril Faenov of Microsoft passed away several days ago. He was one of the visionaries and leaders behind Microsoft’s HPC effort. He was also a nice guy, one whom I had a chance to talk with several times over the last few years. One of the bright folks you like to challenge. I respected him … Read moreRIP Kyril Faenov

Stress analysis of a market … does this explain Facebook's IPO issues?

c.f. this post at ZeroHedge. The problem is we don’t transact in a normal world, but one dominated by central banks and algorithms – which is why the most pressing question for those who grasp the real new normal is how come in a market as controlled and manipulated as the central bank-dominated venue we … Read moreStress analysis of a market … does this explain Facebook's IPO issues?

Misalignment of performance expectations and reality

We are working on a project for a consulting customer. They’ve hired us to help them figure out where their performance is being “lost”.
Obviously, without naming names or revealing information, I note something interesting about this, that I’ve alluded to many times before.
There is an often profound mismatch between expectations for a system and what it actually achieves.
This is in large part, why we benchmark and test our systems in as real configurations as possible, and report real numbers, while many (most) of our competitors make WAGs at best case/best effort/best condition theoretical numbers.
This said, part of the problem with performance expectations are the assumptions underlying it. One of the things I rail on the current Gluster marketing efforts about, are related to the same assumptions. And these assumptions are used as the basis of statements (and in some cases, marketing materials) that are … wrong … at best.
But its not just Gluster marketing that has this as a problem. These core assumptions are often (completely) wrong for a fairly wide range of things, and yet they are used as the basis for many marketing claims that are, at best, specious. Worse than this, are benchmark tests that are fatally flawed, unrepeatable, or somehow or the other broken, and poorly representative of workloads.
Add it all up and you have no real mechanism to predict performance based upon what is published.
Ok … here’s a simple example. You have a disk. Lets call it a SATA 7.2kRPM drive. You can get 100 MB/s out of it (ok, I know its a little more today, just using easy numbers to make life simpler). If I have 10 of these, I get 1GB/s, right? and 20 will give me 2GB/s. #winning !
Not quite.

Read moreMisalignment of performance expectations and reality

siFlash tuning

We’ve been tuning our siFlash. Not done yet … not done, but look where we are. 24 simultaneous streaming (non-cached) reads. Run status group 0 (all jobs): READ: io=193632MB, aggrb=7781.4MB/s, minb=7781.4MB/s, maxb=7781.4MB/s, mint=24884msec, maxt=24884msec Yeah. Baby. Added another almost GB/s to the read performance. Streaming write performance is hovering around 2.6GB/s. Remember, this is a … Read moresiFlash tuning

What high performance isn't

We’ve had a number of interesting interactions with customers over the last few weeks. They all seem to center on, and around, how to get high performance out of gear which isn’t designed for high performance. Generally speaking, you can’t. High performance requires a mixture of design and implementation, with well designed and implemented parts. … Read moreWhat high performance isn't

An NFS gotcha

As we rebuild our server infrastructure (aside from taking time to do things more intelligently), we run into some bumps. This one sorta threw me for a bit. [root@virtual ~]# mount -a mount.nfs: Stale NFS file handle mount.nfs: Stale NFS file handle Checked all the usual suspects. No dice. The /etc/exports was correct, and visible … Read moreAn NFS gotcha

When core assumptions that should never be wrong, do turn out to be wrong

So … where does this tale begin?
We had a nice backup system in place at the lab. Twice a week, all the important servers would happily sync their contents to this unit over Gigabit ethernet. It worked well, we were happy.
Place that snippet in the background, it will come up again.
I’ve told our customers for a long time that RAID is not a backup. RAID is RAID, it gives you time to recover from a failure. But it is not a backup in and of itself.
And it gives you time to recover from some failures. Not all failures.
RAID1 lets you survive a single disk failure.
RAID5 is similar to this.
RAID6 lets you survive 2 disk failures.
Our primary company storage is RAID6. With RAID1 for home directories, OS, important files, etc.
It was being backed up 2x per week.
At our new place, last week, Tuesday morning I believe, I came in and found circuit breakers tripped. Odd I thought. The UPSes were howling at me, one had already shut down.
Those among us whom have managed hardware for any length of time know where this is going. And they are right. But, please do continue to read on, as there are some twists. And not happy ones.

Read moreWhen core assumptions that should never be wrong, do turn out to be wrong