The coming bi(tri?)furcation in HPC, part 1

This will be short. Mostly a hat tip to Doug Eadline who in a very recent article talks about something we have been talking about privately for a while.

Read the article, and afterwords, ponder a point he was discussing:

If one assumes that cores counts will continue to increase, the 64 core workstation may not be that far off. Back in the day, a 64 processor cluster was something to behold. Many problems still do not scale beyond this limit. Could we see split in HPC?

I believe so. Doug cautions people to not read into his words too much. This said, we are building very muscular desktops sporting 24 cores, 256 GB ram, 1+ GB/s IO channels, and accelerators of several flavors. Each of these machines may be sufficient for what clusters used to be sufficient for in the past.

What I like to point out is that the script that HPC has followed in the past in terms of market change and growth is being followed again.

First, the idea of “good enough” and “inexpensive” rules. Why buy a Mercedes when a Honda will do the same job, just as well, with similar levels of comfort. Maybe not as much cache’ on the nameplate, but just as good (if not better) on the inside. Where it counts.

Second, HPC as a market, always … always … goes down market. Many companies whom have not understood this have been destroyed. A fair number of others are likely to be destroyed, because they don’t grasp this. As many of us said when at SGI, you can’t paint a box purple and charge 3x the price for it that your competitors do.

Twenty years ago, vector supers began to see the glimmering of a challenge from the killer supermicro’s (no not Supermicro.com, I mean machines that were single core, shared memory buses with ‘large’ memory systems … several gigabytes in size). I ran on those (vectors and the supermicros).

Fifteen years ago, the battle was over, and supermicros had won. There were these new Pentium II systems that most in the supermicro world looked down on. I ran some tests on those, and found that the cost benefit analysis was going to favor them in the longer term. 1/3 the performance for 1/10th the price. I guestimated in 1995 that SGI had 5 years to make a technology shift or get left behind.

Ten years ago, clusters started emerging with a vengence. I still remember (and recently found in an old sent-mail archive on a machine I am discarding) a benchmark I ran in 1999/2000-ish time frame for informatics codes and fast R10k/R12k processors. The Pentium were faster. And much less expensive. A bunch of us pushed SGI internally to get into the linux cluster market, because we believed it would be big. Some of us also wanted to make Irix cheap so that our fans could buy a used O2 on Ebay, and get Irix OS and compilers cheap. This is a really … really good way to jumpstart application porting/development. But also by then, I was playing with Linux side by side with Irix. I could see the writing on the wall.

Five years ago, the last major supermicro’s finished their retreat to the very high end (shrinking portion) of the market.

From the top500.org data, have a look at where the green abruptly terminates.



The supermicros were the SMPs.

One year ago, accelerators began their emergence in earnest.

In every case, the impact on the market, the vendors, was severe. Cray almost went under 15 years ago. They are doing well now. SGI went under. Twice. Many exited the market, or were bought up by rivals. Convex was bought by HP, as was Compaq. Who had bought DEC.

But the impact on consumers was profound.

Price for performance dropped, usually order(s) of magnitude. While you might not be able to sustain something near peak performance, what you were able to get was “good enough”. Or, as often happened, the new stuff on the block was better, cheaper, faster, and the older companies pretty much had to buy every piece of business they got. Which drove them under, or to be sold off.

Not only that, the size of the market was driven much larger. About an order of magnitude larger over 1990-2000, another about order of magnitude from 2000-2009. What was once a 200M$ market became a 2B$ market and now a 15B$ market.

Understanding what the technology which is going to alter the face of the industry and cause disruption is what VC’s want Entrepreneurs to develop, and in theory anyway, they will help build companies to cause this disruption. Unfortunately many VCs are now busily distracted by failing revenueless and profitless web 2.0 social media companies (aka black holes for capital), as well as LPs who are unhappy with their returns. Couple that with a decidedly un-sexy market … and you have a recipe for very little capital. Which makes it harder unless your company is self boot-strapping.

And the technologies have emerged. In a little self-aggrandizement, I picked accelerators years ago, and was dead on right. Just like with clusters. So we know one of the emergent technologies. What about the others?

A big issue with clusters is the up-front capital cost. What if the cost to stand up the Nth node (N=1 … some large number) were a marginal/incremental fee? What if you didn’t need to bear the capital cost? This is where clouds sort of fit in. This is what they promise. The one missing piece for them to really take off in HPC is the data motion piece. As I have pointed out, this is non-trivial … over a network. It is not cheap. But the costs on the compute side scale well, and if you leverage Linux as the OS, your TCO approaches zero. You don’t need to own/maintain it. The service provider will. You just need to install your own app, or pay them to. And off you go.

Also a big issue is control of the resources … IT organizations with draconian support/deployment policies often impede research/engineering/HPC systems from operating. They make it too expensive to run. So we are seeing more users elect to buy a special desktop. Which has many processors, lots of memory. They can have control over it. IT can be excluded. They run Linux on it. Run windows on their laptop. Or in a VM on the machine. We have customers whom have built clusters of these to run their CFD rather than have IT control the machine. More to the point, end users can run their HPC apps on these machines, and as the core counts, processor and system speeds increase, there will be less incentive to spend for the HPC infrastructure around clusters. The startup capital costs are far lower.

So what I see as the up and coming generation are these personal supers. They currently offer compute power once available on small to moderate sized clusters. Back these up with a remote cluster in your machine room, or at Newservers, Amazon, Tsunamic Technologies, and you have local and remote power for your computing. The only remaining issue in the remote power is the data motion, and this is solvable if need be, with Fedex/UPS. That is, it is an eminently solvable problem, even if it is not elegant to solve.

So when Doug postulates,

If one assumes that cores counts will continue to increase, the 64 core workstation may not be that far off. Back in the day, a 64 processor cluster was something to behold. Many problems still do not scale beyond this limit. Could we see split in HPC?

I think the answer is a resounding … yes. We will see a bifurcation, with purchased clusters occupying the higher end, and muscular desktops with ample computing, graphics, and IO power occupying the lower end, especially when coupled with a cloud HPC provider.

And as with the previous sea changes, I expect the addressable market to grow much larger. Interestingly, several months ago, a commenter on Storagemojo.com derided the coming open source nature of storage software, suggesting it would take a $30B market and turn it into a $3B market. Odd comment, as this flies in the face of what we have seen in HPC, and other markets with open source has been leveraged with great effect. Open source has been a boon to HPC, lowering costs of scaling up. Which has enabled more people to scale up. Won’t be different in storage either. It will disrupt the old order. In order for new markets to be created, some must be destroyed. And that destruction is stressful, especially if you resist change.

Just my thoughts.

Viewed 5528 times by 1132 viewers