Cloud computing for HPC

John West of Inside HPC wrote a great response to my response to Deepak of BBM. My arguments were that to enable cloud computing to work economically, one has to consider all of the costs (infrastructure, pipes, computing, people, …). John’s response was that yes, and sometimes you need an act of congress to get even moderate sized infrastructure.
I probably need to clarify my thoughts. I am a firm believer that this sort of computing will likely happen. I don’t think it will take over everything, but it will likely have a significant impact upon various markets. I tend to distrust marketing messages, and the “grid” has been a marketing mantra of so many companies that the real grid lost its real meaning somewhere down the line.

John’s point about the infrastructure is very interesting. I was unaware of the budgetary issues in building a facility for a government funded entity. In business it largely all comes from the same pot. In academe they sometimes ignore (or gloss over) the infrastructure.
Clouds are and will be important for HPC. But to get there, we need those fast pipes, and lots of them.
I have always been convinced that HPC is an enabling technology. You can do more in less time with it. It enables productivity and creativity. So you want more HPC to enable people to work better, faster, smarter.
And this is driving changes in the market. The argument I have been making for 14 years now is that HPC always (relentlessly) drives down-stream/down-market. It will destroy smaller higher margin markets in favor of larger lower margin markets. Not at a constant dollar volume over time, but a rapidly growing dollar volume.
Cloud computing could potentially be one of the destroyers of traditional cluster computing, which, did a number on the traditional larger SMP market, which itself did a number on the vector market … But not the only destroyer of that market.
The issue with cluster computing right now is how to deliver the maximum number of cycles as inexpensively as possible to the end user. This isn’t a question of OS, or language (well a little, maybe). It is a question of how can we maximized the number of instructions per clock cycle available to a given user per dollar (or other denomination, per euro?).
This is where clouds (grids reborn, hopefully with less filling, better tasting marketing messages) can do good things. I need 1 gigacycle for 100 kiloseconds for my problem, I can run this on my office machine for 2 days. I need 100 teracycles for 1 megasecond for my problem, well, that is going to cost. Not just acquisition cost, and maintenance, but data motion cost.
The mere availability of some method to acquire the cycles independent of the other costs is intriguing, and potentially a significant enabler in and of itself.
My thesis is simply that the other elements really need to be there before we see this take off in earnest. The other elements are the really fast network pipes, and (for commercial software users), sanity w.r.t. licenses (not just pricing, but dealing with license rentals on large clouds).
FWIW, the day job built a business plan, found customers, and applied to the state of Michigan for funding 2 years ago, to do this, anticipating that the time would come where this became the norm. We had potential investors lined up, we spoke to a few software vendors who were interested in testing it out, and we have 1 fortune 500 interested. Sadly, the state didn’t grasp it (as part of the “21st century fund” in Michigan).
Basically, this should hopefully dispel thoughts that I am not in favor of it. I am, we just need to see FIOS everywhere (hear this Verizon?), 100 Mb to the home and gigabit to the office.
But this isn’t the only change going on. At the low end, these little desktops now have 8 cores and 16 GB ram. They can scale to 32 cores and 128 GB ram. So why get a small cluster when you can manage a single machine?
Now add accelerators into the mix.
Part of the reason that I am gung ho for specific technologies there is that some of the acceleration technology can get us 2 orders of magnitude or better performance. 1 order of magnitude (faster than a single core) is no longer really meaningful … I can throw ~10 cores at it easily. On my desktop.
Combine good multicore programming, with excellent accelerator technology, and the low end of the cluster market, the largest and fastest growing segments (last I looked at the IDC data) sort goes away (or is pushed aside in favor of an easier technology whtat will be more widely distributed). Its easy to manage a desktop, and you can run the (64 bit) OS of your choice on it.
And it plugs into a wall. Is a single machine. Nothing special.
Get as much bandwidth to disk/memory as you can afford. Not networks needed.
At the high end, tie it seamlessly into a back end cloud. So you can run your small and large programs.
But to get there we need the pipes. And the business models of the cloud providers needs to work … running services at a loss is not a good way to stay in business.