The grid is dead … long live the … er … grid

I saw a post linked from Ken Farmers excellent site. In it the author leads with a title of “Grid computing being doomed.”

Ok …
Reading further, it seems that the conception of what Grid computing is has morphed a bit. With the rise of the SaaS fad (long term fad, unless it can show real demonstratable ROI for everyday apps) and VC’s pouring money into this willy-nilly, it turns out that “Grid” is no longer fashionable for VCs. Or companies.
Well, ok, it is more complex than that.
The idea behind the grid is fundamentally that computation is portable, can move to where the data is, as data motion is often a limiting factor. Well, it is a limiting factor between HPC systems. If it is a limiting factor within your HPC system, you need to reconsider its design. But that is another thread.
With portable computation, the hard issues become, not so curiously, a) scheduling, b) licensing.
Scheduling is sorta kinda solved. Sorta. Kinda. Limited resource allocation is a thankless task. You know you are doing resource allocation right when everyone is equally pissed at you. Or if you have lots of overcapacity, if everyone is happy. If one group is more pissed than the other, you have to rebalance the pain. Some tools work really well at this, some are overly complicated, and some are just hopelessly an irreparably broken. I won’t name names here, schedulers and resource managers have become a new emacs vs vi argument. If you don’t know that that is, think of a large expenditure of time for no particular purpose.
Licensing is “harder”. Well, this is not true. Licensing should be easier. Much easier. Its that particular dominant licensing vendors haven’t really adapted to the new realities of the world, so their customers using their products are kind of SOL about adapting to the new paradigms.
The market is ripe for a competitor to emerge with some new ideas and designs, which could enable real SaaS behavior in a sane manner. We have some neat ideas in this, but neither funding nor time to pursue.
At the end of the day SaaS is a rehash/respin of ASP. ASP was a bubble era phenomenon. The argument for ASP was that the overall costs of computation are lower when you outsource everything. The reality was rarely this. ASP vendors often had huge infrastructures that they had to pay for which pretty much destroyed their ability to lower prices. They had to have rapid churn. Fast application turn over. All of this costs money. Their business models were typical bubble variants: build it, grow big, and hope for an IPO or buyout.
How is SaaS different? Both had concepts of run it elsewhere. Both argue that the cost models are lower.
The difference is in part due to the “grid”. You don’t need centralized systems (at high cost) anymore. You can have distributed machines all over run stuff.
How is this pertinent to HPC? ASP never really took off in HPC. Some folks wanted it, and gave us approximate costs we needed to be below in order to win business. The idea is that for an ASP to work, it cannot own hardware. Or applications. On the hardware side, it could buy reasonably good cycles from cycle shops (Sun, IBM, …). On the application side, you app had to install and work pretty much out of the box, with no licensing pain. This is where IBM / HP is doing it right and Sun is not. Most of the apps will run on windows and linux, out of the box w/o problems. Add Solaris to the mix as Sun does, and require it, provides an incentive for customers to seek other services. Remember, this is a service. If it doesn’t offer what you need, you simply select a competing one. The cost of selecting another one is low. Sun unfortunately, and to their detriment, makes the mistake of assuming that this will cause more solaris ports to occur.
For SaaS to work, the OS choice cannot be pre-decided. Our clusters can select the OS “load” at run time. No re-imaging necessary. Just boot and run. Whichever OS your app needs. This is where customers appear to want to go.
At the end of the day, grid is not HPC, though with careful design, it can be used in this context. SaaS could work, but it requires replacing broken/non-adaptable licensing systems with more intelligent systems. This is not hard, but apparently no one wants to fund this effort. No one would do micropayments for using excel or word or …. when local and free alternatives exist. For services that make sense, software that is hard to install and maintain, or requires resources that are hard to acquire, SaaS could make a great deal of sense. Look at Google as a huge SaaS provider. Search is a great service. How many people would pay for the privilege of not seeing ads on their google searches?
Likewise, HPC can be a service for some group of apps. We have argued this for a while. And have a business plan with a costing model that we have run by prospective customers. They loved it, as they understood it. Could work real well. All it requires is, again, capital.
And that is something that VCs don’t want to hear about. HPC SaaS/ASP models that can work.
The grid isn’t “dead”, it is being properly subsumed. It is part of resource virtualization. It is more about what you can do with what you have as compared to being an end in and of itself. It is a vehicle for delivery of computing cycles. It is the means by which SaaS will work (and maybe even shed its fad beginnings).
Of course, all of this ignores the 800 kg elephant in the room. Data motion. This is the hard problem.