HPC in the first decade of a new millenium: a perspective, part 6

The recycling of an older business model using newer technology
ASPs began the decade promising to reduce OPEX and CAPEX for HPC systems. They flamed out, badly, as they really didn’t meet their promise, and you had all these nasty issues of data motion, security, jurisdiction, software licenses, utilization, and compatibility.
The concept itself wasn’t bad, create an external data center where you can run stuff, and pay for what you use. The implementation, atop expensive RISC hardware? Not so much good.
This market largely died in the bubble. But it gave rise to a concept of selling remote access to software as a service. Don’t set up your own apps, let us deliver the functionality to you. This is different than “let us create a virtual remote data center for you.” CRM applications grew out of this, as did other types of apps.
The Web 2.0 hype and the soon to be “new” web 2.0 bubble began in earnest, providing a “near desktop experience” for users. The desktop OS became less important than the web application. Soon security issues associated with clients became more important than on the server side.

You see, security is concerned with maintaining a market, allowing people to exchange goods and services for compensation, without losing the value of this good, service, or medium of exchange. Attacks against infrastructure focused upon the clients, where nefarious malware soon infiltrated far and wide. While not impervious to attack, Linux based supercomputers had become compromised thanks to week front doors (keyloggers on windows machines) happily passing information back to the bad guys.
This lead to a hardening of practice, and some (considerable) security theatre. Some of which is pretty scary, at my bank for example. But for the companies providing real, non-theatrical security, this diffused into the broader market, and solved one of the ASP’s overriding issues.
Next was their cost of infrastructure. Inexpensive servers could provide significant processing power. This power was growing ever greater by the year. AMD’s Opteron proved to be a great chip for servers, and with the growth in virtualization, and virtualized machines, suddenly the door was open to not just fixing a problem that had existed, but to demolish it. With virtual machines, you could run several complete systems per physical device. Which was great … to a degree. There is a cost to running these systems, specifically in terms of the performance cost of virtualization. This performance cost is not zero, or even in single digits of percentage. It can be quite significant, and sadly could result in a non-deterministic run time for code the previously was quite deterministic in run time.
Nevertheless, this opened the way for vendors to offer slices of remote machines for HPC. No longer were you constrained to one user/machine or one job/machine. Now you have N VMs per machine, and each VM can be dedicated per user or job. This allows you, the supplier of this hardware, to provide much finer grain billing, and better filling of your resources. Which solved another very hard problem in this ASP model.
Ok, it didn’t solve it right away. Some of the early “cloud” or utility computing efforts reflected particular biases or ideologies of their vendors. Sun famously screwed up their own offering by insisting everyone run Solaris for $1/CPU-hour. Yeah. That worked out well.
IBM had an on-demand service that was sold as a service, so it was hard to figure out what the costs were. No one was making it easy to use these systems.
Until Amazon came along with EC2. They made it drop dead easy to create/run VMs of many sizes. Its not a perfect infrastructure, but it is pretty good. And it worked. Companies started experimenting with it. The billing made sense, the setup/utilization made sense. Especially compared with other offerings.
Now there are a slew of companies competing with Amazon, offering similar types of systems. Platform as a services (PaaS), hardware as a service (HaaS), and similar models are rapidly developing. The Cluster as a service (CaaS) model still does work for some vendors. Notably Sabalcore. CRL in India has the capability to do good things here, as do Newservers. All offer something slightly different. All are worthy of consideration (n.b. In full disclosure, we work with Sabalcore and Newservers).
Moreover, this amorphous cloud is altering the HPC and other business models profoundly. A question that many groups are asking is, do we really need big iron in house, with all the costs associated with this, or can we get by with smaller machines, and rent out the time on the bigger machines when needed?
Customers are there, they are willing to do this. The software vendors? Most I have spoken to aren’t ready for this model. Many fear it. Most think it will do them in.
From what I have heard from customers, a fairly large fraction are getting sick and tired of waiting for these vendors to come around, and are starting to investigate and contribute to open source alternatives. OpenFOAM is the one I hear from CFD people most frequently.
And this gets to another critical area for this nascent market. The costs of moving some computing to the cloud. Here the issues are again, time, effort, and license costs. The latter has driven many of the decisions for users of licensed code over the past few years … HPC systems costs are but a fraction of the license costs in many cases.
So if you adopt a cloud platform with a per instance license cost, you are going to do your best to minimize the number of instances to keep your costs down. Which flies in the face of the cloud model, which is to keep the per instance cost as low as possible, and allow you to scale up your computations as you need.
Without reaching very far, I think it is fair to say that some commercial codes will be overtaken by open source codes in this next decade, simply due to customers wishing to lower their costs to compute.
Moreover, the ISVs have a strong incentive to differentiate and add value. One way they can do this is on accelerators. What if, you could run on your desktop, with a few accelerator cards, and get the job done without ever going to a cluster, or a cloud? Reduce the platform requirements, help reduce the customers cost, and enable more capture of customer’s money into their hands.
From what we have heard and seen (and been hired to do), there is a hard and fast rush to get code onto accelerators.
So now we have ASP 2.0, aka, the cloud. But it is much more than ASP 2.0, there are some really interesting things afoot.