This could be game changing for lots of users

Amazon announced the EC2 availability for HPC users. As per the article on InsideHPC, previous incarnations of EC2 didn’t really work well for low latency jobs or large runs. They still have a storage issue (e.g. storage performance and parallel IO), that we’d be happy to help with.
Why is this potentially game changing for the market? A number of reasons.

You can exploit a complete pay-as-you-go view for whatever you want to boot up (minus accelerators). This is similar in some ways, to the approach of NewServers, our partner in Florida. It is different than the approach of our partners Sabalcore and CRL, in that they have a pre-configured and operational system ready to use.
Second, interesting about the EC2 approach is that it really pushes the OS down into a detail of the system, not a major decision point apart from costs. So if you have the choice of booting up two clusters that are otherwise identical, and one has an OS that charges you extra per node to us/run, why would you do this relative to the nodes that don’t have that extra tax on them?
Third, you pay for what you use. A capex just became an opex.
I think this EC2 system, the CRL machines, the Sabalcore machines, and others are going to dramatically alter HPC usage in general for larger than desktop sized runs.
And thats another point. Desktop/deskside units can have 48 processor cores and 512 GB ram. Why would you buy a cluster of that size when you can have a single machine that does this? This is IMO a part of the best set of arguments for ScaleMP’s vSMP offering (massive simplification of systems management, lowering of OS support costs, increase of memory available). Add in accelerators, and high performance storage, and you can have a hellaciously fast desktop system that can out-compute clusters from 2 years ago or less. So why would you get a cluster for these smaller problems? Yeah … thats the point.
I noted a little more than a year ago that HPC was about to fragment. This announcement is going to accelerate the process.
This is good news. Every major upheaval in the market (and this model is an upheaval) is accompanied by an at least partial destruction of the old order. When the costs start working out that it makes more economic sense for universities to buy cycles (as Teragrid is going away, and its replacement XD, could start requiring people to commit actual $$ to support operations … this last bit is speculation, but fundamentally, we need some sort of chargeback to insure continuity, absent a permanent recurring grant of ever larger size) I expect to see a great deal more computing going this route.
What I don’t see going this way yet are specialty computing clusters, basically dense GPU machines, or more generally, dense accelerator machines. This could change though.
Again, great news for the HPC community!

1 thought on “This could be game changing for lots of users”

  1. One annoying problem is developing codes that scale to massive sizes. The commercial HPC approaches above are great for smaller cluster (new desktop) sizes and established codes, but development of new codes and scaling to larger sizes still needs the flatter structure.
    While it’s nice to say that grants can cover it, etc., having every test/debugging run charged as money puts a serious damper on the number of tests. Hours are bad enough, but often they can be shuffled around as needed. Perhaps it’s that last part that is important. People are more willing to share something abstract like charge hours than concrete like money.
    I suspect many people are worried about charge-oriented rather than hour-oriented computing on the development side, so I would be shocked to see XD go far along the path of extracting funding from users. Or perhaps someone has a wonderfully clever idea that will up-end the whole thing.

Comments are closed.