# COTS supercomputing a danger?

An article on HPCwire suggests that we live in dangerous times. Specifically

The HPC ecosystem is in perfect balance, with little investment and innovation in both hardware and software. We’re in a precarious position now. The community is able to benefit from the COTS market, but it’s anyone’s guess how long we’ll be able to thrive there.

hmmm …

We have limited choices due to economics and market evolution. Way back when RISC was still hot, many people ignored those pesky CISC machines coming up. When those pesky CISC machines started putting down benchmarks of 0.25-1.25 of the performance of the RISC machines, at 1/10th their cost, people started to take serious interest in using them. Anyone out there buying RISC anymore? Well in a few markets, yes, but they are niches and aren’t HPC.

Looking forlornly at the past is no way to advance a market. We have cards (pun intended) that we have been dealt, and we can either play them, or walk away from the table.

Accelerators are not overhyped as this article claims. You want overhyped? Look at “The Grid”(TM). Accelerators provide us a way to exploit more efficient or many more instructions per clock cycle than existing systems. Multi-core is an attempt do do the same thing, more instructions per clock cycle. As is SIMD.

Its all about more instructions, or more efficient instructions per clock cycle. It always has been.

RISC was about making the internal CPU pathways easier and faster. Vectors were about hiding latency (after an initial large latency) and parallel execution of a single instruction (hey … SIMD!). Multi-core is all about how to increase the number of instructions per clock cycle.

Of course when we do this, we run into a number of walls, some of them very hard, very difficult to overcome. Like the bandwidth wall. When each core can completely fill up the memory bandwidth, increasing the number of cores increases contention, and forces you to rewrite your algorithms with this contention in mind. That is, unless you like the Intel benchmarks showing you 50% performance increase by doubling the number of cores.

Resource contention is the fruit that will be borne of the multi-core seeds. How we program with this in mind is critical.

The article also notes:

There is a great deal of hype and promise for accelerators. However, even here we depend on the commodity market to drive the technology and development, and hope to gain what benefit we can. We are in the dangerous position of depending on the scraps that fall off the PlayStation table — and if they take their picnic and go somewhere else, we’re in real trouble. If you think this is silly, try asking NVIDIA to add a feature to their graphics cards that will speed up your application but will hurt graphics performance. I can hear the laughter already.

I won’t fisk this. Just point out that it is in the interests of the various players building these commodity systems to make sure their commodity systems have as wide an applicability (e.g. range of market) as possible. Small runs yield high prices (c.f. ClearSpeed, FPGA), large runs yield economies of scale and lower prices (c.f. nVidia, Cell, …). Basic economics.

The issue isn’t whether or not nVidia will add a feature or not. The author assumes that nVidia isn’t committed to the HPC market, so therefore feature requests would not be given significant attention. nVidia sees the writing on the wall, and knows that this is a fast growing and large market for them to expand into. They know they have an economy of scale advantage relative to other accelerator vendors. I have found nVidia to be responsive and concerned about how to address issues with their units. I also think it is possible, given that the author works for a compiler company that doesn’t support nVidia products, that at least a little of that is seeping through.

Whats going on in the market now with accelerators is akin to what happened with clusters in the late 90s early 2000’s. Enough people are playing with them, and enough interest exists in them that critical mass is rapidly approaching. The large leading edge systems all use accelerators of some sort or another. The writing is on the wall. Denying the commodity based future won’t forestall it.

As for long term government investment in a tool/market, this has proven itself, time and time again, to be a bad thing for the market. It badly biases the market towards that customer. And when political winds and funding changes, well, a good company with a great idea, dependent upon a government customer, can go down the tubes. Cray almost did. As did many others. I argued elsewhere that T&C killed LNXI. This was a shame, as they were a good bunch. The T&C are imposed on these large government and university contracts. Sadly, some smaller university contracts seem to like them as well.

Create a small market with onerous T&C, very limited opportunities for revenue, and you get flameouts, lack of choice, and smaller vendors with real value add passing on opportunities to have thumbscrews put to them.

HPC has been, and continues to move downstream. It ruthlessly destroys old time players and technology, and relentlessly adopts new technology. Vendors chose to adapt or not, based upon their business conditions, their market imperatives. As HPC moves downstream the size of the market continues its exponential growth. With 16 cores in a deskside box, like some of the day job’s Pegasus workstations, if this meets the customers needs, why would they buy a cluster? The era of personal supercomputing is upon us (I have been talking about this since 1999 or so), in a very literal sense. HPC is moving to the desktop. As some have noted, we might need a new term to describe this. HPC sort of doesn’t fit. But it is HPC all the same.

Tool vendors, compiler vendors, etc can play a huge role in this. We need to use these more cores and limited memory resources more effectively. We need newer algorithms. I haven’t linked to Amir’s blog postings recently, but he has several recent articles well worth the read. Basically we need “simple” standardized tools to deal with accelerators, just as he argues we need standardized and simple tools to deal with reconfigurable computing. His thesis is quite correct … those tools are the critical aspect to lowering barriers to adoption by ISVs.

Look at it this way. The good folks at Ansys note that Fluent runs take a long time. If they can get it to run on the nVidia accelerator, then the customer’s need for more high priced gear (as compared to more Fluent licenses) to get higher performance goes down. The customer can buy more performance for the same price (assuming that Ansys and other ISVs approach the pricing model sensibly … they currently do so by “core” which is problematic … 128 cores in the nVidia chip … what would be the Fluent license cost?)

But the ISVs do note that by reducing the cost of the systems required to run their code, they may in fact attract more users, thus increasing their revenue. This is not lost on them.

I remember hearing about the inexorable march towards the economics of the grid. The concept, either screeen saver cycle stealing, or odd distributed computing, required rewriting codes to take advantage of this model, and yielded dubious performance gains, mostly for throughput dominated analysis. For high performance analysis, you need tightly coupled parallel/fast cores. Which is what accelerators give you. The economics of this are hard to beat. For 128 cores, I will pay $500 USD, and get ~10-20x my native PC speed on some apps. This is the elevator pitch for accelerators. This is why$5000 USD for 10x/core makes no sense. You need 10x per application compared to the host machine with all its cores. Since the nVidia chips are pretty much everywhere, this is likely to be the model most will prefer. As Amir pointed out, the incompatible and incredibly expensive FPGA tools make no sense in this market. You need inexpensive and portable.

Note also that Intel is looking at adapting plain old CPUs to act as an array of processors to handle GPU (and other tasks) 80 cores on a chip. Programming it will be a bear, as you will hit resource contention issues gone wild. But it is coming, and I expect that technology to battle GPUs for a while.

Viewed 6280 times by 1155 viewers