Current "fastest" supercomputer is … APU powered !

Go figure.
Between 6 and 3 years ago, when we were pitching HPC accelerators to VCs, trying to convince them that it was inevitable that supercomputing was going this route, we (optimistically) predicted that the worlds fastest machine would be Accelerator Processing Unit (APU) based in 2012.
Well, we were wrong. November 2010 is the correct answer.
My expectation is that many HPC systems (probably most) will have some sort of APU technology (GPUs, vector extensions, Larabee like things, Tilera like things). Programming them efficiently is going to be hard. Very hard. Happily I know some groups working on this problem, who have IMO, exactly the right approach.
I’ll be back with more writeups, including things on Lustre, Ceph, Bright Computing Manager, GlusterFS, and some bits on the state of the market. One particular article at InsideHPC needs some commentary … I disagree with a number of things in it, and I want to get into that in depth. Not a fisking, but a detailed analysis.

4 thoughts on “Current "fastest" supercomputer is … APU powered !”

  1. “Happily I know some groups working on this problem, who have IMO, exactly the right approach.”
    Could you provide some more details?

  2. Your scare quotes around “fastest” are appropriate as I’m quite fuzzy as to the performance of any of the top-n machines on non-linpack workloads. My fear is that sustained application performance is a mere fraction of the linpack numbers; especially for anything communication or bandwidth intensive.
    Any insights on where we really stand?

  3. @Bill
    The scare quotes are there because the meaning of “fastest” is dubious at best. Fastest at running a program that is not in use as a day-to-day research tool, other than for ranking systems? Well, that tells you about how useful it is for real world scenarios (it isn’t).
    Many folks argue that the code is a reasonable replacement for real tests (it isn’t), and some (quite blindly) use the numbers for comparison.
    Speed only matters on your app, and what you have to pay to double/treble/… that speed. If a machine 10x the price is “10x faster” on some benchmark, but 1.5x faster on yours, was it worth the extra cost?
    Thats the real question, and thats why there are scare quotes.
    As for being able to use only a fraction of the performance, this is true in general. I’ve observed compilers generation not simply suboptimal code, but outright terrible code on very simple cases (loops). I don’t anticipate that most of the languages today are going to break this model/mold correctly going forward … their design is one to enable abstract representation of a language and a machine, hiding the dirty secrets of NUMA, of multi-cores, of memory hierarchies in general, of networks with real bandwidths and latencies. Until this is addressed intelligently, our compilers are going to keep generating crappy code.

Comments are closed.