“K” is atop the top500. What does this mean to us?

Not much.

No, I am not trying to be a downer. The relation of the top500 top-o-the-heap to mere mortals with hard problems to solve isn’t very strong. Actually its quite weak.

There is only one K machine. Its at RIKEN in Japan. There’s only one Jaguar, and only one Tihane machine. All are, to some degree or the other, unique in some aspects.

What matters to most people is “what can it do for me”?

Directly, not so much.

Indirectly … quite a bit.

These machines, ignoring the marketing hype of how they will cure the common cold (hint: they won’t, but it makes politicians happy to hear this, as something they can point to during re-election time), they do point the way to designs that we will need to be using in the future to achieve good performance on some subset of problems.

Not IO bound problems, mind you, but a small subset of problems well tackled by such architectures. This is fundamentally one of the problems in using the top500 ranking as “worlds fastest” indicator. Its worlds fastest … at running that code.

Before the graph500 folks chime up, its the same issue there. And the same issue with SPEC. And every other benchmark out there. There are codes for which all these world fastest machines will not be substantially faster than a desktop. No, really.

These are codes that don’t naturally scale well (99.999% of all codes)

These machines won’t cure the common cold, cancer, solve the US debt problem, come up with a viable solution to Greece’s likely default. What it will do is provide an architecture against which people can write codes to attempt to efficiently use as many cycles as possible.

We as HPC folk then need to take the architecture of the machines and push knowledge of how to program them into the main stream. No, not meaning windows. I mean using the techniques of parallelization and parallel algorithm design, and pushing these into common use.

Yeah, its kind of amusing but you can always look at a single processor core as having a parallelism of 1 CPU. When I taught a class on how to program these problems in the past, I gave the students the best advice I could on how to write such code. Assume as little communication as you can manage, keep everything local, and imagine that parallelism is simply a large loop around the major work producing portion, with an index running from 1 to NCPU (hey, I’m not a computer scientist, nor were the students, and counting for the rest of us starts at 1!!!). If you can subdivide your work up as a loop like that, you can write your application as a parallel code.

Yeah, its more complex with MPI, but not that much more so. Really, you can do everything you need in MPI, fairly efficiently, using just a few calls.

Aside from that, I don’t think most users would benefit from MPI. OpenMP was a great way to express the parallelism, and the PGAS languages now look like they are gaining interest. Would love to see people code real work in them. Run from desktop to supercomputer.

Well, ok, the dominant supercomputer of the future may be a humongous collection of iPhone/Android units. The former more likely, as the latter might run afoul of Oracle, and then there’s the java issue in general (slow on every platform). So coding for that will be harder.

Think about it though, as the K machine and others continue to grow, they will force us to develop new technologies that handle portions of the computing lattice being unavailable. Currently we try to handle it at the job scheduler level. I think in the future, much of this will need to be in the app itself.

Same on the IO side … IO will need to be absolutely reliable under all cases. IO failures will need to be handled the same way.

So K itself won’t directly impact us. But the direction its pushing in, yeah, this will have a long term, and profound effect upon us.

Viewed 19243 times by 5019 viewers

6 thoughts on ““K” is atop the top500. What does this mean to us?

  1. My first thought when seeing the list was “gee, I wonder if they screwed up I/O as badly as with the Earth Simulator…” But the interesting bit with K is that they hit a performance/watt target very near GPGPUs with a typical processor. It’s not just possible any more; someone’s done it, and in an area where GPGPUs are supposed to have a good advantage (dense linear algebra).

    BTW, as one of the Graph500 folks, we know that. The Graph500 is interesting for some non-floating-point problems and particularly for different architectures like Convey’s, but only a few of each. And the arguments about the next problem to include are ramping up.

  2. @Jason

    The Sparc chip Fujitsu uses is, from my understanding, vastly different, than the Sparc chips Sun and now Oracle uses. Fujitsu has been in the past, a significant HPC player, and I don’t think they’ve ever really given that up. They know their stuff, real well.

    What K shows us in the chip design space, is that there are mechanisms to wring more efficiency out of the silicon. In the power and performance efficiency world, this is wonderful, and lends further support to Intel’s MIC concept. I don’t think GPUs are going by the way-side, but this is definitely a shot fired across the bow of such units.

    It would be wild … positively awesome, if Fujitsu put out these procs in single, dual, and quad socket MBs, that were priced about where x86-64 units are now. Couple that with a debian or other open distro build, and they could do some serious damage in the market.

    But sadly, that isn’t likely to happen (though if you are from Fujitsu and you either want to correct me in private/public, feel free to post here, or email me at the day job)

    As for Graph500, I am glad to see this effort in general, but as noted, its really hard to get a good read on worlds fastest.

    As a research problem, it would be neat to compare rankings in the various metrics, and see if we can derive any sort of empirical formula with fitted values, that accurately predicted ranking, and then compared this to other metrics. This way we could get a feel for the impact and prediction power of the various metrics. Anyone want to work on a grant? 🙂

  3. @James

    ROTFLMAO!!!

    Yeah, that about sums it up 🙂

    But they do have some neat tech, look pretty, and make politicians swoon …

  4. Given all the submitted information on Graph500 runs, it’s even *harder* to get a handle on the “fastest.” We don’t have a decent normalization yet. I propose number of pathways to memory, but that’s unfortunately still difficult to quantify uniformly. (BTW, Graph500’s not a funded effort. The first push was, but now it’s spare time.)

    The other fun bit with the Sparc is that the base plans all are out there, GPLed. I’m painfully aware that there’s a lot more to making a chip, but it’s a good start. Sparc’s hardly beautiful, but maybe this push will drive more interest. I’m not liking the Intel v. ARM assumption about the future. There should be more possibilities (and MIPS from China is one). Not terribly difficult to support the different architectures on free software, and generally you catch more bugs that way.

Comments are closed.