“K” is atop the top500. What does this mean to us?

Not much.

No, I am not trying to be a downer. The relation of the top500 top-o-the-heap to mere mortals with hard problems to solve isn’t very strong. Actually its quite weak.

There is only one K machine. Its at RIKEN in Japan. There’s only one Jaguar, and only one Tihane machine. All are, to some degree or the other, unique in some aspects.

What matters to most people is “what can it do for me”?

Directly, not so much.

Indirectly … quite a bit.

These machines, ignoring the marketing hype of how they will cure the common cold (hint: they won’t, but it makes politicians happy to hear this, as something they can point to during re-election time), they do point the way to designs that we will need to be using in the future to achieve good performance on some subset of problems.

Not IO bound problems, mind you, but a small subset of problems well tackled by such architectures. This is fundamentally one of the problems in using the top500 ranking as “worlds fastest” indicator. Its worlds fastest … at running that code.

Before the graph500 folks chime up, its the same issue there. And the same issue with SPEC. And every other benchmark out there. There are codes for which all these world fastest machines will not be substantially faster than a desktop. No, really.

These are codes that don’t naturally scale well (99.999% of all codes)

These machines won’t cure the common cold, cancer, solve the US debt problem, come up with a viable solution to Greece’s likely default. What it will do is provide an architecture against which people can write codes to attempt to efficiently use as many cycles as possible.

We as HPC folk then need to take the architecture of the machines and push knowledge of how to program them into the main stream. No, not meaning windows. I mean using the techniques of parallelization and parallel algorithm design, and pushing these into common use.

Yeah, its kind of amusing but you can always look at a single processor core as having a parallelism of 1 CPU. When I taught a class on how to program these problems in the past, I gave the students the best advice I could on how to write such code. Assume as little communication as you can manage, keep everything local, and imagine that parallelism is simply a large loop around the major work producing portion, with an index running from 1 to NCPU (hey, I’m not a computer scientist, nor were the students, and counting for the rest of us starts at 1!!!). If you can subdivide your work up as a loop like that, you can write your application as a parallel code.

Yeah, its more complex with MPI, but not that much more so. Really, you can do everything you need in MPI, fairly efficiently, using just a few calls.

Aside from that, I don’t think most users would benefit from MPI. OpenMP was a great way to express the parallelism, and the PGAS languages now look like they are gaining interest. Would love to see people code real work in them. Run from desktop to supercomputer.

Well, ok, the dominant supercomputer of the future may be a humongous collection of iPhone/Android units. The former more likely, as the latter might run afoul of Oracle, and then there’s the java issue in general (slow on every platform). So coding for that will be harder.

Think about it though, as the K machine and others continue to grow, they will force us to develop new technologies that handle portions of the computing lattice being unavailable. Currently we try to handle it at the job scheduler level. I think in the future, much of this will need to be in the app itself.

Same on the IO side … IO will need to be absolutely reliable under all cases. IO failures will need to be handled the same way.

So K itself won’t directly impact us. But the direction its pushing in, yeah, this will have a long term, and profound effect upon us.

Viewed 14643 times by 4170 viewers

Facebooktwittergoogle_plusredditpinterestlinkedinmail