Breaking mirror symmetry in HPC

If you are not already reading HPCWire on a regular basis, I do recommend it as one of the “must” weekly aggregation sites. They have an interesting article on the “coming” heterogeneous computing systems. Neat idea, but heterogeneous supercomputing systems are already here. Have been for a while. In massive numbers. Working on specialized HPC problems. More about this in a moment.

HPC has a concept built into it. Symmetric multiprocessing, or SMP systems. In these systems, you have effectively identical elements working on problems. If you design your program well, you can have all of these elements work together.
In an SMP, you write 1 program, and the same binary executable can run on each processor. This is nice, as it allows you to partition the work as you require.
In physics, there is a theorem attributed to Emmy Noether (called Noether’s theorem) that a symmetry in nature implies a conservation law. More about this in a moment.
In a heterogenous system, you may not have the luxury of assuming binary ABI compatibility. That is, you have different non-symmetric computing elements. You cannot simply swap one with the other and get identical results.
More to the point, we have multiple processing units, providing specific services and functions to the main processing system. It might be better to call them asymmetric multi processors (aSMP).
In such aSMP systems, one has to be able to pass data back and forth, as well as programs, and act upon this data. While the computing elements may be hetereogeneous, and the programming model is fairly one sided, the elements all need to act together, efficiently, to provide reasonable results in short time periods. This means that the hetereogeneous processing elements (HPEs) need to be able to understand each other. There is symmetry in terms of how they deal with data. And program flow.
Particular programming models are often used which are not fundamentally symmetrical, the same code is not running on all the nodes, often there are “master” nodes. Or some nodes are more equal than others.
So while this article noted with interest the advent of a large commercially produced supercomputer which was hetereogenous by design, it indicated that the programming model would be much harder, saying that we didn’t quite know how to run such systems.
I disagree with this point. We do know how to make such systems run, even run well on some problems.
We have been doing it for a long time.
An API by any other name …
The connection with Noether’s theorem might be through Amdahl’s law, which is, when it comes down to it, a conservation of cycle count law. If your program will take N(s) serial cycles, and N(p) parallel cycles to execute, the sum of N(s) and N(p) on an SMP or homogenous machine will be constant, regardless of how many processors p you throw at the parallel section.
Now that we break that symmetry, we might be able to have N(s) + N(p)[aSMP] << N(s) + N(p)[SMP] by exploiting the power of SPUs, or the ClearSpeed pipelines, … We can realize significant deltas by breaking this symmetry. We just need to do it productively for it to be meaningful.