In a world of vector and intrinsically parallel machines ...

By joe

July 25, 2010 - 3 minutes read - 435 words

… why are we still programming them with serial languages? And more to the point, why are these language compilers so terrible at converting serial code to parallel code? No, seriously … I know there are several constraints on the semantics of the serial language code processing. Debugging and exceptions for one … you wouldn’t want to signal a floating point exception in code that had nothing to do with the FPE in the first place. But this may be more due to thinking about machines as big serial processing engines, rather than a hierarchically organized collections of parallel and asymmetric processing elements. Most programming languages encourage these thought processes. Parallelism is either an explicit bolt on system, or an intrinsic ‘directive’ driven system. Neither of these models works well at expressing a parallel algorithm.

Way back in grad school, in a course on special and general relativity, we were introduced to what was called ‘Einstein notation’ . This was a shorthand to express a particular computational pattern. The notation was compact, logical, and easy to grasp. For example, this computation: sum = a1x1 + a2x2 + ... + aNxN which is little more than a dot product between vector a and vector x, could be represented as sum = aixi where the summation is over repeated indices. This is a contraction operation, or in CS parlance, a reduction operation. A matrix multiplication would be expressible as Ci,j = Ai,kBk,j And so forth. I am not claiming that Einstein notation makes sense for parallel programming, but I am arguing that we need similar compact and generally terse notations to enable us to express complex algorithms correctly and succinctly. I don’t think we have this today. I know there were efforts along these lines with Fortress and others. The issue is that they are, again, serial languages, with parallel bolt-ons. There isn’t a great ability to express a simple parallel operations such as a matrix multiplication, in a compact notation. Morever, and these languages are inherently serial in nature, parallel operations have to be decomposed into serial operations. We have all of these functional units, symmetric and otherwise, heirarchical memory, processing, etc. As the number of processing cores increases, debugging these serial + parallel bolt on codes gets harder. Making sure they are giving correct answers is even more important. I suspect that unit and subsystem testing will be more integrated into codes going forward, especially sanity checks in parallel code. But I think it would be a shame if we continue to use the many resources we have available, in an ever more inefficient manner.