In a world of vector and intrinsically parallel machines …

… why are we still programming them with serial languages? And more to the point, why are these language compilers so terrible at converting serial code to parallel code? No, seriously … I know there are several constraints on the semantics of the serial language code processing. Debugging and exceptions for one … you wouldn’t want to signal a floating point exception in code that had nothing to do with the FPE in the first place.

But this may be more due to thinking about machines as big serial processing engines, rather than a hierarchically organized collections of parallel and asymmetric processing elements. Most programming languages encourage these thought processes. Parallelism is either an explicit bolt on system, or an intrinsic ‘directive’ driven system.

Neither of these models works well at expressing a parallel algorithm.

Way back in grad school, in a course on special and general relativity, we were introduced to what was called ‘Einstein notation‘ . This was a shorthand to express a particular computational pattern. The notation was compact, logical, and easy to grasp.

For example, this computation:

`sum = a1x1 + a2x2 + ... + aNxN`

which is little more than a dot product between vector a and vector x, could be represented as

```sum = aixi ```

where the summation is over repeated indices. This is a contraction operation, or in CS parlance, a reduction operation.

A matrix multiplication would be expressible as

```Ci,j = Ai,kBk,j ```

And so forth.

I am not claiming that Einstein notation makes sense for parallel programming, but I am arguing that we need similar compact and generally terse notations to enable us to express complex algorithms correctly and succinctly.

I don’t think we have this today. I know there were efforts along these lines with Fortress and others.

The issue is that they are, again, serial languages, with parallel bolt-ons. There isn’t a great ability to express a simple parallel operations such as a matrix multiplication, in a compact notation. Morever, and these languages are inherently serial in nature, parallel operations have to be decomposed into serial operations.

We have all of these functional units, symmetric and otherwise, heirarchical memory, processing, etc.

As the number of processing cores increases, debugging these serial + parallel bolt on codes gets harder. Making sure they are giving correct answers is even more important.

I suspect that unit and subsystem testing will be more integrated into codes going forward, especially sanity checks in parallel code. But I think it would be a shame if we continue to use the many resources we have available, in an ever more inefficient manner.

Viewed 28934 times by 4570 viewers

5 thoughts on “In a world of vector and intrinsically parallel machines …”

1. Well, changing semantics of existing serial languages, ala CUDA is a start.

2. What’s Fortran 90 array syntax, chopped liver?

3. @Rohit

Yeah, CUDA is a start, but its still intrinsically a serial language that the technology is based upon. I am not saying this as a language purist, rather noting that you still have to be quite explicit in how you lay out your algorithms on the hardware, and you are expressing this in terms of a set of serial language constructs. Its still serial at its core, if you will pardon the pun.

@Troy (good to see you here)

No, Fortran90/95/2003’s syntax is decidedly not bad. Actually its quite good. But Fortran’s syntax is great for expressing parallelism on a single shared memory machine. You need additional bolt-on extensions (MPI, Co-array, …) to express wider parallelism (this is true of every language as far as I know, BTW, not just Fortran), and you have to alter your code to reflect these extensions.

My point was that I’d like this level to be … well … hidable … from application developers. It would be nice to use a single notation to express an algorithm, regardless of whether it was running local on shared memory, or across a cluster, or a mixture of these.

For vector and parallel systems these days, the sense I get is that most new code developers aren’t looking seriously at Fortran. Which is a shame, as Fortran is still an excellent language (with a number of interesting anachronisms, and in some cases, failed design elements … the ‘kind’ specifiers are IMO annoying and not comprehensible without digging in deeper to the code, module implementations have ranged from ok to terrible, and aren’t even remotely portable).

If possible, I’d like to see something like Fortran’s array syntax and capability extended seamlessly across a cluster, or across a CPU + GPU system. The point is that you have to be quite explicit about how to do this, and my concern is that this gets in the way of people expressing what it is they wish to express.

4. Jeff Edwards says:

While I’m fairly certain there isn’t a programming language that actually does what you’re talking about (leveraging parallel processing), MATLAB does have that kind of syntax.