Interesting observations on performance focus

Again, on the excellent blog, John West points to a blog at Intel with an interesting observation:

I wonder if the academic computing universe is splitting into two camps: those where students deal directly with architecture, low-level languages, concurrency, and performance, and those where students stay at a higher level of abstraction (typically expressed with Java or Python)?

I would add to this that we have research computer users who prefer the expressiveness of languages such as Matlab, often ignoring the huge performance penalty for using such languages. The value to them is the ease of writing/maintaining their “code”. Tools such as ISC’s StarP attempt to build compiled code from the Matlab code. While this is good, there are issues.

Basically the further you remove yourself from the underlying hardware, the higher the performance penalty you pay. Compilers provide a way to abstractly map a code onto a relatively generic model of a processor. So you don’t write in low level language, which is both harder to do, and harder to maintain.

The higher up you go in abstraction and ease of use, the lower your overall performance will be.

Or as I have put it for (my gosh has it been this long?) the past 20 years:

Portable codes are not fast, fast codes are not portable.

You can write very close to the metal or silicon, and as this requires that your code take into account the model of the underlying hardware, you may use the hardware more effectively. At the cost of not being able to easily move your code with minimal impact over to another machine with a somewhat different architecture (NUMA vs non-NUMA).

You can write completely abstractly, never even thinking about the architecture. This leads to all sorts of nice things like object oriented designs, object factories, and other paradigms. None of which are very appropriate for HPC coding. Of you can consider the architecture issues, focus upon design for performance, and use tools that often allow you to get better overall optimization … which often means using fortran or similar as the HLL, and hand coded assembly as the low level versions. Or coding FPGAs, GPUs, etc. When performance is the most important thing, abstraction penalties are painful.

Is there a bifurcation? Yes, I believe so. One of the comments in the post noted

The audience, mainly from industry, certainly picked it up; several identified themselves as hiring managers, and lamented the general ignorance of performance and architecture details. At least one of them said he prefers to interview only EE graduates – for software jobs – since CS students typically do not bring what his company needs (the industries represented here were quite varied: search engine, medical instruments, cluster consulting etc).

I personally prefer scientists and engineers who view computers and computing as means to an end. There are some CS folks highly focused upon performance. They are unfortunately few and far between.

When it comes down to it, pure CS isn’t about high performance. Pure CS folks aren’t into the performance side, and don’t necessarily grasp all it takes to get the performance. An engineer or scientist who has worked hard to speed up their code has looked into what is required to get the performance and focuses some effort there.

Curiously, there are some scientific research fields which do their best to embed CS people within it, as they are heavily information theoretic in nature. And this has resulted, in a number of cases, in particularly poorly designed algorithms (from a performance/implementation point of view). I have to say I am amused when I hear of an implementation of some very computationally intensive algorithm in Java, Python, or Ruby, or Perl, when it is clear that it belongs in C, Fortran, or similar. We run into this far too often.

Don’t get me wrong, I am a fan of Perl, Ruby, etc. They make for great languages. Very expressive. I particularly like Perl for all sorts of work. But I also know enough to understand that I need to keep the high performance stuff away from being written in Perl.

This said, Perl and Ruby, and I believe Python allow you to integrate other code fairly easily. The Inline:: modules in Perl make it ridiculously easy.

So this way, you can tie together the high performance code with the highly performance authoring.

This is critical going forward in HPC. End users won’t want to know how to program, but they will want their Matlab, Octave, … code to run as fast as possible. So if they can link their accelerated libraries and high performance accelerators together into their environments, they will be happy.

This is why I have believed strongly in accelerators. Think of them like hardware accelerated software libraries (if developed right). They should just be able to drop right in and work faster. This is why tools such as CUDA are so interesting if they become multiplatform with AMD and others adopting them. This is why Amir’s tools are so interesting. The ease of use tightly coupled with high performance, with use cases people understand, and can quickly leverage.

About 13 years ago, at my first SC event (well, it was called Supercomputing 95 … not SC), I commented to a colleague at SGI that I thought that HPC programming would take off when it got easy. Well, I am still waiting for it to get easy. But there is still more demand for it.

There is a great deal more to understanding how to speed up a code and a machine than most coders grasp. Ignoring the underlying architecture means you ignore valuable performance information. This in turn prevents you from selecting better algorithms which are more tightly mapped to the underlying hardware model.

Viewed 3637 times by 453 viewers