The IBM folks have turned the Blue Gene into what they claim is the worlds fastest blast engine. Interesting read. They use our A. thaliana data in the Bioinformatics Benchmark System v3 (BBS) to perform their measurement, as well as data from Aaron Darling for mpiBLAST. Our data had been in a mislabeled file for years, and I never took the time to rename the S. lycopersicum for the original Arabidopsis. I grabbed a number of model organism data sets when working on BBS, and I thank them for pointing out that it is probably a good time to fix this.
What’s amusing about this paper? Well, they solved an already solved embarrassingly parallel problem on a large machine well designed to solve large embarrassingly parallel jobs. They solved a problem that had been solved 6 years ago by your humble author and others, and many times since then. And they did so on one of the least cost effective machines possible, contrary to what was done 6+ years ago, and other recent parallel blast work including the excellent mpiblast.
What is interesting about this paper is that they point out that in order to make effective use of such a machine, you need to be really clever about how to move data and where and when use things. They point out that at a massive scale, base assumptions about IO may not be correct.
The value in this paper is not the dubious claim of performance. The value is that a serious look at scaling a particular design up to huge levels requires carefully rethinking single points of information flow within codes, as they become bottlenecks. It is not a new concept, some of us have been saying this for quite a while, but it is valuable to see it conceptualized in a massive scale machine such as the Blue Gene.
Blue Gene is a great machine. Don’t get me wrong. The comparison of 512 Blue Gene power pc processors against the very weak 128 Transmeta chips is somewhat lacking context. Very different generation chips with very different design points.
A more apt comparison would be to more modern processors, which are 5-10x the performance of the Transmeta Crusoe. At least in the same generation. Then compare the performance of a similar sized Blue Gene to a similar sized cluster of these. All aspects, technical, price, power consumption and heat generation, … . This would be interesting.
Regardless of these issues, good job IBM.