Demoing Accelerated Computing

By joe

February 2, 2007 - 3 minutes read - 540 words

So I flew to Eilat, demostrated how a little accelerated computing worked relative to a cluster. What really got to me was how simple a demo it was. The fingers never left the hands, and all that.

We ran a HMMer run on the cluster, then a Scalable HMMer run on the identical data set. Then we ran an 8 way cluster run using MPI-HMMer, again, running the same data and options. Finally we ran the accelerated computing version. Same input decks. Same options. Scalable HMMer was 2x faster than regular HMMer. MPI-HMMer was about 8.2x faster than regular HMMer. Hardware accelerated HMMer was about 10x faster. This was the application run time, not the core algorithm. This is important. No one cares if you make one bit of the code much faster, they only care if you reduce the overall run time significantly. The MPI-HMMer team is merging more of this work together, and should be announcing additional things soon. What if you could give multiple orders of magnitude of acceleration to your most time consuming applications? Could this change your work? In the time from 1990 to 2004, my little molecular dynamics code went from taking 1 week for 100 time steps on a “superworkstation” to taking about 3-4 seconds per time step on my laptop. Yeah, this could count as acceleration, though I did some code optimization along the way. 6048 seconds per time step down to 4. From Moore’s law, we expect an order of magnitude (OOM) every 6.6 years. 13.2 years gets us 2 OOM. This gets us to 60.5 seconds. The rest comes from code optimization (one more OOM). Getting code onto Accelerated Computing is still non-trivial. Even with RapidMind, PeakStream and others, or Celoxica tools, or … The SDKs mostly cost too much (apart from CUDA). Someone isn’t thinking they want to sell many units when they price their SDK at a number comparable to the cost of the hardware. Worse, the porting aspect is non-trivial for FPGA and for “stream” processing. This is not simply: take your C code and it will run 100x faster. No. It wont. The major hurdles I see for accelerated computing are the application ports. Over time I expect the market to sort out the tools. I expect (as do most end users) that the lower cost tools will be the ones to thrive. The history of HPC is littered with the bones of companies that made the critical mistake of not understanding a) HPC moves downstream, b) HPC moves towards the less expensive providers, c) asking people to pay much higher costs for small increase in value is a sure way to lose. You see, while the speed is important, giving people 10x better performance can be done today and not in 6.6 years, charging people 4-6x node price for 10x performance simply doesn’t work out from an economic view. Just wait a year for better price performance from Moore’s law, and voila, problem solved. 10x works today. Can we get the apps on them? Working on it, though as noted many times, VCs and other potential capital sources are not even remotely interested in accelerated computing or HPC. Which means that this will go slowly.