As I noted recently in the post on SGI, they are having a tough time of it, in large part due to who they are competing against, and what they have to use to compete with.
Well, they aren’t the only company with issues. As noted on InsideHPC and elsewhere, ClearSpeed is not having a great time of it either.
Basically ClearSpeed makes accelerated CPUs. Each CPU has 96 cores layed out in a systolic array. Programming it requires a port, and rebuild of your code for these cores. Not to mention any sort of memory access that needs to be done over the PCI-e link.
Porting code is a barrier to adoption. So is price.
Now don’t get me wrong, I like the technology behind the ClearSpeed product. It is neat.
What I question is whether or not it is a viable business.
This is not questioning whether acceleration is a viable business. Acceleration is pretty much the next wave of HPC. The issue is whether or not this particular form of it is the next wave.
This is dedicated specialized silicon from one vendor, which costs in the $6k region per unit. You have to port your code, and when ported you can get ~50 GFLOP double precision, as long as your code is vectorized.
The SDK is not cheap.
Compare that to nVidia.
This is dedicated specialized silicon from one vendor, which costs in the $0.6k region per unit. You have to port your code, and when ported you can get ~200 GFLOP single precision, as long as your code is vectorizable, and can be used in CUDA or RapidMind.
Note also that nVidia and ATI have been making rumblings on offering double precision at this speed.
Also, note that Cell-BE currently does ~210 GFLOP single precision for ~$0.5k (PS3). Programming tools for this are available on the web.
The ClearSpeed looks like it is going to remain a niche product. If it were available at $1k or so, with a free SDK, it could be very interesting. The problem with this, and yes, there is a problem with this, is that ClearSpeed would not likely be able to recoup its costs. Moreover, with 4 and 8 core processors able to hit 22 and 44 GFLOPS sustained on code without porting, what precisely is the advantage of the port to ClearSpeed?
Note: if they made it so that the Intel/GCC/Portland group compilers could simply emit code for them for critical routines (simple pragmas, or using vectorized loop generation), this would solve one half of their problem.
Unfortunately, this is not the case now. That means that you must use the SDK which seems to (last I installed it) only work on a particular version of RHEL4.
I don’t see a way out for ClearSpeed without a significant change in their price structure and go-to-market strategy. nVidia and ATI have millions of acceleration capable processors out there, and an SDK to go with it. Tesla is comparable in cost to the ClearSpeed with SDK, and you can develop on your laptop.
Hopefully they will accept it, sooner rather than later.