# More company information: ClearSpeed

As I noted recently in the post on SGI, they are having a tough time of it, in large part due to who they are competing against, and what they have to use to compete with.
Well, they aren’t the only company with issues. As noted on InsideHPC and elsewhere, ClearSpeed is not having a great time of it either.

Basically ClearSpeed makes accelerated CPUs. Each CPU has 96 cores layed out in a systolic array. Programming it requires a port, and rebuild of your code for these cores. Not to mention any sort of memory access that needs to be done over the PCI-e link.
Porting code is a barrier to adoption. So is price.
Now don’t get me wrong, I like the technology behind the ClearSpeed product. It is neat.
What I question is whether or not it is a viable business.
This is not questioning whether acceleration is a viable business. Acceleration is pretty much the next wave of HPC. The issue is whether or not this particular form of it is the next wave.
This is dedicated specialized silicon from one vendor, which costs in the $6k region per unit. You have to port your code, and when ported you can get ~50 GFLOP double precision, as long as your code is vectorized. The SDK is not cheap. Compare that to nVidia. This is dedicated specialized silicon from one vendor, which costs in the$0.6k region per unit. You have to port your code, and when ported you can get ~200 GFLOP single precision, as long as your code is vectorizable, and can be used in CUDA or RapidMind.
Note also that nVidia and ATI have been making rumblings on offering double precision at this speed.
Also, note that Cell-BE currently does ~210 GFLOP single precision for ~$0.5k (PS3). Programming tools for this are available on the web. The ClearSpeed looks like it is going to remain a niche product. If it were available at$1k or so, with a free SDK, it could be very interesting. The problem with this, and yes, there is a problem with this, is that ClearSpeed would not likely be able to recoup its costs. Moreover, with 4 and 8 core processors able to hit 22 and 44 GFLOPS sustained on code without porting, what precisely is the advantage of the port to ClearSpeed?
Note: if they made it so that the Intel/GCC/Portland group compilers could simply emit code for them for critical routines (simple pragmas, or using vectorized loop generation), this would solve one half of their problem.
Unfortunately, this is not the case now. That means that you must use the SDK which seems to (last I installed it) only work on a particular version of RHEL4.
I don’t see a way out for ClearSpeed without a significant change in their price structure and go-to-market strategy. nVidia and ATI have millions of acceleration capable processors out there, and an SDK to go with it. Tesla is comparable in cost to the ClearSpeed with SDK, and you can develop on your laptop.
Hopefully they will accept it, sooner rather than later.

### 13 thoughts on “More company information: ClearSpeed”

1. hi joe,
accelerated computing is not suffering for want of hardware. in fact, the plethora of devices competing for your co-processor slot would indicate that a real competitive edge would be a good development environment to test and profile algorithms across the various options.
but that’s the $100B question of course: how do we program massively parallel architectures? you know my answer already: dynamically reconfigurable dataflow aka a spreadsheet. 2. @Amir: So when do we get to play with it? 🙂 I think we agree that easy to understand paradigms are absolutely needed. Non-specialists are needed to be able to program the advanced hardware. Computing made a huge leap in going from vacuum tubes and circuit design to assembly language. Then another huge leap to high level languages. There is at least a little bit of irony in the full circle nature of automatically generated circuits…. being built from a HLL representation. Now if this tool can target GPU, Cell (as the obvious fixed function processor winners), and FPGA (ExtremeData/DRC) as the obvious reconfigurable winners, this could be interesting. Even better if we could model a new architecture easily, and have it emit “code” for that. Please keep me apprised of how the company is going, and if you have funding, etc. I am quite interested. 3. I should note that, again, I like the idea/technology behind Clearspeed. I like the people in the company. Some are former colleagues. Its the business model that I question. Thats the only real issue. High prices for high performance, when significant barriers exist to getting that high performance is hard to convince people to spend. As John pointed out in another note, the HPC business model has changed drastically. People like Amir are trying to make programming these types of systems easy. Making that programming inexpensive as well could go a long way to making them ubiquitous. And that is what ClearSpeed needs. Ubiquity. And cheap programming tools. And an easy programming model. 4. Im sorry but this doesnt ring true. You say as regards Clearspeed’s product; “This is dedicated specialized silicon from one vendor, which costs in the$6k region per unit. You have to port your code, and when ported you can get ~50 GFLOP double precision………”
which you compare to nVidia and say;
“This is dedicated specialized silicon from one vendor, which costs in the $0.6k region per unit. You have to port your code, and when ported you can get ~200 GFLOP single precision………….” I dont think you can be comparing like for like here. Ten times the cost for 25% of the speed (albeit double precision versus single precision)??? Ive read Clearspeed’s latest results and they are expecting a significant increase in revenues this year. They are also targetting embedded processing and recently licensed their technology to BAE for use in satellites. 5. @Jim You wrote: “I dont think you can be comparing like for like here. Ten times the cost for 25% of the speed (albeit double precision versus single precision)???” I must be missing what you are thinking about this being somehow different. Both are accelerator technologies, both used in similar ways. The distinction on single versus double precision is, IMO, a temporary one. My thoughts are they are directly comparable. I know the business folks will predict “hockey sticks” in demand and new revenue sources. I am not convinced that the hockey stick will occur. As for BAE and others, this shows the niche that they can fill that GPUs cannot, at this time, easily fill … specifically very lower power consumption floating point processing. The issue is not just that they are fighting against the GPUs, they are also fighting against the native CPUs. With 8 cores, each capable of 5 GFLOP, what advantage does the ClearSpeed offer over running the code on the 8 core unit? This is what I think is the important question. If they can’t demonstrate a clear value over the performance of a node or desktop (not a single core, but a whole system), then it will be hard to justify their value relative to their price. The other issue is that you should assume at the lower end, that the$600 GPUs will drop in power consumption, and gain double precision capability. ATI has been demonstrating this, and I expect others to publicly demonstrate DP soon as well. Couple these with their mobile silicon line, and you could have reasonable power consumption with very high computing power.
I think Clearspeed’s biggest market may be for groups that want DP processing and have been using FPGAs for single precision FFTs and other calculations. Prices would be comparible for the silicon, but the programming model would be easier. Then they will compete against the other embedded CPU builders (Freescale, et al).
My comments were and are not to bash Clearspeed. They are to point out that they have a very difficult route to market, and that it is hard to clearly articulate their value to the larger HPC community, as the rest of the vendors aren’t sitting still, and have products that can compete where the Clearspeed units live. Moreover, in the case of GPUs, Clearspeed is in a difficult place relative to the GPU makers. They can and do ship millions of Cuda enabled GPUs, and enjoy not just economies of scale, but in a very realistic sense, a large and rapidly growing market for their product in an accelerator sense. This market is multiple orders of magnitude larger than Clearspeed’s market. With this market, it is easier to convince customers to get the SDK (free download), and start using it. More to the point, it is very easy for ISVs to start going down this route.

6. Joe
Thats interesting. You obviously know much more than me from a technical standpoint. What I cant fathom is why Clearspeed are trying to sell a product at $6,000 when you say its competitors are selling basically the same product for one tenth of the price. Even a chimpanzee could tell you that business model just wont work. And Im assuming the people running Clearspeed have a higher IQ than that of a chimpanzee! So Im a bit baffled to say the least. 7. @Jim They aren’t Chimpanzees. They are simply unlucky from a timing perspective. I am guessing that their plans did not include the rapid adoption of multi-core/many-core systems, or even that Cuda would become viable. My guess on their business plan (which I am assuming is similar to business plans we created seeking funding for products in this market) assumed that they could sell these units for$5-10k USD (2500-5000 GBP), and have 10x or better native system performance.
With this model, the product is compelling. 3 years ago, this was a very good model.
The problem is that multi-core has effectively decimated the value of 10x acceleration. You can get (for threaded apps, or process based parallelism) 8x out of your desktop. That extra 20% … what is that worth? 3 years ago, it was not 20% but 5-6x (500-600%). Moore’s law in action.
GPUs have become easier to program. They were always the “dark horse”. They are ubiquitous, and will continue to be so. Intel is targeting this with Larabee. Basically at $0.5k USD/unit, if you can get 10% efficiency on a 500 GFLOP part (usually not too hard), this is about the same performance you can get out of a n optimally programmed 2 socket node. So for$500, you double your performance. In reality, you rarely see 7 GFLOPs/core sustained performance on most codes, more typically 1-2 GFLOPs. So 8 cores gets you 8-16 GFLOPs. And your Clearspeed accelerator, if you can get 25 GFLOPs out of 50 you are doing well. The demo codes demonstrate about 16-20 GFLOPs. We have one in our lab. The GPU, if you can get 10% out, you are looking at 50 GFLOPs. Now look at the cost per realistic gigaflop. CPU: $500/8 GFLOP, Clearspeed:$5000/20 GFLOP, GPU: \$500/50 GFLOP.
Amir might point out you can do good things with FPGAs here, and you can. But it is the cost that is a killer.
No one (at planning time for Clearspeed) likely assumed the current state of the market. They need to adapt to this new market, and see where they can get traction. Sadly, while their technology is neat, it may be hard for them to get sustainable traction against competitors like GPUs given the great disparity in pricing per achievable gigaflop.
They didn’t plan this … I suspect that this falls under one of their corner cases of planning, in the heading of “bad scenarios”.
They are not chimpanzees, rather they are quite bright (lots of former colleagues are there). They are a victim of a rapidly changing market. One that will test the mettel of their leadership. Already we have seen them slash costs and jettison people. John Gustafson just left and joined another small company. I suspect there will be others. Managements job is to find and hold a niche, one that is defensible, and one they can stabilize in and grow in.

8. Food for though there Joe. Very good reply. Clearspeed are heavily loss making and I note the current market cap is well below the value of the cash on the balance sheet. I suppose that says it all really. Reading historical news it seems Clearspeed have raised in the order of ??80M over the years. Very little in the way of revenues to show for that investment. You wonder how it will all pan out.

9. @Jim
Well, I hope it pans out well. The technology is nice, the ideas interesting. The problem is one of market timing and competitive technologies.
Any well seasoned entrepreneur will tell you, sometimes you get lucky, othertimes, less so. Little of this is in their control, and that that is, cant react as fast as they might need.

10. Clearspeed make a big play on the low wattage required to run their co-processors. Is this as big an issue as they make out?

11. @Jim
Well if you look at the real costs (power and cooling) of saving that 150W difference over a year, its not nearly equal to the price difference. So while it is good to save power, you have to pay a premium to do so.
At the end of the day it comes down to scale-out economics.

12. A technology primer. This is a good read from the Clearspeed website.
http://www.clearspeed.com/docs/resources/ClearSpeed_TechnologyPrimer_0611.pdf
They do state there are several alteratives to their technology. It looks like they are indeed targetting power hungry HPC systems and argue low power co-processors can results in considerable cost savings. Also mentioned is space-saving which I guess may be an issue in certain cases (city centres) but seems like a stretch.

13. Any comment on todays news on Clearspeeds new CSX700 chip? It seems to be a big improvement on the last one. The blog on tomshardware website thinks it could represent a breakthrough.
Basically it says;
Clearspeed will give 96 GFlops Out Of 12 Watts at double precision which compares well with Nvidia???s chip 100 GFlops in double precision mode and consume 170 watts.