Took a Cuda class. Installed Cuda on my laptop. Well, 1.1 on my laptop. It has a Cuda class GPU (one of the things I made sure of when I bought it). 2.0 is in beta, and I think I will use that.
A few minor glitches getting it going.
That said, I have some simple impressions. Cuda is going to have significant market momentum by mid year. Unlike most of the other accelerator platforms, the SDK is free, and is easy to use. There are no deployment costs outside of a new Cuda enabled board.
Previously I indicated that I thought the accelerator winners would be GPUs and something else similar to them. I am convinced that GPUs will be the winner at this point. Simply from the point of market interest.
Cuda capable GPUs are in 50++M machines. If AMD wants to play in this market, their GPUs need to be Cuda capable. CTM and other low level approaches are over for all but niche apps. Cell, apart from the game consoles, doesn’t seem to be getting out of IBM/Sony/Toshiba that well. We need PCI cards with these chips and memories, just like GPU cards, at about the same price. Currently they are 7-10x the price.
FPGAs … well, you can get Mitrionics virtual CPU. Deployment costs may be problematic. Put another way … we, as small software developers can afford the Tesla and SDK. We cannot afford the virtual FPGA and tools. We expect our customers not to balk at buying a Tesla. Not so sure about the FPGAs.
Speaking a customer about accelerators recently, we went over the list of options. This customer would only consider COTS based gear. They are not alone.
One of my team will be focusing his efforts on Cuda, as well as me and my partner. We think we can get good results very quickly.
Kind of sad that we couldn’t get VC money for accelerators 3+ years ago. We were dead on correct with our assessment about the need. We were off on the assumed level of interest. I had estimated 5% market penetration by the 5th year. I would be surprised if we miss that target … many in HPC have come around to accepting the specialized heterogenous processing is part of our HPC future.
Our initial bet was with low cost, high performance DSPs. We wanted to make sure the boards were purchasable for no more than $5k USD. Then the applications atop that had to be under $5k as well (or open source, even better). Our argument was that we could reduce pricing with economies of scale. And make the tools such that it would be easier to use the accelerator system. About the only thing that has changed has been that GPUs seem to be the winners, and Cuda will be the programming paradigm. Would be great if it worked on Cell too. But we can still layer our value on it. And still make it easy to use/program.
Of course, the Intel Larrabee is looming, and we are interested in that as well, but it is not here yet (though if Intel wants to send us a board, please, by all means, contact me).
Lower barriers. This is what Cuda and nVidia have done. Very good work. Hopefully Cell and Larrabee will catch up, but one programming interface … please.