# CUDA and acceleration

Took a Cuda class. Installed Cuda on my laptop. Well, 1.1 on my laptop. It has a Cuda class GPU (one of the things I made sure of when I bought it). 2.0 is in beta, and I think I will use that.
A few minor glitches getting it going.

That said, I have some simple impressions. Cuda is going to have significant market momentum by mid year. Unlike most of the other accelerator platforms, the SDK is free, and is easy to use. There are no deployment costs outside of a new Cuda enabled board.
Previously I indicated that I thought the accelerator winners would be GPUs and something else similar to them. I am convinced that GPUs will be the winner at this point. Simply from the point of market interest.
Cuda capable GPUs are in 50++M machines. If AMD wants to play in this market, their GPUs need to be Cuda capable. CTM and other low level approaches are over for all but niche apps. Cell, apart from the game consoles, doesn’t seem to be getting out of IBM/Sony/Toshiba that well. We need PCI cards with these chips and memories, just like GPU cards, at about the same price. Currently they are 7-10x the price.
FPGAs … well, you can get Mitrionics virtual CPU. Deployment costs may be problematic. Put another way … we, as small software developers can afford the Tesla and SDK. We cannot afford the virtual FPGA and tools. We expect our customers not to balk at buying a Tesla. Not so sure about the FPGAs.
Speaking a customer about accelerators recently, we went over the list of options. This customer would only consider COTS based gear. They are not alone.
One of my team will be focusing his efforts on Cuda, as well as me and my partner. We think we can get good results very quickly.
Kind of sad that we couldn’t get VC money for accelerators 3+ years ago. We were dead on correct with our assessment about the need. We were off on the assumed level of interest. I had estimated 5% market penetration by the 5th year. I would be surprised if we miss that target … many in HPC have come around to accepting the specialized heterogenous processing is part of our HPC future.
Our initial bet was with low cost, high performance DSPs. We wanted to make sure the boards were purchasable for no more than $5k USD. Then the applications atop that had to be under$5k as well (or open source, even better). Our argument was that we could reduce pricing with economies of scale. And make the tools such that it would be easier to use the accelerator system. About the only thing that has changed has been that GPUs seem to be the winners, and Cuda will be the programming paradigm. Would be great if it worked on Cell too. But we can still layer our value on it. And still make it easy to use/program.
Of course, the Intel Larrabee is looming, and we are interested in that as well, but it is not here yet (though if Intel wants to send us a board, please, by all means, contact me).
Lower barriers. This is what Cuda and nVidia have done. Very good work. Hopefully Cell and Larrabee will catch up, but one programming interface … please.

### 3 thoughts on “CUDA and acceleration”

1. Perhaps you like CUDA because it feels like C? CUDA is a low level device model for nvidia GPUs, but I expect that few programmers will ever deal directly with the API and that high level frameworks will use it to generate code for nvidia targets.
Why do you take a negative stance on FPGAs though? What would the board cost and tool situation have to be like to make it more attractive?
If a better tool situation will level the accelerator playing field to performance economics instead of learning curves, then is it still worth building accelerated computing solutions now or should we wait until the tools mature?
If investment in a development environment is a limiting factor, does a vendor hosted development environment make an accelerator more attractive than having a locally installed board? How about an open source toolchain?
FPGAs have a number of architectural and economic benefits that will make them increasingly attractive as the acceleration market grows (power efficiency, I/O throughput and fine-grained interconnect). The FPGA toolset is actually more mature than GPUs in terms of high level development frameworks: it’s probably easier to get from MATLAB to FPGA than MATLAB to GPU right now.
The only real high level framework compatible with nvidia GPUs is still just OpenGL. there is also no toolset that allows a design to target or even partition across multiple types of accelerators. This is the sort of framework we are building on top of Excel.

2. Cuda is very much like OpenMP. A data parallel version of OpenMP.
FPGAs issues are cost and development tools. Costs can’t be worked around, their pricing model is not set up for HPC. It is still organized on the principle of value, and not on the concept of volume. HPC is moving rapidly in the volume direction.
Also, at the end of the year, we are going to have 1 x 10^8 Cuda enabled GPUs shipped, likely more than that. I just don’t see how FPGAs will be able to overcome a lead like this. Even achieve parity with a lead like this. Even 1/10th or 1/100th a lead like this.
The architectural and other benefits won’t matter much to this sort of head start. GPUs are winning in designs, as developing on them is not very hard, and your code can run right away on your laptop, your desktop, and your cluster (with Cuda GPUs). FPGAs don’t have the concept of a portable bitfile. So I can’t take something architected for an Alphadata card and move it to a Nallatech or to another vendors card.
Don’t get me wrong, GPUs aren’t good for everything, but they are pretty good for a range of apps which are interesting to ISVs and end users.
The tools you are working on should be independent of the underlying accelerators, and we are looking forward to working with them. But for other coders, I am seeing (and have been for the past month or two) a massive surge in GPU interest. I am not convinced it is a fad.

3. The value economics for FPGAs are in a positive feedback loop with unit volume — increased unit volume allows decreased per unit cost which increases per unit value which in turn increases ROI for more markets leading to increased volume. This volume->value feedback for GPU market is less because the market size does not change appreciably from new applications.
This effect is evidenced by FPGA expansion into new markets exceeding the overall growth of the industry in existing markets–the effect of increase value cutting into ASIC/embedded uC markets. This means it is particularly difficult to predict how volume economics will play out for FPGAs over a few years time: a few big market wins and the picture changes totally.
The market for 1 trillion gate FPGA installations doesn’t exist quite yet…