# The tipping point for APUs

This news item on InsideHPC made me smile.
In short, the HPC application vendors do see the value in decreasing the cost of hardware for their HPC users. It keeps more money available for end users to purchase licenses, even in the face of declining budgets. There are other problems, such as the software license cost now being substantially higher than the cost of the hardware to run the HPC codes on, but that is another problem.
So if you have a 128 core cluster that is about as fast on your code as a muscular desktop with 3-4 GPU cards, which is more expensive to procure and run over time?
I am not talking about “leadership class” HPC, where you have 10000 cores available to your jobs. I am talking about the emerging everyday HPC. This is the computationally intensive analyses being done on the aforementioned muscular desktops and smaller deskside clusters.

This doesn’t mean that leadership class systems are going away. On the contrary, as we have been predicting for quite some time, as our accelerator business plan had hypothesized (for next year and the year after at that!), the accelerator processing units (APUs) appear to be taking off, with a vengeance.
What strikes me in this is that once the ISVs determine that adding a platform makes sense (for all the investment and long term support costs this engenders), that they themselves are viewing this as a way to grow their market. Basically, for a port to occur, they have to project real increases in profit after subtracting off all the costs. This is also why, if a non-volume vendor stops paying the support/porting costs, the ISV tends to quickly drop their platform.
We figured that (unlike the vast majority of social media type businesses out there that VCs have been largely blowing their LP’s money on), that there was a real hockey stick growth and opportunity for accelerators in HPC. Its gratifying to see it evolve pretty much exactly as we had thought, and as we wrote in our power points several years ago.
Imagine this. You take a real $10B USD/year market, and say, I dunno, 5% goes APU. This is$500M USD. Despite the dollars recent declines, this is still a real market.
What APUs do is decrease the necessary size of expenditure for end users. Why by a cluster if you don’t really need it? It will lower your cost and complexity to run a single machine than it would to run a cluster.
This is curiously, a similar message to what ScaleMP offers for small/mid-sized units. They will aggregate smaller units into a simpler to manage larger unit. This lowers your management complexity.
This reduction in management complexity is driving other business ventures as well. Some are, well, not likely to survive long. The market for cluster management systems is between Rocks, Perceus, and other things that are free.
APUs offer a reduction in scaled hardware management complexity.
APUs offer a reduction in power consumed per unit time.
APUs offer a reduction in floor space.
APUs offer a reduction in cost per result obtained.
Put another way, it is a no-brainer that they would eventually take off.
And ISVs have noticed. The first we heard about serious ISV interest was 3 years ago in some discussions we had with a few ISVs on this. APUs offer the ability to expand their installed base. Not everyone needs a large cluster to run their code quickly. This lets the ISV increase their installed base by serving more customers, as the barrier to start using the expensive code has changed from an expensive machine to an inexpensive machine.
The smart ISVs are going to combine this with a pay per use model (aka “micropayments“). This opens up the wider adoption of their software, which increases their installed base. Some will argue that tokens provide this. I disagree. I think most users do as well. Ask your typical HPC user about their proprietary software, and specifically, ask them if the licensing has ever given them grief. Odds are that in most cases you will hear a few horror stories, and learn of the design decisions that users have made to minimize the pain that the licensing schemas in use generally cause.
This simultaneously allows their end users to buy time on demand from providers like Tsunamic Technologies, who will host the applications, and possibly help process usage statements, charging for application usage.
That is, this is, IMO not simply another platform for the ISVs. This is part of a long term strategy on their part to increase their market size. The leading ISVs are going to be driving their applications there. The open source apps are already going. We have heard from a few folks starting to work on a number of apps.
Basically we have reached the point where ISVs and OSS app providers have seen the value of moving to an APU based platform going forward. This is a strategic move. Not one likely to be reversed.
In the APU race, GPUs currently have a long lead, and in the GPU space, NVidia has effectively won. I am not hearing of much in the way of GPU ports to ATI … I am sure there are some, but NVidia is pretty much dominating this space with a usable SDK available for free, partners delivering systems (we do this with our tightly coupled storage and processing units), and a growing list of ported applications and wins.
In the APU space there are also Cell units, and FPGAs. I know some of my fellow bloggers are firmly embedded in the FPGA space, as are some of our partners. I don’t expect this space to ever take off. I thought it might at some point if the tools became affordable. This has never happened. It doesn’t look like it ever will.
Cell, while a great technology, one we resell on some of our desktops, does not look like it is a growth platform. The accelerator costs are 3x the GPUs, and are somewhat harder to program than GPUs. The upside is that it is a completely separate machine. The downside is that, unlike GPUs, the Cell as an APU has been 2 or more orders of magnitude less in terms of shipped/deployed systems.
If you asked me two years ago, I would have guessed Cell and GPU fighting it out for a lead, with similar cost hardware and development environments.
Specialized co-processors such as ClearSpeed never really had a chance in the general market. There are a few others. The issue that will kill any accelerator before it starts to get traction is economies of scale, cost to adopt, cost to deploy.
Right now, Nehalem cores can do 4 DP FP operations per clock cycle. At 3 GHz, this 12 GFLOP per core in DP. 4 cores per chip puts this at 48 GFLOP. Two chips per server puts this at 96 GFLOP. Normal application code will use 1/10 to 1/4 of this capability, unless your code spends all of its time in hand tuned assembly language routines that make effective use of the resources.
For an accelerator to be meaningful, it has to deliver ~10x the performance of the full platform (not just a single core), without correspondingly increasing its price by 10x. You will get 10x for free by waiting 5.5 years or so, just from the technological trajectory of Moore’s law.
Current NVidia product delivers about 5x in single precision versus the underlying substrate platform they are plugged into. But after a little work, most folks get 10x fairly easily on the platforms. With some rearrangement of the code, you may be able to get to 100x or more. GPU-HMMer gets to 112x in some cases.
So why would anyone go back?
This is not lost on the ISVs. They see that adding computing power just got real cheap. Its not lost on tool vendors like PGI who are hedging their bets with their compiler platforms. Technologically, it is an excellent hedge against underlying APU changes, it decouples the programming of the accelerator from vendor specific tools. This enables them to keep source code compatibility, even if the APU changes. And they tout that. And with some clever compilation and linking, they can even do unified binaries that will run correctly on APU-ful and APU-less systems, without recompilation.
APUs are very much in HPCs future.
As Clayton Christenson suggested, APUs are going to destroy something, and create something even better in their place. John West has a story today about disruption in the HPC market. APUs are definitely providing this disruption.
We have reached the tipping point.

### 2 thoughts on “The tipping point for APUs”

1. I absolutely agree with everything you’ve said!! It’s amazing how well much we think alike. 🙂
I’m actually seeing the beginning of the hockey stick around GPUs. I think 2010 is the year when we “change slopes” (from the lower slope to the higher slope) and 2011 will be the year that we are firmly on the sharper sloped part of the hockey stick. I’m already seeing this.
One of the cool things I also think will happen is during and after 2010-2011. What happens after we get on the sharper slope of the hockey stick? I’ve been spending a great of deal of time lately thinking about this from a user’s perspective, an ISV perspective, and a vendor’s perspective.
We can start with the assumption that user’s are buying GPU powered systems. These consist of a small number of nodes, most likely desktops for the vast majority of users, running applications (either home-grown or ISV’s). They can now solve problems faster than ever – I mean really fast. As you point out, they don’t need large clusters to solve their problems because they can solve them so much faster. So they save money. In your comments you point that they can either pocket the money or spend it on the ISV software (I think they should spend it on more/better storage but that’s another discussion). This means in the 2010-2011 time frame HPC users will be buying fewer traditional nodes but they will have GPUs in them.
But with these extremely fast small systems they can’t solve larger problems. This leads me to believe that after 2011 we are going to get back on the curve of people buying more nodes – but this time they will have large numbers of GPUs in them.
So from a vendor perspective we are likely to see a flattening in the number of server sales (i.e. nodes) because people are just buying them primarily to be hosts for GPUs. But the overall revenue for HPC will continue to grow because people are spending money on GPUs and/or ISV software. However, as I mentioned, they will need to solve bigger problems so they eventually will have to buy more nodes. I think we will see this starting 2011 but probably 2012 and after.
So, here’s my big finish:
– User’s are going to start buying smaller number of nodes but with lots of GPUs in them (most likely in the 2010-2011 time frame)
– If ISV’s are smart they will invest now and take advantage of the extra money user’s have because they bought fewer nodes (great observation Joe. I think many people miss this). If the ISV’s are _really_ smart they will start now so the application will be done in later 2010 when the market is on the sharper slope of the hockey stick.
– In the 2010-2011 time frame vendors will see a “blip” in the number of servers they sell but could still see good revenue because the servers will come stuffed with GPUs. But after 2011 I think we will get back back on the server count growth curve but they will probably come with GPUs in them, keeping revenues on a could growth path.
The interesting part for vendors is that the smart ones will have servers that can handle lots of GPUs either internally or externally in the 2010-2011 time frame. Their revenues will continue to grow but their units will decrease. But since they are smart they recognize that after this “blip” users will start buying more nodes to solve larger problems. So their revenues will continue to grow through 2010-2011 and after.
I think you will be able to tell the weaker vendors because their revenues will either flatten or shrink during 2010-2011. The _smarter_ vendors will show good revenue growth through 2010-2011.
It’s a great time!!! I think HPC is in for a bit of a sea change in many respects. Great blog Joe – I really love it when you talk about GPUs since we think so much alike 🙂
Jeff
GG

2. Obviously I’ve been doing the FPGA thing for a while, but I’m still honest about the market conditions: I sold off all of my XLNX and ALTR shares to buy NVDA a while ago.
I still think that the transition to accelerators benefits FPGAs in the long run. Let all the overhead associated with partitioning applications and off-loading work to a co-processor be handled now by the GPGPU rush. Then we’ll create FPGA emulations of GPGPU’s with 10x I/O bandwidth.

Comments are closed.