SC06 wrap up: thoughts on what I saw and heard

Well, SC06 is now history. Reno is the next venue. Maybe we will have a bit of a booth then.
So what happened, what was extraordinary, what was ordinary?

This is kind of hard. Last year, there was so much cool stuff, this year, well, somewhat less cool stuff. The exhibit did not seem as big this year, or as lively. Looked like lots of vendors talking to each other. I had a distinct sense that this was a high tech equivalent of a red light district …
Ok… lets be more focused.
The good: Vendors were out with somewhat more focused booths … maybe the floor space issue is why it felt smaller. Technology is maturing, end users are getting more discrimintating in their aporoach. Software vendors are running into scaling problems (trust me this is a good thing, tell you why in a minute). Accelerators were (sort of kind of) there. Sort of. Kind of.
The bad: somewhat darker undercurrents, rumors about multiple vendors health are running rampant.
The ugly: Some people still don’t get that most users don’t by a technology, they buy a product. Some vendors think you can paint it purple, blue, or green, and charge 4x as much. Most of the iSCSI vendors have been overhyping the technology, and when you ask the users what performance they are seeing from their nice new iSCSI array, one often hears the word “disappointing” in there some where.
We can’t fix everything, and there are snake oil vendors out there. Your job should you choose to accept it is to hack your way through the marketing until you find the nuggets of reality. This presumes that such nuggets exist. Call me an optimist, I would hope they do. If not, the company has a baser problem on its hands.
Ok. Lets talk about accelerators and APUs. The accelerator world is neatly dividing into 3 camps. Well, ok, the research world is dividing into 2 camps, and there is this third camp with a working accelerator. Most researchers don’t want to work on a working accelerator, takes some of the fun out of building a working accelerator.
We have the research world: GPUs and FPGA, and the commercial world: ClearSpeed (and related, though not many were visible). One of the two in the research world is really not geared up for application enablement. Their tools are out there in cost (hardware and software). It won’t likely attract many new customers. The barriers to entry are high all the way around. Makes for a great research tool, and after you get experience, possibly a reasonably good product, though your cost per chip is going to be far better in an non-FPGA mode. You don’t want to know what the cost for the development suite is. Really. You don’t want to know. It makes the $16k price of the Cluster OpenMP from Intel look downright cheap.
The positive things about FPGAs are, in the end, you get to build exactly the computing circuit you want to (provided you have enough gates). This cannot be overlooked, as a well designed circuit can out perform quite a few machines. And do so at a very low power.
This is tremendously compelling. But with the cost of tools, boards, and chips where they are, it looks like the nascent market may never grow up.
The down side to FPGAs are also that double precision math takes up quite a few gates if you want little things, like IEEE854 compliance. Well, most folks don’t like NaN when they see her, so it would surprise me if they really wanted to have something implemented that they did not really want. Another down side is that the compilers haven’t been very good. I know, its a Simple Matter Of Programming. Just like VLIW/EPIC. That said, we are going to play with a few to see what has changed if anything with this.
Then there are the GPUs. GPUs do single precision very well. They do not as of this writing, do double precision very well. I had some interesting GPU discussions, and I cannot talk about them.
Cell kind of straddles the GPU and the next category. More in a minute.
Commercial processors. ClearSpeed was there. I hadn’t thought they would survive several years ago. I was wrong. They appear to be thriving. In short it appears that the chip can do lots of interesting things. About 25 GF/CPU double precision is what they are sustaining. This is far better than what Opteron and Woodcrest are sustaining. Not an order of magnitude, but close. ClearSpeed does ints, quite well. We need to be looking more closely at this processor.
Then there is Cell-BE. Cell is the (over)hyped ultra-processing system that fuses a PPC core with 8 SPUs. As an APU it is quite interesting. People (including us) have started looking at how to adapt their algorithms to single precision (which Cell does well). This should also help GPU, and likely also help ClearSpeed. I read some presentations Dongarra had put together on this, where he uses double precision in a correction step of an LU decomposition, and in the update step. Uses single precision in the O(n**3) and use double precision in the O(n**2).
For those not blessed with a memory of Fortran, n**2 is read “n to the power 2” or n^2.
Cell cranks out about 200 GF at peak theoretical, and about 60-80 GF in real world apps from what he has shown. Cell does integers too.
The most impressive APUs at the show really were the ClearSpeed and the Cell. The Cell was somewhat hidden, and the ClearSpeed was out there.
Ok, I am biased. What about real applications on the ClearSpeed or the Cell? Well some are here now, and more are coming. This a question Deepak’s over at the ever-enjoyable blog, had asked about utility. I can’t talk about most of the apps. Suffice it to say that barring major problems we should see some … soon.
Vendor health: The cluster market is a margin market. If you can make enough in gross margin to pay your bills, you are in good shape. Not everyone is, at least thats what we are hearing. There are some vendors who consistently price their machines below the parts cost of others. We have spoken to most of their suppliers (as most cluster vendors share similar suppliers). Unless they are using suboptimal equipment (quite possible), or have struck some sweetheart deals, they are paying about the same as we are. And everyone else. So how do they come in under cost? Maybe they take the eBay model and charge more for shipping. Hide margin there. They do that long enough and they will be an ex-company, this tends to burn through cash aweful fast. These are the sorts of fleeting conversations we had heard.
Economies of scale don’t work well when your scale isn’t large enough to result in significant deltas. Even if you have parts that cost 1/2 of the base price of your unit, and you can shave 5% of its price, the difference to you is 2.5% in the cost of the machine. Ok, so pass this on to your customer, this is a 2.5% advantage to you in price. So now you also make 2.5% less in your margin. For a $4000 machine, 2.5% is $100. Unless you can drive your own efficiencies up so you can do far more work for less expenditure, you have in net lost by doing this.
The customer happily gobbles this up, you trained them to respond to this and demand it. Customers love value as long as it doesn’t cost them anything more. Call this the Walmartization of a market. Now enough groups get together and do this, and you have a death spiral where each vendor tries to outdo and underbid the others.
The financially weaker vendors are going to get culled. Business happens. The survivors may wish they had been culled. It is hard to run a business on very low margin. You have all these extra things like, I dunno, people … phones … next generation product research and development … internet access … buildings … travel to customers … marketing materials/web sites … you like to pay for.
Customers will also be damaged in part by this, as such markets tend to drive the innovators out. Why spend time and effort working on a wonderful product that will add huge value to a customer base if they are not willing to pay for it? This is the dilemma that cluster vendors face. Heaven help them if they do something silly and add extra cost “features” of dubious value to their units, or as bad, fail to adequately explain how the customer gets more value out of their units as compared to the equally priced foo-bar industries unit. Customers will be left with effectively fewer choices, as the innovators pursue the profitable markets.
Sort of like when telecommunication deregulation hit, the phone companies leapt at the chance to work in a freer more profitable market.
This is the market that Appro, LNXI, Penguin, … pick your favorite cluster company, find themselves in.
Scaling problems: Why this is good. Your program is great, you sell a million copies, and people use it. And they tell you, ya know, if when you run larger things, it were faster, they would buy more (remember that value thing? this is sort of an anti-Walmartization effect).
So you invest the effort to parallelize it. Wow… scaling is great, you pound on performance. And your customers keep increasing the size of the models they run.
Eventually you do some tests and notice something: your code doesn’t scale that well any more. It should, it was designed to, but that pesky Amdahl’s law serialization just gets in the way.
So you now start instrumenting your code, and lo-and-behold, you discover that one of the ways you had been doing things, that worked well for the preceding 20 years, simply does not work today. It is the bottleneck.
You embark on a program to fix it, if you value your customers.
This is why programs not scaling is a good thing. Customers don’t buy computers, they buy platforms upon which to develop and run applications that they need for their jobs. Your customers are going to keep pushing the boundaries of what they can do with their code. You need to be a step ahead. And that means finding the use cases that don’t work well. And fixing them.
Minimize the maximum pain.