SC06 Day-1 part 2

Why 2 parts? It seems that in posting blogs from hotel rooms, they may somehow limit the amount you can upload to a web site. Not sure why, but it fails to work while here, though it works great elsewhere.

The point about making more power open to wider groups of people, and more accessible to larger groups of people is *critical*. Driving the computing to the desktop, though there was, how shall I put this, spirited discussion, about whether a “cluster under the desktop” made sense, the message is clear. A 10B$ market is going to do the same thing it did in transitioning from $1B upwards, it is going to seek out the lowest cost producer of high performance cycles. And in doing so, it will likely creatively destroy one market in the process. And build an entirely new one. Or if not destroy it, then really alter it. If I can get 10x application performance out of a GPU, then why on earth would I build a cluster for my user when I can deploy 100 GPU units, and get much better performance? They can run the apps under Linux/Windows (the OS doesn’t matter for the most part). I am not talking 10x on Smith-Waterman or other well accelerated code. I am specifically talking about 10x better wall clock time on this code.
Face it… right now, almost everything on the exhibition floor is a minor variation of the cluster to the left… With some exceptions. First, the accelerator vendors. Aren’t any solutions vendors (yet), some are trying to get there (ClearSpeed). Some are product merchants (DRC, …). Some are FPGA vendors, some are GPU folks. The interesting products out of AMD and Intel are snapped up by the cluster vendors and largely the boxes are the same apart from the fascia.
Clusters are dominating as was seen in the BoF session I attended. One of the more interesting aspects was the idea that the quad core clovertown chip has significant contention issues with memory bound codes. I expected this, and it is interesting to see this. The ISV’s comment was that they needed to re-explore how to program multi-core systems.
Other things of note: We have been saying for a while that data motion will be the next big challenge as it gets exponentially harder each year with that much more data. Sure enough, we are hearing people echo it.
Scalable file systems: Panasas is there (see the pictures after I upload it). They have by far one of the best and most scalable systems around. It is a different market they are addressing than we are for our JackRabbit unit. No overlap. One of the few companies that can build 10 GB/s (big B as in Bytes) and faster file systems. They look like they are getting traction. One thing mentioned in the BoF meeting was that the large model sizes are now impacting IO capability and showing up as performance bottlenecks. The ISV again pointed to MPI-IO, though when quizzed on it using a parallel file system (needed for any performance benefit), they indicated NFS only. This negates any benefit of MPI-IO though. I have been arguing for years that the best IO performance is always local IO performance. Some things need to be re-thought in these applications as the data set sizes swell. This is one of them.
Infiniband: At the BoF we learned that for automotive customers clusters, 20+% of them had IB. While 10GbE wasn’t anywhere. This may have been an oversight, or an agglomeration of 10GbE with GbE. If this is not the case, and IB has 20% of the market, it might be too late for 10GbE to catch up to IB. IB would have too much momentum. I didn’t get this from the IB folks, but from the relatively neutral BoF meeting. That said I met the nice folks at Voltaire today.