whoa. This one I did not see coming. It suggests that Google is serious about performance, and possibly, providing performance to its customers using tools such as PeakStream to provide acceleration. Google into acceleration. With a huge distributed supercomputer.


I wonder if the VCs out there can take time from the next Web 3.0 picture upload site or repackaged open source group to think about this.


Overall, if you can provide high performance tools for reasonable prices (marginal cost above existing system prices), you have value. From Google’s perspective, you can take a few thousand boxes, put nice fast GPU units in there, and with PeakStream’s tools, get better performance.

On what though.

Their classifiers? Possibly. Media transformation? as in encoding/decoding/transcoding? Likely. Compression? Yes.

Ignoring Microsoft’s marketing, high performance computing has been going mainstream, as in being needed in all manner of day to day activities, for a long time. It gets more obvious as you scale up the size of the activity. What you might easily do on a small to midsize cluster with some amount of data X, massively breaks down as you go to 10X, or 100X the data size. Moreover, if your data is on an exponential growth curve (bioinformatics/genomics/proteomics, medical images, physical/engineering modeling, chemistry simulations, …) so that X is really a function of time X(t), and is approximately

X(t) = X_0 * exp(alpha*t)

then the problem of calculation get progressively worse with time. So do the data motion problems, which are related to bandwidth.

But the issue is that if your data sets are growing massively, exponentially fast, your computing tools need to be able to handle this.

This is why APUs and accelerators are so important to the future of supercomputing. Given the low cost per unit, it is not surprising to me that the GPUs are rapidly becoming a most favored APU. They are harder to program, but curiously enough, that is what PeakStream, RapidMind and other handle for you. If you asked me a year ago, I would have said ClearSpeed and FPGA as the most favored. I no longer believe this to be the case.

My question is, who will snap up RapidMind now? I would guess it will be one of the majors, IBM, Cray, or similar. nVidia and its technologies might also be grabbed. This is wild speculation of course, so take it with a few kg of NaCl.

I expect the providers of real value (accelerated applications and tools) to be consumed in a feeding frenzy over the next several months. I also expect that the higher cost/harder to program accelerators to be marginalized.

What is clear is that time to spin a new revision of a code is becoming a critical factor. Who’d a thunk it (we did years ago, in our business plans for this market)

There is real value being created in them-thar HPC hills (ignoring Microsoft’s marketing message) as HPC is incorporated into many aspects of day-to-day business operations. And little capital to jumpstart their efforts. Maybe we can convince the government to stop giving some grants and start giving startup seed funds. No, not SBIRs, which take 6 months to hear back about, which is forever in this market. Something faster, and enough money to have 3-4 people work for a few months, build the thing, and then go to the next level or be shut down. There are companies that live off of SBIRs. This is where they get their revenue from. Not talking about that at all. Talking about providing sorely needed seed capital to strategically important markets that could use it. Let them work for a little while, and then show what they can.

Nah. Won’t happen. The private equity market should be sufficient.

  1. Two major issues, either of which could effectively sink the competition:

    1) Price of FPGA and the price of the development tools, as compared to the benefit one gets by using them (typically 10x overall application performance)

    2) Ubiquity of GPUs. There is a GPU (to some degree) in every PC and server. It wouldn’t take much effort to make those higher end GPGPUs. Nor would it add much cost. And you could have them share ram, thus removing some of the data transfer bottleneck. This would be problematic on the Intel side until they develop a credible alternative to Hypertransport, or just finally knuckle under, admit it was a not-invented-here good idea, and start to use it. I hope the latter happens.

    The business case for accelerators starts with a “we will make some algorithm faster”, and concludes with a long discussion of the relative value of making the “application” (not the algorithm) faster. That is, if I take a section of my code take absolutely zero time, I have an Amdahl’s law again, whereby the performance of the code is governed by the non-accelerated portion. The value of that acceleration is in part due to the opportunity cost of waiting, or selecting alternative solutions. If the accelerated algorithm is not in a critical code for which loss of performance represents a loss of revenue, or lower profits, then you have limited ability to charge more for this acceleration. Add into this the cost/time/pain of adapting algorithms to less ubiquitous platforms, especially ones your customer might not have …

    I don’t necessarily think the GPGPUs are technologically superior to the others. Just more ubiquitous. And that counts for something.

  2. Given how horrible in terms of latency and bandwidth this would be, such a thing might only make sense for problems that were indistinguishable from Monte Carlo calculations (e.g. they are sampling a space). Add in the “high speed” of Javascript, and it might be better just to avoid this particular route.

    If you take this model and use a client written in a language that compiles to a fast binary, you can get good performance. This is what fold@home and others do.

    That latency and lack of bandwidth, or even of a reliable connection render pure web programmatic methods effectively useless for non-Monte Carlo applications.

