Cloudy issues

By joe

April 25, 2009 - 3 minutes read - 600 words

I need to get this out first and foremost. I do believe that cloud computing or similar is inevitable. It is coming. I am also a realist. I know perfectly well that there are some fairly significant impediments to it. The impediments are a mixture of technological deployment, and business models. Its not impossible to do this given sufficient money. But some of the dependencies are simply too pricey to enable rapid cloud adoption, and I don’t see this changing rapidly in the near term (next 3 years). Ok … this is the short version of things. I can go into it in much greater depth, and I may. But not now. HPCwire reports on some work going on to use the cloud. There are some very very important messages in there for potential cloud users, something the hype has largely covered (and something we worry about all the time). Data motion. Or more precisely, the time and monetary cost of data motion.

I have been saying for the better part of a decade that data motion will be the major problem going forward. It is easy to move data quickly on a campus. It is hard to move data quickly between campuses, at a reasonable cost or a reasonable rate.

We don’t see enough of these cost-benefit analyses when people talk about cloud computing. Sure the remote resources are there and usable. But if you spend so much time or cost to move your data … is the low cost of the computing cycle still worth it?

Yes, precisely. The time to complete the calculation with the data is

T(total) = T(ingress) + T(compute) + T(egress)

T(compute) includes local data motion from local storage to ram and back, as well as network data motion time. And computation of course. If T(ingress) + T(egress) » T(compute), then most of your cost is likely to be in the data motion. This time cost is easy to set bounds on. Take the data volume and divide it by the best case bandwidth. This will give you the lower bound on ingress or egress. Basically if you are moving gigabytes and terabytes, you are going to be bound by the site to site bandwidth. And this costs money. 1 MB/s costs about thousand dollars/month. 1 GB = 1000s at 1MB/s. 1 TB = 1,000,000s at 1MB/s. A T3 still runs ~$5000/month and gives you 45 Mb/s, or about 6MB/s. So external clouds are being marketed at small as well as large companies. These only make sense if you can move the data once. That is, pay the data motion cost, store it at Tsunamic’s site, or Amazon, or CRL. Then do all your operations there as well. But these models … move the data there and let it rest there … isn’t what is being pushed. Cloud computing can work. It is effectively ASP v2.0 (if you don’t know what ASP v1.0 was, don’t worry, you aren’t missing much). Its mostly there. The one thing that is missing to make it really work, to uncork the bottle and really let the djinni out … is low cost bandwidth. Which, curiously enough, would likely help create huge amounts of value, as you can have specialized clouds, and create markets for these specialized clouds. But you need that cheap internet. Which also shows why some things are really not meant for the cloud. Data motion is the rate limiting factor. It always will be. Solve the first order problem, and the second order becomes the problem. We are in that second order problem set now.