“Top HPC trends” … or are they?

John West at InsideHPC.com links to an article I read last week and didn’t comment on. In this article David Driggers, CTO at Verari, points out what he believes to be the top 5 trends in HPC.

In no particular order, he points out that CAS (content addressable storage) is “breakthrough technology” for archiving. Which is odd. In that industry insiders appear to have a somewhat different opinion on the value of CAS for archiving. Moreover, others point out that CAS is, itself, somewhat of a misnomer.

Basically, object storage appears to be the direction everyone is really thinking about for scalability. CAS purports to be the archive for the object storage. And you need a database and software stack (neither inexpensive) for finding the contact references.

Next, David suggests HPC is going mainstream. This is true. However, he also suggests that Windows 2008 HPC product is helping to drive this. We aren’t seeing this. Speaking with many others (end users, vendors, …) at SC08, they didn’t see it either. And now with Microsoft chopping projects in a downward turning economy, how long will this effort, not likely a billion dollar revenue (oh heck… not likely a $10 million dollar revenue) stream for them, survive? A smart manager at Microsoft will be looking long and hard at all expenditures, and killing off those that don’t cut the mustard. They just chopped large chunks of their games … including their flight simulator. I would be hard pressed to believe that their HPC initiative generates more revenue than their flight simulator. I could be wrong here. But not by much.

We see HPC going mainstream. We see it being used in a wider array of things. We see customers asking how to lower the cost of their HPC. They aren’t looking to add much cost, they want return on their investment, but they want the investment as small as possible. It is hard to argue for large budgets these days.

This is driving a trend we have been seeing and briefly talking about for a long time. HPC gets driven downstream. To the desktop. A $10k machine that can run the same code as the $100k+ cluster and do so reasonably quickly is a win. Especially if you don’t have to host it in the data center. Put it by your desk, and use it. If you can run your office applications on it, even better. Which you can do with a nice VM session. What we are seeing is Linux being driven into desktops, with office apps running in a VM running windows. For customers that can deal with slight differences, we see significant adoption of OpenOffice 3.0. We use it, it isn’t bad (Microsoft Office willingly ignores the Linux market, so we don’t have the choice of buying it there … would if we could, but we can’t … and aside from that, the Office 2003 -> Office 2007 changes are pretty maddening in and of themselves, formats not completely compatible).

What we see, and our customers confirm this, is a significant interest in powerful desktop systems for HPC. This is why we developed Pegasus many core systems. Large memories, large processor counts, fast disks. We have numerous customers running them individually during the day and tying them together into a beowulf-at-night. Reduces their overall expenditure, and solves their problems. Not for everyone, but a good overall solution for a growing demographic in HPC users community.

David further points out that green is good. Well, I would state it as “lower power consuming is good”. The less power consumed means the less wasted. But you need multiple technologies to make this work right … a “green” label isn’t meaningful in and of itself. You need low power modes, and load aware power reduction, not to mention load aware scheduling and many other things to do this right. That is, adaptive load management coupled with adaptive job management. We had talked about this last year with one of our partners. Most folks opt for the dubious label. It’s so much more than a label.

Then there are two points in rapid succession, the first being cloud computing is coming, the second being data centers are losing interest of customers. The latter is true in that data center space is expensive per square meter. Expensive to build, to maintain, to rent. Removing/reducing data center population is a good thing if you can do it. Better if you don’t own the data center.

But cloud computing and SaaS…

Look, until we get widespread availability of high bandwidth inexpensively on demand, all sorts of “cloud” based bits are going to be hard. The issue is data motion. This is why Amazon has S3 right next to EC2. Because moving data is hard. HPC requires lots of data motion. Huge amounts. 1 MB/s link (10 Mb, roughly 1/4 of a T3 circuit) isn’t going to cut it for small users, with large models. We have small shops with 20-40 GB models. Over their T1’s, at 0.15 MB/s, this is very close to 2 hours per GB to transfer. 20 GB is close to a 40 hour data transfer.

Since the models change frequently, you have to move lots of data.

Heck, over 4.5 MB/s (T3 at $5k/month), 1GB takes about 4 minutes. 20 GB would take 80 minutes to transfer.

If you can have your 10-20 users each doing these transfers out of your facility to the remote site, sure, this is a good thing. But its not practical today in a fairly large number of cases due to that bandwidth issue.

We have worked with customers on this with respect to remote data centers … its the same problem. The bandwidth you need costs you more than you have. The budget you have won’t get you the bandwidth you need. Which means your data should avoid moving.

Which is anathema for cloud HPC.

Once we get gigabit to every demarc, yeah, generalized cloud HPC could get really interesting. But we can’t get there yet.

Right now, a few providers of cloud are doing well by focusing upon apps that don’t require so much data transfer on a continuous basis. Tsunamic Technologies, Amazon, and CRL’s Eka machine are examples, where you colocate fast local storage with the cluster, so you have to pay the data motion cost as infrequently as possible (as long as you don’t have to keep moving your data for your runs). If you have unlimited funds you can buy a dedicated high speed circuit, but most folks don’t have this.

Until we get that fast networking cheap and generally available, cloud is mostly a marketing smoke. Reminiscent of “grid” from years ago. It meant something once, but was overused before all the stars aligned. And it lost its reasonable usage (and became somewhat of a joke in the process).

Lets not go there yet. Lets let the networking mature and get to each demarc with a 1Gb/s speed. Then we can start talking, realistically, about the wonderful new vistas of cloud HPC.

Viewed 13518 times by 3035 viewers


2 thoughts on ““Top HPC trends” … or are they?

  1. I’m not sure your view of cloud storage and SaaS are required to be synonymous. Cloud storage can also be mean privately or corporately deployed storage that can be accessed through a global namespace via multiple GigE and/or 10G connections using http or NAS protocols. The storage is deployed as an internal service with local LAN and WAN accessibility.

    We’re seeing this coming into play as an archival solution today in bioinformatics, medical imaging and media and entertainment.

    I agree the BW available now limits the effectiveness of cloud/SaaS for large, frequently changing data sets. Heck, it took me something like 10 days to backup 18Gs of photos to an online service. Put that type of storage behind the firewall on a private cloud and it starts to become nice solution for large scale content stores.

  2. @Eric:

    The problem with ?aaS is that there are 26 letters, and some things start with the same one (S == storage, software, … )

    More to the point, I was focusing on non-campus based cloud storage. High speed (gigabit etc) on a campus is possible and reasonably priced for large companies. For distributed small companies it is not. Nor for distributed large companies. We have already looked into long haul gigabit+ service for customers wanting to enable high bandwidth to their data centers in India, in Europe, to campuses here. The cost is prohibitive. So distributed enterprise remains effectively island based until we can get real bandwidth for reasonable prices.

    So imagine my day job with two sites, one here, one in India. Both behind a firewall, connected over a high performance VPN. The issue becomes that the intervening network is the issue, regardless of what I can do locally.

    This is the killer for cloud (as it is being marketed now). What you are talking about in terms of campus clouds and related already exists and works well. Its just the between site stuff thats a pain.

    I wonder if we should start characterizing cloud locality in terms of apparent bandwidth. Remote clouds are not usable as data stores and you need alternative methods of dealing with them. Local clouds are (should be) fast. Maybe something like Time to transfer 1 GB. Anything 10 seconds or less is local, 10-100 seconds is midrange, and greater than 100 seconds is far.

Comments are closed.