Is a cluster a toaster?

By joe

January 5, 2006 - 13 minutes read - 2609 words

At the excellent Cluster Monkey Doug Eadline mused on a number of topics of interest, specifically on why Cluster HPC is hard. There were some excellent points made. The OSC is working on an initiative to increase access to high performance computing resources for end users. Their effort is in part by making access to HPC hardware easier, and in part by helping people (users and commercial entities) make better use of computational gear. IDC, as I and many others have pointed out time and again, have been showing their measurements that HPC as a market is growing rapidly, with the largest growth being the smaller units. HPC for the masses. How much HPC can you do with 10-25k$US? Enough that it drives the market for HPC to double digit growth. Actually excellent growth. As I have pointed out here in the past, it is a shame that the capital markets have largely ignored this market, as there are some really good things developing there. With this background, Doug talks about entry into the market and why it is hard. First he starts with HPC appliances. He points out that appliances are successful today as embedded supercomputing, something near and dear to my heart. He also points out an interesting dichotomy in that single box cluster appliances are hard. No, not the Orion bits, or the new Penguin machines. The Orion unit has been out for a year, and they are selling lots of the pizza box, and few of the desksides. I suspect that the Penguin unit will probably not sell in great volume either, as the cost is significant. Well above the market sweet spot. Any cluster-in-a-box is going to run into the same issues. If your market sweet spot is 25k$ maximum, exactly how many $500 chips, $100 memory sticks, $500 motherboards, $1000 NICs (IB) and $10k switches can you fit in there? Interesting problem … Put another way, an appliance is generally a way to reduce cost by reducing choice and configuration options. But you are still fundamentally limited by part cost, which will provide an economic constraint on the size of your appliance. This constraint will in turn place limits on what you can achieve performance-wise. Remember, systems engineering is in part an art of compromises. You make tradeoffs. You can eject the IB in favor of low latency ethernet (probably not a bad idea for a cluster of this size). This will buy you other decisions, and alter the focus of this cluster. That is, the appliance (cluster in a box) significantly reduces your choices. This might not be a bad thing. Sometimes this is exactly what you need. But this is a digression. Doug then talks about the Microsoft approach. His take is that they are going to level the playing field, and make it easy for everyone to build/run clusters. I think anything that increases the amount of simulation and data analysis we collectively do is a good thing. Ok, I am biased, and in full disclosure, my company would love to see more simulation and computing in the HPC world. I am just not sure that is the Microsoft message. I was guessing at what it was. For its large pathway straddling booth, Microsoft did a singularly bad job of getting us (my company) to learn what it was all about, its directions, its goals. I liked Bill Gates talk, I agreed with many of his points, but not necessarily the pathway he described to get there. Doug’s assumption of the message is that a) they will make access to computing power easier, b) they will make it more common. Laudable goals if true. He then pointed out that the onset of clusters was effectively a force for creative destruction. That is, clusters upset the pre-existing natural order of things. I have pointed out many times that the super-micro’s ate the vectors lunches in the early 90s. I called it an 80-20 rule. 80% the performance for 20% the price. Pretty much gutted a small market. And in doing so, it made it a much larger market. The market was elastic. As supercomputing power became more affordable, more groups bought it. In the early 2000s, clusters started eating the lunches of the super-micros in earnest. Some of us were trying to convince our former employers that this was a good thing at the time. Sadly, some companies didn’t get it. Some did. There is pain and upheaval in creative destruction. And there is opportunity. My thesis is that HPC hasn’t really taken off yet. It will. We are seeing greater and greater demand for computing power. IDC is seeing and measuring this. As are others. Such as Microsoft. Clusters have greatly expanded the total addressable market over and above what the supermicros could do. There is that 80-20 rule again. But now something interesting happened. The chips in the PC’s were in a fair number of cases, faster than the supermicros. Not initially on most heavy floating point or memory intensive codes, but on integer bound tasks. And memory latency sensitive tasks. I remember showing some results from runs of chemistry and informatics apps to colleagues and hearing howls of derision on the fact that the 1k$ machine was 80% the speed of the 30k$ machine on the FP code. The PC was about 15% faster on the integer code than the 30k$ machine. If you think about it, that should have been the turning point for the company. It is hard to beat economies of scale. It is hard to beat good enough. Doug used the cliche' of paradigm shift. Yeah, you could call it that. I prefer phase transition. There was a structural change in this economy such that PCs became viable building blocks for supercomputing platforms. And this takes us back to Microsoft and Linux for clusters. Linux has done an absolutely amazing thing for clusters. Remember, distributing most linux is free (as in beer, but I have as of yet to meet anyone with a business model of freely available beer that is still in business as a beer seller). Some linux costs money. For the moment, I am going to ignore those. Since linux (the free ones) may be freely copied and installed, the cost of software installation on the clusters may include at most 1 acquisition cost. The subsequent costs are redistribution (easy with automatic installation tools) and maintenance. This requires some elbow grease, but not very much elbow grease. Excellent cluster tools (Warewulf, Rocks, Oscar) exist to largely and effectively completely automate these processes. A single person can administer several thousands of nodes of cluster without much pain. Moreover, they do not cost a dime. And this is a very important point, which most FOSS advocates miss. The cost to install Linux on all nodes in a cluster is O(1). The cost to maintain them due to the excellent tool sets is also O(1). In contrast, the per seat cost systems such as all windows variants is O(N) to install and O(N*(1 + r)) to maintain where r is the ratio of number of admins per machines needed to maintain and run windows boxen. Depending upon the environment, we typically see r being 1/25 to 1/100. O(N) comes in from the yearly licensing costs for Norton/Symantec and other bits that you must have if you run windows. The reason why the free (as in beer) is important to commercial consumers of cycles is that budgets are always shrinking, and there is always a drive to get more for less. If you don’t need to pay for linux, why bother paying for it? Call this the dark side of the FOSS market. It makes it really hard to have a successful business model, where you have revenue as a function of usage or installation or other aspects. What FOSS does is free you of the upfront costs of the purchase, and push the costs over to support on the back end. Don’t read into this, our experience and other peoples data shows that these costs are still far lower than the competitive OSes. This may be in part why Sun open sourced Solaris. They saw mass defections to linux. While OpenSolaris might be nice, I think this genie is forever out of its bottle. Which gets us to the second point of the FOSS benefits. No vendor lockin. Hardware vendors would like nothing more than to see people buy only from them. Many people have argued that blades have little to do with economics and management, and far more to do with lockin. If the blade makers had interchangable systems, that would be hard to support as an argument. But they dont…. End users tend to like the freedom to do what they want the way they want it, when they want it. Which means freedom of choice. Freedom to design systems most appropriate for them. Most users who taste this freedom are rather hard to bring back into the vendor fold. Some who do make the switch. This is also an important aspect of Doug’s article. Imagine that you have a problem which is perfectly well suited for a machine with some specifications. You can go to the one-stop-shop cluster vendors (usually not a good idea unless their stuff is dead on spec) and try to order such a thing. What we have seen these folks do is send back quotes for stuff they can deliver, not for what you need. Moreover, they rarely have experienced HPC people who can help you figure out what it is you really need. Do yourself a little test. Call up your favorite cheap cluster vendor and ask them what they would recommend for a Monte Carlo simulation, or an electronic structure calculation, or a fluid flow calculation, or a crash test. If they start asking how many processors you want, and would you like fries with that … And that gets to the heart of Doug’s major point. Doing HPC right is hard. Yeah, you can get a pile of PCs, throw them together, call them a cluster. Lots of folks have done it. But then making this thing work well (coding, tuning, operations) is hard. The idea that I saw in Doug’s article is that Microsoft is aiming to make that gluing together easier and better. I am going to remain skeptical until I have a chance to look at it and think it through. The glue and pieces already exist. And they are pretty easy to use. Which might explain why there are so many one-stop-shops for clusters. Parallel programming is non-trivial. High performance computing applications development can be quite challenging. Multicore chips, NUMA systems, non-uniform interconnects are not yet trivial to utilize. In the 15 years or so that I have been playing with parallel computing it has gotten a little easier with better tools, but it still is not trivial, and it is not likely to become so any time soon. Will Microsoft make an impact? Probably, unless they get the costs wildly wrong. Will they make the market better? Possible. More accessible? I think the fact that they are interested in it will at least make the software vendors more pliant to thinking about how to get more of their applications out there. Will Microsoft displace Linux? Unlikely. Costs work against Microsoft here. Per seat licensing, per seat extras like virus bits (would you run a windows cluster without antivirus?) are going to drive the per node costs through the roof. A smart way for Microsoft to innovate is to look at it from an applications view and not from a platform view. Let them write the layer that runs atop Mono on Linux, .Net on windows, and Mono everywhere else. If they do that, and make stuff really simple atop that, this could be interesting. Linux is not vulernable to windows viri, and as a cluster person I would be strongly against creating a potentially huge virus growth medium in terms of a bunch of windows machines as a cluster without virus protection. Of course virus protection massively decreases performance of machines … so you need more machines to get the equivalent performance. Kind of a lose-lose proposition. Then again our ideas are to run windows in a virtualized environment… vmware, xen, … etc. Isolate the windows, and set it up so that if something does nuke the windows VM, the fix is merely a file copy away. Given how tightly you can lock down a linux machine if you try, you might even be able to deal with windows machines not running antiviri. HPC is hard. I hope Microsoft does grace us with its vision someday so we can stop guessing. And this gets us finally back to the question that Jim Lux raised on the beowulf list after this. Is a cluster a toaster? A boat anchor? Yeah, these (toaster, boat anchor) are all appliances. No, they share very little in common with each other. An appliance is ostensibly a single point function system. It does what it does, and it does it well enough (most of the time) such that you never have to think about it (most of the time). When it works, it works well. Of course, we already have supercomputing appliances. Our ipods (or MP3 player in my case) has a high performance DSP in it. Our graphics cards are appliances, with a high performance massively pipelined processing unit. Our network switches in clusters are appliances, processing tremendous numbers of packets. Our disk drives are appliances, as are the new generation of NAS/SAN/iSCSI… These units are appliances, as they do their function and they do it well, usually without us thinking about them, until they break. And that is a pragmatic definition of an appliance. Not a toaster or a boat anchor, but a single point function or function group. You don’t have to go through a complex boot scenario for your home firewall. You turn it on, go to a web page to configure it, and largely forget about it. That is, you use it just like an appliance. Same with the switch in your cluster, though some of us like to tweak and tune until we get “optimal” performance. Your disk systems are much the same. Once you set up your mount points in your cluster from your SAN/NAS, you largely forget about them until they demand your attention. Just like your washing machine if it becomes unbalanced. An appliance is something that you shouldn’t have to think about very often. It should just work. Time after time. As computers, software, OSes, and computational needs evolve on a much shorter time scale than trinitrons, the expected lifetime of a cluster appliance should be comparibly shorter. The cluster appliance also needs to be very inexpensive. I have been telling my customers for years that we now have disposable computing nodes. They are cheap enough that a 3 year 5x8 support largely does not make economic sense in most cases compared to time and materials costs for repairing/replacing failures. What the cluster appliance does is to make the management node also replaceable. When you think about this long enough, you realize this is a good thing. Rocks is the prototype of the software stack that enables a cluster appliance. You can be up and productively going on a sizeable cluster system from bare metal in a very short time interval. The appliances are here. Whats left is for someone to wrap the GUI around them, make them accessible to all. Maybe that’s Microsoft’s strategy.