What he said!

By joe

June 13, 2008 - 5 minutes read - 1064 words

Up early on a friday morning, working through todays' issues and … found this article on Linux Magazine by the esteemed Doug Eadline. I was in on the discussion that he refers to, and pointed out that you do in fact get what you pay for, and that you will not get an engineered system in many cases. Worse, the configs will likely be those that minimize vendor costs, as that is the problem they are attempting to solve in a low margin business (clusters). You will not get an engineered or well designed system unless you, curiously enough, go with a group/shop that engineers/designs their clusters to fit your needs. This is not a minor point, a poorly designed system can be painful to work on. I know, I was working on such systems only yesterday.

Doug makes a point that I wish to emphasize, underscore, amplify …

Recently I ran head first into a poorly designed cluster. End user wanted to do something on it which was frankly hard, the way they had built it. There were many things broken with this system. We wanted to help them get to an operational state. Something where the cluster really worked the way it is supposed to. They bought their hardware (improperly configured) from one of the “major vendors” (more in a moment), along the lines of the one Doug describes. They ran this hardware in a manner which wasted ~50+% of its potential right off the bat. I am not sure if this is due to past experiences, or just general IT focus, but this system was little more than a pile-o-pcs. I call these in general, IT clusters. They are not HPC clusters by any stretch of the imagination. They don’t really work well. Some things sorta-kinda work. Lots of things don’t or cannot. You have some interesting failure modes. You can always tell an IT cluster, it has several features that cause it to stand out. First, it has RHEL. Yes, thats right, it has Redhat as the OS. This is not sufficient in and of itself to guarantee an IT cluster. But it is a strong indicator that the people who put it together were not thinking HPC, or they don’t know/understand HPC enough to understand why this isn’t a good idea. In the simplest view, RHEL on every node often means a software support contract on every node, and a cost per node of the OS. The RHEL kernel (in the 4.x series and now in the 5.x series) has a number of purposefully designed in limitations. They do not have a meaningful file system offering for high performance/high capacity workloads. You have to explicitly add in/build xfs/jfs kernel modules to get what you need there. As of RHEL5 you now have 4k stacks foisted upon you. I could spend many words going through why this is a “Bad Idea“®. I need only one though. Drivers. There are other issues as well (turning off SELinux to have a fighting chance of running a cluster, LVM for disks by default, …). Your standard IT shop will install RHELX.y 32 bit on a 4 GB ram 64 bit server and call it a cluster node. This is precisely why you should be working with a group that knows what the heck it is doing. Second, it has a SAN or a cheap NAS attached. SAN is a low performance system for HPC. No, really. 4Gb is not fast. We can move data within a single box at 20Gb+ and expose this out to the nodes at nearly that rate. A good cluster design will be able to move data very fast. Data motion is rapidly becoming the most painful aspect of internal/external cluster and distributed system usage. A poorly designed system will not be using a reasonable data storage or data motion fabric within the cluster. The cheap NAS is even worse than the SAN. Tell me if this sounds familar. You have this large cluster served by a 1 or 2 gigabit port NAS unit. All those terabytes, effectively hidden behind a network bandwidth wall. This is one of the reasons we designed and built JackRabbit. It solves this problem without breaking the bank. Third, it has a poorly architected network. Ignoring the usual complete lack of a data transport network separate from the command and control network, we often see high end IT switches strung together. Yes, thats right, daisy chained hundred port gigabit switches. If your HPC vendor cannot tell you why it is a “Bad Idea“® then you really, really, need to be speaking to a different HPC vendor. If your HPC vendor has not diagnosed and solved these problems, then they are unaware of the symptoms, and will likely start suggesting you put some really expensive things together to mitigate a bad design. A few of you are cringing? Yes. A corollary to this are the cheap switch users. There are very real differences, no not just $$, but honest performance impacts from your switch choices. We have solved numerous customer problems by tracing back to the cheap switching infrastructure people have built out. There are other examples of the IT cluster, but these are the top on my mind right now (having experienced every single one of them in the last 30 days or less). Remember, a pile-o-PCs is NOT a cluster. It is a pile-o-PCs. Putting a cluster distribution on a pile-o-PCs doesn’t make it a cluster. It makes it a pile-o-PCs with a cluster distribution. If you need a fleet of 18 wheelers (trucks) and you instead buy a bunch of VW bugs and argue that you have equivalent hauling capacity, you may have, I dunno, missed the point. You certainly shouldn’t expect to be able to haul what the fleet of 18 wheelers can haul. Same is true in HPC. Very true in HPC. A good vendor will work with you to solve your problems, or help reduce the problem to a manageable situation. They will sit down with you to find solutions. Not impose them. They will help you achieve your goals. Not theirs. Most of the IT and rack-em-stack-em shops are shipping volumes. Many of those VWs. Few solutions. Doug’s company (Basement-supercomputing.com) does solutions. So does my company (scalable informatics). There are a few others. Decades of experience do matter.