Cargo cult HPC

By joe

October 18, 2008 - 4 minutes read - 773 words

This is a short thread of thought, which was triggered by a casual browse through Wikipedia on another topic (for an article I swear I am writing, right now, as I er … uh … write this). Way back in graduate school, we all had read Feynman’s book. Call it required reading at the academy. Good things came out of this, as we (a few friends and I) reverse engineered his discussions of differentiation under the integral sign and suddenly got a real powerful tool available to us (which seems to have pissed off a few profs in classes with homework, but thats a story for another beer).

Feynman in his book discussed Cargo Cult science. He gave the analogy of what various Pacific Islanders did to bring the all-powerful planes back to their island, making headphones with shells, building faux runways, and then not grasping the reasons why it didn’t work. Their world view was constricted and constrained by their understanding. And changing that understanding would be strenuous and frankly shocking to them (questions about the ethics or morality of ripping peoples illiusions, beliefs, and dogma from them are worth thinking about here). So this got me thinking about HPC as well. There are a few practitioners of HPC out there, a few wannabe’s and some Cargo Culters. No, I won’t name who I think goes into each group. Look at it this way. Some companies put up the facade that they know or understand the market, and then do things that clearly demonstrate a lack of understanding of the fundamental forces in the market. We have seen this in software and hardware companies. Some folks don’t get it, or just don’t care, and like to try to label practitioners as “old school”. Yeah, ok. Whatever. Those are Cargo Culters. They may be able to change, to get to the wannabe levels. The wannabes are companies that want to have a serious HPC footprint. This is their goal. They may not be quite there yet, but with the right investments and over time, they may get there. Or they may not. HPC isn’t easy and it is easy to get distracted from high performance. Some of the wannabes have hands that they have to play which for various reasons won’t likely be successful. The successful ones will be able to influence their future product mix to make it more successful in the market. The practitioners have been doing this stuff successfully for a while. The difference between the Cargo culters and the wannabes is that the Cargo culters will never quite grasp why what they have won’t work. Moreover, they will have little to no say to be able to change the course of the company to offer a meaningful product in the space. The one example that comes to mind right now are (without naming names) a storage vendor with a “scalable” product for large storage. I won’t get into specifics. I will say that hiding terabytes and more behind a gigabit bandwidth wall is not a wise use of resources. Yet we see exactly this, being used. Its even sadder when we see others ignore this issue, until someone complains that their new cluster is slow and they don’t understand why. Not all codes are IO bound. But the ones that are, you need serious I/O firepower at the ready to bring the cluster out to best performance levels. 1 GB takes about 10 seconds to move at wire speed over a gigabit link. 1 TB takes about 10,000 seconds (about 1/8th of a day) to move at wire speed over a gigabit link. 1 PB takes 10,000,000 seconds (about 1/3rd of a year) to move at wire speed over a gigabit link. Yet we see groups propose and often request storage for their HPC with single or dual gigabit links and 10’s to hundreds of TB in size. We see large clusters being architected with stacked switches which will be used for NFS and MPI traffic. The last one we dealt with like this had a stacked switch, and they were running a 256 node job that almost always failed. They didn’t get why it failed. Their HPC resource was a stack of 128 desktop machines with 100 Mb cards. Their storage was a head node with a popular RAID card and 1 disk platter devoted to storage (raid1 mirrored at least). HPC practitioners will understand the issues in the above, as will some of the wannabes. The cargo culters won’t, and will probably make remarks that demonstrate their lack of understanding. This is the market we are in today.