Cheapskates? Nah... really?

By joe

September 30, 2008 - 5 minutes read - 897 words

John West at InsideHPC.com points to an article on fault tolerant servers and the push to get them into HPC systems. One of the key soundbites is something John feeds up

Well, that is one way of looking at it …

It is arguably more correct to point out that cycles are cycles, and clusters offered an opportunity to massively expand lower cost cycles. Calling Supercomputing folks “cheapskates” isn’t likely to win you friends there. It isn’t even really true. Price sensitive? Yes. Performance sensitive? Yes. If supercomputing folks were just cheapskates, we wouldn’t expect to see HPC users running Panasas file systems, on their ECC motherboards, with big expensive Xeon and Opteron chips, when simple 2 sata drives, AMD Phenom/Intel Core 2 Celerons, and lower cost motherboards are possible. That is, there is a floor, or a minimum if you prefer, below which SC folks won’t go. They won’t pay for things they don’t need, and such things shouldn’t even be offered. SC folks do engage in a very serious and … er … active … cost benefit analysis. SC customers ask “if I add this feature, what does it cost, and what do I get for this cost?” This is exactly, precisely the question they should be asking. Does that make them cheapskates? No. There are customers who buy brands. Thats all they want, a tier 1 brand. And thats all they will buy, even though (FUD aside), the gear is largely the same as you will get from the tier 2+ folks. Some folks go in the opposite direction, and buy the lowest cost stuff. They try to maximize the tonnage per dollar spent. The purpose of the floor is, below this, it is hard to manage/run due to “quality” problems (really more of things that desktop users don’t care about, and HPC users do … like ecc memory, large memories, larger bandwidths to memory, better networking options, built in EDAC support, remote control of machines, …). Some of the cheaper stuff pushes the boundaries of this, skirting both sides. This also means that things of dubious value are not going to sell well. Anything with a scaling price per node, had better solve a really important problem, or it simply will fail to sell. We have seen vendors try to pass off software/hardware that didn’t address anything terribly critical, yet increased the cost per node by 25-100%. Whats more amazing than the pricing hubris and lack of understanding on the marketing folks doing that, is the complete lack of understanding of where things went wrong when they don’t sell, and the company starts winding down its efforts in the market. I could give you a number of thoughts as to who will need to wind down operations due to absolutely insane(ly wrong) pricing models, a a profound lack of understanding of the market in which they play. Now we come to the meat of this. Stratus is selling fault tolerant servers. They come in at some huge price. After bullying the reader into believing SC types are just plain old cheap (where the evidence doesn’t support exactly this scenario, as Xeon and Opteron motherboards are being bought with ECC ram, and not Phenom and Core 2 without …), they propose the solution to a (non-existing) problem. Buy these big expensive things you cheapskate SC types! It will help you as down time is expensive. Yes, I agree that downtime is expensive. Resiliency is critical. But resiliency also doesn’t have to be expensive. I don’t buy the argument they make. I don’t think most of SC folks will either. Any scenario which starts with “add this which will massively increase acquisition costs and add something of dubious value” is likely doomed to failure in SC. Yeah, I know. Some folks rail on acquisition costs as being a small fraction of TCO. Tell that to the CFO who signs that small fraction check. When you buy a large thing, you need to justify every component, and conversely, every component had better have a good reason for being in there. I still see customers rip Infiniband out of large clusters, as it is, in some cases, of dubious value for their work. Currently 10GbE is struggling to get a foot hold, in large part due to its cost. SC won’t adopt it enmasse until its cost comes down to reasonable levels. Even then it has to fight against a cheap IB DDR/QDR based solution. Failure tolerance for SC is not priced on what the market will bear. There are competitive alternatives. With business models being squeezed on pricing, it is hard to see how Stratus would be successful. Unless you have large bags of money sitting around, and can fund unprofitable operations for years, to a market which isn’t all that accepting of your solutions, you will likely exit the market. I am guessing that with an article like this, Stratus is headed in that direction. SC consumers want an optimax of best performance and features for a minimal cost. This requires compromise. But it also means that they reject solutions of dubious value out of hand. I am not saying SC customers are not cheapskates, as they select obviously more expensive motherboards, memories, processors, and storage than they could select. But at the same time, they don’t blow money on solutions of little obvious value.