Cheapskates? Nah… really?

John West at points to an article on fault tolerant servers and the push to get them into HPC systems.
One of the key soundbites is something John feeds up

Supercomputer customers are known for spending big bucks on exotic technology, but they’re also notorious cheapskates. That’s why Linux and the clustering of commodity x86 servers took off a decade ago, essentially wiping out the market for vector supercomputers and nearly knocking out RISC architectures.

Well, that is one way of looking at it …

It is arguably more correct to point out that cycles are cycles, and clusters offered an opportunity to massively expand lower cost cycles. Calling Supercomputing folks “cheapskates” isn’t likely to win you friends there.
It isn’t even really true.
Price sensitive? Yes. Performance sensitive? Yes.
If supercomputing folks were just cheapskates, we wouldn’t expect to see HPC users running Panasas file systems, on their ECC motherboards, with big expensive Xeon and Opteron chips, when simple 2 sata drives, AMD Phenom/Intel Core 2 Celerons, and lower cost motherboards are possible.
That is, there is a floor, or a minimum if you prefer, below which SC folks won’t go. They won’t pay for things they don’t need, and such things shouldn’t even be offered.
SC folks do engage in a very serious and … er … active … cost benefit analysis. SC customers ask “if I add this feature, what does it cost, and what do I get for this cost?”
This is exactly, precisely the question they should be asking. Does that make them cheapskates?
There are customers who buy brands. Thats all they want, a tier 1 brand. And thats all they will buy, even though (FUD aside), the gear is largely the same as you will get from the tier 2+ folks.
Some folks go in the opposite direction, and buy the lowest cost stuff. They try to maximize the tonnage per dollar spent.
The purpose of the floor is, below this, it is hard to manage/run due to “quality” problems (really more of things that desktop users don’t care about, and HPC users do … like ecc memory, large memories, larger bandwidths to memory, better networking options, built in EDAC support, remote control of machines, …). Some of the cheaper stuff pushes the boundaries of this, skirting both sides.
This also means that things of dubious value are not going to sell well. Anything with a scaling price per node, had better solve a really important problem, or it simply will fail to sell. We have seen vendors try to pass off software/hardware that didn’t address anything terribly critical, yet increased the cost per node by 25-100%. Whats more amazing than the pricing hubris and lack of understanding on the marketing folks doing that, is the complete lack of understanding of where things went wrong when they don’t sell, and the company starts winding down its efforts in the market. I could give you a number of thoughts as to who will need to wind down operations due to absolutely insane(ly wrong) pricing models, a a profound lack of understanding of the market in which they play.
Now we come to the meat of this. Stratus is selling fault tolerant servers. They come in at some huge price. After bullying the reader into believing SC types are just plain old cheap (where the evidence doesn’t support exactly this scenario, as Xeon and Opteron motherboards are being bought with ECC ram, and not Phenom and Core 2 without …), they propose the solution to a (non-existing) problem. Buy these big expensive things you cheapskate SC types! It will help you as down time is expensive.
Yes, I agree that downtime is expensive. Resiliency is critical. But resiliency also doesn’t have to be expensive.
I don’t buy the argument they make. I don’t think most of SC folks will either. Any scenario which starts with “add this which will massively increase acquisition costs and add something of dubious value” is likely doomed to failure in SC. Yeah, I know. Some folks rail on acquisition costs as being a small fraction of TCO.
Tell that to the CFO who signs that small fraction check.
When you buy a large thing, you need to justify every component, and conversely, every component had better have a good reason for being in there. I still see customers rip Infiniband out of large clusters, as it is, in some cases, of dubious value for their work. Currently 10GbE is struggling to get a foot hold, in large part due to its cost. SC won’t adopt it enmasse until its cost comes down to reasonable levels. Even then it has to fight against a cheap IB DDR/QDR based solution.
Failure tolerance for SC is not priced on what the market will bear. There are competitive alternatives. With business models being squeezed on pricing, it is hard to see how Stratus would be successful.
Unless you have large bags of money sitting around, and can fund unprofitable operations for years, to a market which isn’t all that accepting of your solutions, you will likely exit the market. I am guessing that with an article like this, Stratus is headed in that direction.
SC consumers want an optimax of best performance and features for a minimal cost. This requires compromise. But it also means that they reject solutions of dubious value out of hand. I am not saying SC customers are not cheapskates, as they select obviously more expensive motherboards, memories, processors, and storage than they could select. But at the same time, they don’t blow money on solutions of little obvious value.

7 thoughts on “Cheapskates? Nah… really?”

  1. I’m curious to get your thoughts on what a “reasonable level” of cost might be for 10GbE in HPC clusters. What is the sweet spot at which adoption kicks into high gear? Including switches, NICs, and cabling is it $500/node? $150/node?
    I’m interested in this topic since I develop software for Arastra, a 10GbE switch vendor.

  2. @Nathan
    The sweet spot is around or below the current infiniband costs. Right now, customers can buy 24 port SDR IB switches for less than $3k USD. Thats $125/port on the switch side. Customers can buy SDR IB cards for $125 or so. With a $60 cable, your per port complete cost basis is $310/node for up to 24 ports. Larger port counts will be more if you demand no oversubscription. All told, under $500/port total cost (HCA + cable + port) looks like the tipping point as it is 10% or less of the cost of the node.
    Currently, higher end IB (DDR IB which 10 GbE can’t effectively compete with on bandwidth) costs in the $500-600/HCA and similar per port/cable costs to the above (about $166 USD/port for smaller switches).
    Right now we are hard pressed to find 10 GbE under $600/HCA. For smaller port count switches, we are looking at $1000/port. Usually these involve a $60-100 cable + 1-2 *FP adapters (costs vary).
    10 GbE is, as a technology, not sufficiently demonstrably better than IB SDR, that it merits a significant price premium that it currently has. Some of us like 10 GbE as the stack is simpler, but the problem is that when you present it, customers shoot it down purely on cost. The super-fantastic feature sets never get heard. Its the cost. Pure and simple.
    Arastra was developing some nice 48 port 10 GbE switches. I don’t remember the per port costs (saw it a year or two ago at SC). Much lower per port costs than Woven systems, or Fujitsu, or … . Higher per port costs to Myricom. Required a fibre connection, CX-4 connectors wouldn’t fit on the front. So you had cable + *FP costs.
    I don’t remember the exact per port costs, but they were (ignoring HCAs, cables, and *FP adapters) a bit north of $500 USD per port.
    Currently the most reasonable 10 GbE HCAs in cost seem to come from Intel, SuperMicro, and Myricom, with $550+ per HCA.
    For a $5000 or less compute node (most are under $2500 these days), $1000 per 10GbE (HCA + port + cables/*FP) doesn’t scale as well as $250/port.
    When there is parity, I expect better scaling. Or when motherboards start having 10 GbE embedded (I know of 2 now), and the per port price of reasonable switch configs drops to SDR IB levels, I expect to see adoption increase rapidly.

  3. @Joe
    Thanks for your thoughts. Maybe I wasn’t clear about my relationship with Arastra: I am employed by Arastra so I know about our products.
    The 48-port switch (Arastra DCS-7148SX) that you saw at SC07 does work with copper cables, but you are correct that it’s not compatible with CX-4 cabling. The copper cables that do work with Arastra switches have SFP+ connectors on both ends, and are called “10GBASE-CR” by Arastra, and are referred to as “SFP+ direct attach” by some others. The cables are pre-terminated and available in a variety of lengths up to ~7m. Arastra also sells a 24-port switch that is nearly identical in feature set. One nice thing about these switches is that you can use a mixture of copper cables, optics, and 10GbE or 1GbE modules in any port.
    Either of these are available at less than $500 port including 10GBASE-CR cables. Then you need to add cost of a 10GbE NIC w/ SFP+ connector, and if that’s on the motherboard it is presumably going to be a lot less than the $550 price that you mention is the going rate. I don’t have much data on NIC pricing, so I’ll defer to you on that.

  4. Joe – I think your analysis may be right in terms of the eventual market viability of Stratus.
    We might have to agree to disagree on this, but in my experience, HPC people are absolutely cheapskates. This may be a function of WHERE my experience is: federal, very high end HPC.
    I’ve been part of spending $250M in HPC gear over the past 5 years at the high end in the DoD. The AMD and Intel high end stuff that ends up in our servers is deeply discounted because Intel and AMD each want ownership of the market and bragging rights. When we get other high end components (like a Panasas file system), they come bundled, and not considered as a separate cost – we buy systems, not parts. We do a price/performance on the whole solution, not the parts. As long as that comes in lower than the competitors for a reasonable absolute performance, the sale gets made.
    When you hear other folks talk about my end of the market, including Deb Goldfarb and every other vendor in the room at the Rhode Island conference two years ago when she spoke on this topic, the universal agreement is that high end HPC’ers (at least on the government side) ARE cheap. Just cheap…not value oriented…cheap. Buyers at my end will SAY they want better development tools, more resiliency, and so on, but when offered solutions that include them they don’t buy.
    If you can squeeze better storage, better development tools, better operational tools, and so on into a price/performance figure that is lower than the competitors then you’ll make a sale. But you’ll rarely find a large federal program willing to pay for ANYTHING other than FLOPS. If you can improve a programmer’s productivity by 50% but you are 10% more expensive, you won’t make the sale.
    And if you look in the large over the past ten years, we’ve moved from architectures where it was practical to get 15%-25% of peak of general applications to commodity-processor based systems where it’s normal to expect 1% of peak, based mostly on price and the misguided notion that more peak FLOPS are better (the tyranny of LINPACK).
    This behavior is rewarded by, and entirely driven by, the federal acquisition system that in practice makes it very very hard to buy on anything other than price. Although they’ll talk about “best value” procurements, in practice I’ve rarely seen purchases go for “best value” when that means more expensive.
    At the smaller end of HPC, things could be completely different. I don’t have direct market experience at that end.

  5. @John
    I seem to be outnumbered here … Jeff also thinks that HPCC users are cheap. At the high end, yeah, I see it. At the lower end, people want it turn-key simple, and they just want it to work without thinking much about it.
    It should be noted that the low-mid range is what is driving the market as well. Fastest growth area, largest fraction (last I looked).
    This is the area we focus most of our time/effort in. The higher end stuff doesn’t pay as well, and as Linux Networx noted, it is really hard to make money there.
    Actually with margins as thin as they are, it is, in general, really hard to make money in this portion of the market. It is as you noted, all about the higher end bragging rights. Not about making good money.
    As HPC continues to evolve, I expect that the lower end will drive to desktops via accelerators, and potentially laptops. The HPC marketsize will grow 2+ orders of magnitude overnight. At some point, the cost of owning your cycles versus buying them is going to tip in favor of buying cycles rather than owning them, in which case the large scale economics of huge systems won’t matter as much. This is a guess of course, could be wrong. Would be willing to bet a decent beer (not a case, just one) on it. As time goes on, might get to be a case sized bet 🙂

  6. At some point, the cost of owning your cycles versus buying them is going to tip in favor of buying cycles rather than owning them.

    I talk about this frequently over at the site – glad to know I’ve got a partner to share my point of view. Among insideHPC readers I’m pretty much alone. I think this shift has the potential to really open up both the mid-low end and the high end (facilities perspective on the high end). I keep trying to figure out how I can help make this transition happen…my million monkies coding thesis needs testing.

  7. I side with Nathan here.
    10GE ethernet switches are not that expensive. Quadrics’ 1U 24-port switches list at $14K. (both CX4 and SFP+ varients). Street price is of course less – certainly under $500 per port.
    If we can make them at that price then I am sure others like Arastra et al can.
    There is no fundamental reason why IB (even DDR) should be cheaper than 10GE.

Comments are closed.