“Sustaining” strategies and startups

I read an article on InsideHPC.com that I can’t say I agree with.

The discussion on creative destruction is correct. You create a new market by destroying an old market. That has happened many times in HPC, by enabling better, cheaper, faster execution. If our SGI boxen of days old were 1/100th the cost of the Cray YMP at the time, and 20% of the performance, who won that battle? In all but a vanishingly small number of cases, SGI won. The same system with an easy migration path (created by removing barriers to migration) eventually won the market from the more expensive platform.

There was nothing particularly new about the approach, it was a motherboard, ram, and IO channel. It was better/cheaper than vector machines of the time. And they decimated the vectors market.

Today, we see exactly this with GPUs. It took a while for NVidia to comprehend that end users will not port a large Fortran code to C/C++ just to use the accelerator technology. So they worked with PGI to get a fortran compiler out. And that massively increased the size of their addressable market by removing a barrier to adoption (I could go into a long series of posts why its a really bad idea(TM) to insist that all a user has to do is port their code to your nice shiny new system … this is a guaranteed path to failure … you have to make it easy for them to move … drop in replacement … at low cost)

The problem I have with the article is their definition of a “sustaining” technology. The author seems, despite giving the same example I did above, to have missed the lesson from this. HPC has seen quite a few better/cheaper/faster cycles, and they have resulted in a destruction of the existing market in favor of the replacement market, which was larger, more competitive, and more diverse. This is known to have happened, and in pretty much all the cases, there was no single massive proprietary innovation that did this destruction. From Vectors, to SMPs, to clusters, and starting now on GPU/APU/Accelerators, these changes have been incremental, and every one of them has been better/cheaper/faster, with somewhat different technology … nothing astoundingly new and innovative (people have been doing GPU computing for more than a decade).

That is, a correct definition of a “sustaining” technology is what I am arguing for. Better cheaper and faster is not sustaining. It is destructive. GPUs provide far lower cost per cycle than CPUs. The Intel/AMD CPUs provided far lower cost per cycle than the RISC CPUs. RISC CPUs provided far lower cost per cycle than Vector CPUs.

There is nothing magical about this. There were no great innovations that enabled people to see that this was inevitable. It became inevitable due to economic reasons.

The issue is, at the end of the day, purely an economic one. This (underlying) article misses that. This isn’t InsideHPC I am taking issue with, rather with the underlying article. InsideHPC is providing text ofthe underlying article.

With this in mind, the correct definition of a sustaining technology is one that does not upset the status quo … which by definition is one that is not necessarily better/cheaper/faster. For disruption to occur, you need the economic argument, coupled with the cost/pain of adoption to be as low as possible. Thats it.

This is why, for example, ASP’s failed badly in their first go-around. They had a capex vs opex play … it sold well to CFOs, but not so well to technologists who saw increased costs for the same thing. And now, in ASP v2.0 (or v3.0 … not sure) which has been re-incarnated as “the cloud”(TM), there are nascent cycle markets forming, which show promise in creating an efficient market for cycles. Unfortunately, cost per cycle in this market is still fairly high relative to a capex scenario.

During a 3 year lifetime of a machine with two 2.0GHz CPUs of 8 cores, we can use a maximum of 1.5×1018 cycles. At roughly $5k USD/machine, we get 3 x 1014 cycles per USD.

Using a $1 USD/hour per core metric (makes scaling easier later) for this same system with 8 cores, we get about 7 x 1012 cycles per USD.

Ignoring the up front factor of nearly 2, there are two orders of magnitude difference in these prices. So even if you can get $0.10 USD/hour per core, the local machine still wins. If you can get $0.01 USD/hour per core, it is nearly a tie. That is where we need to get to in order to really see adoption over local machines. That is, there is a fundamental barrier in place that prevents this from being a real game changer (like many are hyping it to be, not unlike the grid was hyped to be). This doesn’t mean that specialist services arising aren’t able to make use of this, they can. But the costs have orders of magnitude of needed change before parity hits.

And now introduce GPUs. Add 1x $500 USD GPU card into our mix.

Our number of cycles over 3 years for 200 cores operating at 2GHz is now 3.8 x 1019. Cycles per USD of cost is now 6.9 x 1015.

Which of these are disruptive?

Clouds aren’t cheaper, or faster. Better is possible. APUs in general, and GPUs in particular are cheaper and faster. Better will be getting there over time.

I’d argue that APUs (accelerator processor units) and GPUs in particular are disruptive, and are in fact disrupting the HPC market. I’d argue that clouds are sustaining technology, simply re-adjusting the same resources without making them cheaper/faster.

This isn’t a dig at clouds. Clouds are great when you need instant-on capability quickly. But are they viable as a long term utilization strategy versus purchasing? You have to look at how many of those cycles you will use over the 3 years. If you are only using 10% of your computer resources over the 10 years, yeah, clouds become viable.

I don’t know many HPC shops at that 10% utilization, most are running at or near capacity.

Hence I take issue with the underlying article that InsideHPC printed. Specifically the phrasing

Reams of research have shown that incumbents are all too effective at getting rid of startups that opt for sustaining strategies. It is typically not a dramatic death ??? no showdown in the streets. No high noon. Rather, the incumbents pull in their roadmaps, lower their prices, negotiate better terms with vendors, stimulate their vast channels and sales networks, offer bundles, and otherwise mobilize their armies to gradually squeeze the life out of ???sustaining??? startups until they run out of cash. It is slow suffocation. The startup???s sales become lackluster. Investors lose patience. Eventually the startup is shuttered, picked apart, or its assets are sold off in a fire-sale.

We do see that for the makers of non-differentiated systems. The rack-em-stack-em cluster builders have taken a beating from the likes of Dell, HP, and others. Quite a few have gone away.

But so have real innovators, people with better, cheaper, faster all over them.

Part of the reason why this touched a nerve with me is that we are most definitely not doing a sustaining technology in JackRabbit or in siCluster. We are coming in hard and fast on better/cheaper/faster. I look at lots of what our competitors are doing and it isn’t focused upon this. They simply want to protect an existing market and continue to farm it and manage it. We want to grow our market. We don’t want to sustain an existing market.

We look at dedup and related technologies as not being terrifically innovative … they are their as a sustaining technology. Enable tiering to work better, hopefully lessen the argument for the lower cost solutions.

But what if, in your tiering model, if our lower cost units are faster than the fast units? Why tier then? Sure, you can look at tiering as caching (which is really what it is), reserving the fastest spinning disk and SSD for the most frequently accessed data. So rather than solve that hard problem with an expensive modality, why not just make everything fast? So now instead of a 4k byte cache, you now have a 4MB cache?

That is disruptive.

Instead of using FC4 and FC8 with expensive interconnects and other bits around this, with large loops per drive, redundant controllers with cache mirroring, and other technologies of old … why not replace this with fast Infiniband connected servers that replicate in an HA pair? And then build large storage clusters (like siCluster) atop these sorts of units?

That is disruptive.

Nothing sustaining about this. But by the article quoted on InsideHPC, it looks to be a sustaining technology. Its not.

When we can deliver 1.5GB/s per 4U unit, scale up network bandwidth with the number of units, as well as the capacity, redundancy, … and compare that to existing FC modalities … no … it is not sustaining.

It is better, cheaper, and faster. These have been the waves of change for a long time in HPC. I’ve watched companies (including those I have worked for) completely miss this. To miss this is to fail in HPC.

Don’t try to out Dell Dell. This is in part what killed SGI and LNXI. Dell can always build the same non-differentiated gear cheaper than you can. That is Dell is sustaining its market. It can suck the oxygen out of a room with other competitors in there. We’ve watched it happen.

Viewed 14217 times by 3600 viewers


6 thoughts on ““Sustaining” strategies and startups

  1. I mostly agree with you, but your opex/capes comparison when you were mentioning ASPs are not quite right.

    First, from a accounting point-of-view, opex and capes are paid from different pots. Capex is usually good, Opex is bad. You don’t want to spend lots of money in the first place just to end up in a situation were running and maintaining a large-scale system eats all your money – accountants just don’t like that.

    Actually I liked your cycles/USD figure! Usually you see FLOPS/USD or FLOPS/Watt. However, especially since you’re comparing it to ASPs I see another problem; if you run yer own system you still got to have space, HAVAC, electricity, access-security, all she shebang which comes with running your own datacenter. Just comparing a single capex-number with an ASP doesn’t take all the hassle running your own stuff into account.

    But anyways: I liked your article!

    Cheers, Alex.

  2. @Alex

    Thanks! Actually, getting the costs of the infrastructure in there is a challenge. I’m thinking about how to do this. I think some sort of modification of denominator to reflect acquisition +duration*infrastructure costs or something like that.

  3. I see where you’re going, but didn’t interpret the article the same way. Disruption theory is very empirical research (not philosophy) and is explicit that incremental and radical innovations (better/cheaper/faster) are both “sustaining.”

    The theory is also clear that startups almost always fail when they use sustaining innovations. Incumbents usually succeed when they use sustaining innovations, but not startups. From what I’ve read, that’s not an assertion of opinion, it’s just a fact (the research looks at thousands and thousands of companies to quantitatively identify patterns).

    But it also says if startups have a “disruptive” strategy that goes into new “non-mainstream” markets or are lower cost/lower performance, they usually win.

    So with GPUs, they are disruptive to CPUs (lower cost and lower performance in ‘mainstream’ CPU tasks like running Windows on a PC), but they were better at something non-mainstream (discrete graphics and parallelism). Using a foothold in this non-mainstream market over a decade ago, GPUs have slowly grown good enough to now threaten CPUs. So disruption is about where innovations start… and who they start with. I think you may have misinterpreted what it was saying.

    You should read “The Innovator’s Solution” by Clayton Christensen.

  4. @Mike

    I have read the Innovators Dilemma some years ago.

    I do disagree with some of the assertions in the article. I am also a skeptic (of a scientific type) in that fact is something that is usually elusive for the non-hard sciences to define (they also usually mis-apply the word “theory”, and it gets amusing correcting those the first 20 or so times). I’ve read many a business school paper that is so completely divorced from reality… Its been a while since I had a list of some of the more interesting ones …

    I’ll look up the “Solution”. Thanks.

Comments are closed.