# tracking other companies

SGI and Clearspeed. SGI is now down to $4.90/share at its close. It dropped 11% yesterday. Market cap is$57M.
Yow. Yeah, the market has been volatile. I am not sure that explains this. With 1600 employees, this is a value of $36k/person. They are rapidly getting to a place where their valuation and ours becomes comparable. They are getting wins, but maybe the wins are not as profitable as they need … or maybe the ones we hear about are the only ones rather than a representative set. It is hard to be a profitable HPC company. Most of the HPC companies out there put out other companies products. Not a terrible thing, but less margin for you. That and the market is brutally unforgiving on margins. You want 40% margins? You are not going to get them in this market. I noted when they emerged from bankruptcy, that they were competing with the HPs and Dells of the world. I cannot emphasize this enough, one should not try to out-Dell, Dell. They can ship megatons. It makes far more sense to try to work with them. Given their loss of real differentiation, what makes them different from a Dell? This is the question they need to answer. Yeah, some folks have brand loyalty. Whether or not this makes sense in this market, versus getting real value (and understanding what real value represents in terms of features/performance) is a whole other discussion. As I said before, HPC is an unforgiving market. Focus where you can add real value. Ok, now onto Clearspeed. Jim Black noted in an earlier comment Any comment on todays news on Clearspeeds new CSX700 chip? It seems to be a big improvement on the last one. The blog on tomshardware website thinks it could represent a breakthrough. Basically it says; Clearspeed will give 96 GFlops Out Of 12 Watts at double precision which compares well with Nvidia’s chip 100 GFlops in double precision mode and consume 170 watts. The issues around acceleration tend to center around cost-benefit and effort-cost. How much does it cost and what will its impact likely be, and how much effort and at what cost will obtaining this benefit be? If it takes you 3 months to get 20x performance, what is the value of that extra speed to you, versus the 10x you might be able to get with another choice. That is, what is the value of the opportunity cost/alternative choices? Put more simply, where is the price-performance knee, and what technologies sit below/about this knee? The knee represents something like an optimax. Maximize value (e.g. return on money and time investment) at a minimum cost (in money and time). You can see such knees in computer parts prices … premium parts cost more than the benefit they give. Look at Opteron quad core 8xxx series. The 5% increase in clock rate will cost you 20+%. Ok. Back to Clearspeed and Jim’s comments. 96 GFLOP at 12 W for new CSD.L part. 100 GFLOP at 100W for new nVidia part (according to Jim). Great. Now look at cost ratios. CSD.L:$5,000/96 GFLOP = $52.1/GFLOP nVidia:$1,600/100 GFLOP= $16/GFLOP Ok, so the nVidia costs less to acquire. What about power cost? There is a 158W difference between the two. In the US, the power costs are less than$0.10/kW-h. So this 158W difference will amount to … $0.37/day additional power cost. Over a 3 year life cycle, this adds$415 to the cost.
But wait you say. What about the added cooling cost? Since you dump that 158W extra into the room, you have to remove it. This will cost at least the same as the power cost, if not more. So lets assume that we should triple the cost of the 3 year life cycle power difference… 1x for power difference, 2x for cooling costs. This would add $1246 to the cost of the nVidia unit relative to the CSD.L unit. So now we are looking at$2846 cost of the nVidia part over 3 years.
Ok.
Now look at programming cost.
SDK for end users.
CSD.L: SDK cost ~$6,000 nVidia: SDK cost ~$0.00
Hmmmm.
Now look at the economies of scale: nVidia will be shipping 2E+06 – 4E+07 CUDA enabled GPUs to its customers. CSD.L will be shipping 1E+02 to 1E+04 parts to customers.
Basically, this speed uptick won’t matter much. Which of these two platforms will ISVs target en-masse? And why?
CUDA was a masterstroke. It lowered barriers to using accelerators right away. There are some valid criticisms of it (you should see some of the code we are playing with), but at the end of the day, it is possible for mere mortals to pull the SDK, compile code, and deliver applications at a very low relative cost.
CSD.L SDK sorta kinda works on Redhat. Didn’t work on SuSE or Ubuntu.
It might help the gentle reader to know that we have a CSX600 PCI board in lab, as well as a CUDA card or 3. And an FPGA Bioboost.
Basically, unless the CSD.L SDK is free, I don’t see CSD.L demand increasing. Add to this the limited size of a potential CSD.L platform, I don’t see ISVs rushing to support it. I do see ISVs and customers giving the CUDA platform a serious set of kicks.
Some people argue that one technology platform is better than another. Unfortunately, those arguments don’t matter to the market. The better mousetrap rarely ever wins. This is not to say CUDA is bad … it isn’t. It is not saying CSD.L isn’t good, nor that FPGAs aren’t good. Its just that they ran into the perfect storrm of nVidia making some very wise, very strategic moves.
Sort of like Dell and HP battling out furiously in the cluster market. The smaller vendors are collateral damage.
If business is a contact sport, HPC is a bloodbath.
CSD.L and SGI, both, need to find a defensible niche. Right now, they aren’t there.

### 17 thoughts on “tracking other companies”

1. “Given their loss of real differentiation, what makes them different from a Dell? This is the question they need to answer.”
I think they have answered it, I’ve worked with Dell and SGI extensively and they differentiate themselves greatly. SGI provides a full one stop shop hpc solution, you get the hardware, consulting and the expertise from SGI. They also have some great products such as their ICE and XE systems. Dell also provides a lot of their own solutions for storage. Whereas SGI provides the exotic hardware like Voltaire & Panasas solutions etc.
I’d say SGi is more comparible to Cray or IBM these days than anyone else in the market. They’ve built their own solutions from the ground up and do a great job of it.
Anyhow that’s my 2 cents, how do you see Dell & SGI being a similar HPC company then?

2. @ChrisV

how do you see Dell & SGI being a similar HPC company then?

Hardware products are basically the same, if you ignore the itanium products.
Designed and sourced from outside both companies.
Voltaire is not exotic hardware … Infiniband, and generally speaking, high bandwidth connections are becoming quite common.
Panasas may be considered “exotic”, but it is an appliance version of a cluster file system design that has some nice properties.
Cray is niche focused and has their own proprietary hardware. Good stuff too. IBM has a mixture of internally developed stuff and externally sourced stuff. Neither is directly comparable to SGI or Dell. I believe SGI and Dell are directly comparable due to their processes.
The old SGI I worked for is long gone. The new SGI is what it is. Dell is one of the two leading systems providers, and can move more revenue/product in a month than SGI can in a year. Since they are fighting over the same space, this is a problem, more for SGI than for Dell.
SGI doesn’t make its own storage, it buys it. Same thing with the f1200 unit. And others. Dell just bought Equalogic. But other than that, they both buy their storage.

3. Well, I too am an EX-sgi employee …
I still can see SGI has value against DELL – sometimes just not valued enough:
Dell’s approach is, to ship a full rack of hardware, and be ignorant on the software (hell they have not even the slightest idea about it)
This approach works in cases, where the setup is small (1 rack or less), and all the sw-bits are already clear.
((like in automotive, where you just add “another cluster”))
Dell will INSTANTLY FAIL, if they have to provide a solution, or the RFP is not about “how much racks I want” – but “how much speed I want”.
SGI on the other side still has enough clue to provide you a complete and working solution – and guaranteed performance.
They can easily size/scale a system to provide “X jobs in Y time” etc.
Sure – there aren’t that many customers willing to pay for this value – and that’s probably their #1 problem.
If you’d ask Dell and SGI for sizing you a 20TFlop/s cluster, I think Dell will be waay off – and never capable to even demonstrate the speed, while SGI will have no problems to size it right and demonstrate it.
Just by being able to deliver racks full of cheap 1U-servers doesn’t mean you’re a cluster-vendor or HPC company …
And that’s what Dell hasn’t understood yet, and that’s why they will NEVER EVER make a profit on those “hpc-wins”.
SGI is exactly the opposite – they will make some profit on those large deals – but they will never be able to make a profit on small deals – as there the clue and large-setup expertise isn’t valued.

4. @hrini
I really like your

Just by being able to deliver racks full of cheap 1U-servers doesn???t mean you???re a cluster-vendor or HPC company ???

comment. We see much of this.
I agree that building and delivering something that works is valuable. This is what we do. The issue we run into, and SGI runs into is that customers invariably want the cheapest possible hardware, first. It is rare that we hear a customer focus upon a solution, where the unit does what it says, out of the gate. Yeah, they want that. Rolled into the price of the cheapest boxes. Without adding cost.
This is the legacy of “a sale at any price” in this market. Customers expectations have been set that things that really carry the value are free (as in zero cost), and hardware must be as cheap as possible.
This leave very little margin to grow on.
A point I discussed with John West at Insidehpc.com was that we are living in the era of “good enough”. If a customer can get functional with a rack and a stack, then this might be “good enough” for them. Given that there are many more 1TF customers than 20TF customers, Dell can hardly go wrong with a volume strategy. This is a business it knows how to make money in, where value is measured in inverse price.
Its when the ‘rubber meets the road’ (that is, when you have to do real work with the gear you bought) that often the nasty surprises emerge.
If SGI is going down this route, then it shouldn’t need to supply hardware. Source it from Dell, they can do a cheaper job of it than SGI. But then SGI needs a very different customer profile, and marketing focus for that sort of model.

5. Ok. Back to Clearspeed and Jim???s comments.
96 GFLOP at 12 W for new CSD.L part.
100 GFLOP at 100W for new nVidia part (according to Jim).

Hi
The articles Ive seen are quoting $3,500 for the new Clearspeed board and$3,000 in volume.
I agree that 6 grand for the SDK is a real problem.
Clearspeed need to find a market where low power is key. BAE have recently licensed the technology for use in their satellites. This may sound really dumb but will laptops be a potential viable market in the near future?

6. @Jim
I agree, CSD.L needs to find a defensible market niche, quickly.
BAE is one possibility, though the mobile GPUs may be able to give it a run for its money there.
$3k for the board is good. Getting there. SDK needs to be free, or even better, have them hook into CUDA somehow. Use the same source at least, even if the binary side is hard. Right now the issue we are seeing in this space is which set of design tools people want to use in order to target their apps. Most want the tools with the widest possible target base (e.g. most shipped units). Piggybacking upon that is a good idea. That is, until Amir’s company releases their spreadsheet tool and takes away the pain of multi/many core). Not being facetious, I do believe he is onto something good there. This said, we are seeing our customers, and numerous others coalescing around CUDA. If we can get it standardized … this would be great. Let everything plug into this as a start. Build on it from there. 7. @Joe I definitely agree with your low end assesment of Dell to SGI which is your 1TF customer. At the 20TF end it’s a different story where you want the “Solution” & the “Services” provided by an HPC company. I think the best approach to compete with Dell for the 1 TF customers is to create an “appliance” and target specific market sectors. The issue is with 1TF customers is that they fall under the IT umbrella within their organizations and are not treated as HPC. They’ll ask their IT manager that they want a cluster, he buys$100k of product from Dell a year and gets a good deal. Dell wines and dines him right so why would he look elsewhere for an SGI box when the Dell machine can do it for cheaper?
I think it comes down to getting the message across and that’ll depend significantly upon price point and marketing. Apple has a similar problem themselves when trying to get into a similar space. They win on ease of use and applications specifically in the graphics markets. It’ll be interesting to see if SGI can grasp a large share of the 1TF market.
Another point to make is sales force size. Dell has a massive sales force whereas SGI has a smaller one. Does it make sense for SGI to expand it’s low end sales force to aggressively pursue the lower end HPC?
Not sure, thanks for the explanation.

8. “That is, until Amir???s company releases their spreadsheet tool and takes away the pain of multi/many core). Not being facetious, I do believe he is onto something good there.”
I seriously hope so too. We don’t have any Harvard dropouts working for us though so we might have a little trouble building up hype.

9. I see Clearspeed are now highlighting the fact that the GPUs are prone to generate computation errors ie.
Crash more often
Reduce the reliability and stability of any system
So the clearspeed chip doesnt make errors. How big an issue is this?

10. Good point above. Are GPUs practical for anything other than graphic applications? ie. can they be used for financial calculations? They dont appear to have error correcting capability which is surely important. Not so in graphics where the odd pixel being the wrong colour will be unnoticeable when 30 frames per second are being displayed.

11. I am not sure how accurate the statements of them being “inaccurate” are.
Specifically, most of the software we have seen written doing FP calculations often don’t handle NaN very well at all. Most crash with NaN.
Pentium and others have had (and likely will have) bugs in their FP hardware. I’ll go back and look at the architectures, but I rather doubt that the Clearspeed has anything more than standard IEEE754/854 semantics.
This said, it is valid to ask whether or not the GPUs are compliant with IEEE 754/854. Arguably, for 30 FPS operation, they probably don’t need it. I suggest looking at the lecture here specifically on page 11, deviations from IEEE754.
Basically, MAD’s are not compliant (and this has been a problem with MADs across multiple hardware types, so this is nothing new).
Division is non-compliant, differing by 2 units in the last place. This can be an issue with interpolation codes (been bitten by that with the original pentium bug) or spline codes. This isn’t that serious, but it is important enough to be aware of. Most hardware implements some sort of un-rolled Newton-Raphson approximation to division. If you don’t unroll it enough, you get these sorts of issues. I remember an issue with someones chip in the past relative to this.
The not all rounding modes being supported could be an issue. You might need to add this to your code if the HW doesn’t quite support it.
The big one is the denormalized/FP exception (FPE).
You do need those, or you need to do some gymnastics to detect FPEs. For example, if I say
a = 1.0/0.0;
this should trigger an FPE. If FPEs are masked (lots of folks run this way), a would be either NaN or INF. The other issue with denormalized numbers is that not supporting them technically decreases the dynamic range of the computation, which effectively reduces the number of bits of accuracy.
The latter is an issue for code very sensitive to roundoff error. Search this blog for an earlier discussion of this.
I believe Clearspeed fully supports IEEE 754. However, it is … well … disingenuous to claim that GPUs are less accurate due a lack of complete support for IEEE 754.
As for silent generation of errors (the FPE and denormalized number lack of support), this is more of a concern. It is one that can be worked around (people were doing computing long before IEEE 754).
Of course, none of this solves the platform problem that Clearspeed has. They have good kit, but it will be in orders of magnitude fewer machines than their competitors. I don’t see a good way for them to solve that.
Worse, with the forthcoming Nehalem (yes they exist, yes I have held one in my hands), there may no longer be a performance advantage to CSP. Even worse, with Larabee, it is likely that both GPUs and anything similar to them for computing will feel a significant pinch. Chips will use standard x86 ISA from what I have read (will go back and re-read to be sure). Have lots of FP cores.
I don’t know if CSP is involved in Larabee. Maybe sold Intel some IP. Who knows. Worth asking them. But CSP by itself won’t be able to move the volumes of chips it needs to move to counter the likes of nVidia or AMD.

12. Clearspeed released their interim results yesterday. Another disastrous set of results and it seems the penny has finally dropped. They are slashing running csots to under ??3M, down from ??16M in 2007. They are focussing on individual projects with customers. So they’ve obviously given up trying to compete with the likes of nVidia. Obviously there is going to be a lot of redundancies with the majority of staff going. Its cost a lot of money for Clearspeed to finally realise they were fighting a losing battle.

13. @Doug
I just looked on Yahoo. Didn’t see interim results release. I do see the formation of the Federal systems on the 23rd. Hmmm…. maybe they are going to load all the IP into that group, so that it can service/pursue new contracts with the military.
Ok, found them here.
Highlights:
1) Revenue growth of 213% over last year. Unfortunately this is £0.5m (about $1M) today, £0.1m last year. Quick ballpark estimate: at £3k per card, this would be, oh, 167 cards sold. At £1.75k per card, this would be 286 cards. This assumes the SDK is free by the way, but it isn’t. 2) This one is a doozy: R&D costs in the first half amounted to £3.2m; these will be specifically controlled and only incurred going forward to fulfil customer orders. The Group’s cash position at 30 June 2008 was £13.9m (31 December 2007: £19.6m). take home message. They are burning through £3.2m of R&D per year, and have £13.9m cash on hand at the end of June, so they are effectively terminating (my word, complain to me if you disagree with this wording) R&D activities that are not already paid for as part of existing orders. I would probably think they would retain the ability to do custom engineering as well, but … 3) restructuring (aka RIF, layoff, redundancy, … insert your favorite euphemism) The annualised savings from the restructuring programme implemented in June 2008 are expected to exceed £3.0m. The proposed restructuring for September has commenced with a consultation process with the objective of reducing the Group’s annual cost base to less than £3m. They are cutting staff to get to £3m. Ignoring other costs, and assuming that salaries average £50k, this is 20 people per £3m. Or 60 people total. Of course, adding in other costs (facilities, operations, etc.) it might be closer to 50 people. If you read the outlook statements, they point to the difficulty in getting traction in the EU and US. They indicate some traction in APAC. They point to the slow growth of acceleration as the reason why revenue growth is slow. I disagree. Acceleration growth has been incredibly rapid. The issue impeding them is that they have to evangelize their model of acceleration, as well as their hardware, and justify their cost structures in the face of competitors using alternative technologies with similar/better performance on some things, worse on others, but much lower price points. That is, nVidia, and to an extend ATI are seeing significant growth and interest. Cuda is rapidly becoming a defacto standard, and AMD/ATI need to address this (either make something source compatible or work with Intel on a Larabee source compatibility … something I think is … er … quite unlikely). CSD isn’t Cuda compatible. This gets to an important point of product market targeting. You have software X, you want to get out to the maximum number of people. To help make your program faster, you want to target an accelerator platform. You survey the market and see what you see. Do you write code for every accelerator? Or the dominant one(s)? That last sentence is the critical. Each additional platform you have to code for (and each accelerator is currently an island unto itself), adds cost to you as the developer. Cost to develop and support. You as the business person are going to look at this and make a very simple determination on cost-benefit. If you are looking to grow your installed base, you are going to look at precisely how to move this application onto the most number of machines. Being a saavy business person, you know for a fact that increasing your customers costs will reduce adoption (which is one of the things driving accelerators and multi-core, but that is a topic for another conversation) of your new tools. So you want to make sure you deploy your new tools which can run on the widest array of end users machines, and allow you to minimize the cost of support to these machines. Being a smart business person, you target ubiquity. Since the coding models are very different for CSD, ATI, FPGA, …, the important thing is to enable the maximum number of users to use your code with the minimum expenditure of your precious developers time. Which means you have to make decisions about what to support. Technological arguments notwithstanding, this almost always works against harder to program and less ubiquitous systems. If you are designing a modern accelerator, you should either make it x86_64 instruction compatible, or Cuda compatible, IMO. Any other scenario is not one designed to succeed (again, IMO). If CSD made their accelerators handle Cuda code, they could compete for nVidia’s business on nVidia’s turf. Understand that nVidia is having a tough time convincing customers to use the more expensive card versions, as the lower end ones have similar capability. The point being that people want “good enough”, as this usually has a better price performance and price advantage than “great” for their choice of accelerators, computers, infiniband/10 GbE, storage, etc. This means that, if you have a$500 nVidia card, and a \$3000 CSD card, and they run at similar performance, it isn’t hard to see which one people will purchase. Additionally, with CSD lucky if it has shipped more than 2E+3 units (2000) since inception (we have one in the lab), and nVidia and ATI shipping 2E+6 (2000000) or more since inception, which of these do you think an ISV would rather target for a market? One of the fraction of the 2000 that might be in your particular market, or a fraction of the 2M that might be in your market? I would imagine the latter.
This is why I think their revenue growth is slow. It isn’t a case of the accelerator market revenue being low, it is a case of competitors not sitting still and demanding premiums for what will be effectively commodity parts.
I get the sense that the board actually agrees with this assessment:

The Board believes, despite its confidence that the Group is well positioned at the leading edge of high-performance processors for supercomputing and embedded systems, that it is prudent to focus on controlling costs and also to be cautious about the timing of future revenue.

Its one thing to build a better mousetrap. It is quite another to get people to buy it. Especially as other new mousetraps have come along, and work with more … standard cheese … in their traps. This is what CSD has needed to do. And haven’t done. Which is why they now have to slash costs, halt R&D, and focus on existing project deliveries.
It saddens me to see this, but this is business, and HPC is littered with the carcasses of companies that have tried and failed. Good ideas and products are great. HPC has now gone commodity. Which is why nVidia and others are going after it.

14. “They are cutting staff to get to ??3m. Ignoring other costs, and assuming that salaries average ??50k, this is 20 people per ??3m. Or 60 people total. Of course, adding in other costs (facilities, operations, etc.) it might be closer to 50 people.”
Youre out by a factor of 3 there Joe. It looks like nearer 180 will have to go.

15. Just checked the accounts. Employee costs were 7.8M for 2007 and that was for 105 employees. So 67000 GBP per employee. In addition directors were paid 0.7M. Costs in 2007 were ??16M. Depreciation, amortization accounts for about ??0.5M which leaves ??7M. 7M spent on marketing?? Seems unbelieveable given how little kit theyve sold. Even last years revenues of ??1.2M included ??0.5M from the BAE licensing contract.

16. @Doug
Ow… worse than I thought ( £17k off). Are you sure £7M spent on marketing? I thought there was about £3M on R&D.
It wouldn’t surprise me to hear that they had to have specific chip runs of several thousand (and thus about £1M) each which would seriously eat into this. This is basically the ASIC problem … you need a huge market to go after in order to amortize the production costs across many chips. This gives you economies of scale that are hard to beat. Which is why GPU/CPU makers appear to have an unfair advantage relative to specialty chip builders.
I could guess £2M for sales and marketing. This would be largely evangelism, a few early design wins (they needed to aim for motherboards). I am not sure the really got enough traction though.

17. Well heres the 2007 (and 2006 for comparison) financials
http://fool.uk-wire.com/cgi-bin/articles/200803180701443124Q.html
They’ve averaged well over 6M GBP in R&D expense each of the last 2 years. And last year over 9M GBP was spent on “marketing and administrative expenses”. Its a total disaster for any investor who partook in the placings (50M GBP raised in the last 4 years ) because the share price has been demolished. A very painful lesson indeed for the management.