Oh whatta day: the fisking

Yesterday, I commented on a puff piece article on Windows CCS. Go ahead and read it, the article and the commentary. This morning, I saw a comment on this same article from John at InsideHpc.com. I disagreed with John’s premise, and wrote a long article discussing this. While I respect John, I do disagree with him. But I will do so respectfully.

The rest of this article will be … sarcastic … flippant … and I am going to fisk the fisking post that was derived from John’s on another site. So gentle reader, if your stomach cannot take this, or you don’t like this stuff do skip this article.


This afternoon, I got a pingback from John’s blog to the puff piece commentary. In it the article is characterized as “anti-CCS”.

He offers a thoughtful critique of the anti-CCS article posted by Joe Landman over at scalability.org. Joe was writing in his article about the SearchDataCenter.com piece I was referencing.

I disagree with the characterization of “anti-CCS”. The post was a detailed questioning of the costs and motivations. If they come out negative relative to the options, does that make the analysis “negative”? Are all cost-benefit analysis intrinsically negative to some choice then?

I won’t link to the other blog. Reading it over, well, you will see what I mean. Needless to say, John and I must have significantly differing opinions of what the definition of “thoughtful critique” is.

This individual (not John) has posted here before. And has tried similar tactics. “Thoughtful critique”? No. Not even close. But given our history, I guess I should expect this.

Note that John has commented there as well.

Thanks for the pointer. I like your point-by-point dissection of Joe’s article. I’m really interested in this topic because I think we have the potential for a fundamental shift in HPC that will make us much larger than we’ve ever been.

This is called a Fisking .

Following the wikipedia link

A point-by-point refutation of a blog entry or (especially) news story. A really stylish fisking is witty, logical, sarcastic and ruthlessly factual; flaming or handwaving is considered poor form.

Now why on earth would I say that? From the “fisk”

However, when it comes to Microsoft, he also has a chip on his shoulder the size of the Upper Peninsula–and that lets some faulty logic into his many posts he devotes to ripping on CCS.

on his blog he has an generic picture of da-UP (upper peninsula) next to his point. Witty? eh… doubtful. Ad hominem? Yeah. Poor form? Uh huh.

First, full disclosure: My family owns IRAs which have stock in Microsoft.

Second, my day job sells machines which can and do run Windows and windows applications in addition to Linux. I am very interested in real markets, and highly skeptical of fake ones.

Third, I am a fan of Excel and Powerpoint. Word still grates on me every now and then, but for the most part I like it.

If I had a chip on my shoulder, as he implies, I certainly wouldn’t do two or three. And I would sell off the IRA shares in one.

He (Dan) works for a company that sells windows-only software to build distributed processing environments. These are perfect for desktop grids and grids of other systems. Being windows only he has a built in bias towards that platform, and, as you will see, against competitive ones.

That said, lets move and and see some of the faulty logic, shall we?

In his latest, he compares acquisition costs of clusters with Linux and CCS (note he doesn’t try to make any TCO comparisons, perhaps because he’s read the independent studies that show Windows is cheaper in the long run).

And of course, since it is printed, on paper, and “independent”, it must be true. There aren’t any critiques of this or similar studies. Are there. (hint: more at the end of this post)

An April 2004 report from the Yankee Group called “Linux, Unix and Windows TCO Report, Part 1″ surveyed 1000 IT managers across various types of organizations and found that most believed Windows offered better TCO. But Cybersource noted that it was later made clear that the sample group was taken from a mailing list aimed at Windows system administrators.

Um… uh… so you ask windows systems admins whether it has a better TCO than Linux, you write a report, and 3 years later someone hawks it as if it is fact? Ok…..

There are no studies that say something else, cause that would knock the wind out of the sails here. Wouldn’t it. (see the bottom of the post as well)

The only major independent study to contrast Linux against Microsoft is a report from Germany’s Soreon Research, using data collected from interviews with 50 enterprises, Cybersource said. The report found that Linux had up to 30 percent lower TCO than Windows.

Can’t seem to find that company, but there are more links.

Oh, but this won’t do. This suggests the jury is out on this. There goes his first point. The TCO is known to be hard to know, and recent studies (as you will see in the links at the end) actually tend to show that Linux has a lower TCO.

But then he starts adding spurious costs…

As the size of the system scales, so do per unit costs. In the case of windows, the $469/unit cost means that a moderate 16 node system + head node + file server adds another $8500 to the purchase cost. Not to mention the yearly additional costs of the OS support, the necessary per node anti-virus, the necessary per node anti-spam ??? That would add in another about $1500 or so. So call it an addition $10,000 per 16 node cluster.

I’m not going to debate the OS costs, but anti-spam? On a cluster?

Hey, we agree! It is a lousy idea to do this. Unless you have to. Lets ask the ultimate, and first windows cluster shop, the Cornell Theory Center, what it is they do.


Snapshot of CTC installed software on login, batch, collab nodes

For completeness, here is their link to linux software installed.

I’m not going to debate the OS costs, but anti-spam? On a cluster?

Yes. Look at their site. Might be out of date. I don’t know. But look at their site.

Oh, I guess that is two points knocked down. Those darn facts … they keep getting in the way of a good fisking of Joe, don’t they.

And, while I am at it, let me help with the math

GNU Octave, version 2.1.57 (x86_64-unknown-linux-gnu).
Copyright (C) 2004 John W. Eaton.
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; not even for MERCHANTIBILITY or
FITNESS FOR A PARTICULAR PURPOSE. For details, type `warranty’.

Additional information about Octave is available at http://www.octave.org.

Please contribute if you find this software useful.
For more information, visit http://www.octave.org/help-wanted.html

Report bugs to (but first, please read
http://www.octave.org/bugs.html to learn how to write a helpful report).

octave:1> 469*18
ans = 8442
octave:2> 18*83
ans = 1494
octave:3> 8442 + 1494
ans = 9936

Thats 18 copies of CCS at $469/copy. With 18 copies of antivirus/antispam at $83/copy. Adding it all together, yup, comes out to right about $10k.

Ok, lets go commando, and run without protection. Lets just call it $8500 between friends, shall we?

Back to antivirus on windows clusters. I asked this of a few Windows cluster folks. I got back “you don’t need it”. I asked them why, and never really got an answer. Now we have enterprise customers for whom it is mandated that firewalls go on and up, antivirus gets installed before that windows machine is ever let on a network, and there are corporate edicts that include firing for removing or disabling such tools. Doesn’t matter, desktop or server. They are treated the same. I am not aware of many sites that don’t have similar policies, maybe some more or less draconian.

I note with distress that the agency missioned with our defense, is under constant barrage of cyberattack. To be expected I guess, sort of a low level virtual war against us. These hacks are enabled and facilitated by poor security. Unfortunately, if you start out with a platform which is hard to secure, it often winds up that you have exposed yourself in some manner to the attackers. If our defense machines are crippled, will we be able to defend ourselves?

In the corporate world, you have to worry about SOX. It is everywhere. Execs don’t want to pay the price for incorrect and bad corporate and fiduciary stewardship, so they are getting really serious about protecting assets. Draconian in some cases, but its their behind, as it were, on the line. You need to be secure. If you can’t tie down your platform whaddya do?

Thats why firewalls, antivirus, antispam everywhere. The latest nefarious bits are using keyloggers to steal login passwords for single factor sign ons. Multifactor with a second factor being time based and limited lifetime is needed.

Ok, whats up next …

Our customers don’t let their clusters have a peep at the internet. Even within their networks, access is strictly controlled. No need to waste time and money on anti-spam on those nodes, and to pretend that they do need it is just silly. As I said before, Joe’s a really smart guy: he doesn’t need to fall back on specious arguments like this.

Specious? At least one (and the flagship) windows computing customer, at that, believes otherwise. Anyone out there willing to run a windows machine without antivirus, antispam? Anyone? At all?

As for letting clusters “peep” onto the internet, that isn’t needed. You just need one, precisely one, infected executable and you are toast. No network needed. How do you get such an infected executable? Well, some grad student or admin or whomever uploads from a protected/secure server. The exe could have been compromised already, or possibly the protected secure server has been cracked. Again, this is sadly a common anecdote of admins and security folks.

From the above link:

Paranoia?
There are other solutions than unplugging the network permanently. It’s called defense in layers. You choose the least vulnerable, the least exposed, the least targeted, the least commonly used solution and you choose them in layers around you so that each layer protects you redundantly. And if all fails you are ready to mitigate the consequences, learn form the experience and rebuild.

But living with the illusion of security is the worst solution as far as security is concerned.

But you know this, so why am I telling you. I dunno, maybe more of that “specious” reasoning.

Ok, onto server shipments and marketshare. We have hashed this one before, with this person.

Another strange argument Joe makes is

Meanwhile we are left with the indelible impression of a small market segment (CCS) that is not growing as fast as the cluster market as a whole (which means it may be shrinking in relative terms).

It’s unclear exactly what he’s trying to say. Is it that Windows servers aren’t selling? Unlikely, since according to Linux-Watch.com, not only is Windows server revenue triple Linux server revenue, it’s also growing faster.

In the past I had asked him not to put words into my mouth. It seems my request has not been honored. He sets up this straw man to knock it down. Sadly.

Ok, onto the market analysis. Linux-Watch from May 2007 reports

The server market is back, and Linux is helping, IDC reports. Linux servers posted their second consecutive quarter of double-digit growth and now represent 12.7 percent of the overall server market, or $1.6 billion for the first quarter of 2007.

The latest quarter was a good one for servers in general. Factory revenue in the worldwide server market grew 4.9 percent year-over-year, to $12.4 billion for the latest quarter. This is the fourth consecutive quarter of positive revenue growth and the highest Q1 server revenue since 2001, IDC said.

Sadly I don’t have the exact figures for that second consecutive quarter of double digit growth, but I can do a lowball estimate. Lets assume double digit growth is 10%, the lowest double digits we can do. Growing 10% per quarter means growing 40% CAGR. That ain’t bad. But I don’t think this is what is meant by it.

Ok, lets assume that really, they didn’t mean second consecutive quarter of double digit growth but double digit growth relative to the previous year. Ok, this knocks that Linux way down to … uh … 10% growth.

Oh. Thats still not bad.

So how did Windows do?

Microsoft’s Server 2003 showed surprising strength. Microsoft Windows server revenue was $4.8 billion in Q1, showing 10.4 percent year-over-year growth and gaining 1.9 points of revenue market share over Q1 of 2006.

10.4%? Darn it, thats higher than the 10% I posited above. IDC continues

According to IDC, this was the first quarter since IDC began tracking Linux server spending in 1998 that Windows server revenue has grown faster than Linux server revenue.

Yup, indeed. 10.4% is indeed greater than 10%. I yield the point. For the first time since 1998, windows has grown faster than Linux.

Point, Dan.

Making the statements easier to digest:

1) the data indicates that Linux is growing faster than the market
2) the data indicates that Windows is growing faster than the market

I had assumed the January data in writing this note, where indeed, Linux was growing faster than the market, and windows was, as I remember, treading water or growing slightly less fast. Call this an old or bad memory on my part.

As indicated, the new data shows a whopping 0.4% growth rate advantage this quarter (first time in 9 years) to Windows. Yes Dan, you were right. So right, and I was wrong, so, so very wrong. Oh wait, thats sarcasm…

But here we go again. Remember, I have had trouble with him before trying to put words into my mouth. He seems to like to set up a straw man, tell the world about it, then go knock it down. Now he says …

Maybe he was saying that the growth in HPC isn’t in the small market segment…but that isn’t true, either. According to IDC, the greatest growth in the HPC market is in the capacity (under $1,000,000) segment.

Of course, if he read my blog he would know that he is the only one saying this. I have been talking about how the small systems are driving the market for years. $25k and below and $50k and below. Our own sales data indicate this. So do many others. This is part of the reason I make arguments about the cost of accelerator technology. Anyone thinking that accelerators will be selling for $20k enmasse is fooling themselves. Accelerators are a disruptive technology. Highly disruptive. Expensive ones which are hard to program are going to fall by the wayside. From what I can see nVidia grasps this, and even if CUDA and the existing platform aren’t exactly where we need them to be, they are going to get there soon. And that means real, personal, supercomputing. This is the growth market I am interested in. I don’t see nVidia selling thousand cluster units. I see them selling millions of one/two off units. Same with Cell. A good Cell based machine will not run (hopefully) 10k$. But I digress.

So Dan is trying to paint me as saying what he said, knowing full well that the blog history here indicates that I think the opposite of what he said. Well, such is life. And such are blogs. Facts be damned, full snarkiness ahead … or some such.

Lets go on.

He goes one step further when he compares ease-of-use: he compares a one-line shell command (to list the tasks on a node) with a 30-line PowerShell script to point out how much easier Linux clusters are to manage.

Not to manage. To use. It was just one example. Is there a one-liner for Powershell to do this?

It’s a faulty comparison, and Joe knows it.

Really? I am not aware that it is a faulty comparison. How is it faulty?

He chose a script that iterates through every node on a cluster, writing out all sorts of information about the node and the tasks currently running on it.

Ahhh…. so you want me to iterate over the nodes and gather data for you as compared to using the scheduler to gather that data for you. Remember, this was about getting what is running on the node. We could always do

ssh node_name "ps -ealf; cat /proc/cpuinfo ; cat /proc/meminfo

or something like that …

But regardless, I want to know how demonstrating significant power in the palm of your hand as it were, is somehow faulty. Patrick at Microsoft pointed me over to this site, and I looked for a common task. What is running on a node.

That said, I was specifically looking at how to get information out of windows cluster. It appears that it is not a simple process. The powershell assumes you know quite a bit of windows internals to do this. Yeah, I could write a similar thing for Linux, but why bother when you have tools that will do it for you now?

Of course, one could point out that if the site used LSF, or SGE for windows (and I think they can), they ought to be able to get out “similar” information. But rather than point at the “easy” (but not invented here) solution, you have to use the windows internals.

Ok, simple challenge. Suppose you wanted to report on your entire clusters usage of swap to its total memory as a number. This is useful to see if any of your nodes is thrashing under a load. How can you do this in windows? I think you need to use the above script and modify it some.

[root@crunch-r ~]# cat swapratio.pl
#!/usr/bin/perl

while(<>){if($_=~/(Swap.*|Mem.*):\s+(\d+)/){$m{$1}=$2}}; printf "%s\n",($m{SwapTotal}-$m{SwapFree})/$m{MemTotal};

[root@crunch-r ~]# pdsh "cat /proc/meminfo | ~/swapratio.pl"
itanic: 0
autoinst: 0
minicc: 0
dualcore: 0

Here I assumed that the script was moved to each node. Could have put it on one line, but that gets ugly. The point being though that you have lots of power. Huge amounts of power available to you. You have choice on the tools you might wish to use. Every choice has a cost.

Note that with the right libraries, we could make that swapratio.pl code work on windows. Same as on Linux. Scary. This is what I had been hoping Microsoft was going to do. But they didn’t.

He’s comparing one apple to a bag of oranges, and complaining that the oranges are too heavy.

Actually Dan, I was comparing apples to apples here. Had they elected to use SGE/LSF tools, they could have had this as well. But they didn’t. This is their indicated script to do what we can easily do without it.

You know, I often get similar comments from Java programmers when they tell me about the team they put together to solve the problem that a simple module from CPAN handles for you. You make your choices on the tools, and you get all the costs and benefits of the choices. These days, people have been finally waking up to the real costs of java systems, and are finally looking seriously at dynamic languages for their codes. About time. If you can do in one line, what it takes another code 100 lines, why on earth would you use the 100 line method? Same issue as above. Huge concentrated power, and the freedom to exploit it as you need. Eventually I expect windows ccs to get there as well, if Microsoft keeps at it.

The fact that PowerShell is available for Compute Cluster Server is a very good thing–Windows has long lacked powerful scripting tools (I’m sure Joe will back me up on that),

The first part, powershell being available is a good thing. Windows has powerful scripting language tools. Ask ActiveState. Exposing the internals to scripting via language linkages, yes, this is a good thing. Again, ActiveState, etc.

This is a point I have been making for a long time as well. A well designed cross platform scripting environment could go a long way towards alleviating real pain. I don’t think Powershell is it, but it is a first pass. Pdsh isn’t quite it either, that is more of a transport. I am not advocating bash or tcsh for windows. I think shell scripting should be kept small, and dynamic language scripting (Perl, Python, Ruby) should be used. In Perl, many of the modules are smart enough to “do the right thing” on windows. Some modules will not work on windows (and this frustrates me, I know of no equivalent to unix Sudo on windows that I can run from a command line).

But thats a post for another day. Though this is a serious topic, deserving of serious consideration.

Back to the “specious reasoning”

Ah, Microsoft expertise. That’s my final point.

In concentrating on acquisition cost and ignoring total cost of ownership, Joe is ignoring the thing that is selling the most CCS clusters: people who are already managing many Windows servers can easily manage CCS clusters.

We don’t see CCS clusters. And that has been my point. They appear to cost more to acquire, and I haven’t seen data that contradicts this to date. Feel free to point it out for identically configured systems.

On the other hand Dan implies that no one is managing many Linux servers, and therefore the TCO is higher. As noted, the jury is out on whether the TCO is a massive win or a massive loss. It appears to be a wash at first pass, though as you will see below, there are others who point out a strong win in Linux’s favor.

And given that Linux is a growing segment, 12+ percent of the market, it is quite likely that in the TCO analysis, there is no problem or significant additional cost in managing Linux clusters. Because when you manage a cluster, in large part (well at least in Linux), you are managing the infrastructure nodes. The head node, file servers, and so forth. Compute nodes are disposable, you don’t manage per node, this does not scale. Management has the same effort for 1 or 10 or 100 for an administrator, modulo hardware costs and system design. That is your admin costs should not scale linearly with the number of compute nodes. Unless you are doing it wrong.

The people who are buying CCS clusters are people who don’t have Linux experience. They’ve got Windows desktops. They’ve got Windows servers. They’ve got Active Directory and SharePoint and loads of .NET developers. And they want to add a cluster.

Good. Linux will slide right in there. Centrify and others pre-installed/configured to handle AD. Mono handling .NET. Linux experience is not that hard to acquire if needed. Lots of admins out there for you to choose from. Lots of MCSEs looking for ways to differentiate themselves from the pack, and show their flexibility and value. Many of them are already running Linux, or talking about it, so it won’t be a stretch.

What’s the natural way for them to do that? Windows. I’m not going to try to pretend that CCS is objectively better than or easier than Linux–I think that depends on your personal experience and expertise. But, remember that outside of HPC, Linux’s market penetration is very, very small. And not everyone finds it easy.

12% in the server market, a little more than 1/9th the server market, is small. Ok. Very very small? No. Is it growing? Yes. Faster than the market as a whole. This of course covers shipped servers, and doesn’t deal with servers that have been converted. We have seen quite a few of these.

Ok, back to the FUD fisk

So for a small to medium size business (or a department within a large business looking at a cluster), add this cost in to your Linux cluster acquisition: the $100,000 you’ll have to pay to hire a decent IT guy to run an OS you’ve never seen. Wow. TCO just got a lot higher for that cluster.

This is of course, not true. We see two major thrusts: 1) Linux training in house to build the expertise, 2) outsourcing support. The cost to manage Linux for some of our customers is about $2-5k/year. This is with no Linux knowledgeable staff in house. For some of our mission critical customers, we provide first/second line services. Even on-site services. So they don’t have to. Saves them money, lets them train staff to handle first line. OTOH, adding additional responsibilities to windows admins, already overworked, doesn’t save money.

Objective measures of TCO aren’t necessarily in windows favor, even with an existing large body of windows admins.

Moreover


Production results show Linux administrators can often manage
more systems that Windows administrators in a given amount of
time, resulting in reduced management costs and less overall
complexity in management activities.

The study that reports that, was commissioned by IBM. Obviously IBM sells lots of Linux. And Windows. And some Solaris. And AIX. And … Yeah, its old too. 2 years. Like the Yankee Group study.

There couldn’t be other studies that contradict the TCO bits you say, could there be? I mean, that would be wrong.

We can keep throwing TCO studies at each other all day long. The recent studies I have pointed to have demonstrated that in some use cases, the TCO for Linux is much better. In others not so good. Couple that with the horizontal scaling, the virus resistance, and well …

The facts remain in the end. Linux usage is growing, rapidly, faster than the growth of the server market as a whole. And that means that it can’t really be costing everyone more to do this, or they would not do this. That is, you can exclude some theories by observation of relevant facts. Fact is it its use is growing faster than the server market as a whole. Fact is that companies are very TCO/acquisition cost conscious. The only theories that fit that well are a TCO wash or a TCO advantage to Linux. I cannot fathom how a TCO advantage to windows would be able to explain the Linux uptake. I can imagine that windows growing faster than linux by that whopping 0.4% (estimated number, could be lower or higher, but not likely by much) is a transient phenomenon, or marketing dollars driven. Microsoft does pay marketing dollars to some vendors to help them get into market with their products. Most companies with lots of marketing money do.

End users have to either request Linux on their servers, or wipe an existing windows server. I could come up with some guesstimates as to the latter, but I think it is simply best stated that the number of Linux servers indicated in the various reports is a fraction of the total number. Large or small fraction, who knows. That study has not been done yet.

Same with desktops. Few vendors supply Linux on desktops. We do. This number is growing though.

continuing

That’s why CCS is valuable. Not because it’s necessarily “better.” Not because it’s necessarily “easier.”

Because it’s familiar. Because it is very easy to integrate into a corporate infrastructure that already has so many Microsoft products.

Well, I won’t argue with better/easier. Dan was making the easier argument a minute ago.

That’s why Microsoft insists that CCS is bringing HPC to the masses.

I disagree that this is there reasoning. HPC has been coming to the masses long before Microsoft got into the picture. It has been on this trajectory for 2 decades, and showing no real signs of slowing down. Linux is just the latest (now 8 year old) HPC systems contender. It has been growing for a long time outside of HPC, and very rapidly within HPC. It is having some interesting side effects, such as end users finally evaluating Linux as a desktop, and these evaluations coming out strongly positive.

My thoughts are the Win xp x64 is the closest thing to a reasonable windows on a cluster node. It is (relatively) lightweight, and should be easy to set up. If we could just strip out the junk we don’t need it would be good. If we could only network boot it (diskless compute node), we would be happy. Patrick from Microsoft alluded to mixed windows/linux clusters in a previous post. We have been talking about them and working on building such things for a while now. Unfortunately xp 64 is going away when XP goes away, and all we will be able to work with is W2k3 CCS.

But again, that is a point for another time.

Back to the last bit

I know Joe understands this–he’s written well reasoned posts on the topic before. But when Joe starts making such specious arguments as he makes in this post, he’s no better than the “marketing types” he derides.

Specious? No.

Well reasoned (my “puff piece” post)? Simple number crunching, and I didn’t see serious holes in my numbers.

Opening a salvo with “poof piece” should set the tone for the rest of it. It was an attempt at a fisking, and not that good of one at that. Rather than argue facts and numbers (like I did), we saw hand waving, ad hominems and other weak argumentation. The straw men were particularly funny. I am sure he doesn’t have copyright usage allowances for the images, but I could be wrong. Copyright, the foundation of open source, is rather picky about these things.

I have to admit being saddened to see John indicate his pleasure in seeing this fisking attempt. There was little in Dan’s article that was well thought out, and there was lots of hand waving, straw men, and word insertion. Oh well.

Finally, I would like to note that I see two distinct camps emerging, with a few hearty skeptics like myself in the middle. First camp are what I would call Microsoft Fanboys (and girls). These are people for whom Microsoft can do no wrong, and everything else must be bad. Second camp are the Free software types. These are people for whom anything that is not shared is bad. There are very few of us (grizzled?) skeptics in the middle. The folks for whom hard numbers matter more than anything else. All I know is that when I point out problems with Microsoft and business models for CCS, we get these fisking attempts (and some emails from Microsoft). When I point out problems with open source side such as the business model issues, I get some harsh emails.

Yeah, reality is tough. I remain highly skeptical of Microsoft’s CCS. I see a company trying to do all things: out Google Google, out ipod Apple, out PS3 Sony, out Linux clusters, out netscape netscape (oh yeah, they did that already), out java Sun. I like all the Microsoft people I have met, even if I don’t agree with them.

There is a place for CCS. I remain unconvinced that the replacement paradigm will work, I think that will fall by the wayside. Patrick gave me some hope that Microsoft is seeing side by side as a better way, and I agree. Even better would be boot on demand, though we need diskless (iscsi or CIFS/SMB/NFS) booting to enable. We can do that today (I have an example worked out) but it is a VMware kludge using Linux booted diskless to fire up windows (and solaris for that matter). This will change. This is something I would like to see us work with them on. This would be useful.

Viewed 8636 times by 1753 viewers