Irresistable force? meet Immovable object …

There is a strong push (well at least the articles tell us so, and you know, its not like they are ever wrong … nosiree) to move computing into a cloud. This is sometimes a good idea, there are specific profiles which fit the cloud paradigm.
Quite a few profiles actually.
But there are some speedbumps. Literally. Bandwidth has been, and will be, an issue for the foreseeable future. Clouds have limited bandwidth in and out. End users have limited bandwidths in and out. You can “solve” this with Fedex-Net or UPS-Net. Encrypt disks in transit, require multi-factor decryption keys, and ship a couple-o-terabytes back and forth in a handsome carrying case.
Clouds are, if you believe the press and articles out there … the greatest thing since sliced bread. They are the future. They are … an irresistible force.
This may be. But apart from bandwidth, they are running hard into immovable objects.

Software licenses.
If you are lucky enough to use open source for your application, your cost of elasticity is merely a marginal cost of adding the next chunk of hardware, which the cloud providers do a very good job of making as low cost to you as possible.
If you use ISV software, with per-seat licensing … adding the (N+1)th CPU will add a (usually very significant) cost. So your cost of elasticity is “merely” the marginal cost of adding additional permanent licenses. Which completely dwarfs the marginal cost of adding the next chunk of hardware. Or buying hardware for that matter.
This is in part due to the current business models of the ISVs revolves around selling seats. Which provides a very high barrier to usage of the tool, limits the tool usage, and provides some non-elastic revenue for the ISV.
Suppose they wanted to adopt a utilization cost model. Demark time in quantum of hours.
Start with 1 year being 8760 hours. Use a utilization fraction U, which represents the fraction of the year that the license is in use on average. Use L as the license cost per CPU per year. Your current costing model is then L/(8760*U) on a per utilized hour basis. Here are some examples.
U = 25% (1/4 of the time, the customers utilize the license)
L = $3000 (cost per CPU per year).
then your utilized cost per hour = $1.37.
If you want to charge a premium for this, apply a margin M to this so that your billing rate per CPU-hour is (1+M)* utilized cost per hour. For a 100% margin (being greedy here), this is $2.74/CPU-hour.
So, a 16 CPU run, run for 12 hours would cost $526.08 + cost of hardware access. One of our partners provides bare metal systems with 4 CPUs at $0.50/hour. So 16 CPUs would be $2/hour. For 12 hours, this would be $24 of hardware rental cost.
So if you sell 10000 CPU licenses per year, for $30M in license revenue, how many CPU-hours would you need to replace this? For the above example, this would be 10.9M CPU hours.
Now assume that each small group does 3 runs per week of the size indicated above. Thats 3x 192 CPU hours = 576 CPU hours/group. 28800 CPU hours per group per year.
This is interesting as you only need 380 customers of this size to completely replace this revenue. And you likely have more than 380 customers.
So you get the idea. I have a nice little spreadsheet that goes through this model. But replacement is not the only model. Augmentation is a good idea … enable people to elastically enhance their simulation capability. On the fly. Pay for what they use. This lowers the access barrier, enabling many more users to access the code. It enables users to occasionally scale usage up.
Even if they can’t afford that capital for the hardware, they can usually afford the software cost as an expense for a run.
But this requires a sea-change in the way companies bill for their product.
And one thing we see is that companies with business models that “work” are reluctant to change. Even if they don’t grasp that what they perceive to be “working” may not be what their customers perceive to be working.
They are the immovable object in this equation.
I do suspect that the smart organizations are going to figure out how to do the cloud model. And we would be happy to speak with anyone who wants to pilot test this on the ISV side, as well as on the customer side.
Its those organizations who adapt to this model who are going to grow their businesses something fierce. Likely at the expense of those who don’t. I am not playing cloud booster here, I am pointing out that cloud computing represents a new way for HPC users to acquire processing capability, in an elastic manner. This allows for easier decisions, and better control over their costs, with less up-front expenditure on capital equipment. The vendors who understand that this represents a huge opportunity, and who choose to act upon this, will be the ones to thrive.

2 thoughts on “Irresistable force? meet Immovable object …”

  1. Excellent analysis. The per seat license is one of those elephants in the room that everyone just walks around til you need to change the carpet….and noone can get it to move.
    When you start talking about Google or Microsoft numbers of servers (10K*X) even a $10 per server number starts growing pretty fast.

  2. We are getting started on the Cloud. First, I’m not sure what you mean by “per seat” licensing. Mostly that term, the way I’ve heard it used, refers to an entitlement consigned to a specific person or a specific CPU. Does anyone still license software that way? However, what you say also applies to floating licenses as well, which is what we’ve used exclusively for many years. Our pricing model is basically logarithmic; which would mean that each doubling of number of licenses, the user would pay an equal increment of $. Well, maybe it’s not quite logarithmic, but it’s certainly sublinear, because we understand that people have to size their licensing resources (like their internal computational resources) for peak demand, and they’re not going to be using all those licenses all the time.
    Enter Cloud. Yes, it is a sea change. My guess is that what it will settle down to in the long run is sending excess use to the Cloud, at least for “high-priority” jobs (whatever that may mean). Long run, this will likely be done automatically, which, to be done well, requires a metascheduler with reasonable predictive ability to guess when sufficient internal resources will be available to run a candidate Cloud job, in order to determine whether to send such a job to the Cloud or hold it for internal execution. However, once sent to the Cloud, software use, like Cloud-resource use, will probably be charged linearly, or nearly so. WIth this in place, users can look at utilization data over, say, six months and determine whether it makes sense to build out their internal resources.
    But this execution/billing model is not in place yet. What we are doing immediately for our lead Cloud adopters is quoting them for extra floating licenses good for, say, a month at at time to run large Cloud jobs with.
    Billing models aside, I have to say that I’ve seen a few surprises on the Cloud that we never noticed with internal resources. Some of them have to do with data flow, as you’ve pointed out before: inefficient data-provisioning models that you might never notice on your LAN but that you sure notice in a hurry when you’re going over a skinnier pipe. The maximum number of processors you can keep busy when running a distributed job is N=T/t, where T is the run-time of a job and t is the job submission interval. If a lot if data has to be uploaded to the Cloud for each subjob, you can’t take advantage of a large number of processors even if the Cloud can provide them. Some of our workflows uploaded large amounts of redundant data for each subjob. This didn’t matter for internal execution, both because t was small (high LAN bandwidth) and because most clusters aren’t huge anyway, so N is limited by cluster size. Both these things change on the Cloud.
    Another has to do with load balancing, which you might not realize is quite inefficient with your current workflow till you look at an execution profile for allocated but unused Cloud processors that you are paying real money for.
    Both these surprises are causing us to rethink some of our workflows.

Comments are closed.