"Unexpected" cloud storage retrieval charges, or "RTFM"

By joe

January 18, 2016 - 3 minutes read - 561 words

An article appeared on HN this morning. In it, the author noted that all was not well with the universe, as their backup, using Amazon’s Glacier product, wound up being quite expensive for a small backup/restore. The OP discovered some of the issues with Glacier when they began the restore (not commenting on performance, merely the costing). Basically, to lure you in, they provide very low up front costs. That is, until you try to pull the data back for some reason. Then it starts getting expensive. There were many comments about this, including that his use case wasn’t the target use case, his example was a poor one, as he didn’t RTFM, or the fine print in this case, and thought “gee, $0.05 USD/GB storage”, convoluted/painful pricing algorithm. There may be some truth to some aspects of these. The target market one is interesting, as is the pricing. We’ve had many customers talk to us about doing similar things in the cloud, and asked them what they would be willing to pay to recover their data. I wish I could capture the shocked expressions on their face when we mention that. Pay to recover it? But its “$X USD/GB per month”. No, no it isn’t. And the DR/backup use case? Nope, not even close. Wrong tool. But people don’t pay attention to that. They pay attention to the “$X USD/GB per month” and figure they will adapt their use case to this. So now, lets have you recover 100TB of data because a data center went “boom”. How long will this take, and more importantly, how much will it cost? Well, $0.011 USD/GB for retrieval. So 105 GB x 0.011 USD/GB = $1.1 x 103 USD. Oh, and then there are the network fees atop this. My point should be fairly obvious. The “low low prices” are for very specific use cases, designed specifically to pull you in, and make it expensive for you to leave. For the various benefits of cloud computing to be as useful and utilitarian as possible, you need the ability to be able to roam between providers of capacity (commoditized) computing and storage. Despite many protestations to the contrary, not only do you not have that today, but you are locked in, more firmly than in the past, with these systems. Which if you are looking at derisking, you not only have to contend with a massively larger attack surface, but possible non-deterministic costs. This is superior … how? Basically cloud is about using someone elses resources, and paying them for the privilege, so you can reduce your capex, and load up on opex, which you should be able to scale up and down as you need. That is the theory. The issue becomes when you need to alter the workflow to adapt to an issue … any issue … and you suddenly discover that the opex can be very … very large. Balancing between these is going to be the game for many folks going forward. If you don’t have infrastructure in house, just like outsourcing other things, you are now far more dependent upon your supply lines. If you have a widely variable business demand, with a nearly constant data bolus, yeah, cloud is likely to be the most cost efficient, even with these other issues. Other use cases … not so much.