# This one hits it out of the park …

On James’ blog
Heh. I think we’ve had and seen others have this conversation before.
RAID is not a backup. Backup is very important.
Ok, I did burst out laughing. The low level scan of 1PB of data to find data on the “no_backup” folder …
Yeah.
Customer has a file system. We’ve asked them “is your data important” and they’ve answered “no”. And we try to really get whether or not its important out of them, as they didn’t spend money on a backup, and there is the potential for a single failure to take down their data. Increase entropy in the universe and all that.
Sure enough, the not so important data becomes absolutely precious and must-recover-at-all-costs once there is an issue on this scratch file system.
I especially liked the Newegg parts. The “we can build it ourself” parts, and the wanting 10PB for $5k USD on a$50M USD grant. Yeah, I hate to say, we’ve seen, and heard similar requests.
But any relation to faculty members of a prestigious east coast university? Nah… Not likely.
I lost it at “I’m kind of a big deal.” Got a good chuckle out of that.

### 20 thoughts on “This one hits it out of the park …”

1. I’m not worthy of this most awesome blog post.
I leave this comment for all of us soldiers and troopers at the front line, so I’ll take this one for the team.
p.s. Where is my data? I’m kinda big deal after all!

2. Utterly agreed. Some people do not listen.

3. I highly second Joe’s comments and more importantly, James’ movie. I think I need to show that to all the customers I talk to from now on. It is so, so, so accurate.
On a more serious note, I’m always ask IT customers whether they need a true backup or just a copy of the data. I’m surprised when they look at me and look puzzled. These IT experts really don’t know the difference and the difference can cost you ALOT of money, time, and effort. I have to calmly explain the differences but what I really have to do is tell them how things work from an admin perspective and a user perspective. Once I do that they get it. However, sometimes, they ask for quotes for both a true backup and just a copy of the data. This wastes a huge amount of time on our part and nothing is really ever learned except that true backup can be really expensive at scale 🙂
Now that I’m off my soapbox, let me also say that I laughed very hard at the comment, “My Postdoc says they can do it much cheaper.” All I have to say is BackBlaze!! The ghetto storage for a stupid world.
Jeff

4. Yeah – seen this pattern – esp. in the BIO-IT space very often.
I make the analogy, that those bio-it-guys are like “giving a Taliban budget to buy a nuclear-rocket”.
They just get lots of (govmnt)  – and want now to transtion from a couple of 1TB-USB-drives (storage stone-age) STRAIGHT into the 21st century of multi-PETABYTE storage.
(and they really are convinced, that except for the bigger capacity – all things remain the same as for their 1TB USB-drive…
hahaha 🙂 – funny times.

5. @hrini
For the bio-IT folks whom aren’t beholden to their corporate IT dictates, yeah, we see this as well. Make the hardware so non-functional as to make it effectively useless. I am blown away by how many … bad, really bad, horrible … designs I run into on a daily basis.
@Jeff
[sigh] Yeah. The latest fads we see/hear are these massive front/back arrays coupled with expandors, and a low end HBA. We’ve had people tell us, to our face “No we don’t want to buy this from you for $X, as we want to pay (1 + r)*$X for this (highly buzzword compliant) set of parts (with 0.5 <= r <= 2.5) which may, with a good strong data wind, achieve half of your measured real world performance. And yes, performance and performance density matter to us." I swear, all you can do is start blinking rapidly. I've seen some terrible designs, bad data loss, and horrible performance because people didn't listen, and wanted to "do it themselves". The theory in operation was that they would save money by doing so. True only if they valued their time at zero. If that was the case, I am sure their employer would like to hear that so they could adjust their paychecks accordingly. On backblaze ... I don't want to savage them. Its a big stinking pile of bits though. And I am aware of at least a few Bio-IT folks looking at it.

6. I enjoyed the comic, but it pains me to have you guys turn it into some kind of “my storage is holier than thing” thing. what exactly is wrong with Backblaze-like approaches (to me, they’re simply price-optimizing for infrequently-accessed data: PMs and “cheap” disk controllers are not necessarily a bad idea.) and exactly how does a “true backup” differ from a false one? (are you just a member of the Church of Tape?)
seriously, these comments have been uncomfortably close to BOFHisms.

7. @Mark
I don’t see anyone going there. There are quite a few folks who build their own kit for deployment, and some (small) fraction are actually successful at it.
Price optimizing is fine in context: A group of USB connected SATA drives in a tree is a highly price-optimized system. Would you want to deploy this and use it? A fairly large number of these cheap disk controllers are horrible. Marvel chipsets in particular (mvsas and other similar code) have a well deserved (negative) reputation.
This is not to say that more traditional vendors have only good gear. Look at all the cheap Broadcom NICs around in [insert the name of your favorite tier 1 vendor] gear. This isn’t a brand name thing. Its a design thing.
Ok, a simple analogy (yeah, suspect, I know). Suppose you are really cheap, and you don’t want to pay hundreds of dollars for a 10GbE NIC. You don’t need it, so you decide to channel bond 10 really cheap 1GbE NICs together. You get 10GbE. In aggregate. Best case. Assuming that the drivers work ok, and that the switch understands LACP/bonding correctly, though you won’t get more than 1GbE per client, at least you get 10GbE in aggregate.
Sort of.
Its doable, it will “work” for some definitions of the word “work”. But its not a well constructed/implemented design relative to the simpler and more powerful single card. It will take more slots, more management, more cabling. There’s more room for error, there is more cost from the end users side (in their time/effort to act as an integrator, and yes, you can value your time at $0USD or$0CAD, but your employer doesn’t agree with this, and would likely prefer you to spend your time on the elements where you will generate a return for them).
I’ve been on both sides of this in the past. As a grad student, my advisor would be happy if I spent a day or two to hyperoptimize (e.g. squeeze every dollar I could out of a purchase), even though the marginal cost of that was a) a loss of a productive day (or more) of work, and b) often lost in purchasing/shipping cost inefficiencies and delays.
Backblaze looks to me just short of that hyperoptimization. Is this a good use of X’s time (for X being a person paid by someone else, for whom their mission is to do Y, and not build and support these units)?
This isn’t BOFH-ish at all.
As for tape, nah, been arguing against it for about a decade. In the “olden” days, my thesis advisor had a \$20k USD Exabyte tape library. She had something like 300 GB of data to back up (and this was big when she had it). She used the Exabyte until it gave out. The drive that is. Tapes last effectively forever. Drives don’t. Exabyte refused to fix or replace the drive, preferring to sell her a new one for … quite a bit more money … than fixing it.
Which led me to conclude that while tapes were possibly a good investment, unless the reader/writer came integrated with the tapes, they were/are largely a waste of time. Happily SATA drives come in good capacities, with excellent bandwidths, and last a pretty long time when stored properly. If you are clever on how you back up, you can even tolerate media/drive failure.
Its ok to have a different opinion of these things. Nothing wrong with that.

8. @Mark,
Remember that a backup is one that can go back in time to retrieve data states. A copy of the data is just that – a copy of the data at the particular point in time when it was taken.
For example, if I want a file in its state from 3 weeks ago, chances are that a simple copy of the current state of the data at that moment won’t have it (and I may have changed it). That means I need a true backup system that allows me to go backward through incrementals and restore the data (if you have your backup schedule set up that way).
If I happen to erase a file and want to restore the file, then a simple copy of the data works fine. But if I want the file from 3 weeks ago, then I need a backup and not a copy.
I see IT people constantly confusing the two concepts (backups allow you to go back in time, copies are just point in time). Even enterprise IT people who live and die by this stuff confuse the two concepts – they refer to a copy of the data as a “backup” when in fact it’s not a backup – it’s just a copy. I see seasoned admins including HPC admins confuse the two concepts every day which is why I use the phrase “true backup” so they ask the question, as you did, “what is a true backup?” Then we get to have the conversation about the difference between a backup and a copy (of course this is quickly followed by questions about pricing which means a much deeper conversation has to take place involving policies, etc.). Rarely do they truly understand the difference between the two concepts, but using the phrase “true backups” causes them to ask the question (a conversation starter if you will).
Sorry I offended you with my comments.
Jeff

9. @Jeff Layton
I’m a Backblaze customer so I actually know how their system works.
If you knew how BB worked then you’d realize that, according to your definition, they are a “true” backup. Backblaze will let you go back up to 30 days on any file that has been backed up. While you may not feel 30 days is enough (I’d be curious to know what the archival time of your “true” backup is) based on your tenuous definition, Backblaze is a true backup.

10. @Sam
I know Jeff pretty well, and he does know, in pretty intricate detail, what he is talking about. In my book, he’s up there with Henry Newman and a few others as experts in this field.
Jeff was making a very particular point about replication at a point in time versus a true backup. He pointed out there are some differences that people tend to gloss over.
Many of us who do these sorts of things for a living see and hear all sorts of … questionable … practices. James’ original post (really go see it, it places this discussion in context) shows, in somewhat comic relief, the same type of conversation we’ve all had.
We’ve had customers who have data that is worth profoundly large amounts of time, effort, money, and future IP value, not even have a point in time replica, a copy, of their data on another machine. If they don’t even have that, then their data must not be as valuable as they make it out to be. What’s humorous about this bad situation is the “I’m kind of a big deal”, which is an expression of self-ego/self-importance, which doesn’t factor in one iota when a disk drive goes away. Or a file system bug bites you. Or …
Jeff was specifically pointing out some nuances of backup versus copy. I have to note that I’ve not drawn that level of distinction for our customers all the time, but it is a very important distinction to draw. Merely having a copy may not be good enough. If you need to recreate state, you might need to go back to previous backups, that point in time snapshots can’t do.
My point is another sticking point we keep running into. RAID is not a backup. Many a time I will see customers not making copies or backups as their data is on a RAID. They think that this prevents data loss. It only delays loss, gives the admin time to swap bad hardware out.
Jeff’s point is that you have to be intelligent about the replication, copies, and time-evolved state that you have preserved elsewhere (e.g. the backup).

11. That’s the big issue with so called “comedy”, folks seem to sometimes get the wrong end of the stick. Even the folks at reddit are confused about the stick in the “moon on a stick” comment at the end ;-)) http://www.reddit.com/r/talesfromtechsupport/comments/nof0u/sooner_or_later_we_all_deal_with_this_guy/.
Plus the backblaze facebook page is also probably going to get as amusing… Gleb who owns and runs Backblaze fortunately also gets it: https://twitter.com/#!/jamesdotcuff/status/150398115210477568
Along with Joe and Jeff who I rate so high up on the “have a clue” factor scale it is not even funny. ;-))

• @James
I wrote about “moon on a stick” in a similar context, calling it “the pony scale“. Basically its when someone says, effectively, “And I want a pony too.” You know this isn’t likely to be something doable, or reasonable.
In fact, directly relevant to this animation, is this post from earlier this year, and another one.
Call it moon on a stick, pony scale, whatever. It all comes out to (effectively) impossible to satisfy requests. I shake my head when I see these.
Thanks for the vote of confidence, clues are hard to find these days 😀

12. We’ve been really lucky at MSI in that most of our users trust us. We have a couple of guys like the one in the video (and unfortunately they are the ones that go straight to the Dean!), but given the number of people we service it’s really a small fraction. Having said that I’ve been there.
Jeff & Joe: I think either of you guys could easily build an effective backblaze style box that would fulfill it’s intended role (cheap, slowish, redundant storage). If you build it right it probably wouldn’t even be all that slow. You certainly have to be careful about the hardware (we just inherited a storage “server” consisting of a chenbro 4U case housing a gigabyte microatx gaming board and a bunch of rocketraid cards. Ugh). Having said that I’ve been burned by enough highish-end raid hardware that I’m ready for a paradigm shift.
Show me some JR5s (or a mythical Dell branded SC847a C-series servers) with a couple of LSI2008 (or H200) cards, SSD cache from a reliable vendor, and a throughly tested gluster+zfs or production quality ceph setup and you’d definitely have my interest.

13. And yes, I know gluster+zfs and/or production quality ceph are high on the Pony Scale 😉

14. Not a commercial:

Show me some JR5s (or a mythical Dell branded SC847a C-series servers) with a couple of LSI2008 (or H200) cards, SSD cache from a reliable vendor, and a throughly tested gluster+zfs or production quality ceph setup and you?d definitely have my interest.

Ignoring gluster+zfs/ceph for the moment, look at this.
Your comment on production quality glusterfs+zfs and ceph … ok … There was a push a while ago to get glusterfs on Solaris. I am not sure what ever happened with it, though there are some known issues that are hampering the port to *BSD. I don’t think we will see Red Hat pushing hard for a port to Solaris or BSD. They won’t discourage it, I am guessing that they simply won’t commit resources to it.
On Ceph: You can have it today, although btrfs is still not fully ready. The lack of a real working fsck.btrfs has been a sticking/sore point for a while. Chris Mason is loathe to put out a half baked version that he has to support as well as the working one. Sage Weil and group have been working very hard on Ceph, btrfs, and many associated things.
We’ve built a few Ceph clusters, should have another one up early next year with the 3.2 kernel drop.
The point is that we do have this capability. Even performs pretty well.

15. Hi Joe,
Do you have customers using Ceph in production then? For me it sits in the same partially-charted territory that things like openstack swift and exotic glusterfs configurations (IE riding on top of ZFS or btrfs) sit. Full of amazing potential, but there are few people (including myself!) I would trust to deploy a production system using it right now.
Back on topic: It’s all a matter of cost-benefit analysis. PI’s may be computer-illiterate but they probably aren’t stupid. Most seem to be extremely good at squeezing blood from turnips. In this case it’s just that you and I are the turnips. 😉

16. @Mark
Not yet. Would like to see some customers using Ceph.
Actually with the purchase of Gluster by Red Hat, some of our development plans will require re-examination. Specifically we were planning on some additional translator layers for some capability we wanted to develop (translators that interfaced to other tools we are developing). With Red Hat’s control over this, I need to see the roadmap for it before we continue down that path. Would prefer to do it at the file system layer rather than the block layer, but we could work there. Basically using SCST’s block target layer, we can do a few things we’d like to do on a completely transparent basis. But much of what we want to do makes more sense at a file system layer. I guess worst case, we can develop it on 3.2.5 Gluster and wait to see what happens before doing the next bits.
As for turnips … One of the harder aspects about being in business is convincing people to pay for value. Most people like those in the animation, may wish to pay for the sum of the parts, and will ask for volume discounts in the process, even though they aren’t buying volume. That is, they will claim the value to them is the summation of the values of the sum of the parts, and nothing more. This is the backblaze/newegg approach, and it masks a number of critical aspects. Not the least of it that it doesn’t work very well, doesn’t scale well, and is incredibly hard and expensive to support.
A business can’t remain in business if it makes losses or cannot cover its costs, and turn enough of a profit to remain investing in the business. Basic axiom of commerce there. Money doesn’t spontaneously appear, costs aren’t magically covered out of other accounts, there are no wheelbarrows filled with money sitting around the corner that we can tap into.
As a general rule, when a customer demands a product at what they believe to the sum of the cost of the constituent parts, we walk away if we cannot convince them that this valuation method is incorrect. Its exactly analogous to walking into a car dealership and valueing the car by summing the current value of the metal, rubber, plastic, cloth.
The folks James indicated in the animation would apply a valuation function like the above, and that would, arguably, provide a “cost of materials” of some measure, not what they need, but a hard lower bound on what they should spend.
Some PI’s during my time at SGI, actually asked for completely free gear. Never mind the entertainment value associated with the arguments for this (e.g. “training the next generation of scientists, who may buy your gear”). This is always a bad deal
We have one proposal now, where we didn’t want to give it to the customer, but our partner did, so we acquiesced. Values everything at (in some cases, slightly below) our cost, and if we have to ship it, it would be cheaper if I drove it. Our partner really wants to win this over a competitor that’s failing (loudly at that). This business is business we’d normally walk away from though, as we won’t recoup our full costs. The customer would need to give one helluva set of positive quotes, videos, etc. for us to recoup any value from this transaction, and as its a university, the probability of this is very low. Part of the reason that the competitor is going out of business has to do with taking deals like this. We can’t include anything more than baseline warranty support in this, and even with that, I sincerely hope that nothing happens to the unit, as we will lose money if anything does. This is not business we’d pursue. Customer may think they are getting a great deal, but if they wind up driving competitors out of the market by demanding too-cheap a solution, well, thats gonna happen, their costs are going to rise and they will have less choice (and less bargining power) going forward. Exactly how is this short term thinking a win, when you lose in the long term?
Basically what I am saying is that PI’s can squeeze, and at some point they will drive out all the competitors with something of value. Which … is actually not in their interest. I don’t see many people putting cars together from parts. Most people do this because they want to, not because they need a car. Its not critical for them that everything (radio, signal lights, …) function the way it was intended in the vehicle. Yet for transportation, most of these folks will apply a different value equation. Same with what we do.
The next generation HW we are working on will help us increase our density and lower our costs. This is nice. Add to this that it will make serviceability better. There is value in this. There’s value in the time/effort required to make it all work. And in the stacks, and in the support. Again, its all about the desire to get a working solution and fairly compensate for it, or a desire to squeeze turnips. Its my fervent desire to make sure our competitors win all the turnip squeezing deals out there.

17. My concern with Ceph is the btrfs backend. Whilst I’ve been using btrfs for many years now it does still seem to have good kernels and bad kernels – my work laptop (which has a btrfs /home and yes, I do have good backups of it, via TSM) is currently still on Kubuntu 11.04 as that has 2.6.38 which is a “good” btrfs kernel. However, ever since I’ve seen people reporting issues that have meant that I’ve not upgraded to anything newer because those kernels have had many odd issues underneath. Now it’s looking like 3.2 will knock a lot of those on the head so I’ve got my fingers crossed that it’ll be another “good” kernel (for suitably subjective values of good).
Of course there are still other fundamental issues with btrfs going on underneath, like the issue with the limited number of links you can have in a directory and which will require an on-disk format change to fix:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=633062
Hopefully that’ll be an online upgrade format change and not a backup, reformat and restore data change..

18. Well, been there in the scenary the video represents, this blog outlines and the followed blog comments gave colorful spots on.
I felt the conversation a bit agonizing to read as much as to watch the self-imposing/self-repeating character on the video. Now doing the same, been there, done that ,”I was kinda big deal after all!” not far ago. I did build and maintain solutions from bits and pieces as the “BlackBlaze/Newegg” way, spent hours with “real backup” solutions to adapt them and circumvent the lock-ins in them and finally being replaced by a “final real backup” “all-in-one” and “fine integrated” solution from a BIG vendor. From a BIG vendor, where Joe can’t offer the quarantees the BIG name brings and puts Joe & the Rabbits to the same category with BlackBlaze/Newegg-solutions. If I express myself to the limits, the situation is, that after all Joe & the Rabbits -solution is just a value added HW reseller with its own “assembled, tested, measured and quarandeed to work” brand. But if I had introduced Joe & the Rabbits as one provider on line with the BIG names, I had got good laugh for others involved. I hope you all understand my point here.
As a reading comprehension for all participants I recommend this recent blog mark “Learning from customers” from Storage Mojo. It has a lot of food for thought for all of us.
To make myself short here, I think the video had one stinking BIG mistake from the research community IT service. Have written contracts, that a service level has been agreed on with the “kinda big deal” customer and not rely on some past conversations. My lesson was, that as soon as technical staffing is involved to produce an IT service, service descriptions and levels must be all in written and clearly marked read, understood and agreed upon the recipient of the service. And if necessary you better think CEO as customer to have a SLA with in the first place. From there on we can start talking about technical solutions and constraints we have to deal with and involve “the solution” vendors with each of its own “pony scale”.

• @Erikki
I think I can summarize what you are saying is that this is a failure to communicate and a failure of expectations and to set expectations. With regards to the other elements, they are tangential to the discussion, as, at a base level, every organization that is selling in this market space is a “value added reseller” (inclusive of the big names). Yeah, many resist that notion (both customers and big name employees alike), though pretty much all of them are designed/assembled by 3 specific companies, with different branding/fascia for each vendor.
The big issue are the expectations, and often how out of line with reality they are. Many customers demand the lowest possible price for systems, which require giving up all the fancy support and service elements, until they suddenly decide that they need those features, and are shocked, shocked I say, to find that they don’t have those immediately available for free. They then believe that they can magically leverage what ever notoriety they have in their limited circles to impact a change in business on the part of those supplying the service/support. Which is … well … silly. One wouldn’t expect an intelligent person to rely upon such an argument to try to get things done like that. It was a parody, and if anyone actually did that … well … that speaks volumes about them.
But that misalignment of expectations is in part because of the drive for the absolute lowest price, where the customer sees the best support and service options, sees the price, and doesn’t connect what they are getting for the price. And thats the issue.
The backblaze and other build it yourself versions of things all suffer from a high localization of the support to the person who built it. And if they don’t know what they are doing (most don’t) you will get a badly underperforming system. Which is the norm.
Large storage systems need performance. There is no way around this. Which is where we play.