We're Back!

We were knocked off the air around 11pm on 13-September, by a machine finally deciding to give up its ghost. A partially retired machine which happened to run scalability.org decided, finally, that it no longer wished to correctly run grub.
Grub being the thing essential to booting.
Like the bootloader.
Yeah. It was one of those nights.

I haven’t finished the figuring out why it died, and I am working on finishing restoring the services.
Happily, I had set up a nightly database backup, so once we had the chance to get the replacement unit in, it was a quick matter to make it work.
The site also happens to host scalableinformatics.com’s downloads and a number of other services.
Interestingly, we didn’t lose data. Just time.
The server giving up its ghost (5 year old box) was replaced with a newer vintage box. Similar in many ways to our Pegasus Server (aka JackRabbit DeskSide), but with fewer disks. More cores, more ram. Apart from the chassis/power supply, there are stories (amusing ones) around these CPUs, and the motherboard they were in. Long story short, we wound up in something akin to the Monty Python ‘argument’ skit with … er … a certain vendor over whether or not they would replace the motherboard or bios to support quad core chips.
More ram, more cores, enough disk space (we were running out). I took the opportunity to migrate from Centos 4.x to Ubuntu LTS. Just personal preference, both are good.
I can’t say enough good things about being highly focused on backups. This wasn’t a critical system, but it was important enough to warrant backup and replication of data. Data that is not important shouldn’t be replicated, or should be deduplicated. Data that is important, that you do not want to lose, should be replicated.
A RAID is *NOT* a backup system. You shouldn’t treat it as such. A RAID buys you time. It provides resiliency in the face of failure. Not recovery.
Once we had the apache system up, the rest went quickly.
Now I have to restart mpihmmer.org, and the other sites this machine hosted. Back to the grind …

8 thoughts on “We're Back!”

  1. Welcome back. 🙂
    Care to comment on (or avoid altogether…) the recent “Petabytes on a budget” Backblaze blog post and associated discussions?

    • I commented on the Beowulf mailing list … but worth repeating some of it here. I will have an article up soon on “The Bandwidth Wall” that these guys are doing a good job of helping me articulate.
      Ok … on the build side, I am worried about their vibration coupling. Nylon standoffs will provide very good conduction of bulk mode (low/mid) frequency vibrations. Imagine you have a box with a bunch of 120 Hz oscillating masses. Say 45 of them. Arranged as a regular array … you can see where I am going from here. Basically this unit has some good ideas, but its going to have issues with vibrations.
      I am very worried about their using the port multipliers as supports for the drives. These units weren’t designed with that in mind, and it will likely lead to some … er … interesting failure modes.
      Their use of cheap port multipliers and lower end PCI and lower end SATA cards means that they are badly oversubscribed for bandwidth. For their app, over https on a single gigabit, this is fine. For a serious high performance data storage system? Not so much fine.
      Their use of a very low end processor platform is interesting as well. What this thing looks like to me is a homebrew drobo.
      A better designed case is in order. As are better server class electronics and a better stack. But then thats DeltaV.
      I should note that one of the day jobs’ fastest growing markets is cloud storage.

  2. Also – can’t seem to subscribe to the comments feed of this post, at least within Google Reader. Error message I’m getting:
    “The feed being requested cannot be found.”
    Site-wide comments and blog feed are fine…

  3. Ignore my last comment – RSS working fine now. Comment posts just took a little time to come through.

  4. Interesting thought re: vibration. At least the oscillation is primarily orthogonal to the “floppy” axis of the HDD.
    To their credit, they never said they were targeting high performance data system. The comparison with commercially available systems could obviously be misleading just looking at the “Cost of a Petabyte” chart, though. Too bad there was no mention of DeltaV!
    We often acquire 50GB of data per day in our lab (directly to our network store), but with a bandwidth usually less than 1MB/s. Performance of retrieval (in Matlab) isn’t critical, either. We currently run a hodgepodge of home-brewed RAID6 systems (1 x Core 2 Quad P45 + 1 x Supermicro Xeon DP, each w/ Areca SATA controller), but we’re running out of room in our server closet just as our capacity needs are taking off. Something like this Backblaze build would address our (physical) storage needs quite nicely.

    • @David
      The oscillation is not orthogonal to the head. There is likely a coupling there that interferes with settling. You don’t care about the platters so much, they are rigid and designed to be so. But you have a long moment arm of a head assembly with a non-zero power coupling from a forced set of oscillators, with all the associated higher order harmonics. Not good.
      For 1PB of DeltaV, with an (IMO) saner component mix, we are looking at $290k list price using 2TB desktop drives. The enterprise version lists at $362k.
      50 GB/day -> 579 kB/s. You can do this most cost effectively on a desktop USB2 drive. Buy a 2TB unit, even a 2TB raided unit and you have 40 days of data. Makes for very good backup, as you can write a simple replication script, and do easy offsite backup.
      As for using the backblaze for lab storage … I’d be careful about that. Too many homebrew systems fall by the wayside when labs get disinterested in them, the (under)grad student or postdoc moves on, etc. Some homebrew things, like the cluster of PS3s for using Cell processors are born out of necessity (due to few other routes to getting Cell processors, though Fixstars now has them, and we sell/support them). But replicating what you can buy on the market, to save a few $$, means you are going to spend your time/effort in fabrication/build/support/debugging…
      I do believe a researcher and their students time is better spent on research than IT. Unless of course their research is IT, but thats a whole other story. As someone tasked to price optimize purchases in the past … that is, I was missioned to spend a week to save a few dollars, I can’t begin to describe what a colossal waste of my time that was. The price for that savings was my salary/benefits during that interval. So yes, we spent $100 less, and it cost oh, about $500 to get that $100 savings.
      What I am trying to express is, that for most people this is a Pyhrric victory. Unless you can buy these units pre-made like this for this price, you aren’t going to be spending what they claim you will be spending. Factor in all the costs, and its quite a bit more.
      This isn’t a TCO argument. You can’t (as far as I am aware) actually buy the units. You can buy the parts. Quite a bit of assembly is required. And then debugging after the assembly. And then testing/burn in. And then … No company I am aware of will sell you these units at cost either. They need some mechanism to justify their pricing. The backblaze folks need to simply minimize their cost per GB, they are an IT shop, and they can (in theory) handle supporting these units. More power to them.
      I know, this argument is going to fall on deaf ears, that is, until you actually try it once.
      Which is something I encourage people to do. Do it once, understand the issues.
      Our units are designed to keep you from having to worry about them. No “did I orient the backplane just right so I make good connection to the drive”. No “I heard 480Hz harmonic off of a set of drives, will it impact my head placement”. No “why can’t I move data into/out of this multi-10TB unit fast enough … I have data I need to store and backup”.
      If you think this is the right direction for you, by all means. Go and build one.

  5. Really appreciate your thoughts! I AM actually the undergrad and grad student that “moved on”…I continue to collaborate with the group, however, hence the quotes. 😉
    I don’t think USB drive archival is going to cut it – we’d really like to keep a central place to access all data over a few years time. If I knew about DeltaV at the time, we may never have gone the DIY route with the original systems. And these days I don’t have the luxury of time that I used to – DeltaV’s at the top of my recommendation list. Thanks again!

Comments are closed.