We’re Back!

We were knocked off the air around 11pm on 13-September, by a machine finally deciding to give up its ghost. A partially retired machine which happened to run scalability.org decided, finally, that it no longer wished to correctly run grub.

Grub being the thing essential to booting.

Like the bootloader.

Yeah. It was one of those nights.

I haven’t finished the figuring out why it died, and I am working on finishing restoring the services.

Happily, I had set up a nightly database backup, so once we had the chance to get the replacement unit in, it was a quick matter to make it work.

The site also happens to host scalableinformatics.com’s downloads and a number of other services.

Interestingly, we didn’t lose data. Just time.

The server giving up its ghost (5 year old box) was replaced with a newer vintage box. Similar in many ways to our Pegasus Server (aka JackRabbit DeskSide), but with fewer disks. More cores, more ram. Apart from the chassis/power supply, there are stories (amusing ones) around these CPUs, and the motherboard they were in. Long story short, we wound up in something akin to the Monty Python ‘argument’ skit with … er … a certain vendor over whether or not they would replace the motherboard or bios to support quad core chips.

More ram, more cores, enough disk space (we were running out). I took the opportunity to migrate from Centos 4.x to Ubuntu LTS. Just personal preference, both are good.

I can’t say enough good things about being highly focused on backups. This wasn’t a critical system, but it was important enough to warrant backup and replication of data. Data that is not important shouldn’t be replicated, or should be deduplicated. Data that is important, that you do not want to lose, should be replicated.

A RAID is *NOT* a backup system. You shouldn’t treat it as such. A RAID buys you time. It provides resiliency in the face of failure. Not recovery.

Once we had the apache system up, the rest went quickly.

Now I have to restart mpihmmer.org, and the other sites this machine hosted. Back to the grind …

