Backing up decades of data
This is a tale of backing up data. To date, I have a small-ish 28TB server at home running zfs (finally) on debian 11.5. This is my local backup target, and I take snapshots of my other systems over to it.
Most of my other systems have 1-2TB of data at most, though I've got a few with 5TB, and some with a few 10s of GB (my Mac M1 Mini). These snapshots are current, give or take a month. On the large server, I have data/information going all the way back to the early 1990s.
I've got my old research codes and data, back before I decreased my physics efforts and increased my #HPC workload. I've got family photos. Emails. Videos. My Ph.D. thesis stuff. Everything.
This is the mother lode of historical data. A large data bolus. Something I'd be annoyed to lose.
So ...
How do I get this out of my server and onto an external backup? Something not at risk should my house suffer damage?
This data is important to me. I don't want to lose it. So I need to put a value on that data, and figure out how to best keep the data at an acceptable cost.
I looked at the options.
- a virtual server with 5-10TB of storage
- a physical server in a colo with the same amount
- an s3 like system
The first option is simply prohibitively expensive. Even the inexpensive cloud folks (DigitalOcean, and others) were north of $1000USD/month for that service.
Do I value the data that much? Lets look at the next one.
I found some local (Michigan US) colos. I could build a server ($1200-1500 USD) that could handle everything, but most of them gave very substandard maximum currents, unless I rented a half rack. Then I could get 20A. Server would take 5-10A typically. Basic space was 2A max.
The 1/2 rack from a few different providers near me was about $500/month. Once I sunk the cost of the server, this is still $6000/year. Steep.
Looking at the s3 like systems, I reviewed a few. I really wanted to keep monthly storage costs down, and have reasonable egress fees. This latter requirement eliminated AWS, Azure, and GCP immediately. Well, it also eliminated them due to the former requirement.
I looked at Wasabi, Backblaze B2, and Digital Ocean spaces. Wasabi was very interesting price-wise, but looked a bit more complex than my needs entailed. Digital Ocean Spaces was more costly than I liked. Backblaze B2 though, looked almost spot on. I know there is rsync.net, but the pricing was a bit higher than BackBlaze B2 and Wasabi.
The intention for backup is write several, read very rarely. Basically a DR from backup scenario. I don't necessarily need most of this data to function at home. I don't mind spending a little money to pull down a terabyte or 2 if I lose something.
The economics of Backblaze B2 looked like they fit my model. So I set up an account and a bucket. I used rclone to mount(!!!) the encrypted volume on my local server. And now, I am backing it up with rsync.
The next problem to solve is the asymmetric nature of our internet provider. We can't seem to get fibre here. I get cable modems, and while I have a 1G service, it is asymmetrical. With my new OpnSense firewall router, I'm getting much better sustained upload/download speeds than the old pfSense system, but still, its 60Mb/s. This works out to 7.5 MB/s. So 1GB would take 137 seconds. 1TB would take 1 day 15 hours or so. 10 TB would take close to a month.
I really need symmetric fibre.
I can't seem to get it here.
I guess I could copy it all over a fast USB 3.2 device and ship that to BackBlaze to load onto their system (though since I do encryption here, I'm sure this would be more complex).
It's 2023 and we don't have great internet. I compress the heck out of the data in transit when possible. I might simply resort to copying over zstd files. VMs are sparse files, and rsync knows how to handle those correctly.
Anyway, I know have, an at least functional (albeit slow) backup to an external site, which won't cost me an arm and a leg per month.
Ill take the win.