On the retention of electronic data

One of the things I and many other people worry about is how to retain data for long periods of time. This means that the data has to be accessible, readable, and convertable. This suggests that only open formats and file systems should ever be considered for data storage and retention.
With this in mind, I read Robin Harris’ Storagemojo column this morning. Yeah, I would say he nailed it.

I have Fortran code I wrote (oh my gosh) 20+ years ago that still compiles and runs. Ascii text, well known/understood format.
I have reports I wrote in chiwriter/t3/amipro that are effectively wastes of bits. Can’t read them.
Some things are simply too important to trust to the perceived “goodwill” of a company running proprietary formats and systems. And I say that as the head of a company that builds high performance storage systems. We want our customers to be able to pull data off our devices (and put data on as well) far into the future. Could you imagine if we created our own file system which was superior, closed source, and on our machines only? Then when we get bought by “Some-Large-Company”(tm) to work on optimizing the next great SaaS-social networking-web 2.0 site performance, the file system is deep-sixed. So what just happened to your ability to pull data off the machine? Nothing, as long as you keep the last version hanging around. Which gets older and crustier as time goes on.
How precisely does this not increase your risk? Not from us, this is a hypothetical scenario. Our units happily run what you want them to, and we strongly urge open standards. But your data, on an aging, unfixable system.
Now invoke SOX. SOX and its fellow legislation mandates businesses retain records. Effectively in perpetuity. This is hard. Now add in a proprietary format, say like, I dunno, Microsoft Exchange.
And then a few years down the road, Microsoft decides that Exchange is last decades technology, so you are forced to move to the next software.
What becomes of your ability to extract/read old data?
Well, as Robin points out, Microsoft did something effectively like this with its latest patch tuesday.
FWIW: I carefully inspect each patch. I don’t allow patches that don’t actually fix things to get onto the machine. Yeah, so I miss out on great new features.
Like the bricking of old data files.
This is unfortunately relevant, as the ODF vs OOXML debate has raged for a while. Despite some, well, rather …. um …. whats the word I am looking for …. practices on the part of the OOXML proponents and purveyors, ODF still looks like it is coming out ahead. This is good, as no one controls ODF, and frankly, my data will be readable as open source codes exist to read/write it. All that is required is a build. I don’t expect C to go away any time soon (or C++, Fortran or similar). So though it might take a little work, we will always be able to read/write these formats.
This is good.
If Microsoft released OOXML with open source readers/writers, well, this wouldn’t be an issue … would it?
That noted, think of all that huge amount of data going into sharepoint rather than wikis. All Microsoft has to do is shift the format and ….
This is vendor lock-in, and it represents a clear and present risk to ongoing business operations. Microsoft just demonstrated this. This is not theoretical, there are many angry people with old Office file formats that won’t work anymore with new software.
How much time and money will they have to spend to fix this? Not Microsoft, I don’t expect a fix out of them. Moreover, given that these documents often must remain in pristine shape, unedited and unconverted as raw data, precisely why are more people signing on to the proprietary file formats? You can’t convert them, as the conversion may not preserve all the data, and SOX and related legislation requires that you preserve all data.
Maybe this is why we are seeing mass migrations to open systems.
I have jpgs and bmps and other formats from ~10 years ago I can read/view. I have a windows based cell phone that refused to play a .wav file from my voicemail as it did not recognize the format. Of a .wav file. While it happily played other .wav files. Just not .wav files created with non-Microsoft software.
Arguably this windows based phone demonstrates beyond a shadow of a doubt why windows should not be everywhere. But it is not the .wav file that is the issue. The phone may be the subject of a longer rant in the future. For now, suffice it to say that lots of windows file formats sorta-kinda work. The IE on it sorta kinda works. I especially like it when the sites I go to, like reuters.com, tell me its time to upgrade my browser.
I would love to. Only I can’t. Its windows based.
Proprietary file formats are a risk. Even more so when the company that makes the proprietary format decides to no longer support it. And you are left with dealing with their decisions. Its your cost now. Your pain.
For the phone, I purchased a $30 software that plays the media correctly. Does a far better job than the built in player. Next phone will hopefully be an open platform. If only the Nokia 810 had an integrated phone. Would be perfect.