Friday 17 April 2009

Terrabyte Terror

I knew I'd come to the right place for work one morning when I was talking with Renhart and Susan and mentioned that I was very excited at having discovered a Maplin just minutes walk from the office. Instead making a hasty retreat from the conversation, both of them gave me knowing smiles and agreed that Maplin was wonderful!

From my early days watching my Dad teach electronics I've loved the smell of soldering, the look of components and the idea that you can make your own set of LEDs flash just for the fun of it. Thumbing through the Maplin catalogue with a cup of tea was once one of my favourite past times. But these days, more and more, I get a sense of dread as I check out the special offers.

Why?

Let me give you an example: 1TB External Drive, £99.
Here is another: 1TB Internal Drive, £89.

You read that right - 1 TerraByte of storage for under £100! Doesn't that make you quake? Probably not, but I can't help but wonder how long it will be before we have to accession a 1TB drive. What do we do with it? Do we even know what that amount of detritus accumlated over, well how long? a lifetime? a couple of evenings with iTunes? We don't know how long it'll take the average person to fill up a 1TB drive. Do we have the capacity to store 1TB of data and even if we do, how sustainable is that?

You could argue that since storage like this is so cheap, we can rest assured that our own storage costs will be less, so we always keep up with the growth of consumer storage. It is a fair point, but how many preservation-grade storage devices can manage 10p a GB? None I imagine, and for good reason. There is a whole lot more to a preservation system than a disk and a plastic case - it takes more than 1TB to keep 1TB safe for a start! (Mind you, I couldn't help but smile at Maplin's promise of "Peace of mind with 5 year limited warranty").

If we cannot keep up with the storage then, what do we do? A brute force method would be to compress the data, but then bit rot becomes a much more worrying issue (and it is pretty worrying already). We could look for duplicates - how many MP3 collections will include the same songs for instance and should we keep them all (if any)? What if it is the same song, with a different encoding/bitrate/whatever? What about copies of OSs - all those i386 directories? (Though arguably an external drive will not contain an OS, so we wont save space there).

We probably don't need or want to keep all of those 1000GBs, but how will we identify what to preserve? Susan and Renhart came up with some answers to this with their brilliant Paradigm project - which I'll paraphrase as "encourage the creators to curate their own data" - and I'm hopeful that will happen, but what if it doesn't? Will we see "personal data curation" and "managing information overload" added to the National Curriculum anytime soon? I hope so!

All of which finally gives me reason to stop worrying about cheap terrabytes! Data is going to keep growing and someone is going to have to help manage all that stuff. I guess that is where we fit in.

No comments: