Monday, 21 June 2010

Par2ty

This is probably an old and battered hat for you good folks (seeing as the Web site's last "announcement" was in 2004!), but most days I still feel pretty new to this whole digital archiving business - not just with the "archive" bit, but also the "digital preservation", um, bit so it was news to me... ;-)

Perusing the latest Linux Format at the weekend, I chanced on an article by Ben Martin (I couldn't find a Web site for him...) about parchive and specifically par2cmdline.

Par-what? I hear you ask? (Or perhaps "oh yeah, that old thing" ;-))

Par2 files are what the article calls "error correcting files". A bit like checksums, only once created they can be used to repair the original file in the event of bit/byte level damage.

Curious.

So I duly installed par2 - did I mention how wonderful Linux (Ubuntu in this case) is? - the install was simple:

sudo apt-get install par2

Then tried it out on a 300MB Mac disk image - the new Doctor Who game from the BBC - and guess what? It works! Do some damage to the file with dd, run the verify again and it says "the file is damaged, but I can fix it" in a reassuring HAL-like way (that could be my imagination, it didn't really talk - and if it did, probably best not to trust it to fix the file right...)

The par2 files totalled around 9MB at "5% redundancy" - not quite sure what that means - which isn't much of an overhead for a some extra data security... I think, though I've not tried, that it is integrated into KDE4 too for a little bit of personal file protection.

The interesting thing about par2 is that it comes from an age when bandwidth was limited. If you downloaded a large file and it was corrupt, rather than have to download it again, you simply downloaded the (much smaller) par2 file that had the power to fix your download.

This got me thinking. Is there then any scope for archives to share par2 files with each other? (Do they already?) We cannot exchange confidential data but perhaps we could share the par2 files, a little like a pseudo-mini-LOCKSS?

All that said, I'm not quite sure we will use parchive here, though it'd be pretty easy to create the par2 files on ingest. In theory our use of ZFS, RAID, etc. should be covering this level of data security for us, but I guess it remains an interesting question - would anything be gained by keeping par2 data alongside our disk images? And, after Dundee, would smaller archives be able to get some of the protection offered by things like ZFS, but in a smaller, lighter way?

Oh, and Happy Summer Solstice!

No comments: