Monday 15 June 2009

CAIRO Content Model: A noob's overview (Part 2) Accessions

I-E-O-2. It is a curious one and no mistake. The idea of an accession wasn't something I'd come across before (though alien concepts are no surprise to me now having worked in libraries for some time) - and if you really want to confuse yourself about what it is, try this Wikipedia article on 'accession'.

All clear now? Didn't think so! (Though the bit about the museum is probably closest). For the purpose of ingest, I'm not too worried about legal ownership (this must be assumed) so my working definition - and I make no claims that it is a good or accurate - is:

"an accession is a unit of stuff that arrives at a given time, at a given archive, and needs to be added to that archive"

An accession then is defined by its origin, date of arrival and the collection to which it belongs.

Like IE01, IE02 uses the METS header element to record agent data - who did what and when to this record. This will usually "created by ingest process", "updated by CAIRO tool by user X", etc.

In IE02, the descriptive metadata is kept deliberately Spartan. This is because the concept of the accession remains largely transparent to researchers. While it is important to the archivist to know the source of the collection's parts, the researcher really just needs the collection and the items. So, dmdSec gives us space for an accession identifier (unitId) and title (unitTitle) (using controlled formats - which may or may not be specified yet - I'll find out as I move through the model!) and then a minimal EAD/DC/MODS description. Since we're an archive, I focussed on the EAD, but cross-walking to the others would be possible.

The dmdSec EAD description is quite minimal, listing just origination information (pricipal creator), a physical description (extent, in MBs), a description of the formats and a description of the software/hardware environment used. All these are given as free-text, manual fields, but I wonder how far we can get offering automated "tips" for these elements.

There is also room for a pointer to some rights statement (amdSec) (as with IE01) and also a pointer to a log file that records activity over this accession (fileSec). The format of that log file remains undefined, but it'll record things like "checked all files for viruses", "identified 15 obsolete files", "transformed obsolete formats to ODT", etc. We'll have to think about how we implement this linkage. The model says use a URN to point at the log file, but I wonder if we use a URI that points to a canned search of a generally logging service - something along the lines of beam.ouls.ox.ac.uk/audit?accessionID=12345. That way we can dynamically generate log reports for each accession. Needs thinking about anyways.

Finally there is, in the structMap, a map to the other "main divisions" of the accession and the model suggests that these are either folders or files (different types are given - like subject folder or email directory) and so it is apparent that the accession structMap could be used to reflect the entire accession structure or just the top level directories. Which is better is unclear, but I suspect we will be adding a manifest to the accession that does list all the files (so the structure can be browsed without getting near the real data objects and, if we do it that way, without having to parse the IE02 too much).

We could put a pointer to the manifest alongside the log file pointer in fileSec.

I guess that is probably no clearer than the aforementioned wikipedia page but at least I never used the word "Inaedificatio"!

No comments: