Wednesday 10 November 2010

The as yet unpaved publication pathway...

It has been a while since we had a whiteboard post, so I thought it was high time we had one! This delightful picture is the result of trying to explain the "Publication Pathway" - Susan's term for making our content available - to a new member of staff at the Library...

Nothing too startling here really - take some disparate sources of metadata, add a sprinkling of auto-gen'd metadata (using the marvelous FITS and the equally marvelous tools it wraps), migrate the arcane input formats to something useful, normalise and publish! (I'm thinking I might get "Normalise and Publish!" printed on a t-shirt! :-))

The blue box CollectionBuilder is what does most of the work - constructs an in memory tree of "components" from the EAD, tags the items onto the right shelfmarks, augments the items with additional metadata, and writes the whole lot out in a tidy directory structure that even includes a foxml file with DC, PREMIS and RDF data streams (the RDF is used to maintain the hierarchical relationships in the EAD). That all sounds a lot neater than it currently is, but, like all computer software, it is a work in progress that works, rather than a perfect end result! :-)

After that, we (will, it aint quite there yet) push the metadata parts into the Web interface and from there index it and present to our lovely readers!

Hooray!

The four boxes at the bottom are the "vhysical" layout - its a new word I made up to describe what is essentially a physical (machine) architecture, but is in fact a bunch of virtual machines... 

For the really attentive among you, this shot is of the whiteboard in its new home on the 2nd floor of Osney One, where Renhart and I have moved following a fairly major building renovation. Clearly we were too naughty to remain with the archivists! ;-)

3 comments:

Seth said...

What RDF ontology/terms are you using? I recently migrated our electronic records data into a Sesame RDF repository to play with. I am mostly using a combination of Nepomuk's NIE & NFO ontologies with Dublin Core although I create my own archival terms ontology/namespace for "Accession" and "Collection" classes.

pixelatedpete said...

Hey Seth!

As the RDF is (at present) solely to record the hierarchical relationship between the collection, series, sub(-sub, etc)-series & items I just use Fedora Commons' RELS-EXT ontology [http://www.fedora.info/definitions/1/0/fedora-relsext-ontology.rdfs}] "isPartOf" and "isPart" - draws a pretty crazy graph I can tell ye!

All the descriptive stuff is plain old XML...

If EAD & RDF float your boat you may be interested in the LOCAH project [http://blogs.ukoln.ac.uk/locah/] if you are not already!

Pete

Seth said...

I hadn't come across the LOCAH project before. Thanks for that pointer!