Wednesday, 18 November 2009

Data Liberation: Google's mission

This is the stated mission of a Google engineering team called the 'Data Liberation Front':
Users should be able to control the data they store in any of Google's products. Our team's goal is to make it easier to move data in and out
Yay! Loyalty is best achieved through great products, not data lock-in. As an individual who uses online data services this approach makes me very happy. As an archivist I'm ecstatic. Thanks guys.

More about how to get data in and out of Google's many services at the DLF's blog.

Tuesday, 17 November 2009

building castles 1: the problem

It has been an odd couple of days. You know how it is. A problem that needs solving. A seemingly bewildering array of possible solutions and lots of opinions and no clear place to start. In an attempt to bring some shape to the mist, I'm going to start at the start, with the basics.

The Raw Materials
  • A collection of things.
  • A set of born digital items - mostly documents in antique formats.
  • EAD for the collection - hierarchical according to local custom and ISAD(G).
  • A spreadsheet - providing additional information about the digital items, including digests.
The Desired Result

A browser-based reader interface to the digital items that maintains the connections to the analogue components and remains faithful to the structure of the finding aid and presents that structure in such a way as to not confuse the reader. Ideally the interface should also support aspects of a collaborative Web, where people can annotate and comment, as well as offer "basket"-like functionality ("basket" is the wrong term), maybe requests for copies and maybe even the ability to arrange the collection how they'd like to use it.

(I imagine you've all got similar issues! :-))

We put together a sketch for the interface to the collection for the Project Advisory Board and got some very useful feedback from that. Our Graduate Trainee Victoria has also done some great research on interfaces to existing archives and some commercial sites which provides some marvellous input on what we should and could build.

But this is where things get misty...

We have some raw materials, we have a vision of the thing we want to build (though that vision is in parts hazy and in parts aiming high! (why not eh?)), so where to we go from here?

(To put it another way, there are the foundations of a "model", a vision of a "view"; now we need to define the "controller" - the thing that brings the first two together).

  • We could build a database and put all the metadata into it and run the site off that

  • We could build a set of resources (the items, the sub[0,*]series, the collection, the people), link all that data together and run the site off that.

  • We could build a bunch of flat pages which, while generated dynamically once, don't change once the collection is up.

There is a strong contender for how it'll be done (the middle one!) and in the next exciting episode I'll hopefully be able to tell you more about the first tentative steps, but for now I'm open to suggestions - either for alternatives or technologies that'll help and if you have already built what we're after then please get in touch... ;-)


Thursday, 5 November 2009

Note to self...

...don't play space invaders on a donor's Mac (which is a PC)!

http://www.stfj.net/art/2009/loselose/

(Needless to say we wouldn't anyway!)

This, from the site:

"
As technology grows, our understanding of it diminishes, yet, at the same time, it becomes increasingly important in our lives. At what point does our virtual data become as important to us as physical possessions? If we have reached that point already, what real objects do we value less than our data? What implications does trusting something so important to something we understand so poorly have?"

Wednesday, 21 October 2009

Bendy ePaper

Could this be the answer to our reading room interface?

http://www.reghardware.co.uk/2009/10/20/auo_epaper/

Thursday, 8 October 2009

Investigating Terms of Service

During the project advisory board meeting we briefly discussed the legal issues involved in archiving web2.0 sites. I’ve been doing a bit of investigating already, looking at what the various service providers’ Terms of Service say and thought I’d share what I’ve found.

Each Terms of Service is basically the same, though a few are a bit more specific about what is and is not allowed. Here’s a basic table where you can see briefly what each ToS contains (sorry it's a bit small):


As you can see, all the ToS agree that the account holder owns their content, which is good news for archivists as well as account holders, but they also agree that the site provider owns all copyright, trademarks, logos and any other intellectual property. This means that if an archive wants to harvest a site interface, not just a user’s data, then the site provider’s permission needs obtaining.

A second problem is that most sites restrict data harvesting. Facebook bans it outright, however Twitter only prohibits scraping; crawling is allowed “if done in accordance with the provisions of the robots.txt file” (which aren't stated). Also, Myspace only prohibits automated harvesting data “for the purposes of sending unsolicited or unauthorised material”. This implies that harvesting data for archival purposes is allowed. However, this isn’t stated directly, and since some stipulations are quite specific I’d be inclined to check with the service provider rather than rely on assumptions.

Interestingly, Twitter used to have a rather vague ToS which said nothing about other people using their logos and trademarks. However, they updated their terms on 18th September and now restrictions on using Twitter’s intellectual property are written in.

So altogether it looks like an archivist can’t do much with a web2.0 account without the service provider’s permission. Now it just depends how amenable they’d be to granting it.

Wednesday, 30 September 2009

Advisory board meeting, 24 Sept. 2009

Thanks to everyone who came along and contributed to the project's first advisory board meeting last Thursday.

Introductions
We started with some introductory discussions around the Library's hybrid collections and the futureArch project's aims and activities. This discussion was wide ranging, touching on a number of subjects including the potential content sources for 'digital manuscripts': from mobile phones, to digital media, to cloud materials.

Systems
In the past year, we've made progress on developing, and beginning to implement, the technical architecture for BEAM (Bodleian Electronic Archives & Manuscripts). Pete Cliff (futureArch Software Engineer) kicked off our session on 'systems' with an overview of the architecture, drawing on some particular highlights; it's worth a look at his slides if you're interested in finding out more.Next, a demo of two ingest tools:
1. Renhart Gittens demonstrated the BEAM ingester, our means of committing accessions (under a collection umbrella) to BEAM's preservation storage.


2. Dave Thompson (Wellcome Library Digital Curator) demonstrated the XIP creator. This tool does a similar job to the BEAM Ingester and forms part of the Tessella digital preservation system being implemented at the Wellcome Library.

Keeping with technical architecture, Neil Jefferies (OULS R&D Project Officer) introduced Oxford University Library Service's Digital Asset Management System (or DAMS, as we've taken to calling it). This is the resilient preservation store upon which BEAM, and other digital repositories, will sit.

How will researchers use hybrid archives?
Next we turned our attention to the needs of the researchers who will use the Library's hybrid archives. Matt Kirschenbaum (Assoc. Prof. of English & Assoc. Director of MITH at the University of Maryland) got us off to a great start with an overview of his work as a researcher working with born-digital materials. Matt's talk emphasised digital archives as ' material culture', an aspect of digital manuscripts that can be overlooked when the focus becomes overly content-driven. Some researchers want to explore the writer's writing environment; this includes seeing the writer's desktop, and looking at their MP3 playlist, as much as examining the word-processed files generated on a given computer. Look out for the paper Matt has co-authored for iPRES this year.

Next we broke into groups to critique the 'interim interface' which will serve as a temporary access mechanism for digital archives while a more sophisticated interface is developed for BEAM. Feedback from the advisory board critique session was helpful and we've come away with a to-do list of bug fixes and enhancements for the interim interface as well as ideas for developing BEAM's researcher interfaces. We expect to take work on researcher requirements further next year (2010) through workshops with researchers.

Finally, we heard from Helen Hockx-Yu (British library's Web Archiving Programme Manager) on the state of the art in web archiving. Helen kindly agreed to give us an overview of web archiving processes and the range of web archiving solutions available. Her talk covered all the options, from implementing existing tools suites in-house to outsourcing some/all of the activity. This was enormously useful and should inform conversations about the desired scope of web archiving activity at the Bodleian and the most appropriate means by which this could be supported.

Some of us continued the conversation into a sunny autumn evening on the terrace of the Fellows' Garden of Exeter College, and then over dinner.

Monday, 14 September 2009

OS recovery tool

May be useful. Cross-platform and supports a few kinds of media and disk image. Also uses file signature analysis (which can be expanded to support further identification) and is capable of carving out files. http://www.cgsecurity.org/wiki/PhotoRec