Tuesday, 13 November 2012
Transcribe at the arcHIVE
Friday, 19 October 2012
Atlas of digital damages
Saturday, 13 October 2012
DayOfDigitalArchives 2012
This 'Day' was initiated last year to encourage those working with digital archives to use social media to raise awareness of digital archives: "By collectively documenting what we do, we will be answering questions like: What are digital archives? Who uses them? How are they created and managed? Why are they important?" . So in that spirit, here is a whizz through my week.
Coincidentally not only does this week include the Day of Digital Archives but it's also the week that the Digital Preservation Coalition (or DPC) celebrated its 10th birthday. On Monday afternoon I went to the reception at the House of Lords to celebrate that landmark anniversary. A lovely event, during which the shortlist for the three digital preservation awards was announced. It's great to see three award categories this time around, including one that takes a longer view: 'the most outstanding contribution to digital preservation in the last decade'. That's quite an accolade.
On the train journey home from the awards I found some quiet time to review a guidance document on the subject of acquiring born-digital materials. There is something about being on a train that puts my brain in the right mode for this kind of work. Nearing its final form, this guidance is the result of a collaboration between colleagues from a handful of archive repositories. The document will be out for further review before too long, and if we've been successful in our work it should prove helpful to creators, donors, dealers and repositories.
Part of Tuesday I spent reviewing oral history guidance drafted by a colleague to support the efforts of Oxford Medical Alumni in recording interviews with significant figures in the world of Oxford medicine. Oral histories come to us in both analogue and digital formats these days, and we try to digitise the former as and when we can. The development of the guidance is in the context of our Saving Oxford Medicine initiative to capture important sources for the recent history of medicine in Oxford. One of the core activities of this initiative is survey work, and it is notable that many archives surveyed include plenty of digital material. Web archiving is another element of the 'capturing' work that the Saving Oxford Medicine team has been doing, and you can see what has been archived to-date via Archive-It, our web archiving service provider.
Much of Wednesday morning was given over to a meeting of our building committee, which had very little to do with digital archives! In the afternoon, however, we were pleased to welcome visitors from MIT - Nancy McGovern and Kari Smith. I find visits like these are one of the most important ways of sharing information, experiences and know-how, and as always I got a lot out of it. I hope Nancy and Kari did too! That same afternoon, colleagues returned from a trip to London to collect another tranche of a personal archive. I'm not sure if this instalment contains much in the way of digital material, but previous ones have included hundreds of floppies and optical media, some zip discs and two hard disks. Also arriving on Wednesday, some digital Library records courtesy of our newly retired Executive Secretary; these supplement materials uploaded to BEAM (our digital archives repository) last week.
On Thursday, I found some time to work with developer Carl Wilson on our SPRUCE-funded project. Becky Nielsen (our recent trainee, now studying at Glasgow) kicked off this short project with Carl, following on from her collaboration with Peter May at a SPRUCE mashup in Glasgow. I'm picking up some of the latter stages of testing and feedback work now Becky's started her studies. The development process has been an agile one with lots of chat and testing. I've found this very productive - it's motivating to see things evolving, and to be able to provide feedback early and often. For now you can see what's going on at github here, but this link will likely change once we settle on a name that's more useful than 'spruce-beam' (doesn't tell you much, does it?! Something to do with trees...) One of the primary aims of this tool is to facilitate collection analysis, so we know better what our holdings are in terms of format and content. We expect that it will be useful to others, and there will be more info. on it available soon.
Friday was more SPRUCE work with Carl, among other things. Also a few meetings today - one around funding and service models for digital archiving, and a meeting of the Bodleian's eLegal Deposit Group (where my special interest is web archiving). The curious can read more about e-legal deposit at the DCMS website. One fun thing that came out of the day was that the Saving Oxford Medicine team decided to participate in a Women in Science wikipedia editathon. This will be hosted by the Radcliffe Science Library on 26 October as part of a series of 'Engage' events on social media organised by the Bodleian and the University's Computing Services. It's fascinating to contemplate how the range and content of Wikipedia articles change over time, something a web archive would facilitate perhaps.
For more on working with digital archives, go take a look at the great posts at the Day of Digital Archives blog!
Friday, 8 June 2012
Sprucing up the TikaFileIdentifier
Following the SPRUCE mashup I attended in April, we are very pleased to be one of the organizations granted a SPRUCE Project funding award, which will allow us to 'spruce' up the TikaFileIdentifier tool. (Paul has written more about these funding awards on the OPF site.)
TikaFileIdentifier is the tool which was developed at the mashup to address a problem several of us were having extracting metadata from batches of files, in our case within ISO images. Due to the nature of the mashup event the tool is still a bit rough around the edges, and this funding will allow us to improve on it. We aim to create a user interface and a simpler install process, and carry out performance improvements. Plus, if resources allow, we hope to scope some further functionality improvements.
This is really great news, as with the improvements that this funding allows us to make, the TikaFileIdentifier will provide us with better metadata for our digital files more efficiently than our current system of manually checking each file in a disk image. Hopefully the simpler user interface and other improvements means that other repositories will want to make use of it as well; I certainly think it will be very useful!
Friday, 20 April 2012
SPRUCE Mashup: 16th-18th April 2012
Monday, 26 March 2012
Media Recognition: DV part 3
Type:
|
Digital videotape cassette encoding
|
Introduced:
|
1996
|
Active:
|
Yes, but few new camcorders are being produced.
|
Cessation:
|
-
|
Capacity:
|
184 minutes (large), 40 minutes (MiniDV).
|
Compatibility:
|
DVCAM is an enhancement of the widely adopted DV format, and uses the same encoding.
Cassettes recorded in DVCAM format can be played back in DVCAM VTRs (Video Tape Recorders), newer DV VTRs (made after the introduction of DVCAM), and DVCPRO VTRs, as long as the correct settings are specified (this resamples the signal to 4:1:1). DVCAM can also be played back in compatible HDV players. |
Users:
|
Professional / Industrial.
|
File Systems:
|
-
|
Common Manufacturers:
|
Sony, Ikegami.
|
Type:
|
Digital videotape cassette encoding
|
Introduced:
|
2003
|
Active:
|
Yes, although industry experts do not expect many new HDV products.
|
Cessation:
|
-
|
Capacity:
|
1 hour (MiniDV), up to 4.5 hours (large)
|
Compatibility:
|
Video is recorded in the popular MPEG-2 video format. Files can be transferred to computers without loss of quality using an IEEE 1394 connection.
There are two types of HDV, HDV 720p and HDV 1080, which are not cross-compatible. HDV can be played back in HDV VTRs. These are often able to support other formats such as DV and DVCAM. |
Users:
|
Amateur/Professional
|
File Systems:
|
-
|
Common Manufacturers:
|
Format developed by JVC, Sony, Canon and Sharp.
|
Media Recognition: DV part 2
Type:
|
Digital videotape cassette encoding
|
Introduced:
|
1995
|
Active:
|
Yes, but tapeless formats such as MPEG-1, MPEG-2 and MPEG-4 are becoming more popular.
|
Cessation:
|
-
|
Capacity:
|
MiniDV cassettes can hold up to 80/120 minutes SP/LP. Medium cassette size can hold up to 3.0/4.6 hrs SP/LP. Files sizes can be up to 1GB per 4 minutes of recording.
|
Compatibility:
|
DV format is widely adopted.
Cassettes recorded in the DV format can be played back on DVCAM, DVCPRO and HDV replay devices. However, LP recordings cannot be played back in these machines. |
Users:
|
DV is aimed at a consumer market – may also be used by ‘prosumer’ film makers.
|
File Systems:
|
-
|
Common Manufacturers:
|
A consortium of over 60 manufacturers including Sony, Panasonic, JVC, Canon, and Sharp.
|
Type:
|
Digital videotape cassette encoding
|
Introduced:
|
1995 (DVCPRO), 1997 (DVCPRO 50), 2000 (DVCPRO HD)
|
Active:
|
Yes, but few new camcorders are being produced.
|
Cessation:
|
-
|
Capacity:
|
126 minutes (large), 66 minutes (medium).
|
Compatibility:
|
DVCPRO is an enhancement of the widely adopted DV format, and uses the same encoding.
Cassettes recorded in DVCPRO format can be played back only in DVCPRO Video Tape Recorders (VTRs) and some DVCAM VTRs. |
Users:
|
Professional / Industrial; designed for electronic news gathering
|
File Systems:
|
-
|
Common Manufacturers:
|
Panasonic, also Philips, Ikegami and Hitachi.
|
DVCPRO 50 and DVCPRO HD are further developments of DVCPRO, which use the equivalent of 2 or 4 DV codecs in parallel to increase the video data rate.
Any DV cassette can contain DVCPRO format video, but some are sold with DVCPRO branding on them.
Recognition
DVCPRO branded cassettes come in medium (97.5 × 64.5 × 14.6mm) or large (125 × 78 × 14.6mm) cassette sizes. The medium size is for use in camcorders, and the large size in editing and recording decks. DVCPRO 50 and DVCPRO HD branded cassettes are extra-large cassettes (172 x 102 x 14.6mm). Tape width is ¼”.
DVCPRO labelled cassettes have different coloured tape doors depending on their type; DVCPRO has a yellow tape door, DVCPRO50 has a blue tape door, and DVCPRO HD has a red tape door.
Images of DVCPRO cassettes are available at the Panasonic website.
Media Recognition: DV part 1
DV tape is ¼ inch (6.35mm) wide. DV cassettes come in four different sizes: Small, also known as MiniDV (66 x 48 x 12.2 mm), medium (97.5 × 64.5 × 14.6 mm), large (125.1 x 78 x 14.6 mm), and extra-large (172 x 102 x 14.6 mm). MiniDV is the most popular cassette size.
DV cassettes can be encoded with one of four formats; DV, DVCAM, DVCPRO, or HDV. DV is the original encoding, and is used in consumer devices. DVCPRO and DVCAM were developed by Panasonic and Sony respectively as an enhancement of DV, and are aimed at a professional market. The basic encoding algorithm is the same as with DV, but a higher track width (18 and 15 microns versus DV’s 10 micron track width) and faster tape speed means that these formats are more robust and better suited to professional users. HDV is a high-definition variant, aimed at professionals and consumers, which uses MPEG-2 compression rather than the DV format.
Depending on the recording device, any of the four DV encodings can be recorded on any size DV cassette. However, due to different recording speeds, the formats are not always backwards compatible. A cassette recorded in an enhanced format, such as HDV, DVCAM or DVCPRO, will not play back on a standard DV player. Also, as they are supported by different companies, there are some issues with playing back a DVCPRO cassette on DVCAM equipment, and vice versa.
Although all DV cassette sizes can record any format of DV, some are marketed specifically as being of a certain type; e.g. DVCAM. The guide below looks at some of the most common varieties of DV cassette that might be encountered, and the encodings that may be used with them. It is important to remember that any type of encoding may be found on any kind of cassette, depending on what system the video was recorded on.
MiniDV (cassette)
Type: | Digital videotape cassette |
Introduced: | 1995 |
Active: | Yes, but is being replaced in popularity by hard disk and flash memory recording. At the International Consumer Electronics Show 2011 no camcorders were presented which record on tape. |
Cessation: | - |
Capacity: | Up to 80 minutes SP / 120 minutes LP, depending on the tape used; 60/90 minutes SP/LP is standard. This can also depend on the encoding used (see further entries). Files sizes can be up to 1GB per 4 minutes of recording. |
Compatibility: | DV file format is widely adopted. Requires Fire Wire (IEEE 1394) port for best transfer. |
Users: | Consumer and ‘Prosumer’ film makers, some professionals. |
File Systems: | - |
Common Manufacturers: | A consortium of over 60 manufacturers including Sony, Panasonic, JVC, Canon, and Sharp
|
MiniDV refers to the size of the cassette; as noted above, it can come with any encoding. As a consumer format they generally use DV encoding. DVCAM and HDV cassettes also come in MiniDV size.
MiniDV is the most popular DV cassette, and is used for consumer and semi-professional (‘prosumer’) recordings due to its high quality.
Recognition
These cassettes are the small cassette size, measuring 66 x 48 x 12.2mm. Tape width is ¼”. They carry the MiniDV logo, as seen below:
Monday, 30 January 2012
Digital Preservation: What I Wish I Knew Before I Started
Tuesday 24th January, 2012
Last week I attended a student conference, hosted by the Digital Preservation Coalition, on what digital preservation professionals wished they had known before they started. The event covered a great deal of the challenges faced by those involved in digital preservation, and the skills required to deal with these challenges.
The similarities between traditional archiving and digital preservation were highlighted at the beginning of the afternoon, when Sarah Higgins translated terms from the OAIS model into more traditional ‘archive speak’. Dave Thompson also emphasized this connection, arguing that digital data “is just a new kind of paper”, and that trained archivists already have 85-90% of the skills needed for digital preservation.
Digital preservation was shown to be a human rather than a technical challenge. Adrian Brown argued that much of the preservation process (the "boring stuff") can be automated. Dave Thompson stated that many of the technical issues of digital preservation, such as migration, have been solved, and that the challenge we now face is to retain the context and significance of the data. The point made throughout the afternoon was that you don’t need to be a computer expert in order to carry out effective digital preservation.
The urgency of intervention was another key lesson for the afternoon. As William Kilbride put it; digital preservation won’t do itself, won’t go away, and we shouldn't wait for perfection before we begin to act. Access to data in the future is not guaranteed without input now, and digital data is particularly intolerant to gaps in preservation. Andrew Fetherstone added to this argument, noting that doing something is (usually) better than doing nothing, and that even if you are not in a position to carry out the whole preservation process, it is better to follow the guidelines as far as you can, rather than wait and create a backlog.
The scale of digital preservation was another point illustrated throughout the afternoon. William Kilbride suggested that the days of manual processing are over, due to the sheer amount of digital data being created (estimated to reach 35ZB by 2020!). He argued that the ability to process this data is more important to the future of digital preservation than the risks of obsolescence. The impossibility of preserving all of this data was illustrated by Helen Hockx-Yu, who offered the statistic the the UK Web Archive and National Archives Web Archive combined have archived less than 1% of UK websites. Adrian Brown also pointed out that as we move towards dynamic, individualised content on the web, we must decide exactly what the information is that we are trying to preserve. During the Q&A session, it was argued that the scale of digital data means that we have to accept that we can’t preserve everything, that not everything needs to be preserved, and that there will be data loss.
The importance of collaboration was another theme which was repeated by many speakers. Collaboration between institutions on a local, national and even international level was encouraged, as by sharing solutions to problems and implementing common standards we can make the task of digital preservation easier.
This is only a selection of the points covered in a very engaging afternoon of discussion. Overall, the event showed that, despite the scale of the task, digital preservation needn't be a frightening prospect, as archivists already have many of the necessary skills.
The DPC have uploaded the slides used during the event, and the event was also live-tweeted, using the hashtag #dpc_wiwik, if you are interested in finding out more.