Wednesday, 29 September 2010

(ZX) spectrums & stained glass

Only slightly off topic but I was pointed at this morning. I can't decide what is more impressive, the site itself, that Java can be used to emulate a ZX Spectrum or that there are games listed from 2010!

Slightly more on topic, I was curious to read this article about stained glass, or is it file formats. Skip over the rather flowery bit and start reading from:

"The archive problem is one of format" and concludes "Then in five hundred years time... pictures of the Auch windows will be stored and accessible in the cloud. But not one of them...will have the jaw-dropping impact of people seeing them for the very first time...and realising that humans could do wondrous things for themselves."

I read another article, this time from New View (not something I'd usually read, but an article about the "origins of computing" was recommended to me), which - in a different article - likened stained glass windows to computer screens - mostly because of the back lighting - and suggested, a little like Chris Mellor in the Register, that the cultural experience of seeing stained glass is a bit like experiencing computers for the first time.

(An other article berated digital technology, almost branding it evil, and certainly bad enough to make the author ill using it. I was reading this at DrupalCon where 90% of the people there were welded to their MacBooks and wondered how bad using a plough might make these people feel! ;-))

I guess many folks don't have the "big bang" experience of the digital world. It crept up slowly - a bit like watching the cathedral and its windows being built would probably reduce its wonder - going from a ZX81 to a Pet to a BBC B to an Amstrad 1640 to a... well, you get the picture...

But does that make it any less astonishing? It shouldn't. If anything, it is all about helping people do "wondrous things for themselves", just like

Friday, 17 September 2010

Media players and the reader interface...

This is quite a long post, so I'm going to put the final line at the top too in case you don't read that far... ;-)

Your thoughts on media players would be most welcome!

The trouble, um, I mean, beauty of digital collections is that they redefine what a "manuscript" is. This is nothing new. Once upon a time someone somewhere probably upset the apple cart when they arrived at the hallowed doors with a basket full of photographs. Now we have video, audio and images, all of which can be encoded in any number of "standard" ways. (Not to mention a zillion different binary formats for just about any purpose you can imagine from sheet music to the latest car designs, which may well require more than just document-like presentation too - 3D models for example). These new manuscripts bring challenges for preservation, of course, but they also present challenges for presentation.

To address this, I've been learning more about media players in browsers with a view to picking one for the reader interface. I'm no expert in this field, so here is my layman's consideration of what I've found out and if you want to read more then this is great!

The traditional method to render audio/video in browsers, which pre-dates their ability to handle video themselves, is to use a browser plug-in, either directly (for example VLC plugin) or (more commonly) to build on top of Flash (eg. Flowplayer) or Java (eg. Cortado). The exact mark-up required to use these players varies. Some will simply use the "embed" tag and others have JavaScript libraries to simplify their usage and allow for graceful degradation in the event that the browser does not have the correct plug-in and/or understand/run JavaScript. (This may be an issue when we deploy the interface into a reading room with machines we do not control the configuration of).

But the times, they are a-changin'. Just as old browsers knew what to do when presented with an "img" tag, most modern browsers are beginning to support HTML5's "video" and "audio" tags, allowing the browser itself to handle the playback rather than farming this out to a plug-in. (For more on HTML5 generally see this presentation - the video tag is mentioned at about 58 minutes in). As an added bonus of bringing video into the browser in this way is it has inspired folks to build media players that manipulate the Web page to add the correct mark-up, be it a video tag, an embed, or whatever to play the media. This is currently being used to generate some nice media players that'll use the browser, the Flash-plugin, or whatever is available (see OpenVideoPlayer and OSMPlayer).

So now we get to the crux of it. What should we do for the reader interface? Go old-school (and annoy Steve Jobs) and use a Flash-based player? Adopt the new ways of HTML5? Insist on an Open Source player? Buy something in?

To work out the answer I did a bit of investigating and have installed most of the players mentioned thus far in this post - Flowplayer, OSMPlayer, video-tag only, VLC and Cortado, as well as JWPlayer.

Flowplayer uses the Flash-plugin to play Flash video (and, with an additional plug-in, MP3 audio) - it does not support Ogg. It is very simple to use and very slick to look at. It is open source, released under GPL3 with an additional (and reasonable) "attribution clause" which basically means the Flowplayer logo must appear on the player unless you pay extra.

JWPlayer works much like Flowplayer (though there is also a beta HTML5 video player in the making) and seems pretty good. While the source code is available, it is not clear if this is an open source product or otherwise - the source files do not include a LICENSE.txt or any boilerplate. Probably I'm just missing something there though, and JWPlayer seems a good choice if you don't mind Flash.

OSMPlayer is also open source and has numerous options for installation including a Drupal module (untested), a PHP library and a "stand-alone" configuration. In theory it supports lots of different audio and video formats and uses several divs to create a nice browser based player. Unfortunately, following the guidelines for both PHP and stand-alone configurations, I could not get it to work on my test server.

Video-tag only works pretty well with Firefox 3.6 on Ubuntu 10.04 and is very easy to include in a Web-page. Unfortunately it isn't nearly as slick at playback as Flowplayer - there is a delay in starting the video and it is unclear what is going on.

The VLC plug-in is also open source and seems to work pretty well and should be able to handle many different formats, but it isn't nearly as refined as other players and the provided example code fails to stop the video or make it full-screen. The VLC desktop player is wonderful, but I'm not convinced by the Firefox plug-in.

Cortado is a Java-applet provided to play Ogg Theora among other things. Usage is very simple - you just add an applet tag to the page - but playback is jerky, slow and lacked sound. I do not know if my machine is to blame for this or if it is the player itself so will have to investigate further.

Were I sat on and forced to make a choice I think I'd struggle. Flowplayer is slick to use and easy to implement, but requires we convert everything to Flash video or MP3 (mind you, most media will arrive in suitable formats I imagine). JWPlayer is very similar in this regard. I'd like to adopt the video-tag as this supports a wide range of formats, including open ones, but currently the experience is not very smooth and refinements in this area provided by things like OSMPlayer are still in their early stages of development. JWPlayer's HTML5 offering is still beta for example.

I guess my feeling for now is to either go with Flowplayer (and swallow the conversions required - actually pretty easy with ffmpeg) or spend a bit of time with OpenVideoPlayer's HTML5 work and the video tag. At this stage I think we probably need both working in the interface and see where the better user experience is...

I should throw one more thing into the pot - the problem of formats. Video and audio files are complicated beasts consisting of containers and tracks and such - a bit like cassettes! The contents of these containers are encoded in a variety of ways, each requiring different software to decode and render their content. We have the same problem with documents and we solve that by converting all the text-based materials we get into PDFs (for presentation before anyone starts worrying about the preservation implications of PDF!) and use a PDF plug-in to display them.

Can we do the same with our audio/video material and if we can, what format (I'm using "format" as a general term to mean "container/encoding"!) do we use? (Victoria has already done some work along these lines, creating WAVs for storage and MP3s for presentation, from audio CDs). Is there any additional concerns given that most born-digital video/audio is likely to arrive at our doors in a compressed format? Should we uncompress it? Is such a thing even possible? Should we (and do we have the processing power to) convert all audio/video materials to open formats for both preservation and presentation purposes?

We're going to raise this final question at our next Library developer meeting and see what folks think. In theory we can delay the decision because most browsers and their plug-ins handle multiple formats, but perhaps we should have a standard delivery format much like we currently have PDF?

Oh dear. I started writing this post with the hope of finding all the answers! I have found out a lot about media players at least, which can only be a good thing, and I've also found out that that state of the art is not quite as far along as the proponents of HTML5 killing Flash would like us to believe - though there is good work going on here and this is the future. I'm also unclear just how much my experience of these things is hindered by using Ubuntu - I often wrestle with the playback of media files under Linux! :-)

Still, I think we're further along, nearer an answer and at least in a place to know where to start testing...

Your thoughts on media players would be most welcome! :-)

Friday, 3 September 2010

Wot I Lernd At DrupalCon

I spent last week in the lovely city of Copenhagen immersed in all things Drupal. It was a great experience, not just because of the city (so many happy cyclists!), but because I'd not seen a large scale Open Source project up close before and it is a very different and very interesting world!

I'm going to pick out some of my highlights here as to cover it all would take days, but if you want to know more I'd encourage you to check out the conference Web site and the presentation videos on

So, wot did I lernd?

Drupal Does RDF
OK, so I knew that already, but I didn't know that from Drupal 7 (release pending) RDF support will be part of the Drupal core, showing a fairly significant commitment in this area. Even better, there is an active Semantic Web Drupal group working on this stuff. While "linked data" remains something of an aside for us (99.9% of our materials will not make their way to the Web any time soon) the "x has relationship y with z" structure of RDF is still useful when building the BEAM interfaces - for example Item 10 is part of shelfmark MS Digital 01, etc. There is also no harm in trying to be future proof (assuming the Semantic Web is indeed the future of the Web! ;-)) for when the resources are released into the wild.

Projects like Islandora and discussions like this suggest growing utility in the use of Drupal as an aspect of an institutional repository, archives or even Library catalogues (this last one my (pxuxp) experiment with Drupal 6 and RDF).

Speaking of IRs...

Drupal Does Publishing
During his keynote, Dries Buytaert (the creator of Drupal) mentioned "distributions". Much like Linux distributions, these are custom builds of Drupal for a particular market or function. (It is testament the software's flexibility that this is possible!) Such distributions already exist and I attended a session on OpenPublish because I wondered what the interface would look like and also thought it might be handy if you wanted to build, for instance, an Open Access Journal over institutional repositories. Mix in the RDF mentioned above and you've a very attractive publishing platform indeed!

Another distro that might be of interest is OpenAtrium which bills itself as an Intranet in a Box.

Drupal Does Community
One of my motivations in attending the conference was to find out about Open Source development and communities. One of the talks was entitled "Come for the Software, Stay for the Community" and I think part of Drupal's success is its drive to create and maintain a sharing culture - the code is GPL'd for example. It was a curious thing to arrive into this community, an outsider, and feel completely on the edge of it all. That said, I met some wonderful people, spent a productive day finding my way around the code at the "sprint" and think that a little effort to contribute will go a long way. This is a good opportunity to engage with a real life Open Source community. All I need to do is work out what I have to offer!

Drupal Needs to Get Old School
There were three keynotes in total, and the middle one was by Rasmus Lerdorf of PHP fame, scaring the Web designers in the audience with a technical performance analysis of the core Drupal code. I scribbled down the names of various debugging tools, but what struck me the most was the almost bewildered look on Rasmus' face when considering that PHP had been used to build a full-scale Web platform. He even suggested at one point that parts of the Drupal core should be migrated to C rather than remain as "a PHP script". There is something very cool about C. I should dig my old books out! :-)

HTML5 is Here!
Jeremy Keith gave a wonderful keynote on HTML5, why it is like it is and what happened to xhtml 2.0. Parts were reminiscent of the RSS wars, but mostly I was impressed by the HTML 5 Design Principles which favour a working Web rather than a theoretically pure (XML-based) one. The talk is well worth a watch if you're interested in such things and I felt reassured and inspired by the practical and pragmatic approach outlined. I can't decide if I should start to implement HTML5 in our interface or not, but given that 5 is broadly compatible with the hotchpotch of HTMLs we all code in now, I suspect this migration will be gentle and as required rather than a brutal revolution.

Responsive Design
I often feel I'm a little slow at finding things out, but I don't think I was the only person in the audience to have never heard about responsive Web design, though when you know what it is, it seems the most obvious thing in the world! The problem with the Web has long been the variation in technology used to render the HTML. Different browsers react differently and things can look very different on different hardware - from large desktop monitors, through smaller screens to phones. Adherence to standards like HTML5 and CSS3 will go a long way to solving the browser problem, but what of screen size? One way would be to create a site for each screen size. Another way would be to make a single design that scales well, so things like images disappear on narrower screens, multiple columns become one, etc.

Though not without its problems, this is the essence of responsive design and CSS3 makes it all possible. Still not sure what I'm on about? dconstruct was given as a good example. Using a standards compliant browser (ie. not IE! (yet)) shrink the browser window so it is quite narrow. See what happens? This kind of design, along with the underlying technology and frameworks, will be very useful to our interface so I probably need to look more into it. Currently we're working with a screen size in mind (that of the reading room laptop) but being more flexible can only be a good thing!

There were so many more interesting things but I hope this has given you a flavour of what was a grand conference.

Wednesday, 1 September 2010

The Case for Digital Preservation

Now, I'm pretty sure there is no need for me to make the case to the good readers of this blog, but if you're ever stuck for something to say about why your work is important - for example at parties - then the demise of the print edition of the OED seems a good candidate!

OK, so no one is about to ditch the Pocket version, or even the Shorter (I got one of those for a graduation present from my Grandma!), but even so...

The last print OED was published in 1989. I imagine, given the regular updates to the OED online, that there has been a substantial influx of words since 1989 and I guess (given how Chaucer looks now) English will undergo some significant changes in the future. Unless we (the DP community) decide to preserve the digital OED, we will condemn readers of 2489 to struggle on with an antique 1989 print copy and much will they wonder when they don't find things like "Internet"...

(Mind you, the electricity might have all run out by then so it wont really matter...)

On the flip side, and no doubt something someone at the party will point out, this is also a case for continuing to print the OED - at least a few copies, kept in safe places... ;-)