Monday, 6 April 2009

Validating normalised dates in XML

I had some fun (hmm..., maybe that's not the right word) a year or two ago with regular expressions, trying to come up with something that could validate the kinds of 'normalised' dates that archivists use. You know the ones. The fuzzy dates, the approximates, the uncertains, the 'it was in this decade, but I can't be more precise than that' date, the 'I can tell you the start-date, but not the end-date' (and vice-versa) date. To add to this, we now have the very precise dates associated with born-digital materials - down to the second complete with timezone. In the event, my problem was dispatched by the folks working on PREMIS, who created a union type that brings together some regular expressions to provide a fix (not perfect, but that's regular expressions for you). Just recently the Library of Congress have mounted some pages in the Standards section of their website, where they have put together a nice statement of the problem, as well as pubishing the union type and an XML document with some test dates. See their Extended Date Time Format page.

1 comment:

Sebastian Rahtz said...

The TEI has done a fair bit of work in the area of imprecise dates which you may be interested in. It's at
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html#NDATTSda

We try to use a whole slew of attributes to explain what is happening in a date, rather than trying to apply regexps to a string.