Monday, 24 May 2010

#4n6 event, and CLIR report on digital forensics as applied to cultural materials

For a couple of days week before last I was at a meeting which went by the name of Computer Forensics and Born-Digital Content in Cultural Heritage Collections. The meeting was in support of a report bearing the same name (for now at least) which is currently being written by Matt Kirschenbaum, Richard Ovenden and Gabby Redwine. The final day of the workshop was dedicated to reviewing the first draft of the report, and the finalised version should be published by CLIR later this year.

We've been adapting digital forensics tools and techniques within BEAM (Bodleian Electronic Archives and Manuscripts) for a few years now, and this meeting was a useful event to talk about how we do this, and some of the issues (process, technical and ethical) it raises.

It was a good meeting, and I very much enjoyed hearing from other digital archivists and *real* forensics practitioners (they have rather different objectives to ours, but their tools are still useful!). Another highlight for me was Stephen Ennis' framing thoughts, presented in the first session. Ennis grounded the discussion, with three key - and very practical - points that should be important to any archivist:

1) What is the hard-cash value of born-digital archives?
Ennis contends that monetary value has been a preservation agent for literary manuscripts. If disks and digital data are of no value, their survival rate is likely to be poor. He cited the example of John Updike's archive (at Harvard), which contained software disks but no related data disks. It's worrying that dealers don't/won't appraise born-digital material, but this will surely change. Another issue is that we need dealers to be able to appraise digital archives without altering what they are appraising. Will they have to adopt digital forensic techniques too?

2) Are the steps that seem justified for celebrity authors justified for others?
This question is very important and equally applicable to 'papers', of course. In the digital domain, the obvious 'celebrity' example is the work Emory's MARBL have done to make one of Salman Rushdie's hard disks accessible to scholars through an emulator and a searchable database. We certainly won't be processing every digital archive submission at this level, and I suspect MARBL won't either. Where it's justified, I think it's a very good thing.

3) What is the researcher's object of study? Are we promoting new and different forms of enquiry?
This question, perhaps, gets closer to exploring our simultaneous excitement and concern when we consider the potential of combining scholarly enquiry and digital forensic tools in relation to born-digital archives. There's a good deal we need to learn about scholars' requirements and I'm looking forward to the day that we have more case studies so we can move this discussion beyond conjecture!

If you're interested in finding out more in advance of the report, you'll probably find that some of the slides will be published in due course at the event's website. You can also take a look at some photos and tweets.

I may extend this post with some of the more interesting tidbits if I find a moment.

Thursday, 29 April 2010

Passwords you never created and never knew

Every so often the more technically-savy in a family are called on to help set-up a new computer when an old one begins to fail. Experience tells you that there are a number of things that you'll need to do as part of this process, but there's generally one or more things you forget to check for and have to fix later. It's seldom a single-session process.

Last weekend the main problem was an unknown password for an email account. In a scenario which can't be that uncommon, an email account had been established by a friend and the password for it remembered by the email client but no human being. Luckily we were able to salvage the password using one of these tools and restore access to the email via a new client on the new computer.

It seems all to possible that we will encounter this scenario with a depositor at some stage, so it's handy to have an easy fix for it. On the other hand, it's a little worrying how easy a fix it is...

Wednesday, 28 April 2010

So long floppy, hello retro cool!

If you've been following Victoria's rather brilliant posts about media, you'll be sad (or perhaps glad) to hear that the demise of the floppy draws ever closer now that Sony are discontinuing floppy disks. I suspect everyone has a story to tell that involves a floppy disk, the fear, the shear agony of that lost essay, the relief at the kindness of the geek who saved the file. These stories will become a thing of the past.

To balance this bad news, I also wanted to flag up the Vintage Computer Festival up the road at Bletchly Park. Lets hope they raise a glass to deprecated storage devices and their tales!

Friday, 23 April 2010

Do you know the way to Dundee?

The Centre for Archive and Information Studies at the University of Dundee is putting on what looks to be a very interesting seminar entitled "Practical Approaches to Electronic Records: the Academy and Beyond". And I'm not just saying it'll be very interesting because we're talking at it either. Just take a look at the packed programme and I'm sure you'll agree.

I'll be covering the workflow we're adopting here at futureArch and hopefully demo part of it, as well as discussing our digital asset management system, the foundation for our archive and how those ideas may scale to smaller systems.

Hope to see you there and if not I'm sure we'll be reporting back right here so stay tuned!

(Also a bit (um, I mean big) thank you to Jennifer Johnstone for helping me find my way to Dundee! :-))

Wednesday, 14 April 2010

Using a D-Link DGE-530T Gigabit Network adapter in ESX 4.

For our developers ESX testbed/playground I wanted to install two D-Link DGE-530T Gigabit PCI Desktop network adapters unfortunately they do not appear to be on the ESX supported list. These are the steps I took to get them to be recognised by ESX:

1. Acquire the skge.o driver which supports the Marvell Yukon 88E0001 chipset

The discussion Using a Marvell LAN card with ESXi 4 contains a link to a tarball sky2-and-skge-for-esxi4-0.02.tar.gz containing both the sky2 and skge driver

2. login to ESX 4.0 as root and copy the skge.o driver to /usr/lib/vmware/vmknod

2.1 download sky2-and-skge-for-esxi4-0.02.tar.gz

2.2 tar xvzf ../sky2-and-skge-for-esxi4-0.02.tar.gz

2.3 cp vmtest/usr/lib/vmware/vmkmod/skge.o /usr/lib/vmware/vmkmod

3. run 'lspci' and identify the NICs location (the xx:xx.x number in front of the description)

03:00.0 Ethernet controller: D-Link System Inc Unknown device 4b01 (rev 11)

4. run 'lspci -n' and determine the vendor and device IDs (for D-Link it should be 1186:xxxx)

lspci -n
00:00.0 0600: 8086:29b0 (rev 02)
(snipped)
03:00.0 0200: 1186:4b01 (rev 11)
03:02.0 0200: 8086:1026 (rev 04)

5. create the vmware pciid file '/etc/vmware/pciid/skge.xml' here's a listing of the mine

cat /etc/vmware/pciid/skge.xml

<?xml version='1.0' encoding='iso-8859-1'?>
<pcitable>
<vendor id="1186">
<short>D-Link System Inc</short>
<name>D-Link System Inc</name>
<device id="4b01">
<vmware label="nic">
<driver>skge</driver>
</vmware>
<name>DGE-530T Ethernet NIC</name>
<table file="pcitable" module="ignore" />
<table file="pcitable.Linux" module="skge">
<desc>D-Link System|DGE-530T Ethernet NIC</desc>
</table>
</device>
</vendor>
</pcitable>

6. create file /etc/vmware/init/manifests/vmware-skge.mf which contains a single line as shown
cat /etc/vmware/init/manifests/vmware-skge.mf
copy /usr/lib/vmware/vmkmod/skge.o

7. reboot the server and checking the /var/log/vmware/esxcfg-boot.log should confirm:

That the esxcfg boot process has loaded the skge.xml metafile , constructed the new vmware-devices.map file and included the skge.o driver in the initramfs image.

8. running 'lspci' after adding a second DGE-530T card now shows

03:00.0 Ethernet controller: D-Link System Inc DGE-530T Ethernet NIC (rev 11)
03:02.0 Ethernet controller: D-Link System Inc DGE-530T Ethernet NIC (rev 11)

Of course the normal caveats and disclaimers apply as in not supported by VMware etc.

Monday, 12 April 2010

Want to be our new graduate trainee?

We are now advertising for our second graduate traineeship post within the project. This one-year post is intended to provide pre-course experience to a graduate prior to undergoing professional training on one of the recognised archive courses. Being based within futureArch, it particularly suits an applicant wishing to develop an understanding of how the shift to digital communications impacts the work of archivists. 

The postholder will support the curatorial and technical work of the futureArch project, while sampling a variety of more traditional archival work, including providing services to researchers in the Special Collections Reading Room. The postholder will also participate in activities organised through the OWL Graduate Trainee Scheme.

Further details and application forms are available here. The closing date for applications is 10 May 2010 and we expect to interview on 1 June. For a flavour of some of the work Victoria has done during her time as a trainee take a look at some of her posts to this blog and to the Bodleian graduate trainee scheme blog.

Wednesday, 31 March 2010

Microsoft Works library

Quick future aide-mémoire for accessing Works files. Available at the libwps site. Implemented in some openoffice variants.

Tuesday, 30 March 2010

Disk imaging for older floppies

Thanks to Michael Olson for the link to Kryoflux , which is currently being developed by the Software Preservation Society (an organisation established to preserve disk-based computer games). Stanford are also using the Catweasel floppy disk controller; see Stanford's post on Catweasel and the Catweasel site itself. These could be handy to have around when we receive more in the way of unusual floppy formats.

Monday, 22 March 2010

Media Recognition Guide - Flash Media

Flash memory is the alternative to byte-programmable memory, which is used by hard, floppy and Zip disks. It is much less expensive, meaning large capacity devices are economically viable and has faster access times and much better shock resistance and durability. Altogether this makes it particularly suitable for use as a portable storage device. Flash memory does have a finite number of write-erase cycles, but manufacturers can guarantee at least 100,000 cycles, which is a much larger number than with byte-programmable memory.


USB Flash Drive

Type:

Flash memory data storage device with USB interface

Introduced:

2000, though the company that invented the device is a legal issue.

Active:

Yes

Cessation:

-

Capacity:

First drive had a capacity of 8 MB but the latest versions can have capacities as large as 256 GB

Compatibility:

Widely supported by modern operating systems including Windows, Mac OS, Linux and Unix systems.

Users:

Broad. Has replaced 3.5” floppy disks as the preferred device for individuals and small organisations for personal data storage, transfer and backup.

File Systems:

FAT, NTFS, HFS+, ext2, ext3

Common manufacturers:

Many manufacturers and brands including Sandisk, Integral, HP, Kingston Technology and Sony


Recognition


USB flash drives can come in a range of shapes and sizes, but as a general rule they measure somewhere in the region of 70mm x 20mm x 10mm and all have a male USB connector at one end. Capacity also varies widely, though the majority of manufacturers specify this either by printing the information on the casing or etching it onto the connector.


Using the word ‘drive’ is misleading as nothing moves mechanically in a USB flash drive. However, they are read and written to by computers in the same way they read and write to disk drives, therefore they are referred to by operating systems as ‘drives’.


The only visible component is the male USB connector, often with a protective cap. Inside the plastic casing is a USB mass storage controller, a NAND flash memory chip and a crystal oscillator to control data output. Some drives also include jumpers and LEDS and a few also have a write-protect switch.


High Level Formatting


USB drives use many of the same file systems as hard disk drives, though it is rare to find a drive that contains a version that pre-dates its creation. Therefore, USB drives most likely contain FAT32, rather than FAT16 or FAT12. FAT32 is the file system most commonly found on USB drives due to its broad compatibility with all major operating systems. NTFS can be used but it is not as reliable when used on operating systems other than Windows. If a drives is intended for a specific operating system, you can expect to find either HFS+ (for Macs) or ext2 or ext3 (for Linux).


Formatting a disk is done in the same way as formatting a floppy disk. If being done on a Windows operating system for example the only difference is you will right click on the USB drive icon, rather than the floppy drive.



FireWire Flash Drive


Type:

Flash memory data storage device with firewire interface

Introduced:

2004

Active:

Yes

Cessation:

-

Capacity:

Either 4, 8 or 16 GB

Compatibility:

Compatible with any computer with a firewire connector

Users:

Limited. Never achieved the same popularity as USB flash drives. They come in smaller sizes and have slower memory

File Systems:

FAT, NTFS, HFS+, ext2, ext3

Common manufacturers:

Kanguru


Recognition


FireWire flash drives look similar and are similar in construction to USB drives, the one difference being that they use a FireWire connector, rather than a USB one. Due to this they have different data transfer rates and capacities than USB drives. Depending on which version of FireWire the drive has been manufactured with it has a transfer rate of either 49.13, 98.25 or 393 MB/s. With the exception of 40.13 MB/s, these rates exceed that of the latest USB version, however they have a much smaller capacity. Furthermore they are heavier and more expensive and fewer computers have the appropriate FireWire connectors compared to those with USB ports. Thus, FireWire flash drives have never dominated the market and are fairly rare.

High Level Formatting


FireWire drives only differ from USB drives in their type of connector, therefore they will contain the same file systems and can be formatted in the same way.

Media Recognition Guide - Iomega Zip Disks

Type:

Removable disk storage

Introduced:

1994

Active:

Yes, but used by minority

Cessation:

-

Capacity:

Either 100, 250 or 750MB

Compatibility:

Zip drive needs to be of a matching or higher capacity than the Zip disk. Supports Windows OS, IBM OS/2, Mac OS 7.6 to 9.2, MAC OS X and some Linux OS.

Users:

Small businesses and personal users to backup files

File Systems:

NTFS, FAT, ext2, HFS/+, ADFS

Common manufacturers:

Iomega


Recognition


The Zip disk was introduced by Iomega in 1994 as a medium capacity removable storage device to rival 3.5” floppy disks. There are three versions with capacities of 100, 250 and 750 MB, which is considerably more than a 3.5” floppy disk, and Zip disks have a quicker data transfer rate: 1 MB/s compared with a HD 3.5” floppy disk’s rate of 15.6 KB/s. However, Zip disks never reached the same popularity as floppy disks and could not compete with other forms of removable storage, such as CDs which offer much larger capacities. Therefore sales declined and use is limited at the time of writing [2010], though zip drives and disks are still available for purchase from online retailers.


Zip disks are physically similar to floppy disks, except they are larger and not quite rectangular. Their dimensions are 97 x 98 x 6mm compared to 3.5” floppy disk dimensions of 90 x 94 x 3mm.


Zip drives are more compact than floppy drives. Dimensions vary, but are around 170 x 110 x 25mm, although drives with a SCSI interface are larger, measuring 193 x 139 x 44mm. Drives are external and attached to a computer via an interface, either PATA, SCSI, USB or FireWire. Very early drives had an IDE interface (the forerunner to PATA), but these are not common. Not every interface can be found on each type of Zip drive. Below is a table setting out which interface is compatible with each drive:


Drive

Interface

PATA

SCSI

USB

FireWire

Zip 100

Yes

Yes

Yes

No

Zip 250

Yes

Yes

Yes

Yes

Zip 750

Yes

No

Yes

Yes



High Level Formatting


Zip disks use the same file systems as floppy disks, with the most common being FAT for use with Windows, HFS or HFS+ for use with Mac OS and ext2 for use with Linux. Many disks come preformatted, but can still be reformatted by the user to suit their operating system.


Formatting with Windows

This is done in the same way as formatting floppy disks: with the disk inserted in the drive open ‘My Computer’, right click over the Zip disk drive icon and select ‘Format’. There are two options; either ‘Short Format’ or ‘Long Format’ and you can change the file system used with either option by selecting either Mac or PC. Click ‘Start’ and the disk will be formatted.


Formatting with Mac OS

Insert the disk into the zip drive. Open the IomegaWare folder then open the ‘Tools’ folder and double click on the Tools icon, which will open up the window. From here click on the appropriate icon for the disk you wish to format. There are two options, either ‘Short Erase’ or ‘Long Erase’. Long Erase should be used for disks containing errors. Select ‘Erase’ to begin formatting the disk. All content will be erased and the disk will be formatted to a file system appropriate for Mac OS (HFS/+).


Formatting with Linux

Formatting a zip disk using Linux is done in the same way as formatting floppy disks, the difference being you type man zip instead of man floppy. For more details see: http://linux.die.net/man/8/floppy


Monday, 15 March 2010

Media Recognition - Hard Disk Drive part 3

Serial Attached SCSI (SAS) Interface


Type:

Magnetic storage media

Introduced:

2004

Active:

Yes [2010]

Cessation:

-

Capacity:

Varies, but majority do not exceed 300GB

Compatibility:

Compatible with all operating systems, though drives with a capacity of 137GB or more are only compatible with Windows 98 onwards and Mac OS 10.2 onwards. Not found on 8” or 5.25” drives.

Users:

Servers and high-end computers

File Systems:

FAT, NTFS, HFS/+, ext

Common manufacturers:

Western Digital, Seagate, Toshiba, Hitachi, Samsung


Recognition


SAS was born out of SCSI developments and entered the market in 2004. One feature making it preferable to SCSI is its higher transfer rate. Its fast speeds and high level performance make it suitable for high-end personal computer hard drives and servers. The first version was slower than the latest version of SCSI having a data transfer rate of 300 MB/s. However, in 2009 this rate increased to 600 MB/s and it is expected to reach 1200 MB/s by 2012. SAS uses point-to-point topology to connect the interface and can support multiple devices (up to 200), making it popular with servers. For the same reasons SAS hard disk drives are relatively expensive therefore they are not as common on standard personal computers as the more general purpose SATA interface.


The SAS connector is a 29-position connector. It is much smaller than its predecessor, SCSI, so as to be used on 2.5” drives. SAS connectors look similar to SATA connectors. The difference is that with the SATA interface the data and power connectors lie next to each other, but are separate, whereas with SAS the two form one connector, with a piece of plastic used to keep them distinct. This similarity is deliberate so that SAS connectors are compatible with SATA drives, but not the other way around.



External Hard Disk Drives


Early Apple Macintosh computers used external SCSI hard disk drives, despite internal hard disk drives being the standard for other PCs. More recently external hard drives are primarily used as additional storage devices.


Although the early Apple external drives were only compatible with Mac OS, later drives have been manufactured to support all modern operating systems. However, they cannot support any Windows OS preceding Windows 2000, Mac OS before version 8.5.1 or the Linux OS with a kernel earlier than version 2.4 unless updates are installed.


The hard disk in an external hard disk drive is no different to that in an internal drive, though an external drive is encased in plastic and the only visible part is the connector. This can either be a SCSI, eSATA, USB or FireWire connector.


FireWire (IEEE 1394): First released in 1995 this was originally developed as a replacement for the SCSI connector and many computers since 2003 have a built-in FireWire port, particularly Apple machines. FireWire has a higher transfer rate than USB and the latest version, FireWire 3200 has a rate of 393 MB/s, which also exceeds that of eSATA, although this rate varies with Windows OS. However it is more expensive than USB, hence it has never superseded USB’s popularity. It is compatible with Windows OS from Windows XP onwards, though issues with Vista have been raised. It is also compatible with Linux OS and Mac OS from version 8.6 onwards. The FireWire cables carry power and data on a single cable, therefore only one is needed for a device.


There have been several versions of FireWire each using different connectors. Here is a brief table setting this out:


Version

Cable Used

Date Introduced

FireWire 400 (IEEE 1394)

6-circuit

1995

FireWire 400 (IEEE 1394a)

4-circuit

2000

FireWire 800 (IEEE 1394b)

9-circuit

2002

FireWire S3200

9-circuit

2007


It is most common to find 6-cicuit connectors on desktop computers and 4-cicuit connectors on laptops. However, in 2000 amendments were made and the 4-cicuit connector was standardised resulting in more of these connectors being found on desktop computers.


FireWire 800 and 3200 are backwards compatible with these ports, but are manufactured with a 9-cicuit connector. Adaptors are available so that 9-circuit cables can be used with 4- and 6-circuit connectors on computers.


USB (Universal Serial Bus): USB was introduced in 1996 and has since become the dominant means to connect computer peripherals to the host controller. The original USB 1.0 has a transfer rate of 12 Mbits/s, which was increased to 60 MB/s (480 Mbits/s) by USB 2.0. This was released in 2000 and standardised in 2001. Like FireWire, USB connectors carry power as well as data, therefore do not require additional power cables.


There are several different USB cables available for different uses. The most common type found on computers is the A plug and port. A second type is similar in size and is usually found on extension cables. Mini plugs are also available for use with small devices such as cameras. The other sort is squarer and about half the width of the A plug. This is known as a B plug and is used on devices that use removable cables such as printers. Having two types of connector (A and B) prevents users accidentally creating an electrical loop.


eSATA: This is SATA’s own external connector introduced in 2004 with a transfer rate of 131 MB/s. Despite having a much larger data transfer rate, few computers have eSATA ports, favouring instead USB and FireWire.