Back in April, we received an addition to the Ladislav Holy Collection – Ladislav Holy (1933–1997) was a Czech anthropologist and Africanist, who came to St Andrews in 1979 as Reader in Social Anthropology, eventually becoming a Professor in 1987.
The addition was a box containing 144 3.5” and 5.25” floppy discs found amongst Holy’s papers. These were passed to me as Digital Archives Officer to add to our growing register of digital and audio-visual material in the Library’s manuscript collections.
After accessioning these items I thought I’d use them as samples to test some of our digital preservation techniques. Since taking up the new post of Digital Archives Officer, one of my tasks has been to plan for preserving such material in the long term.
Digital preservation is not just an issue of storing files indefinitely – in order to preserve files in any meaningful way we also need to catalogue them in much the same manner as any other object, recording who made them, when, what the object contains and what items are related to it. We also need to keep some specific preservation information, taking in to account the susceptibility of digital objects to accidental (or deliberate!) change and obsolescence.
In practice, this requires the use of various tools to extract automatically certain kinds of useful descriptive information (e.g. file size, format, creation date, file name1), generating some preservation information (e.g. MD5 checksums) and filling in some of the gaps using our old fashioned analogue brains.
For the Holy floppy discs, I could achieve much of this using a free tool called FTK Imager and an external floppy disc drive. FTK is a forensics tool that allows you to examine digital objects without altering their original state, enabling the investigation of all sorts of files that might otherwise be difficult to read.
First checking that it was write-protected, I took a look at an unlabelled disc. FTK immediately tells that me that it contains the following files:
I took a further look at the metadata for the file called ‘Draft metaphors’ and saw that it was created in 1993 (which doesn’t seem that long ago, until you realise that our incoming undergraduates were born in 1997…).
I can also see lots of other metadata, including an MD5 hash – I’ll keep a record of this, as this is the number that will change if anything about the content of the file is modified, and can be used to demonstrate that the file is unaltered since coming in to our custody.
The file type is WDBN – not immediately recognisable to me, but Google suggests it was used by Microsoft Word for Mac in the early 90s, which fits with the file creation date.
So is the name of the file a good indicator of its contents? An in-built viewer lets me read the relevant sectors on the disc to find out:
I can rearrange this view in to something a bit more human-friendly, revealing:
This file seems to be a draft of a text titled Metaphors of the Natural and the Artifical in Czech Political Discourse written in April 1993. A quick search on Google reveals that Holy published an article of the same name in the anthroplogical journal Man in Dec 1994 – available online via JSTOR here.
A comparison of the texts suggests that our draft is quite different in the order, and in some cases the specifics, of the argument presented in the published version – differences which may be of interest to anyone studying Holy’s work and thought processes.
FTK Imager also allows me to read any files that were marked for deletion but never actually overwritten on the disc – not always appropriate or useful, but in this case revealing an even earlier draft of Metaphors of the Natural and Artificial in Czech Political Discourse:
Examination of some of the other discs in the same box suggest that these floppies represent a discrete collection of drafts of Holy’s publications on Czech nationalism in the early 1990s.
With the above contextual information, we can now consider transferring copies of these files from their vulnerable discs to our digital asset management system, cataloguing them, and making additional preservation copies in formats more likely to be supported in the long term.
In addition to these digital files, the Holy collection also contains a significant number of photographs, slides, maps, audio tapes and kinship diagrams – and so is shaping up to be one of our first truly multimedia collections.
Digital Archives Officer
1. Though what people actually name their files can, conversely, be unhelpful or misleading