Permanent access technology for the virtual heritage
. May 2004
Jeffrey R. van der Hoeven

Much of our time we are surrounded by a virtual environment. Surfing the Internet, typing a letter, playing computer games or watching the news, all of these happen in an environment that does not physically exist, but is represented via a computer or television screen. It has become a part of our culture and just like any other physical artifact that is specific for our culture, a part of it should be saved for future generations.

In the scope of my research assignment for the study Technical Informatics at the TU Delft, I performed a literature study on permanent access technology. Permanent Access is the guarantee that information in digital form remains accessible and understandable on the long-term.
The results are published in a report which can be downloaded freely. The research is done under the group of Information Systems Design of the TU Delft and within the scope of the Long-Term Preservation (LTP) project at IBM Netherlands N.V.

With the results of this research I hope to have created more awareness on the fragileness of digital documents, which to my opinion is very important to prevent loss of information now and in the future. Furthermore I hope to have given a good reflection of the current status on permanent access technology in the world. With this reflection, it is intended to help organizations making the right decision about which digital preservation strategy to follow. Finally I hope that this information leads to new initiatives and gives birth to new ideas and practical solutions on digital preservation.

Downloads



Executive summary

This report addresses the difficulties of preservation of digital documents and how to access and understand them in the future. Therefore the following research question is chosen:
"Which of the current strategies regarding permanent access technology taken worldwide ensure accessibility over the long-term?"
Digital documents are different from traditional documents. They consist of streams of bits stored on a hardware device. The meaning of such a document is kept by the logical form in which the bits are stored. This can only be interpreted using appropriate hardware devices and software applications. Besides that, metadata (data about data) is often supplied together with the original document.
Preserving digital documents is a difficult task. The explosive growth of digitally stored information makes it almost impossible to efficiently organize this information. Therefore selections have to be made about which information should be preserved and which not.
As a second, IT developments on both hardware and software make preservation not easier. Preserving a document today could be inaccessible over two or three years from now.
Finally, authenticity of a document is at stake. A digital document can only remain authentic if its integrity is safeguarded and if it can be verified as 'the real one'. This is hard to satisfy because a digital document highly depends on its environment (hard- and software). Therefore this environment should be exactly recreated or the digital document should be transformed to newer formats without loss of its intrinsic value.

Considering these aspects, how can we ever preserve digital documents so that access and understanding of them is safeguarded? This question is addressed by permanent access. Different preservation strategies are considered worldwide based on permanent access technology to guarantee permanent access. In short these are:
  • Technology preservation: build computer museums with all hard- / software created.
  • Saving the hard copy: print everything on paper or microfilm.
  • Encapsulation: supply every document with a self-explanatory description of the file.
  • Migration: convert each document to a newer logical form.
  • Migration on request: do the same as migration, only at retrieval time.
  • Emulation: virtually recreate the original environment of the digital document.
  • XML: store the document separated from structure, content and layout.
  • Digital Rosetta Stone: build a knowledge archive with specifications of hard-/ software
  • Universal Virtual Computer: view documents on a platform independent manner.
All of these approaches have their own advantages and disadvantages, which makes that there is no on-size-fits-all solution. The first three approaches (technology preservation, saving the hard copy, and encapsulation) are less suitable than the others, because it is practically impossible or loss of information is inevitable.
Considering the six other strategies, migration seems to be suitable for common document formats which are widely supported while authenticity has not the highest priority. Emulation can be seen as a last resort for uncommon file formats, whereby authenticity of a document is important and initial costs are not an issue. XML is different in its kind because it tends to a uniform standard used worldwide. Despite the history of standardization (standards come and go), XML can become the standard of standards if it stays in business. It seems to be very suitable for preservation of e-mail, spreadsheets and text documents, although less for other document formats, e.g. image files. The Digital Rosetta Stone (DRS) is a good theory, but a complete implementation of the model seems far away. Instead, migration on request is very practical and already tested with success for image formats. A disadvantage is that it is platform dependent. This does not hold for the Universal Virtual Computer (UVC) based approach. This strategy is the only one that is platform independent, applicable for all digital documents while offering maximum authenticity. But to make the UVC approach successful, it requires decoders and Logical Data Schema's to be developed at preservation time. This demands a lot of effort. Therefore more experience with the UVC should be gained to convince others of its potency and gain more support in development.

In general it can be stated that there is a growing attention on permanent access. Many organizations are developing (or planning) digital preservation repositories and are becoming aware of the difficulties of preservation of digital documents. Frontrunners are exploring the possibilities of different preservation strategies and many libraries and archives are watching the outcomes closely. In this report the most important preservation strategies have been discussed to find an answer on the main question of this research.
Based on these outcomes it seems clear that no one-size-fits-all solution is possible. Digital documents differ from each other in too many ways and are used for many different purposes by many different users. Organizations that are waiting for "the" solution will not be successful in preservation of digital documents. Risk management should be applied to find out which strategy is most appropriate for each type of document. Thereby considering how important the authenticity of a document is.

Although we are heading the right way, more work has still to be done in the field of digital preservation. First of all, more understanding is needed on preservation strategies. Besides this, the core of the problem should not be forgotten. The creation of so many file formats depending on all kinds of hard- and software over the last decades leaves us with the preservation struggles today. We are now in the position to solve this problem. No matter which actions are required, most important is that valuable information will remain valuable, accessible and understandable for future generations, helping our civilization forward.


About the author

Jeffrey van der Hoeven is a student Technical Informatics at TU Delft, university of Technology in the Netherlands. At the moment he is accomplishing his final task: graduation for the master degree, carried out on the topic of digital preservation and access at IBM Netherlands N.V. which started in October 2003. As a prelude to the graduation task, this research has been accomplished. Graduation will take place in August this year on a specialisation of permanent access: the Universal Virtual Computer.

IBM Netherlands N.V. grants to place this research report in libraries for use of examination. For publication, entirely or partially of this report permission from IBM Netherlands N.V. is needed in advance.

2004 by J.R. van der Hoeven - Last updated 6 July 2004