The OAIS experience at the British Library
Deborah WoodyardDigital Preservation Coordinator
ERPANET OAIS Training Seminar, 28-29 Nov 2002
OVERVIEW
Introduction to the British Library
Why the BL chose to use the OAIS model
OAIS theory versus implementation
Terminology
Metadata
Issues not covered by OAIS
Summary of lessons learned about using the OAIS
THE BRITISH LIBRARY
Deposit library Aiming to get deposit legislation for digital
materials Receiving digital material by voluntary deposit,
purchase and digitisation Wide variety of types of digital material received Require method/system for long term storage,
preservation and access Seriously embarked on developing such a system
in 2000 Initial work developed detailed functional
specification of a system aligned with OAIS model concepts
WHY OAIS?
Very little current experience of a system such as this exists
No ‘off-the-shelf’ systems available No other standards OAIS model well developed Considered to be the guidance for best practice Provided excellent high level framework and
convincing back-up argument for political justification for development of such a system
Provided standard terminology for communication A good match for almost the entire system we
were planning to build
OAIS THEORY vs SYSTEM IMPLEMENTATION
High level standard implies no rules for actual design or implementation
OAIS sounds like one system but is not necessarily, or even likely to be, one single entity
No formal method of implementation used
Analysed business processes and matched to OAIS functions
DIAGRAM COMPARISON
OAIS TERMINOLOGY
Useful as a common vocabulary which is used to communicate with internally and externally
Difficult to explain without reading a lot of the document, therefore opaque to those not heavily involved (e.g. OAIS vs OAI)
Still needed to create another glossary
Especially useful: SIP, AIP, DIP; Ingest; Content Information = Content Data Object +
Representation Information
Difficulties with: defining an object; naming preservation users
OAIS METADATA TO BL METADATA
Packaging Information (i.e. how and where the bits are stored)
Content Information including Representation Information (i.e. how to interpret the bits into data)
Preservation Description Information including Reference Information Context Information Provenance Information Fixity Information
(i.e. how to interpret the data into information)
CONTENT INFORMATION
Representation Information (Content data object description)
Technical details of files and resource structure How the resource appears, is installed and runs Documentation Significant properties
Representation Information (Environment description)
Requirements for hardware, peripherals, Operating system, application software, Input and output, memory requirements and other
parameters Documentation on installation, use and location of
environment components.
PRESERVATION DESCRIPTION INFORMATION
Reference Information Identifiers & descriptive information
Context Information Reason for creation, relationships with other
resources
Provenance Information Origin of the resource & changes made due to
its life in the archive
Fixity Information Authentication details
BL METADATA (1/8)
Agent Group Agent Identifier
Agent Role Personal Agent
Group Personal Agent Name
Affix Personal Agent
Family Name Personal Agent Given
Name Personal Agent
Affiliation, Personal Agent Vital
Date
Corporate Agent Group Corp Agent Name Corp Agent Place
Event Agent Group Event Agent Name Event Agent Number Event Agent Location Event Agent Date
Other Agent Group Other Agent Name Other Agent
Description
BL METADATA (2/8)
Descriptive Items Group Language Page Range Frequency Of Serial Issue Data Audience
Title Group Primary Title Title Status Alternative Title Sub Title Series Title Series Title Number Article Title Uniform Title
BL METADATA (3/8)
Subject Group LCSH DDC Name As Subject Free Text Other Subject
Vocabularies BL Collection BL Classification
Description Group Abstract Table of Contents Map Scale Free Text
BL METADATA (4/8)
Date Group Date Issued Date Available Date Created Date Archived Licence Check Date Date Modified Date Coverage Date Valid Vital Date Event Date Other Descriptive
Dates System Dates
BL METADATA (5/8)
Coverage Group Temporal coverage Spatial Coverage
Terms Group Price Terms Of Availability
Statement Terms Of Availability
Reference
Type and Identifier Group
Resource Type Object Type Object Preservation
Category Resource Identifier System IDs Descriptive IDs
Format Group
BL METADATA (6/8)
Relation Group Relation Is Version Of Relation Is Format Of Relation Is Part Of
Relation Is Component Of
Relation Is Replaced By
Relation Replaces Relation Requires Relation External
Object Relation Continues
History Group Custody History Digitisation History Ingest History Preservation History
Process Name Process Description Process Reason Process Selection Process Specification Critical Hardware Critical Software Process Result Process Agent Process Date
BL METADATA (7/8)
Object Part Group Digital Signature Digital Signature
Name Operating
Environment Object Part
Preservation Status Viewing Software Object Part Identifier Start File Underlying Abstract
Form Essence of Being
External Object Group Source Relation External Object Related Information
Object Other
Original Environment Group
Operating System Processor Type Processor Speed Hard Disc Capacity RAM Video Card Sound Card CD Speed
BL METADATA (8/8)
Rights Information Group
Rights Group Rights URL Rights XRML Rights Statement
Rights Holder
Licence Group Licence Type Licence Fee Licence Description Location Number Of Licences
System Parameter Group Licence Key Extraordinary
Requirements Original Carrier Copy Counter
ISSUES NOT COVERED BY THE OAIS (1/3)
Boundary of the system under development: Which materials will be stored in this system Should descriptive information be stored
internally Should object relationships be stored internally Should a retrieval manager component be
included Should an exit strategy (high volume data
transfer) be built from day one
Changes to metadata:
Should changes be allowed without delivery and re-ingest as new item
ISSUES NOT COVERED BY THE OAIS (2/3)
Object deletion: Not included and may be difficult to implement Remove content or only access to content
Object identification in a volume: In the case of corruption or requested
refreshment is it necessary to be able to identify the individual object on a volume
Independent use of archive volumes: Disaster recovery without exact same system
ISSUES NOT COVERED BY THE OAIS (3/3)
Unique identifier: Where should it be generated What structure should it have
How to store license information: Scan hard copy or data entry Where should it be stored
Data integrity: How often should the data be checked
SUMMARY OF MAIN LESSONS LEARNED
It’s heavy
It’s complex
It doesn’t define your scope
It’s worth understanding the terminology and concepts
It is a very valuable tool and the basis of progressing the long term preservation of digital information