+ All Categories
Home > Documents > DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource...

DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource...

Date post: 01-Aug-2020
Category:
Upload: others
View: 13 times
Download: 1 times
Share this document with a friend
41
ISSN 1747-1524 DCC | Digital Curation Manual Instalment on “Metadata” http://www.dcc.ac.uk/resource/curation-manual/chapters/metadata Michael Day UKOLN University of Bath, Bath BA2 7AY http://www.ukoln.ac.uk/ November 2005 Version 1.1
Transcript
Page 1: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

ISSN 1747-1524

DCC | Digital Curation Manual

Instalment on“Metadata”

http://www.dcc.ac.uk/resource/curation-manual/chapters/metadata

Michael Day

UKOLN

University of Bath, Bath BA2 7AY

http://www.ukoln.ac.uk/

November 2005

Version 1.1

Page 2: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 2 DCC Digital Curation Manual

Legal Notices

The Digital Curation Manual is licensed under a Creative Commons Attribution - Non-Commercial- Share-Alike 2.0 License.

© in the collective work - Digital Curation Centre (which in the context of these notices shall meanone or more of the University of Edinburgh, the University of Glasgow, the University of Bath, theCouncil for the Central Laboratory of the Research Councils and the staff and agents of these partiesinvolved in the work of the Digital Curation Centre), 2005.

© in the individual instalments – the author of the instalment or their employer where relevant (asindicated in catalogue entry below).

The Digital Curation Centre confirms that the owners of copyright in the individual instalmentshave given permission for their work to be licensed under the Creative Commons license.

Catalogue EntryTitle DCC Digital Curation Manual Instalment on Metadata

Creator Michael Day (author)

Subject Information Technology; Science; Technology--Philosophy; Computer Science;Digital Preservation; Digital Records; Science and the humanities.

Description Instalment on the role of metadata within the digital curation life-cycle. Describesthe increasingly important role of metadata for digital curation, some practicalapplications for metadata, issues of interoperability between metadata schemes, thetopic’s place within the OAIS reference model and the issues associated withpreservation metadata.

Publisher HATII, University of Glasgow; University of Edinburgh; UKOLN, University ofBath; Council for the Central Laboratory of the Research Councils.

Contributor Seamus Ross (editor)

Contributor Michael Day (editor)

Date November 2005 (creation)

Type Text

Format Adobe Portable Document Format v.1.2

Resource Identifier ISSN 1747-1524

Language English

Rights © Michael Day, UKOLN, University of Bath

Citation GuidelinesDay M, (November 2005), "Metadata", DCC Digital Curation Manual, S.Ross, M.Day (eds),Retrieved <date>, from http://www.dcc.ac.uk/resource/curation-manual/chapters/metadata/

Page 3: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 3

About the DCCThe JISC-funded Digital Curation Centre (DCC) provides a focus on research into digitalcuration expertise and best practice for the storage, management and preservation of digitalinformation to enable its use and re-use over time. The project represents a collaborationbetween the University of Edinburgh, the University of Glasgow through HATII, UKOLN atthe University of Bath, and the Council of the Central Laboratory of the Research Councils(CCLRC). The DCC relies heavily on active participation and feedback from all stakeholdercommunities. For more information, please visit www.dcc.ac.uk. The DCC is not itself a datarepository, nor does it attempt to impose policies and practices of one branch of scholarshipupon another. Rather, based on insight from a vibrant research programme that addresseswider issues of data curation and long-term preservation, it will develop and offerprogrammes of outreach and practical services to assist those who face digital curationchallenges. It also seeks to complement and contribute towards the efforts of relatedorganisations, rather than duplicate services.

DCC - Digital Curation Manual

EditorsSeamus RossDirector, HATII, University of Glasgow (UK)

Michael DayResearch Officer, UKOLN, University of Bath (UK)

Peer Review BoardNeil Beagrie, JISC/British LibraryPartnership Manager (UK)

Georg Büechler, Digital PreservationSpecialist, Coordination Agency for theLong-term Preservation of Digital Files(Switzerland)

Filip Boudrez, Researcher DAVID, CityArchives of Antwerp (Belgium)

Andrew Charlesworth, Senior ResearchFellow in IT and Law, University ofBristol (UK)

Robin L. Dale, Program Manager, RLGMember Programs and Initiatives,Research Libraries Group (USA)

Wendy Duff, Associate Professor, Facultyof Information Studies, University ofToronto (Canada)

Peter Dukes, Strategy and LiaisonManager, Infections & Immunity Section,Research Management Group, MedicalResearch Council (UK)

Terry Eastwood, Professor, School ofLibrary, Archival and InformationStudies, University of British Columbia(Canada)

Julie Esanu, Program Officer, U.S.National Committee for CODATA,National Academy of Sciences (USA)

Paul Fiander, Head of BBC Informationand Archives, BBC (UK)

Luigi Fusco, Senior Advisor for EarthObservation Department, European SpaceAgency (Italy)

Hans Hofman, Director, Erpanet; SeniorAdvisor, Nationaal Archief van Nederland(Netherlands)

Max Kaiser, Coordinator of Research andDevelopment, Austrian National Library(Austria)

Carl Lagoze, Senior Research Associate,Cornell University (USA)

Nancy McGovern, Associate Director,IRIS Research Department, CornellUniversity (USA)

Reagan Moore, Associate Director, Data-Intensive Computing, San DiegoSupercomputer Center (USA)

Alan Murdock, Head of RecordsManagement Centre, EuropeanInvestment Bank (Luxembourg)

Julian Richards, Director, ArchaeologyData Service, University of York (UK)

Donald Sawyer, Interim Head, NationalSpace Science Data Center, NASA/GSFC(USA)

Jean-Pierre Teil, Head of ConstanceProgram, Archives nationales de France(France)

Mark Thorley, NERC Data ManagementCoordinator, Natural EnvironmentResearch Council (UK)

Helen Tibbo, Professor, School ofInformation and Library Science,University of North Carolina (USA)

Malcolm Todd, Head of Standards,Digital Records Management, TheNational Archives (UK)

Page 4: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 4 DCC Digital Curation Manual

PrefaceThe Digital Curation Centre (DCC) develops and shares expertise in

digital curation and makes accessible best practices in the creation,management, and preservation of digital information to enable its use and re-use over time. Among its key objectives is the development and maintenanceof a world-class digital curation manual. The DCC Digital Curation Manual isa community-driven resource—from the selection of topics for inclusionthrough to peer review. The Manual is accessible from the DCC web site(http://www.dcc.ac.uk/resource/curation-manual).

Each of the sections of the DCC Digital Curation Manual has beendesigned for use in conjunction with DCC Briefing Papers. The briefingpapers offer a high-level introduction to a specific topic; they are intended foruse by senior managers. The DCC Digital Curation Manual instalmentsprovide detailed and practical information aimed at digital curationpractitioners. They are designed to assist data creators, curators and re-usersto better understand and address the challenges they face and to fulfil the rolesthey play in creating, managing, and preserving digital information over time.Each instalment will place the topic on which it is focused in the context ofdigital curation by providing an introduction to the subject, case studies, andguidelines for best practice(s). A full list of areas that the curation manualaims to cover can be found at the DCC web site(http://www.dcc.ac.uk/resource/curation-manual/chapters). To ensure that thismanual reflects new developments, discoveries, and emerging practicesauthors will have a chance to update their contributions annually. Initially,we anticipate that the manual will be composed of forty instalments, but asnew topics emerge and older topics require more detailed coverage moremight be added to the work.

To ensure that the Manual is of the highest quality, the DCC hasassembled a peer review panel including a wide range of international expertsin the field of digital curation to review each of its instalments and to identifynewer areas that should be covered. The current membership of the PeerReview Panel is provided at the beginning of this document.

The DCC actively seeks suggestions for new topics and suggestions orfeedback on completed Curation Manual instalments. Both may be sent to theeditors of the DCC Digital Curation Manual at [email protected].

Seamus Ross & Michael Day.18 April 2005

Page 5: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 5

Table of Contents1. Introduction and scope...................................................................................62. Definitions......................................................................................................83. The growing importance of metadata...........................................................104. Some uses of metadata.................................................................................12

4.1 Resource discovery and retrieval...........................................................124.2 The management of resources...............................................................134.3 The management of archival records....................................................134.4 Facilitating data sharing and reuse........................................................15

5. Metadata interoperability.............................................................................176. The OAIS model and preservation metadata...............................................19

6.1 Types of preservation metadata.............................................................196.1.1 Technical and structural metadata.................................................206.1.2 Descriptive, administrative and contextual metadata....................21

6.2 Preservation metadata initiatives ..........................................................236.3 Metadata packaging and METS.............................................................246.4. Some open questions............................................................................26

7. Conclusions..................................................................................................29Acknowledgments ......................................................................................29

References........................................................................................................30Further reading............................................................................................37Further references........................................................................................38

Author information...........................................................................................41

Page 6: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 6 DCC Digital Curation Manual

Metadata (information about data) provides ameans for discovering data objects as well asproviding other useful information about the dataobjects such as experimental parameters,creation conditions, etc. (Rajasekar & Moore,2001)

In order to exploit and explore the petabytes ofscientific data that will arise from ... high-throughput experiments, supercomputersimulations, sensor networks, and satellitesurveys, scientists will need assistance fromspecialized search engines, data mining tools,and data visualization tools that make it easy toask questions and understand answers. Tocreate such tools, the data will need to beannotated with relevant "metadata" givinginformation as to provenance, content,conditions, and so on; and, in many instances,the sheer volume of data will dictate that thisprocess be automated. (Hey & Trefethen, 2005,p.818)

1. Introduction and scopeThis instalment will introduce the key topic ofmetadata and attempt to highlight just why it isconsidered critically important for the ongoingstewardship and curation of digital data andinformation.

Metadata can be defined simply as any"structured information that describes, explains,locates, or otherwise makes it easier to retrieve,use, or manage" any other resource (NISO,2004). Unfortunately, the term is used in somany different contexts and applied to so manydifferent things that it sometimes seems toconvey very little meaning. For example, Duff(2004) has written that data about data can

seemingly refer to everything, andconcomitantly, nothing. Despite this, it isperhaps worth persisting with the term for now,partly because it remains a useful way ofpromoting cross-domain communication.

While many metadata initiatives have focusedon the development of standards to facilitate thediscovery of objects, there has also been agrowing awareness of the role that metadata canplay in supporting the reuse, management, andlong-term preservation. This last has directly ledto the development of projects and initiativesfocused on the identification of that metadataspecifically required to support long-termpreservation, perhaps most definitively throughthe international working group known asPREMIS(http://www.oclc.org/research/projects/pmwg/).Despite this, developing a preservation metadatastandard that can be easily implemented hasproved difficult. One of the major challengeshas been addressing the distinctive metadatarequirements of the many different players inthe preservation process. For example, referringto the simple taxonomy of users developed bythe Digital Curation Centre for its requirementsanalysis (Carpenter, 2005), it is clear that themetadata needs of data creators may be quitedifferent from those of curators or the re-usersof data.

This first Digital Curation Manual instalmenton metadata will attempt to provide a generalintroduction to the subject from a digitalcuration perspective. It will first attempt somedefinitions and try to explain why metadata is

Page 7: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 7

being seen as increasingly important forsupporting reuse and long-term preservation. Asection highlighting some of the main uses (orfunctions) of metadata will be followed by amore detailed introduction to interoperability.Because of its direct relevance to digitalcuration, the instalment will then consider inslightly more detail the development ofpreservation metadata standards and the role ofpackaging formats like the Metadata Encodingand Transmission Standard (METS).

The curation manual will contain furtherinstalments that will consider specific metadataissues and domains in more detail. Thosealready commissioned or planned (November2005) include introductions to preservationmetadata, interoperability, workflows and theautomated extraction of metadata, and reviewsof metadata initiatives relevant to learningobjects, scientific data and archival records.

Page 8: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 8 DCC Digital Curation Manual

2. DefinitionsWhile the term itself has rapidly becomeubiquitous, literal definitions of metadata as"data about data" are perhaps now less thanhelpful. Instead we must try to define metadatain relation to its use, chiefly the functions that itis intended to support.

There have been a number of attempts tocategorise these functions. For example, Haynes(2004, pp. 15-17) consolidated oldercategorisations into a five-point model, coveringresource description, information retrieval,management, documenting ownership andauthenticity, and interoperability. One of themost popular categorisations was firstdeveloped in the 1990s by a digitisationinitiative called the Making of America IITestbed Project (Hurley, et al., 1999). Thisdefined categories for descriptive, structural,and administrative metadata types, a broadstructure that has to a large extent been inheritedby the influential Metadata Encoding andTransmission Standard (METS). In this simpletypology, descriptive metadata is that used forthe discovery and identification of objects,structural metadata supports the display andnavigation of objects, and administrativemetadata includes any management informationneeded for the object, including information onthe creation process, storage formats, the sourceand provenance of objects, and the intellectualproperty rights held in them.

What is missing from this categorisation is any

specific acknowledgement of the importance ofcontext. Gilliland-Swetland (1998) has notedthat a large part of the activity of archives andmuseums has traditionally been focused onelucidating and preserving the context ofrecords and artefacts. So, for example, archivistshave long been aware that archival records arehighly contingent upon what the InterPARESproject refers to as their juridical-administrative,procedural, provenancial, documentary andtechnological contexts (Gilliland-Swetland &Eppard, 2000). The importance of context - andother archival principles like authenticity - isevident in the well-known definition of'recordkeeping metadata' first developed at aworking meeting held in the Netherlands in June2000 (Wallace, 2001, p 255):

Structured or semi-structured informationwhich enables the creation, managementand use of records through time and withinand across domains in which they arecreated. Recordkeeping metadata can beused to identify, authenticate, andcontextualise records; and the people,processes and systems that create,manage, and maintain and use them.

This is the understanding of metadata thatunderpins initiatives like the draft recordsmanagement metadata standard (ISO/FDIS23081-1:2005) currently under development bythe ISO archives/records managementsubcommittee (ISO/TC46/SC11).

While the word 'metadata' is a fairly recentinvention, the idea of metadata is much older,with its roots in library catalogues (and similar)

Page 9: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 9

dating back to the Pinakes of Callimachus (asystematic bibliography of Greek literaturecompiled in the third century BC, probablybased on the contents of the Library ofAlexandria) and beyond to the record-keepingsystems of the ancient near east (Casson, 2001).The term 'metadata' was first used in the contextof database management systems to give ageneric name for all the various additional dataneeded to describe and control the managementand use of data (Mark & Roussopoulos, 1986).The increasing importance of computernetworking had two main effects. On the onehand, it has led to the development of abewildering array of new metadata standards,each focused on a particular subject domain,content type, function or application.Conversely, this very diversity led to theincreased recognition of the importance ofmetadata in supporting interoperability betweensystems, both technical and semantic (e.g.,Johnston, 2001).

Metadata is now seen as an essential part of thedigital world that we live in now, facilitating thediscovery, management and reuse of all kinds ofdigital and non-digital object. Gilliland-Swetland (2004) has observed that metadata "isrecognised as a critically important, and yetincreasingly problematic and complex conceptwith relevance for information objects of alltypes as they move through time and space." Asalready noted, metadata standards have beendeveloped to support an extremely wide rangeof activities. These include facilitating thediscovery of objects, the management of access

and integration, and the documentation of objectorigins, life cycles and contexts - all at multiplelevels of aggregation and focused on particularsubject domains. Correspondingly, the world ofmetadata can look extremely complicated, withmultiple domain-specific projects, initiativesand standards. This diversity makes providinggeneric advice on the use of metadata standardsextremely difficult.

Page 10: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 10 DCC Digital Curation Manual

3. The growing importance ofmetadataThe importance of metadata is directly related tothe roles they play in supporting the discovery,management and stewardship of digitalresources. However, there are a number ofgeneral trends that are now making metadataeven more crucial to digital curation andstewardship.

The first of these is the vast (and rapidlyincreasing) amounts of information becomingavailable in digital form, as reflected in theUniversity of California at Berkeley's periodicanalyses of the amount of information beingcreated. These suggest that, even when ignoringthe (greater) amount of information that flowsthrough electronic channels (e.g., telephone,radio, television, the Internet), the amount ofnew information being created and stored on alltypes of media effectively doubled between1999 and 2002 (Lyman & Varian, 2003). This'information explosion' or 'data deluge' isevident in many contexts, e.g. in commerce,public administration and healthcare, but isbecoming increasingly important in the researchdomain.

Scientists and other researchers are becomingincreasingly dependent on the production andanalysis of vast amounts of data, typically thatgenerated by high-throughput instruments andcomputer simulations, or streamed from sensorsand satellites (Hey & Trefethen). A fewexamples may suffice. In astronomy it has been

suggested that the volume of observational dataproduced by telescopes and sky surveys doubleseach year, with a consequent need to federateaccess to data across distributed multi-terabyterepositories (Szalay & Gray, 2001). In particlephysics, it has been estimated that experimentson the Large Hadron Collider (LHC) currentlyunder construction at CERN will, whenoperational, generate in the region of 12-14petabytes of data per year, which will then needto be stored and managed across multiple sitesthrough the LHC Computing Grid (LCG)project (http://lcg.web.cern.ch/lgc/). Theseexamples from 'Big Science' domains may havethe most extreme requirements, but relateddevelopments in bioinformatics, theenvironmental sciences and medicine (e.g.,neuroinformatics) indicate that many othersubject disciplines need to respond to thecuration challenges of rapid data growth.

In addition to the growing amount of data beinggenerated, there is an increasing focus in sciencepolicy on encouraging open access to data. Forexample, in January 2004, government ministersfrom all OECD member states (and someothers) endorsed a declaration based on theprinciple that publicly funded research datashould be openly available to the maximumextent possible (Arzberger, et al., 2004). AnOECD working group is currently working onthe development of a set of guidelines thatwould facilitate open access to digital researchdata (http://dataaccess.ucsd.edu/). However, asin other contexts, open access does not justdepend on the willingness of scientists to share

Page 11: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 11

data, or on the existence of appropriateintellectual property rights regimes (e.g.,Waelde & McGinley, 2005), but on the abilityof scientists to be able to find appropriate dataand to be able to understand it sufficiently inorder to reanalyse it or to integrate it with otherdata sources. The existence of sufficient good-quality metadata is a prerequisite for the reuseof data. For example, Deelman, et al., (2004)comment that it "is impossible to conduct acorrect analysis of a data set without knowinghow the data was cleaned, calibrated, whatparameters were used in the process, etc." It canbe argued that this need for metadata is evenmore important in the data-rich researchenvironments that are characteristic of e-science. Hey and Trefethen (2005, p. 818) arguethat metadata will be a necessary condition forthe next generation of scientific tools, e.g.giving scientists assistance "from specializedsearch engines, data mining tools, and datavisualization tools that make it easy to askquestions and understand answers." It is,therefore, not surprising that the US NationalScience Foundation's Blue-Ribbon AdvisoryPanel on Cyberinfrastructure (2003) argue thatthe creation and maintenance of metadata isessential for the ongoing stewardship andcuration of data.

Page 12: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 12 DCC Digital Curation Manual

4. Some uses of metadataAs indicated before, metadata can be used tosupport a range of functions, from discovery,managing access, to recording sufficientdescriptive and contextual information to enablethe preservation or reuse of objects over time.Before elaborating a few of these functions inmore detail, it is perhaps first worth noting thatone of the fundamental characteristics ofmetadata is that, while it can be made human-readable, it is primarily intended to be processedby machines, e.g. for searching, sorting ordisplay. This basic ability has beensupplemented in recent years by the vision of aSemantic Web that facilitates the integration andreuse of data across applications and domains(Berners-Lee, Hendler & Lassila, 2001). Wewill return to this topic in our discussion ofinteroperability in section five.

4.1 Resource discovery and retrievalHistorically, a major focus of metadatadevelopment has been supporting discovery andretrieval. For example, this has long been one ofthe primary roles of the metadata held in librarycatalogues and one of the functions of archivalfinding aids. A large number of metadatastandards have been developed to supportresource discovery, although most of these tendto be focused on particular types of object orsubject domain. A smaller number of metadatainitiatives exist to promote resource discoveryacross domains. Perhaps the most well known ofthese is the Dublin Core Metadata Initiative

(DCMI), which maintains a fifteen-element coremetadata set together with definitions of othermetadata terms that can be used to help buildinteroperability within and across domains(http://dublincore.org/). The element set hasbeen widely implemented, e.g. in cross-domainservices like the US National Science DigitalLibrary (Arms & Arms, 2004) but also adaptedfor use in domain-specific areas like linguistics(Bird & Simons, 2003) or for distributed imagecollections (e.g.,http://www.pictureaustralia.org/). It alsounderlies a number of metadata standardsdesigned to facilitate access to governmentinformation, e.g. the Australian AGLS MetadataStandard (http://www.agls.gov.au/) and themetadata standard defined as part of the UK e-Government Interoperability Framework (e-GIF) (Cabinet Office, Office of the e-Envoy,2004).

The types of information required for resourcediscovery tends to differ according to the type ofdigital object being described. For document-like-objects, there tends to be a strong focus onthe types of information traditionally used bylibrary catalogues or abstracting and indexingservices, e.g. author and editor names, titles,abstracts, subject headings, etc. The MARC(Machine-Readable Cataloguing) formatstraditionally used by libraries have translatedwell into the metadata world through theprovision of things like the MARC21 XMLSchema(http://www.loc.gov/standards/marcxml/) andmappings to formats like Dublin Core and

Page 13: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 13

ONIX(http://www.loc.gov/marc/marcdocz.html), alsothrough the creation of simplified formats likethe XML-based Metadata Object DescriptionSchema (http://www.loc.gov/standards/mods/).Metadata standards supporting the discovery ofimages or multimedia tend to includeinformation describing semantic content as wellas a range of relevant technical characteristics.Metadata standards for scientific datasets tend toinclude additional information about theproducers of the data, access provisions, andtransfer protocols (e.g., Kim, 1999).

4.2 The management of resourcesAnother area where metadata has a potentialimportant role is in supporting the managementof digital resources. This may, for example,record key aspects of the production or curationprocess (e.g. reasons for selection, preservationactions undertaken) as well as information aboutintellectual property rights that could be used tomanage end-user access. This is the main focusof the 'administrative metadata' section definedby the Metadata Encoding and TransmissionStandard (METS)(http://www.loc.gov/standards/mets/). Othertypes of administrative metadata have beenidentified by a Digital Library Federationinitiative that has produced a data structure thatcan be used to support the management ofdynamic collections of digital resources withinlibrary management systems and similar(Jewell, et al., 2004). The ONIX metadatastandards for books and serials provide

publishers with a way to share productinformation with each other and with suppliers,in part built on a generalised frameworkdeveloped to facilitate rights metadatatransactions in e-commerce contexts(http://www.editeur.org/).

Supporting the long-term management andreuse of digital objects brings us to the realm ofdigital preservation. Since the mid-1990s, anumber of projects and initiatives, mostlyoriginating in the library domain, haveattempted to identify the precise role ofmetadata in supporting digital preservationactivities. In recent years, much of the focus ofthis activity has been on the internationalworking group on Preservation Metadata:Implementation Strategies (PREMIS)(http://www.oclc.org/research/projects/pmwg/),the outcomes of which will be described in moredetail in section six (below) and in a separateinstalment of this curation manual that will dealspecifically with preservation metadata.

4.3 The management of archival recordsAlso focused on the longer-term is the importantwork being undertaken by archivists and recordsmanagers in identifying the metadata needed toensure the preservation of the value of archivalrecords as evidence. Research initiatives like theseminal Pittsburgh Project (FunctionalRequirements for Evidence in Recordkeeping)(Bearman & Duff, 1997; Duff, 2001), bothphases of InterPARES(http://www.interpares.org/), and the AustralianRecordkeeping Metadata Schema (RKMS)

Page 14: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 14 DCC Digital Curation Manual

(McKemmish, et al., 1999) have done much tofacilitate a better understanding of the role ofmetadata in the archives and records domain.

In addition, a number of archives havedeveloped specific standards to support thecapture (and presentation) of metadata fromelectronic records management systems(ERMS). For example, the functionalrequirements for ERMS published by theNational Archives in the UK identifies not onlythe metadata required to support recordsmanagement functions but also that intended tofulfil external requirements like the e-Government Interoperability Framework, withwhich it is aligned (National Archives, 2002).Similar standards that specify the metadata thatrecords management software should be able tocapture include the influential "Design CriteriaStandard for Electronic Records ManagementSoftware Applications" issued by the USDepartment of Defense (DoD 5015.2-STD) andthe "Model Requirements for the Managementof Electronic Records" (MoReq)(http://www.cornwell.co.uk/moreq.html). Thistype of records management metadata isprimarily designed to support standardisationwithin organisations, and it is not yet clear yethow much of this metadata will prove useful insupporting long-term preservation. The NationalArchives metadata standard (and the related e-Government Metadata Standard) contains aspecific section for preservation information(e.g. for recording format information), butincludes a note that the area is subject to furtherdevelopment. It is also perhaps worth making

the point that such metadata will not entirelyremove the need for more traditional forms ofarchival description, which is much better forreflecting the context of a given body of recordsand their complex interrelationships.

Currently, the archives and records managementsub-committee of the International Organizationfor Standardisation (ISO/TC46/SC11) isworking on the development of a standard forrecords management metadata, building on themetadata requirements identified by the earlierISO Records Management standard (ISO 15489-1:2001), which defined metadata as "datadescribing the context, content and structure ofrecords and their management through time."The new standard - ISO 23081 - is made up ofthree parts. The part now under development isa general outline of the principles of recordsmanagement metadata, currently a draftstandard (ISO/FDIS 23081-1:2005). Furtherparts will look at implementation issues andprovide some methods of assessment. Buildingon a popular definition first developed at aworking meeting in 2000 (Wallace, 2001), thedraft standard refines the ISO 15489 clause todefine records management metadata as"structured or semi-structured information thatenables the creation, registration, classification,access, preservation and disposition of recordsthrough time and within and across domains,"adding that it "can be used to identify,authenticate and contextualize records and thepeople, processes and systems that create,manage, maintain and use them and the policiesthat govern them" (ISO/FDIS 23081-1:2005).

Page 15: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 15

This definition highlights the need forrecordkeeping systems to capture metadataabout many different types of entity, e.g. therecords themselves and their business context,the underlying policies of archives, recordsmanagement processes and the agents thatundertake them.

This diverse metadata would be expensive tocreate manually, so the viability of recordsmanagement metadata will depend upon thepossibility of automatically capturing thedesired information from recordkeepingsystems, existing metadata, and other sources.Both the Clever Recordkeeping MetadataProject and InterPARES 2 are investigating theextent to which records management metadatacan be captured from business processes andsystems and are exploring the potential roles ofmetadata registries (Evans & Lindberg, 2004;Evans, McKemmish & Bhoday, 2004).

A separate instalment in this curation manualwill deal with archival and records managementmetadata in more detail.

4.4 Facilitating data sharing and reuseIn research domains where data needs to beshared, the creators of data have long recognisedthe need to maintain contextual and otherinformation about data that allow it to becorrectly interpreted or analysed by otherresearchers. For example, in a paper onecological metadata, Michener, et al. (1997)noted that "highly detailed instructions ordocumentation may be required for scientists to

accurately interpret and analyze historic or long-term data sets, as well as data resulting fromunfamiliar research or complicated experimentaldesigns." Helly, Staudigel and Koppers (2003)view this type of documentation as applicationmetadata, "describing the content, context,quality, structure, accessibility and so on of aspecific data set." Large-scale data sharingdepends to a large extent upon two things.Firstly, it depends upon the existence of somekind of data sharing infrastructure - e.g.databases, repositories or data centres - that canstore, curate and provide continued access todata. Secondly, large-scale data sharing requiresstandardised forms of data and metadata so thatusers are able to correctly process the retrieveddata. Many scientific disciplines and sub-disciplines, therefore, have been involved indeveloping standards that can facilitate theexchange of data and metadata (Wouters &Reddy, 2003; Ball, Sherlock & Brazma, 2004).These standards tend to be specific to one sub-discipline or type of data.

Metadata sharing is of particular importance inthe geosciences, where a number ofstandardisation initiatives exist (Kim, 1999).Perhaps the most prominent of these is theContent Standard for Digital GeospatialMetadata (CSDGM), developed by the USFederal Geographic Data Committee for thesharing and dissemination of geospatial data(http://www.fgdc.gov/metadata/contstan.html).It is widely used by federal agencies, localgovernment and universities, especially in theUnited States. Domain-specific profiles of

Page 16: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 16 DCC Digital Curation Manual

CSDGM have also been developed forbiological data, shoreline data and remotesensing metadata. A technical committee of theInternational Organization for Standardization(ISO/TC 211) has also developed a metadatastandard for describing geographicalinformation and services (ISO 19115:2003).

Another domain where data sharing is importantis the social sciences, especially for quantitativedata. The Data Documentation Initiative (DDI)is an attempt to develop an internationalstandard for the exchange and preservation ofsocial and behavioural science datasets(http://www.icpsr.umich.edu/DDI/). Thestandard is currently based on XML and is beingused by a growing number of projects and datacentres.

The principle of reuse is also a motive behindthe development of metadata standards thatdescribe learning objects, most prominently theLearning Object Metadata (LOM) standarddeveloped by the Learning TechnologyStandards Committee of the IEEE ComputerSociety (IEEE Std 1484.12.1-2002).

Page 17: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 17

5. Metadata interoperabilityA key issue in the networked world isinteroperability, the ability of heterogeneousdata and metadata to be shared across differentsystems, e.g. for data aggregation or federatedsearching. While it is not exclusively a technicalissue, the main focus has been on the technicaland semantic aspects of interoperability(Johnston, 2001). At the technical level,interoperability is dependent on the existence ofstandard syntaxes, e.g. based on the ExtensibleMarkup Language (XML), and the use ofcommon communication protocols. Popularprotocols include the Z39.50 standard(ANSI/NISO Z39.50-2003), typically used forsearching distributed collections ofbibliographic data like library catalogues, andthe Open Archives Initiative Protocol forMetadata Harvesting (OAI-PMH) (Lagoze, etal., 2002). Once working, these aspects ofinteroperability are usually hidden from theuser.

Once technical interoperability has beenachieved, there is then a need to consider thegreater problem of semantic interoperability,e.g. dealing with differences in terminology andmeaning across domains. This can be veryproblematic. In their book Sorting things out:classification and its consequences, Bowker andStar (1999, p. 287) remind us that all forms ofclassification reflect a particular point of view,"that categories are historically situated artifactsand, like all artifacts, are learned as part ofmembership in communities of practice."

Reflecting on differences of meaning, Harvey,et al. (1999) argue that true semanticinteroperability requires the means "to resolve[the] complex differences that lurk behindapparently consensual terminology andprocedures."

The simplest solutions to the semanticinteroperability problem involve a combinationof metadata transformations based on human-generated mappings (or crosswalks) and the useof cross-domain metadata standards like DublinCore. The transformation of one metadataschema to another using mappings is a fairlycommon activity, e.g. when organisations adoptnew data formats or systems, but can be farfrom a straightforward task in practice, withmany opportunities for 'mistranslation' (e.g.,Woodley, 1998; Godby, Smith & Childress,2003). The underlying problem, as Duff (2001)reminds us, is that metadata standards are mostoften developed to address a specific set ofneeds or requirements and are usually based onquite different conceptual models.

Beyond the Dublin Core, many communities ofpractice have developed their own standardisedformats for facilitating interoperability withinparticular domains or with relation to particularobject types, e.g. the IEEE Standard forLearning Object Metadata (IEEE Std 1484.12.1-2002). For facilitating access to scientificdatasets, the Council for the Central Laboratoryof the Research Councils (CCLRC) hasinvestigated the development of a generic modelfor all types of scientific metadata as part of itsData Portal project (Sufi & Matthews, 2004;

Page 18: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 18 DCC Digital Curation Manual

Drinkwater & Sufi, 2004).

There are some deeper aspects of semanticinteroperability. Heflin & Hendler (2000)comment that in order to achieve it, "systemsmust be able to exchange data in such a way thatthe precise meaning of the data is readilyaccessible and the data itself can be translatedby any system into a form that it understands."This brings us firmly into the domain ofontologies and the Semantic Web. The latter is avision of a World Wide Web where the meaningof information can be processed by machines.Berners-Lee and Hendler (2001) stress that theconcept of machine-processability is not basedon artificial intelligence techniques, but "solelyon the machine's ability to solve well-definedoperations on well-defined data." What thismeans in practice is that resources are describedor annotated with semantic markup (metadata)that means that they can be processed bysoftware agents. Semantic Web technologieslike the Resource Description Framework(RDF) and ontology languages have manypotential applications, e.g. for the integration ofdata and information (Hendler, 2003; Staab,2003; Wroe, et al., 2004), and for supportingcollaborative and interdisciplinary e-science(e.g., De Roure & Hendler, 2004).

Page 19: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 19

6. The OAIS model and preservationmetadataPreservation metadata and the Reference Modelfor an Open Archival Information System(OAIS) will be dealt with in more detail in othermanual instalments. However, their importanceto digital curation means that short introductionsto both may be useful here.

Since the mid-1990s, those responsible for thelong-term preservation of digital objects haverealised that all digital preservation strategiesdepend - to some extent - upon the capture,creation and maintenance of appropriatemetadata (e.g., Day, 2004). This 'preservationmetadata' is understood to be all of the varioustypes of data that allows the re-creation andinterpretation of the structure and content ofdigital data over time (Ludäsher, Marciano andMoore, 2001). Understood in this way, it is clearthat such metadata needs to support anextremely wide range of different functions,including discovery, the technical rendering ofobjects, the recording of contexts andprovenance, to the documentation of repositoryactions and policies. Conceptually, therefore,preservation metadata spans the populardivision of metadata into descriptive, structuraland administrative categories. Lynch (1999), forexample, has noted that within digitalrepositories, metadata should accompany andmake reference to digital objects, providingassociated descriptive, structural, administrative,rights management, and other kinds of

information.

The wide range of functions that preservationmetadata is expected to support means that thedefinition (or recommendation) of standards isnot a simple task. The situation is complicatedfurther by the knowledge that different kinds ofmetadata will be required to support differentdigital preservation strategies and that metadatastandards themselves need to evolve over time.

6.1 Types of preservation metadataThe OAIS information model (CCSDS 650.0-B-1, 2002) has been very influential on thedevelopment of preservation metadata. Thissection will briefly outline the general types ofmetadata that it suggests are necessary tosupport the preservation of digital objects andnote, where possible, work being undertaken inrelated areas.

The OAIS standard defines an informationmodel for the objects that are managed by anarchive. This model built around an entity calledan information package, which conceptuallylinks into a single entity the object that is thefocus of preservation together with all of theadditional information types (metadata)necessary to support its continued use. Of thethree information packages defined in the OAISmodel, the Archival Information Package (AIP)may perhaps be understood as the mostimportant for preservation purposes, "defined toprovide a concise way of referring to a set ofinformation that has, in principle, all thequalities needed for permanent, or indefinite,

Page 20: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 20 DCC Digital Curation Manual

Long Term Preservation of a designatedInformation Object" (CCSDS 650.0-B-1, 2002,4-33). The other two information packagesdefined by the model emphasise that there arelikely to be differences between the objects heldwithin an OAIS - the AIP - and those submittedby producers or disseminated to consumers

(Lavoie, 2004). However, in the OAISinformation model, the AIP is the keyinformation package that needs to be preserved.

As with all OAIS information packages, an AIPis a conceptual container of two types of

information, called Content Information andPreservation Description Information. Both ofthese are encapsulated and identified byPackaging Information and discoverablethrough Descriptive Information, package-levelmetadata that can be used to create finding aids(Figure 1).

6.1.1 Technical and structural metadataContent Information has two components: theContent Data Object, i.e. the object needingpreservation (for digital resources this istypically a bit stream), and the associated

Figure 1. Information Package Concepts and Relationships(from OAIS CCSDS 650.0-B-1, 2002, Fig. 2-3)

Page 21: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 21

Representation Information required to makethat object understandable to the users of theOAIS. The OAIS model defines RepresentationInformation as "the information that maps aData Object into more meaningful concepts"(CCSDS 650.0-B-1, 2002, 1-13), but for digitalresources it is essentially the technicalinformation (or metadata) needed to render thebit sequences into something meaningful.Typically, Representation Information mightinclude descriptions of the formats, charactersets, etc. in use, possibly with descriptions ofhardware and software environments (StructureInformation). It might also include anyadditional information that is required toestablish the particular meaning of data content,e.g. that raw numbers should be understood asdates or as temperatures in degrees Celsius(Semantic Information). The OAIS informationmodel understands that RepresentationInformation can be recursive, i.e. that it mayitself may need some Reference Information,resulting in what the model defines as aRepresentation Network. While RepresentationInformation is conceptually part of the ContentInformation, in practice it could just link tocentralised information held elsewhere withinthe OAIS or in third party registries. A start hasbeen made with developing registries ofinformation about file formats, but similarapproaches could be used for other types ofRepresentation Information. The DigitalCuration Centre is itself experimenting with thedevelopment of a prototype registry ofRepresentation Information

(http://dev.dcc.ac.uk/dccrrt/).

6.1.2 Descriptive, administrative andcontextual metadataIn addition to the Content Data Object and itsRepresentation Information, the OAIS modelsuggests that an AIP would also typicallyinclude some Preservation DescriptionInformation (PDI). This is the type ofinformation that will allow the continuedunderstanding of the Content Information overtime. The OAIS model document says that PDIis "specifically focused on describing the pastand present states of the Content Information,ensuring that it is uniquely identifiable, andensuring that it has not been unknowinglyaltered (CCSDS 650.0-B-1, 2002, 4-27). It thendefines four classes of PDI, based on categoriesdefined in the seminal 1996 report of a TaskForce on Archiving of Digital Informationcommissioned by the Commission onPreservation and Access and the ResearchLibraries Group (Garrett & Waters, 1996). Thisreport noted that these four categories, togetherwith the definition of content at different levelsof abstraction, were the key features fordetermining information integrity in the digitalenvironment and argued that they deservedspecial attention. The following paragraphs willintroduce the four categories in more detail.

Fixity - The users of digital resources need tohave confidence that they are what they claim tobe and that their integrity has not beencompromised. Digital information is relatively

Page 22: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 22 DCC Digital Curation Manual

easy to manipulate, enabling producers tochange or withdraw information releasedpreviously (Lynch, 1996). This problem isparticularly acute for continuously updateddatabases, such as those that now play anincreasingly important role in scientific researchand in commerce. While metadata by itselfcannot solve the integrity problem, the OAISmodel suggests the inclusion of FixityInformation that can support data integritychecks at the level of Content Data Objects.These might include the use of cryptographictechniques like checksums that can help protectthe bit-level integrity by highlighting anychanges made to individual data objects.

Reference - Another aspect of the integrity ofdigital resources identified by the Task Force onArchiving of Digital Information was the needfor objects to be identified and located overtime. Their report said that for an object "tomaintain its integrity, its wholeness andsingularity, one must be able to locate itdefinitively and reliably over time among otherobjects" (Garrett & Waters, 1996, p. 15). Thisbrings us to the traditional realm of descriptivemetadata, e.g. that used in bibliographies,catalogues, and finding aids, but also highlightsa key role for persistent identifiers. Identifiersfeature highly in the OAIS model's definition ofReference Information, although the practicalexamples make it clear that other types ofdescriptive metadata could also be included.There is a separate category in the OAISinformation model for descriptive metadataabout information packages (Descriptive

Information) that can be used to facilitatediscovery and access, although it acknowledgesthat at least some Reference Information willoften be replicated in these PackageDescriptions (CCSDS 650.0-B-1, 2002, 4-28)

Context - Many resources cannot properly beinterpreted without some understanding of theircontext. Digital objects do not often exist inisolation, but interact with other objects andtheir wider environment. The context might - inpart - be technical, e.g. recording dependencieson particular hardware or softwareconfigurations. It might also reflect less tangiblerealities, e.g., a scientific dataset might be partof a set produced from one experiment,investigation or exploration. In the OAIS model,Context Information is defined as documentingthe relationships of the Content Information toits environment (CCSDS 650.0-B-1, 2002, 4-28).

Provenance - Provenance refers to alongstanding principle of the archivesprofession and embodies the concept that a keypart of the integrity of an object is being able totrace its origin and chain of custody. Forexample, Cook (1993) has written that whenarchivists adhere to the principles of provenanceand original order, "the evidential character ofarchives is protected, whereby the recordsinherently reflect the functions, programmes andactivities of the person or institution that createdthem, and the transactional processes by whichthat actual creation took place." Knowing theprovenance or lineage of data is also becomingincreasingly important in scientific contexts,

Page 23: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 23

where there is a need to be able to trace theorigin and subsequent processing history ofdatasets to facilitate their reuse (Bose & Frew,2005). This is especially critical in researchdisciplines where data can be reprocessed manytimes by different software applications andservices. In bioinformatics, for example, Zhao,et al. (2004) have noted the importance ofprovenance data, understood as the records ofwhere, how and why results were generated, "inorder to help e-Scientists to verify results, drawconclusions and test hypotheses." The UK e-Science project myGrid(http://www.mygrid.org.uk) has investigated thedevelopment of workflow tools that enable theautomatic capture of provenance data, includingboth information about the organisationalcontext of experiments and their life cycle(Wroe, et al, 2004; Stevens, et al., 2004). TheOAIS model views Provenance Information as aspecial type of context information thatdocuments the history of the ContentInformation. This might include informationabout its creation and provide a record ofcustody and preservation actions undertaken.

It is perhaps worth noting that the traditionaldescriptive practices adopted by archivists havebeen particularly good at providing contextualinformation. Archival description serves tolocate archival records in their relationships toother records (documentary context), to theactivities that created them (procedural orbusiness context) and to the entities that created,used, and maintained them over time(provenancial context).

6.2 Preservation metadata initiatives National and research libraries began to developpreservation metadata standards in the late1990s with the publication of a number of draftelement sets. The National Library of Australiaproduced the first of these (Phillips, et al.,1999), quickly followed by the Cedars andNEDLIB projects (Russell, et al., 2000;Lupovici & Masanès, 2000). An internationalworking group sponsored by OCLC OnlineComputer Library Center and the ResearchLibraries Group (RLG) then built upon these(and other) proposals to produce a unifiedMetadata Framework to Support thePreservation of Digital Objects (Working Groupon Preservation Metadata, 2002). While theearlier initiatives had all been informed by the(then) evolving Reference Model for an OpenArchival Information System (OAIS) (CCSDS650.0-B-1, 2002; ISO 14721:2003), theOCLC/RLG Metadata Framework wasexplicitly structured around its informationmodel.

Following publication of the MetadataFramework, OCLC and RLG commissionedanother international group to investigate theissues of implementing preservation metadata inmore detail. The resulting Working Group onPreservation Metadata: ImplementationStrategies (PREMIS)(http://www.oclc.org/research/projects/pmwg/),co-chaired by Priscilla Caplan and RebeccaGuenther, had the twin objectives of producinga 'core' set of preservation metadata elementsand evaluating alternative strategies for

Page 24: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 24 DCC Digital Curation Manual

encoding, storing, managing and exchangingsuch metadata. The group first undertook asurvey of the practices of existing and plannedpreservation repositories. The responses to thesurvey (PREMIS Working Group, 2004)revealed that most repositories were capturingor planning to capture many different types ofmetadata. Of individual schemes, the MetadataEncoding & Transmission Standard (METS)was the most popular, with over half ofrespondents using or planning to use it in someway. The next most popular schemes were theANSI/NISO Z39.97 standard (Data dictionary --Technical metadata for digital still images) andOCLC's Digital Archive Metadata Elements.Many repositories were developing custom-builtlocal schemes based on other standards. Theworking group, however, acknowledged that therelatively small number of respondents (48)meant that it was hard to know exactly howrepresentative the results were.

The working group issued its proposal for corepreservation metadata elements in May 2005with the publication of the PREMIS DataDictionary for Preservation Metadata (PREMISWorking Group, 2005; Lavoie & Gartner,2005). While this is intended to be a translationof the earlier Metadata Framework into a set ofimplementable semantic units, the DataDictionary developed its own data model and isnot afraid to diverge from the OAIS model in itsuse of terminology. The Data Dictionary definespreservation metadata as "the information arepository uses to support the digitalpreservation process," specifically that

"metadata supporting the functions ofmaintaining viability, renderability,understandability, authenticity, and identity in apreservation context" (p. ix). The DataDictionary itself defines elements (calledsemantic units) for describing four of theentities identified by the PREMIS data model:objects (at different levels of aggregation),events, agents, and rights, the latter two in noreal detail. The working group also limited thescope of the Data Dictionary by excludingcategories of metadata deemed not directlyrelevant to preservation (e.g. descriptivemetadata) or outside the expertise of the group(e.g. technical metadata, information aboutmedia and hardware).

6.3 Metadata packaging and METSThe Information Package concept as developedby the OAIS reference model suggests thatdigital objects should be packaged with both thetechnical data (Representation Information)needed to convert those bits into meaningfulinformation and all of the other informationneeded to find, understand and interpret theobject (PDI). The model itself does not proposeany particular packaging mechanism.

Various models have been proposed for thepackaging of data and metadata. For example,the need for some kind of packaging mechanismfor different types of metadata and data wasrealised at the second Dublin Core workshop,held at the University of Warwick in 1996. Theoutcome of this was the Warwick Framework, aconceptual architecture for the logical

Page 25: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 25

aggregation of multiple types of metadata (ordata) in packages called containers (Lagoze,Lynch & Daniel, 1996). This, in turn, influencedthe development of the active digital objectmodel that is now a key part of the FEDORArepository architecture(http://www.fedora.info/).

One important recent trend has been thedevelopment of the Metadata Encoding &Transmission Standard (METS)(http://www.loc.gov/standards/mets/), a standardmaintained by the Library of Congress'sNetwork Development and MARC StandardsOffice. METS is an attempt to provide an XMLSchema for encoding metadata that can supportthe management and exchange of digital libraryobjects. Essentially, it is an XML-basedframework in which different types of metadatacan be packaged together. Beedham, et al.(2005, p. 70) say that METS "uses XML toprovide a vocabulary and syntax for identifyingthe components that together comprise a digitalobject, for specifying the location of thesecomponents, and for expressing their structuralrelationships." A METS document consists ofseven sections: a METS Header for briefdescriptive information about the METSdocument itself, Descriptive Metadata,Administrative Metadata, a File Section listingall of the files that make up the object,Structural Map and Structural Links sectionsthat enable individual files and metadata to bemapped to the structure of the object, and aBehavior section that provides information onhow particular components should be rendered.

The administrative metadata section is intendedto store technical information about the file, aswell as information about intellectual propertyrights held in the resource, the source material,and provenance metadata that recordsrelationships between files and migrations. Themodular design of METS means that objects canalso include metadata from 'extension schemas' -i.e. from standards defined elsewhere. Forexample, the descriptive metadata could includeor link to records conforming to standards likethe Encoded Archival Description (EAD), theMetadata Object Description Schema (MODS),or Dublin Core. Technical information aboutstill images could be taken from ANSI/NISOZ39.87 or its XML encoding in MIX(http://www.loc.gov/standards/mix/).

METS evolved from an XML Document TypeDefinition developed for the Making of AmericaII digitisation project (Hurley, et al., 1999) andit is perhaps true to say that the standard hasbeen most widely implemented to date in similarcontexts (Gartner, 2002). It has been used, forexample, in the Oxford Digital Library(http://www.odl.ox.ac.uk/) to provide integratedaccess to digitised image files with searchabletexts. However, there has also been someinterest in the potential for METS as a containerfor preservation metadata. Much of this interesthas focused on the potential of METS for objectexchange and has been linked with the OAISconcept of Information Packages. For example,Harvard University Library (2001)experimented with METS for defining aSubmission Information Package in its Mellon-

Page 26: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 26 DCC Digital Curation Manual

funded E-Journal Archiving Project. METScould also be used within a digital repository topackage all of the data and metadata requiredfor an Archival Information Package. Despitethis interest, however, a recent study of thepotential use of METS by the UK Data Archiveand The National Archives concluded that the"potential for aggregating the metadata requiredfor different purposes, such as resourcediscovery, rendering, processing andpreservation, into one METS document to act asan OAIS information package, has not beenrealised sufficiently in practice" (Beedham, etal., 2005, p. 75).

Other potential packaging formats exist. Forexample, the Los Alamos National LaboratoryDigital Library has experimented with theMPEG-21 Digital Item Declaration (DID)specification from ISO/IEC 21000-2:2003 forthe packaging of complex digital objects (e.g.,Bekaert, Hochstenbach & Van de Sompel,2003). One part of a standard originallydeveloped for the expression andcommunication of intellectual property rightsinformation about multimedia objects, theMPEG-21 DID abstract model and its XMLsyntax (the MPEG-21 Digital Item DeclarationLanguage) has provided the Los Alamos teamwith a standards-based way of representingcompound objects and their associatedmetadata. Some consideration has also beengiven to aligning the MPEG-21 DID with theOAIS model. For example, Bekaert, DeKooning & Van de Walle (2005) defined aOAIS-based model for the systematic

comparison of object packaging formats andapplied this to METS and the MPEG-21 DigitalItem Declaration (DID) specification.

Other candidate packaging frameworks mightinclude the Resource Description Framework(RDF) or packaging models developed for usewith learning objects, e.g. the AdvancedDistributed Learning's Sharable Content ObjectReference Model (SCORM)(http://www.adlnet.org/scorm/) or the IMSContent Packaging XML Binding(http://www.imsglobal.org/). These are areasthat need more investigation from a digitalpreservation perspective.

6.4. Some open questionsThe last decade has seen an increased awarenessof the role of metadata in supporting thepreservation and reuse of digital resources andthe start of some progress on developing thenecessary standards and schemas. The twoworking groups sponsored by OCLC and RLGhave played a major role in this, as haswidespread acceptance of the OAIS model.However, there exist a number of questions thatsuggest that much work remains to be done indeveloping our understanding of the role ofmetadata in preservation and curation contexts.

A first question that needs to be asked iswhether creating and maintaining metadata onthe scale needed to support preservation is eitherachievable or sustainable. The PREMIS DataDictionary assumes that repositories will have tocapture and maintain information about at least

Page 27: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 27

three different entities, the objects needingpreservation (which themselves can becomplex), the actions undertaken on them, andthe people, organisations or software programscontrolling these actions. The human generationof metadata is expensive and time consuming,so it will be important for repositories, wherepossible, to make use of automatic means ofcapturing this information, be it from objectsthemselves, already existing metadata, third-party repositories of RepresentationInformation, or from repository processes.Combining all of this into a coherent whole willbe a far from trivial task, and it will only ever bepossible to check manually a very smallproportion of the objects in a repository.Projects like PAWN (Producer - ArchiveWorkflow Network) have now begun toexperiment with the automatic capture ofmetadata as part of repository ingest processes,collecting administrative, preservation and chainof custody (provenance) metadata, andencapsulating it in METS (JaJa, et al., 2004;Smorul, et al., 2004).

Quality control of metadata will be a potentialsecond problem. The importance of consistentmetadata has already been recognised by thosetrying to develop services that combine datafrom more than one repository using the OAI-PMH (e.g., Hillmann, Dushay & Phipps, 2004).While there are some ways of supporting thecreation of consistent metadata in these contexts(e.g. Guy, Powell & Day, 2004), it is difficult tobe certain that these have always been adheredto in practice. Completeness is also likely to be

a problem, as some types of metadata willtypically not be available for capture. VanOssenbruggen, Nack and Hardman (2004, p. 39)comment that the editing information formultimedia products is often discarded afterproduction. Also, Vogel (1998) has noted thatthere are not always sufficient incentives forresearchers to fully document their data,although this does vary from discipline todiscipline. In preservation contexts, inconsistent,incomplete and misleading metadata are likelyto persist for long periods of time.

A related issue is that of the hidden subjectivityand cultural bias of metadata, especially when itwill be maintained over long periods of time.Van Ossenbruggen, Nack and Hardman (2004,p. 46) note that contexts of use will most likelybe radically different from anything the humancreators of metadata might have imagined. In athought-provoking paper, Bowker (2000, p.645) has argued that the creators of databasesneed to historicise data and its organisation "soas to create flexible databases that are as richontologically as the social and natural worldsthey map." He provides examples from thehistory of science to show that even relativelyfixed things like measurement standards canchange over time. The OAIS model tries tosolve this problem by saying that an OAIS"must understand the Knowledge Base of itsDesignated Community to understand theminimum Representation Information that mustbe maintained," adding that it could also decideto maintain additional RepresentationInformation to enable understanding by a

Page 28: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 28 DCC Digital Curation Manual

broader community (CCSDS 650.0-B-1, 2002,2-4). It remains to be seen whether this is aviable way of solving Bowker's concerns.

A final thing to be considered is the need formetadata itself to be preserved. Metadata isitself digital and will need to be migrated intonew forms when necessary, althoughRothenberg, et al. (2005, p. 26) note thatmetadata tend not to be highly application-dependent, meaning that they "are not asvulnerable to loss as more general onlineinformation." The OAIS principle ofencapsulating Content Object and metadataInformation Packages is another possible way ofensuring metadata longevity. In terms of theOAIS model, Preservation DescriptionInformation is itself understood to be anInformation Object, needing its own associatedRepresentation Information (CCSDS 650.0-B-1,2002, 2-5). In practice, however, the overheadassociated with processing the metadataencapsulated in Information Packages maymean that implementations choose to storemetadata in separately managed databases. Arelated issue is that of the ongoing evolution ofmetadata standards and the need to modifyexisting metadata to conform with them.

Page 29: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 29

7. ConclusionsThis instalment has attempted to introduce theconcept of metadata and indicate its generalrelevance to digital curation and preservationtopics. It has provided some definitions,outlined some of the functions that metadata areintended to support, and introduced in moredetail the role that metadata plays in supportingthe preservation and reuse of digital objects.

It has been outside the scope of this introductoryinstalment to cover all relevant topics. Forexample, it does not include a detaileddiscussion of metadata designed for specifictypes of object (e.g. government information,scientific data, learning objects, multimedia) orfor functions like rights management. Otherinstalments in this manual will provide moredetailed introductions to some of these topics.

Acknowledgments I would like to thank Hans Hofman (NationaalArchief), Terry Eastwood (University of BritishColumbia), Andy Powell (UKOLN, Universityof Bath) and my colleagues in the DCC servicesteam for their comments on earlier drafts of thisinstalment.

Page 30: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 30 DCC Digital Curation Manual

ReferencesANSI/NISO Z39.50-2003. Information retrieval(Z39.50): application service definition andprotocol specification. Betheda, Md.: NISOPress.

ANSI/NISO Z39.87 AIIM 20. (2005). Datadictionary -- Technical metadata for digital stillimages [draft standard]. Retrieved January 30,2006, from the National Information StandardsOrganization Web site:http://www.niso.org/standards/resources/Z39-87-200x-forballot.pdf

Arms, C R., & Arms, W. Y. (2004). "Mixedcontent and mixed metadata: informationdiscovery in a messy world," in: Diane I.Hillmann and Elaine L. Westbrooks, eds.,Metadata in practice. Chicago, Ill.: AmericanLibrary Association, 223-237.

Arzberger, P., Schroeder, P., Beaulieu, A.,Bowker, G., Casey, K., Laaksonen, L.,Moorman, D., Uhlir, P., & Wouters, P. (2004)."An international framework to promote accessto data." Science, 303, 1777-1778.

Ball, C. A., Sherlock, G., & Brazma, A. (2004)."Funding high-throughput data sharing." NatureBiotechnology, 22(9), 1179-1183.

Bearman, D., & Duff, W. (1997). "Groundingarchival description in the functionalrequirements for evidence." Archivaria, 41, 275-303.

Beedham, H., Missen, J., Palmer, M., &Ruusalepp, R. (2005). Assessment of UKDA and

TNA compliance with OAIS and METSstandards. Colchester: UK Data Archive.Retrieved January 30, 2006, fromhttp://www.data-archive.ac.uk/news/publications/oaismets.pdf

Bekaert, J., De Kooning, E., & Van de Walle, R.(2005). "Packaging models for the storage anddistribution of complex digital objects inarchival information systems: a review ofMPEG-21 DID principles." Multimedia Systems,10(4), 286-301.

Bekaert, J., Hochstenbach,P., & Van de Sompel,H. (2003). "Using MPEG-21 DIDL to representcomplex digital objects in the Los AlamosNational Laboratory Digital Library." D-LibMagazine, 9(11), November. Retrieved January30, 2006, fromhttp://www.dlib.org/dlib/november03/bekaert/11bekaert.html

Berners-Lee, T., & Hendler, J. (2001)."Publishing on the Semantic Web." Nature, 410,1023-1024.

Berners-Lee, T., Hendler, J., & Lassila, O.(2001). "The Semantic Web." ScientificAmerican, 284(5), 28-37.

Bird, S., & Simons, G. (2003). "ExtendingDublin Core metadata to support the descriptionand discovery of language resources."Computers and the Humanities, 37(4), 375-388.

Bose, R., & Frew, J. (2005). "Lineage retrievalfor scientific data processing: a survey." ACMComputing Surveys, 37(1), 1-28.

Bowker, G. C. (2000). "Biodiversity

Page 31: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 31

datadiversity." Social Studies of Science, 30(5),643-683.

Cabinet Office, Office of the e-Envoy. (2004).e-Government Metadata Standard, v. 3.0, 29April. Retrieved January 30, 2006, fromhttp://www.govtalk.gov.uk/schemasstandards/metadata.asp

Carpenter, L. (2004). "Taxonomy of digitalcuration users." Retrieved January 30, 2006,from the Digital Curation Centre Web site:http://www.dcc.ac.uk/docs/Taxonomy-dc-users.pdf

Casson, L. (2001). Libraries in the ancientworld, New Haven, Conn.: Yale UniversityPress.

CCSDS 650.0-B-1. (2002). Reference model foran Open Archival Information System (OAIS),Washington, D.C.: Consultative Committee onSpace Data Systems. Retrieved January 30,2006, fromhttp://www.ccsds.org/documents/650x0b1.pdf

Cook, T. (1993). "The concept of the archivalfonds in the post-custodial era: theory, problemsand solutions." Archivaria, 35, 24-27.

Day, M. (2004). "Preservation metadata," In: G.E. Gorman and Daniel G. Dorner, eds.,Metadata applications and management,International Yearbook of Library andInformation Management, 2003-2004, London:Facet Publishing, 253-273.

Deelman, E., Singh, G., Atkinson, M. P.,Chervenak, A., Chue Hong, N. P., Kesselman,C., Patil, S., Pearlman, L., & Su, M. -H. (2004).

"Grid-based metadata services." 16thInternational Conference on Scientific andStatistical Database Management (SSDBM04),Santorini Island, Greece, 21-23 June 2004.Retrieved January 30, 2006, fromhttp://www.isi.edu/~annc/papers/deelman_final1.pdf

De Roure, D., & Hendler, J. A. (2004). "E-science: the Grid and the Semantic Web." IEEEIntelligent Systems, 19(1), 65-71.

DoD 5015.2-STD. (2002). "Design criteriastandard for electronic records softwareapplications." Washington, D.C.: Department ofDefense, 19 June. Retrieved January 30, 2006,fromhttp://www.dtic.mil/whs/directives/corres/html/50152std.htm

Drinkwater, G., & Sufi, S. (2004). "CCLRCData Portal," UK e-Science All Hands Meeting2004 (AHM2004), Nottingham, UK, 31 August- 3 September 2004. Retrieved January 30, 2006fromhttp://www.allhands.org.uk/2004/proceedings/papers/161.pdf

Duff, W. M. (2001). "Evaluating metadata on ametalevel." Archival Science, 1, 285-294.

Duff, W. (2004). "Metadata in digitalpreservation: foundations, functions and issues."In: Frank M. Bischoff, Hans Hofman, andSeamus Ross, eds., Metadata in preservation:selected papers from an ERPANET Seminar atthe Archives School Marburg, 3-5 September2003. Veröffentlichungen der Archivschule

Page 32: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 32 DCC Digital Curation Manual

Marburg, Institut für Archivwissenschaft, 40,27-38.

Evans, J., & Lindberg, L. (2004). "Describingand analyzing the recordkeeping capabilities ofmetadata sets." International Conference onDublin Core and Metadata Applications 2004(DC2004), Shanghai, China, October 11-14,2004. Retrieved January 30, 2006 fromhttp://purl.org/metadataresearch/dcconf2004/papers/Paper_27.pdf

Evans, J., McKemmish, S., & Bhoday, K.(2004). "Create once, use many times: the cleveruse of recordkeeping metadata for multiplearchival purposes." 15th International Congresson Archives, Vienna, Austria, 23-29 August2004. Retrieved January 30, 2006 fromhttp://www.wien2004.ica.org/

Garrett, J., & Waters, D., eds., (1996).Preserving digital information: report of theTask Force on Archiving of Digital Informationcommissioned by the Commission onPreservation and Access and the ResearchLibraries Group, Washington, D.C.:Commission on Preservation and Access, 1996.Retrieved January 30, 2006, fromhttp://www.rlg.org/en/page.php?Page_ID=114

Gartner, R. (2002). Metadata Encoding andTransmission Standard (METS). JISCTechwatch Report TSW 02-05. RetrievedJanuary 30, 2006, fromhttp://www.jisc.ac.uk/index.cfm?name=techwatch_report_0205

Gilliland-Swetland, A. J. (1998). "Setting the

stage." In: Murca Baca, Introduction tometadata: pathways to digital information. LosAngeles, Calif.: Getty Information Institute.Retrieved January 30, 2006, fromhttp://www.getty.edu/research/conducting_research/standards/intrometadata/

Gilliland-Swetland, A. (2004). "Metadata -where are we going?" In: G. E. Gorman andDaniel G. Dorner, eds., Metadata applicationsand management, International Yearbook ofLibrary and Information Management, 2003-2004, London: Facet Publishing, 16-33.

Gilliland-Swetland, A. J., & Eppard, P. B."Preserving the authenticity of contingent digitalobjects: the InterPARES project." D-LibMagazine, 6(7/8), July/August. RetrievedJanuary 30, 2006, fromhttp://www.dlib.org/dlib/july00/eppard/07eppard.html

Godby, J., Smith, D., & Childress, E. (2003)."Two paths to interoperable metadata."International Conference on Dublin Core andMetadata Applications 2003 (DC2003), Seattle,Wa., United States of America, September 28-October 2, 2003. Retrieved January 30, 2006from http://purl.oclc.org/dc2003/03godby.pdf

Guy, M., Powell, A., & Day, M. (2004)."Improving the quality of metadata in eprintarchives." Ariadne, 38. Retrieved January 30,2006, from http://www.ariadne.ac.uk/issue38/guy/

Harvard University Library. (2001). SubmissionInformation Package (SIP) specification, v. 1.0

Page 33: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 33

draft. Retrieved January 30, 2006, from theDigital Library Federation Web site:http://www.diglib.org/preserve/harvardsip10.pdf

Harvey, F., Kuhn, W., Pundt, H., Bishr, Y., &Riedemann, C. (1999). "Semanticinteroperability: a central issue for sharinggeographic information." Annals of RegionalScience, 33, 213-232.

Haynes, D. (2004). Metadata for informationmanagement and retrieval. London: Facet.

Helly, J., Staudigel, H., & Koppers, A. (2003)."Scalable models of data sharing in earthsciences." Geochemistry, Geophysics,Geosystems, 4(1), 1010.

Hendler, J. (2003). "Science and the SemanticWeb." Science, 299, 520-521.

Heflin, J., & Hendler, J. (2000). "Semanticinteroperability on the Web." Extreme MarkupLanguages 2000, Montreal, Canada, August 15-18, 2000. Retrieved January 30, 2006, fromhttp://www.cs.umd.edu/projects/plus/SHOE/pubs/extreme2000.pdf

Hey, T., & Trefethen, A., (2003). "The datadeluge: an e-science perspective." In: FranBerman, Geoffrey Fox and Anthony J. G. Hey,eds., Grid computing: making the globalinfrastructure a reality. Chichester: Wiley, 809-824. Retrieved January 30, 2006, fromhttp://www.rcuk.ac.uk/escience/documents/report_datadeluge.pdf

Hey, T., & Trefethen, A. E. (2005)."Cyberinfrastructure for e-science." Science,308, 817-821.

Hillmann, D., Dushay, N., & Phipps, J."Improving metadata quality: augmentation andrecombination." International Conference onDublin Core and Metadata Applications 2004(DC2004), Shanghai, China, October 11-14,2004. Retrieved January 30, 2006 fromhttp://purl.org/metadataresearch/dcconf2004/papers/Paper_21.pdf

Hurley, B. J., Price-Wilkin, J., Proffitt, M., &Besser, H. (1999). The Making of America IITestbed Project: a digital library service model,Washington, D.C.: Council on Library andInformation Resources. Retrieved January 30,2006, fromhttp://www.clir.org/pubs/abstract/pub87abst.html

IEEE Std 1484.12.1-2002. IEEE Standard forLearning Object Metadata. New York: Instituteof Electrical and Electronics Engineers.

ISO 14721:2003. Space data and informationtransfer systems -- Open archival informationsystem -- Reference model, Geneva:International Organization for Standardization.

ISO 15489-1:2001. Information anddocumentation -- Records management -- Part1: General, Geneva: International Organizationfor Standardization.

ISO/IEC 21000-2:2003. Information technology-- Multimedia framework (MPEG-21) -- Part 2:Digital Item Declaration, Geneva: InternationalOrganization for Standardization.

ISO/DIS 23081. Information and documentation-- Records management processes -- Metadata

Page 34: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 34 DCC Digital Curation Manual

for records -- Part 1: Principles, Geneva:International Organization for Standardization.

JaJa, J., McCall, F., Smorul, M., Moore, R., &Chadduck, R. (2004). "Digital archiving andlong term preservation: an early experience withGrid and digital library technologies." RetrievedJanuary 30, 2006, from the National Archivesand Records Administration Web site:http://www.archives.gov/era/papers/thic-04.html

Jewell, T. D., Anderson, I., Chandler, A., Farb,S. E., Parker, K., Riggio, A., & Robertson, N. D.M. (2004). Electronic resource management:report of the DLF Initiative. Washington, D.C.:Digital Library Federation, August. RetrievedJanuary 30, 2006, fromhttp://www.diglib.org/pubs/dlfermi0408/

Johnston, P. (2001). "Interoperability:supporting effective access to informationresources," Library and Information Briefings,108, London: South Bank University.

Lagoze, C., Lynch, C. A., & Daniel, R. (1996)."The Warwick Framework: a containerarchitecture for aggregating sets of metadata."Cornell University Technical Report TR96-1593, Ithaca, N.Y.: Cornell University Library,28 June. Retrieved January 30, 2006, fromhttp://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR96-1593

Lagoze, C., Van de Sompel, H., Nelson, M., &Warner, S. (2002). The Open Archives InitiativeProtocol for Metadata Harvesting, v. 2.0, 14June, Retrieved January 30, 2006, fromhttp://www.openarchives.org/OAI/openarchives

protocol.html

Lavoie, B. F. (2004). The Open ArchivalInformation System Reference Model:introductory guide, DPC Technology WatchReport 04-01, Digital Preservation Coalition.Retrieved January 30, 2006, fromhttp://www.dpconline.org/docs/lavoie_OAIS.pdf

Lavoie, B. F., & Gartner, R. (2005).Preservation metadata, DPC Technology WatchReport 05-01, Digital Preservation Coalition.Retrieved January 30, 2006, fromhttp://www.dpconline.org/docs/reports/dpctw05-01.pdf

Ludäsher, B., Marciano, R., & Moore, R.(2001). "Preservation of digital data with self-validating, self-instantiating knowledge-basedarchives," SIGMOD Record, 30(3), 54-63.Retrieved January 30, 2006, fromhttp://www.acm.org/sigmod/record/issues/0109/SPECIAL/ludaescher8.pdf

Lupovici, C., & Masanès, J. (2000). Metadatafor the long term preservation of electronicpublications, The Hague: KoninklijkeBibliotheek. Retrieved January 30, 2006, fromhttp://www.kb.nl/coop/nedlib/results/NEDLIBmetadata.pdf

Lyman, P., & Varian, H. R. (2003). "How muchinformation? 2003." Berkeley, Calif.: Universityof California at Berkeley, School of InformationManagement and Systems.http://www.sims.berkeley.edu/research/projects/how-much-info-2003/

Page 35: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 35

Lynch, C. (1996) "Integrity issues in electronicpublishing," in Robin P. Peek and Gregory B.Newby, eds., Scholarly publishing: theelectronic frontier. Cambridge, Mass.: MITPress, 133-145.

Lynch, C. (1999). "Canonicalization: afundamental tool to facilitate preservation andmanagement of digital information." D-LibMagazine, 5(9), September. Retrieved January30, 2006, fromhttp://www.dlib.org/dlib/september99/09lynch.html

Mark, L., & Roussopoulos, N. (1986)."Metadata management," Computer, 19(12), 26-36.

McKemmish, S., Acland, G., Ward, N., & Reed,B. (1999). "Describing records in context in thecontinuum: the Australian RecordkeepingMetadata Schema." Archivaria, 48, 3-43.Retrieved January 30, 2006, fromhttp://www.sims.monash.edu.au/research/rcrg/publications/archiv01.htm

Michener, W. K., Brunt, J. W., Helly, J. J.,Kirchner, T. B., & Stafford, S. G. (1997)."Nongeospatial metadata for the ecologicalsciences." Ecological Applications, 7(1), 330-342.

National Archives. (2002). Functionalrequirements for electronic recordsmanagement systems, 2002 revision. Kew: TheNational Archives. Retrieved January 30, 2006,fromhttp://www.nationalarchives.gov.uk/electronicre

cords/reqs2002/

National Information Standards Organization.(2004). Understanding metadata. Bethesda,Md.: NISO Press. Retrieved January 30, 2006,fromhttp://www.niso.org/standards/resources/UnderstandingMetadata.pdf

National Science Foundation Blue-RibbonAdvisory Panel on Cyberinfrastructure,. (2003).Revolutionizing science and engineeringthrough cyberinfrastructure. Arlington, Va.:National Science Foundation, Directorate forComputer & Information Science &Engineering (CISE). Retrieved January 30,2006, fromhttp://www.nsf.gov/cise/sci/reports/toc.jsp

Phillips, M., Woodyard, D., Bradley, K., &Webb, C. (1999). Preservation metadata fordigital collections: exposure draft, 1999.Retrieved January 30, 2006, from the NationalLibrary of Australia Web site:http://www.nla.gov.au/preserve/pmeta.html

PREMIS Working Group. (2004). Implementingpreservation repositories for digital materials:current practice and emerging trends in thecultural heritage community. Dublin, Ohio:OCLC Online Computer Library Center.Retrieved January 30, 2006, fromhttp://www.oclc.org/research/projects/pmwg/

PREMIS Working Group. (2005). Datadictionary for preservation metadata. Dublin,Ohio: OCLC Online Computer Library Center.Retrieved January 30, 2006, from

Page 36: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 36 DCC Digital Curation Manual

http://www.oclc.org/research/projects/pmwg/

Rajasekar, A. K., & Moore, R. W. (2001). "Dataand metadata collections for scientificapplications." 9th International Conference onHigh-Performance Computing and Networking(HPCN 2001), Amsterdam, Netherlands, 25-27June 2001, Lecture Notes in Computer Science,2110, Berlin: Springer, 72-80. RetrievedJanuary 30, 2006, fromhttp://www.sdsc.edu/dice/Pubs/Data-management_moore.pdf

Rothenberg, J., Graafland-Essers, I.,Kranenkamp, H., Lierens, A., Oranje, C. van, &Schaik, R. van. (2005). Designing a nationalstandard for discovery metadata, RANDCorporation Technical Report TR-185-BZK.Retrieved January 30, 2006, fromhttp://www.rand.org/publications/TR/TR185/

Russell, K., Sergeant, D., Stone, A.,Weinberger, E., & Day, M. (2000). Metadatafor digital preservation: the Cedars projectoutline specification. Retrieved January 30,2006, from the University of Leeds Web site:http://www.leeds.ac.uk/cedars/metadata.html

Smorul, M., JaJa, J., Wang, Y., & McCall, F.(2004). "PAWN: Producer - Archive WorkflowNetwork in support of digital preservation"University of Maryland, Institute for AdvancedComputer Studies Technical Report UMIACS-TR-2004. Retrieved January 30, 2006,fromhttp://www.umiacs.umd.edu/~joseph/pawn-july2-2004.pdf

Staab, S. (2003). "The Semantic Web: new ways

to present and integrate information."Comparative and Functional Genomics, 4(1),98-103.

Stevens, R., McEntire, R., Goble, C.,Greenwood, M., Zhao, J., Wipat, A., & Li, P.(2004). "myGrid and the drug discoveryprocess." Drug Discovery Today: BIOSILICO,2(4), 140-148.

Sufi, S., & Matthews, B. (2004). CCLRCScientific Metadata Model, version 2, CCLRCTechnical Report, DL-TR-2004-001. RetrievedJanuary 30, 2006, fromhttp://epubs.cclrc.ac.uk/work-details?w=30324

Szalay, A., & Gray, J. (2001). "The world-widetelescope." Science, 293, 2037-2040.

Van de Sompel, H., Bekaert, J., Liu, X.,Balakireva, L., & Schwander, T. (2005)."aDORe: a modular, standards-based digitalobject repository." Computer Journal, 48(5),514-535.

Van Ossenbruggen, J, Nack, F., & Hardman, L.(2004). "That obscure object of desire:multimedia metadata on the Web, part 1," IEEEMultimedia, 11(4), 38-48.

Vogel, R. L. (1998). "Why scientists haven'tbeen writing metadata." Retrieved January 30,2006, from the Joint Committee on AntarcticData Management Web site:http://www.jcadm.scar.org/Articles/why_scientists_dont_write_metadata.htm

Waelde, C. & McGinley, M. (2005). "Publicdomain; public interest; public funding:focussing on the 'three Ps' in scientific

Page 37: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 37

research." SCRIPT-ed, 2(1), March. RetrievedJanuary 30, 2006, fromhttp://www.law.ed.ac.uk/ahrb/script-ed/vol2-1/3ps.asp

Wallace, D. (2001). "Archiving metadata forum:report from the Recordkeeping MetadataWorking Meeting, June 2000." ArchivalScience, 1(3), 253-269.

Woodley, M. (1998). "Crosswalks: the path touniversal access?" In: Baca, M., ed.,Introduction to metadata: pathways to digitalinformation, Los Angeles, California: GettyInformation Institute. Retrieved January 30,2006, from http://www.getty.edu/research/conducting_research/standards/intrometadata/3_crosswalks/

Working Group on Preservation Metadata.(2002). A metadata framework to support thepreservation of digital objects. Dublin, Ohio:OCLC Online Computer Library Center.Retrieved January 30, 2006, fromhttp://www.oclc.org/research/projects/pmwg/pm_framework.pdf

Wouters, P., & Reddy, C. (2003). "Big sciencedata policies." In: P. Wouters and P. Schröder,eds., Promise and practice in data sharing.Amsterdam: Koninklijke NederlandseAkademie van Wetenschappen, NederlandsInstituut voor WetenschappelijkeInformatiediensten (NIWI-KNAW), 13-40.Retrieved January 30, 2006, fromhttp://www.virtualknowledgestudio.nl/en/

Wroe, C., Goble, C., Greenwood, M., Lord, P.,

Miles, S., Papay, J., Payne, T., & Moreau, L.(2004). "Automating experiments usingsemantic data on a bioinformatics grid." IEEEIntelligent Systems, 19(1), 48-55.

Zhao, J., Wroe, C., Goble, C., Stevens, R.,Quan, D., & Greenwood, M. (2004). "UsingSemantic Web technologies for representing e-science provenance." In: Proceedings of theThird International Semantic Web Conference(ISWC2004), Hiroshima, Japan, November2004, Lecture Notes in Computer Science, 3298,Berlin: Springer-Verlag, 92-106.

Further readingThere is an extremely extensive (and growing)literature on metadata, much of it freelyavailable on the Web.

The best short general introduction to metadataremains the chapter by Gilliland-Swetland inBaca (1998), which should perhaps now besupplemented by the more recent textbooktreatment of the topic by Caplan (2003). Of theolder introductions, the paper by Dempsey andHeery (1998) and the chapter by Lagoze andPayette in Moving theory into practice (Kenney& Rieger, 2000) remain useful. An interestingrecent paper by researchers based at OCLCResearch looks at the potential role of modularmetadata services in the constantly changingcontexts of research and learning (Dempsey, etal., 2005).

Three edited volumes provide interestingoverviews. The chapters in the books edited byGorman and Dorner (2004) and Hillmann and

Page 38: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 38 DCC Digital Curation Manual

Westbrooks (2004) provide up-to-date accountsof metadata developments in different culturalheritage domains. The book edited by Jones,Aronheim and Crawford (2002) is based onpapers delivered at an event held in 2000 and,although they are of uneven quality, collectivelythey do provide a good flavour of metadatathinking in the library domain at the end of the1990s.

A good collection of papers on the various rolesof metadata in digital preservation contexts canbe found in the book edited by Bischoff,Hofman and Ross (2004), the outcome of anERPANET training seminar held in late 2003.On the subject of preservation metadata itself,the OAIS model (CCSDS 650.0-B-1, 2002) isfundamental, although this should besupplemented by a reading of the variousreports issued by the two working groups onpreservation metadata commissioned by OCLCand the Research Libraries Group(http://www.oclc.org/research/projects/pmwg/).The chapter by Day in Gorman and Dorner(2004) is a review of the state-of-the-art at aboutthe time the second of these groups (PREMIS)started its deliberations. Lavoie and Gartner(2005) provide an overview of PREMISdevelopments and METS from a preservationperspective.

A good deal of work has been undertaken in thearchives and records management domain onthe identification of metadata that supports thepreservation of the authenticity and integrity ofarchival records. Unfortunately, there is not a lotof freely available information on ISO 23081

(the draft standard can be purchased from ISOand other national standards bodies) except for asummary provided by the Government ofQuebec(http://www.autoroute.gouv.qc.ca/publica/normes/introduction.htm). Papers by McKemmish, etal. (1999), Cunningham (2000), and Hedstrom(2001) all provide interesting overviews ofparticular initiatives in the recordkeepingdomain.

Issues around the management of multimediaresources are introduced in two paperspublished in the IEEE Multimedia journal (vanOssenbruggen, Nack & Hardman, 2004; Nack,van Ossenbruggen & Hardman, 2005).

Nilsson, Palmér and Naeve (2002) provideinsight into the role of metadata and theSemantic Web in e-learning contexts. A moredetailed assessment of the potential role of theSemantic Web in UK higher and furthereducation contexts can be found in Matthews(2005).

Further referencesBaca, M., ed. (1998). Introduction to metadata:pathways to digital information, Los Angeles,California: Getty Information Institute.Retrieved January 30, 2006, fromhttp://www.getty.edu/research/conducting_research/standards/intrometadata/

Bischoff, F. M., Hofman, H., & Ross, S., eds.(2004). Metadata in preservation: selectedpapers from an ERPANET seminar at theArchives School Marburg, 3-5 September 2003,

Page 39: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 39

Veröffentlichungen der Archivschule Marburg,Institut für Archivwissenschaft, Nr. 40.

Caplan, P. (2003). Metadata fundamentals forall librarians, Chicago, Illinois: AmericanLibrary Association.

CCSDS 650.0-B-1. (2002). Reference model foran Open Archival Information System (OAIS),Washington, D.C.: Consultative Committee onSpace Data Systems. Retrieved January 30,2006, fromhttp://public.ccsds.org/publications/archive/650x0b1.pdf

Cunningham, A. (2000). "Dynamic descriptions:recent developments in standards for archivaldescription and metadata," Canadian Journal ofInformation and Library Science, 25(4), 3-17.

Dempsey, L., & Heery, R. (1998). "Metadata: acurrent view of practice and issues," Journal ofDocumentation, 54(2), 145-172.

Dempsey, L., Childress, E. R., Godby, C. J.,Hickey, T. B., Houghton, A., Vizine-Goetz, D.,& Young, J. (2005). "Metadata switch: thinkingabout some metadata management andknowledge organization issues in the changingresearch and learning landscape," in Shapiro, D.(ed.), EScholarship: a LITA guide. Chicago,Illinois: American Library Association. PreprintRetrieved January 30, 2006, fromhttp://www.oclc.org/research/publications/archive/2004/dempsey-mslitaguide.pdf

Gorman, G. E., & Dorner, D. G., eds. (2004).Metadata applications and management,International Yearbook of Library and

Information Management, 2003-2004, London:Facet.

Hedstrom, M. (2001). "Recordkeepingmetadata: presenting the results of a workingmeeting." Archival Science, 1(3), 243-251.

Hillmann, D. I., & Westbrooks, E. L., eds.(2004). Metadata in practice, Chicago, Illinois:American Library Association.

Jones, W., Aronheim, J. R., & Crawford, J., eds.(2002). Cataloging the Web: metadata, AACR,and MARC21, Lanham, Maryland: ScarecrowPress.

Kenney, A. R., & Rieger, O. Y., eds. (2000).Moving theory into practice: digital imaging forlibraries and archives, Mountain View, Calif.:Research Libraries Group.

Lavoie, B. F., & Gartner, R. (2005).Preservation metadata, DPC Technology WatchReport 05-01, Digital Preservation Coalition.Retrieved January 30, 2006, fromhttp://www.dpconline.org/docs/reports/dpctw05-01.pdf

Matthews, B. (2005). Semantic Webtechnologies. JISC Technology and StandardsWatch Report TSW0502. Retrieved January 30,2006, fromhttp://www.jisc.ac.uk/index.cfm?name=techwatch_ic_reports2005_published

Nack, F., van Ossenbruggen, J., & Hardman, L.(2004). "That obscure object of desire:multimedia metadata on the Web, part 2," IEEEMultimedia, 12(1), 54-63.

Page 40: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Page 40 DCC Digital Curation Manual

Nilsson, M., Palmér. M., & Naeve, A. (2002)."Semantic Web meta-data for e-learning: somearchitectural guidelines." 11th InternationalWorld Wide Web Conference, Honolulu,Hawaii, USA, 7-11 May 2002. RetrievedJanuary 30, 2006, fromhttp://kmr.nada.kth.se/papers/SemanticWeb/p744-nilsson.pdf

Van Ossenbruggen, J, Nack, F., & Hardman, L.(2004). "That obscure object of desire:multimedia metadata on the Web, part 1," IEEEMultimedia, 11(4), 38-48.

Page 41: DCC | Digital Curation Manual...Type Text Format Adobe Portable Document Format v.1.2 Resource Identifier ISSN 1747-1524 Language English Rights ' Michael Day, UKOLN, University of

Michael Day, Metadata Page 41

Author informationMichael Day is a Research Officer at UKOLN,based at the University of Bath, UnitedKingdom (http://www.ukoln.ac.uk/). Sincejoining UKOLN in 1996, he has worked on arange of metadata-related research projects,which have mostly concerned the developmentof Internet subject gateways, interoperability,and digital preservation. His most recentcompleted projects include ePrints UK,concerned with the development of services thatgive access to the content of multipleinstitutional repositories, and phase one ofeBank UK, an attempt to develop the openaccess repository paradigm for research datawith an initial focus on crystallography. Atpresent, he is mostly working for the UK DigitalCuration Centre (http://www.dcc.ac.uk/), anational focus for research and expertise relatingto digital preservation issues, and the EuropeanUnion-funded DELOS Network of Excellenceon Digital Libraries (http://www.delos.info/).


Recommended