UCSD Digital Library Program Working Group February 6, 2002
METS: Metadata Encoding & Transmission Standard
UCSD Digital Library Program Working Group February 6, 2002
Digital (Library) Objects
• Reformatted to digital• scanned photographs, books and journals• digitized audio/video files
• “Born digital”• TEI-encoded texts• digital images, audio, video files• GIS, statistical datasets• interactive content
UCSD Digital Library Program Working Group February 6, 2002
Digital (Library) Objects
• Simple– single files, e.g.
• visual TIFF images• MP3 files• TEI-encoded text
– objects stand alone • no relationships to other objects
UCSD Digital Library Program Working Group February 6, 2002
Digital (Library) Objects
• Complex– multiple related files, e.g.
• page images from books or articles• multiple channels in digital audio files• related sound and text files (multimedia)• statistical dataset and codebook
– objects cannot stand alone• one or more related files required to
interpret the object
– requires structural metadata to model
UCSD Digital Library Program Working Group February 6, 2002
Structural metadata
• Maps physical files (digital assets) to logical items (complex digital objects)
• Examples– Scanned print material
• complex publication structures (e.g. journals runs)
• ordered relationship between digital page images
– A/V material• multiple resolutions of an image• multiple channels of an audio file
UCSD Digital Library Program Working Group February 6, 2002
Structural metadata
• Examples, continued– Multimedia presentations
• relationship between images, text, sound, video, etc. (time-based or other)
– Web sites• linkages between web pages• sitemaps
– Databases• table models and ER diagrams
UCSD Digital Library Program Working Group February 6, 2002
Digital (Library) Objects
• Also have other (non-structural) metadata– descriptive
• MARC, DC, FGDC, VRA core, other ontologies
– administrative• rights, provenance
– technical• format details, OAIS “representation
information”
• Standards exist or emerging for these
UCSD Digital Library Program Working Group February 6, 2002
METS Scope
• Supports– Structural metadata
• complex reformatted or born digital objects
– Metadata wrapper framework• descriptive, administrative, structural, etc.• structural required• others use namespaces to reference
“extension schemas”
UCSD Digital Library Program Working Group February 6, 2002
Evolved from MOA2
• Making Of America II project– Developed November 1997-January
2000– Funded by DLF and NEH, participants
• Cornell, NYPL, Penn State, Stanford, Berkeley
– Designed for scanned archival collections
– XML DTD defining explicit descriptive, administrative and structural metadata
UCSD Digital Library Program Working Group February 6, 2002
Evolved from MOA2
• February 2001 DLF workshop on structural metadata– Harvard, LC, MOA2 participants, others
• Outcome: METS definition– emphasis on structural metadata– wider scope of participants, content
types– change to XML schema, framework
architecture
UCSD Digital Library Program Working Group February 6, 2002
METS Header
Administrativemetadata
FileInventory
Structuremap
Descriptivemetadata
Behavioralmetadata
METS metadata “buckets”
optional
optional
optional required
optional optional
UCSD Digital Library Program Working Group February 6, 2002
METS metadata
• XML “extension schemas”– descriptive metadata
• Dublin Core, MARC, FGDC, VRA, etc.• Berkeley’s GDM schema (from MOA2)
– administrative/technical metadata• NISO image technical metadata• LC schemas for A/V technical metadata• Rights metadata (e.g. PRISM, XrML, etc.)• Provenance metadata
UCSD Digital Library Program Working Group February 6, 2002
M etad a ta R e fe ren ce M etad a ta W rap p er
D esc rip tive M etad a ta
Metadata Reference (mdRef): A link to external descriptive metadata. The type of link (URN/Handle/etc.)is included as an attribute, as is the metadata type.
Metadata Wrapper (mdWrap): Included descriptive metadata, as either binary data (Base64 encoded) or arbitrary XML using namespace mechanism. The metadata type is specified as an attribute.
METS Descriptive Metadata Section
UCSD Digital Library Program Working Group February 6, 2002
Tech n ica lM etad a ta
IP R ig h tsM etad a ta
S ou rceM etad a ta
P reserva tionM etad a ta
A d m in is tra tiveM etad a ta
Technical Metadata (techMD): technical metadata regarding content files
IP Rights Metadata (rightsMD): rights metadata regarding content files or primary source material
Source Metadata (sourceMD): provenance information for content files.
Preservation Metadata (preservationMD): metadata to assist in preservation of digital content
All sections use generic metadata reference and wrapper subelements.
METS Administrative Metadata Section
UCSD Digital Library Program Working Group February 6, 2002
e tc ., e tc ., e tc .
F ile G rou p F ile
F ile G rou p F ile
F ile In ven to ry(F ile G rou p )
File Group (fileGrp): provides mechanism for hierarchically subdividing physical files, for example by type
File (file): provides a pointer to an external file (Flocat) or includes file content internally (Fcontent) in Base64 encoding
METS File Inventory
UCSD Digital Library Program Working Group February 6, 2002
etc ., e tc . e tc ....
D ivis ion M E TS P o in te r F ile P o in te r
D ivis ion M E TS P o in te r F ile P o in te r
D ivis ion
S tru c tu ra l M ap
The Structural Map provides a tree structure describing the original document. Each division (div) element is a node in that tree, and can identify content files associated with that division by a METS Pointer (mptr) or a File Pointer (fptr)
METS Structural Map
UCSD Digital Library Program Working Group February 6, 2002
METS Pointer and File Pointer
METS Pointer (mptr): xlink to another METS file containing the content for the associated div. Useful for breaking up large objects (e.g., a journal run) into a series of smaller METS documents.
File Pointer (fptr): Identifies one or more entries in the File Inventory section containing the content for the associated div element. Can also limit the link from a div element to a portion of a content file (e.g., a segment of an audio or video file, a subarea of an image or video file, etc.).
UCSD Digital Library Program Working Group February 6, 2002
A rea A rea . . .
P ara lle l F iles
A rea A rea . . .
S eq u en tia l F iles
F ile P o in te r
File Pointer (fptr): Can identify a single file in File Inventory using ID/IDREF linking
Parallel/Sequential(par/seq): Allows a div to be associated with several content files that should be played/displayed in parallel (video with separate audio track file) or sequentially.
Area (area): identifiers a point, linear segment, or 2D area within content file that corresponds with associated div element.
METS File Pointer Mechanisms
UCSD Digital Library Program Working Group February 6, 2002
METS Area Element Attributes
FILE: ID for File element in File InventorySHAPE: As in HTML Area elementCOORDS: As in HTML Area elementBEGIN: A start point within a file for defining
a segmentEND: An end point within a file for defining
a segmentBETYPE: Begin/End type: IDREF, Byte Offset,
or SMPTE time codeEXTENT: Length Duration of SegmentEXTYPE: Extent Type: Bytes, or SMPTE
UCSD Digital Library Program Working Group February 6, 2002
Structure Example
<file ID=“f1” MIMETYPE=“audio/x-wav” SEQ=“1”><Flocat LOCTYPE=“URN”>
urn:x-nyu:violet42</Flocat>
</file><div N=“5” LABEL=“Question 5”>
<fptr><seq>
<area FILE=“f1” BEGIN=00:23:17:00 END=“00:23:38:00” BETYPE=“SMPTE”>
</area><seq>
</fptr></div>
UCSD Digital Library Program Working Group February 6, 2002
• Created for multimedia structural encoding
• SMIL has “time-based” orientation – for playing multimedia presentations
• Very complex• May eventually be incorporated
Related standards: SMIL (W3C), MPEG-7 (ISO)
UCSD Digital Library Program Working Group February 6, 2002
Related standards: RDF (W3C)• Also metadata wrapper framework• Structural metadata could be
supported, but doesn’t specify how…
• Opaque to use• No element semantics provided• element names deliberately meaningless
• Originally intended for descriptive metadata
UCSD Digital Library Program Working Group February 6, 2002
METS and OAIS framework
• Submission Information Package (SIP)• METS as transfer syntax
• Dissemination Information Package (DIP)
• METS as transfer syntax• METS as input to display applications
• Archival Information Package (AIP)• METS stored internally in an archive
UCSD Digital Library Program Working Group February 6, 2002
Part Three: Library Applications of METS
UCSD Digital Library Program Working Group February 6, 2002
Library Applications
• Digital Object transfer syntax– between systems
• enables interoperability
– between institutions• enables collection sharing
– implements OAIS SIP/DIP/AIP
UCSD Digital Library Program Working Group February 6, 2002
Library Applications
• Input to Digital Object delivery systems (aka “disseminators”)– Simple bit-streaming– XSL stylesheet– Custom program for complex digital
object display
UCSD Digital Library Program Working Group February 6, 2002
Harvard’s Page Delivery Service (PDS)
• Range of publication types supported– 0-4 levels of hierarchy
• simple 3 page letter, 20 page article• diary with entries• book containing chapters containing sections• report run containing reports containing
sections• journal bound in volumes containing issues
containing articles
• Implemented as METS “tree”• example on METS web site
UCSD Digital Library Program Working Group February 6, 2002
Harvard’s PDS
Letter Citation leveland Leaf level METS
TIFF TIFF TIFF TIFF TIFF
UCSD Digital Library Program Working Group February 6, 2002
Harvard’s PDS
Diary
Entry Entry Entry
Citation level METS
Leaf levelMETS
TIFF TIFF TIFF TIFFTIFFTIFF TIFFTIFF
Entry
UCSD Digital Library Program Working Group February 6, 2002
Harvard’s PDS Journal
Volume Volume Volume
Issue Issue Issue Issue Issue Issue
Article Article Article Article Article Article Article Article
TIFF
TIFFTIFF
TIFF
TIFFTIFF
TIFF
TIFFTIFF
TIFF
TIFFTIFF
TIFF
TIFFTIFF
TIFF
TIFFTIFF
TIFF
TIFFTIFF
TIFF
TIFFTIFF
Citation level METS
Intermediatelevel METS
Leaf level METS
UCSD Digital Library Program Working Group February 6, 2002
Harvard’s PDS
• “Page turner” system– implemented as a web application
• java servlet, SAX parser
– minimal descriptive metadata • display only (not for discovery)
– no administrative metadata– file inventory only for “leaf” nodes
UCSD Digital Library Program Working Group February 6, 2002
Harvard’s PDS
• METS maintenance system – implemented as a web applications
• java servlet, DOM parser
– supports structure updates• add a missing volume to a run• add a missing page to a scanned
manuscript• switch two page images
– supports cascading deletes• entire logical object including all
underlying digital assets
UCSD Digital Library Program Working Group February 6, 2002
Harvard’s E-Journal Archive
• Capture e-journals of three scholarly journal publishers – Wiley, Blackwell, University of Chicago
Press
• Accept normative data formats– descriptive, administrative metadata– article text, images, figures, etc.– reference links– other supplementary material
UCSD Digital Library Program Working Group February 6, 2002
Harvard’s E-Journal Archive
• OAIS Submission Information Package– received from publishers for each
journal issue and article, along with digital content files
• OAIS Archival Information Package– stored in Digital Repository Service
• OAIS Dissemination Information Package– delivered to subscribers on demand
UCSD Digital Library Program Working Group February 6, 2002
Harvard’s E-Journal Archive
• Issue-level metadata includes– METS header– descriptive (i.e.bibliographic) metadata– administrative (e.g. rights, provenance,
technical) metadata– structural metadata
• issue-level content– masthead, editorial board, etc.
• issue content– articles, correspondence, reviews, editorials,
errata, etc.
UCSD Digital Library Program Working Group February 6, 2002
OAIS
• Article-level metadata– METS header– descriptive (i.e. bibliographic)
metadata– administrative (e.g. rights,
provenance, technical) metadata– structural metadata
• article content– xml-encoded text plus images, figures, links,
etc.– and/or PDF
UCSD Digital Library Program Working Group February 6, 2002
Example Issue SIP<?xml version=”1.0” encoding=”UTF-8” standalone=”no”?><mets xmlns=”http://www.loc.gov/METS/” xmlns:ejar=”http://hul.harvard.edu/EJAR/METADATA/” xmlns:xsi=”http://www.w3.org/2001/XMLSchema” xsi:schemaLocation=”http://www.loc.gov/METS http://www.loc.gov/standards/mets/mets.xsd” xmlns:xlink=”http://www.w3.org/1999/xlink” TYPE=”EJARISSUE-major.minor” OBJID=”issueid” LABEL=”issue bibliographic citation” PROFILE=”EJAR”>
<metsHdr CREATEDATE=”yyyy-mm-dd”> <agent ROLE=”CREATOR” TYPE=”ORGANIZATION”> <name>content provider</name> </agent> </metsHdr>
<dmdSec ID=”descr:issue”> <mdWrap MIMETYPE=”text/xml” MDTYPE=”OTHER” OTHERMDTYPE=”EJAR”> <ejar:descr type=”issue”>issue descriptive metadata</ejar:descr/> </mdWrap> </dmdSec>
UCSD Digital Library Program Working Group February 6, 2002
Example Issue SIP <admSec ID=”admin:issue”> <rightsMD> <mdWrap MIMETYPE=”text/xml” MDTYPE=”OTHER” OTHERMDTYPE=”EJAR”> <ejar:copyright>issue copyright metadata</ejar:copyright> </mdWrap> </rightsMD> </admSec>
<admSec ID=”admin:issue-content”> <techMD> <mdWrap MIMETYPE=”text/xml” MDTYPE=”OTHER” OTHERMDTYPE=”EJAR”> <ejar:tech type=”TEXT”>issue content technical metadata</ejar:tech> </mdWrap> </techMD> <digiprovMD> <mdWrap MIMETYPE=”text/xml” MDTYPE=”OTHER” OTHERMDTYPE=”EJAR”> <ejar:checksum type=”MD5”>content file checksum</ejar:checksum> </mdWrap> </digiprovMD> </admSec>
UCSD Digital Library Program Working Group February 6, 2002
Example Issue SIP <admSec ID=”admin:1”> <techMD> <mdWrap MIMETYPE=”text/xml” MDTYPE=”NISOIMG”> <niso:...>cover image technical metadata</niso:...> </mdWrap> </techMD> <rightsMD> <mdWrap MIMETYPE=”text/xml” MDTYPE=”OTHER” OTHERMDTYPE=”EJAR”> <ejar:copyright>cover image copyright metadata</ejar:copyright> </mdWrap> </rightsMD> <digiprovMD> <mdWrap MIMETYPE=”text/xml” MDTYPE=”OTHER” OTHERMDTYPE=”EJAR”> <ejar:checksum type=”MD5”>cover image checksum</ejar:checksum> </mdWrap> </digiprovMD> </admSec>
UCSD Digital Library Program Working Group February 6, 2002
Example Issue SIP
<fileSec> <fileGrp ADMID=”admin:issue”> <file ID=”file:issue-content” ADMID=”admin:issue-content” CREATED=”yyyy-mm-dd” MIMETYPE=”text/xml” OWNERID=”id”
SIZE=”n”> <Flocat xlink:type=”simple” xlink:href=”issue.xml”/> </file> <file ID=”file:1” ADMID=”admin:1” CREATED=”yyyy-mm-dd” MIMETYPE=”image/tiff” OWNERID=”id” SIZE=”n”> <Flocat xlink:type=”simple” xlink:href=”cover.tif”/> </file>
... </fileGrp> </fileSec>
UCSD Digital Library Program Working Group February 6, 2002
Example Issue SIP
<structMap TYPE=”LOGICAL”> <div TYPE=”EJARISSUE” ADMID=”admin:issue” DMD=”descr:issue” LABEL=”issue bibliographic citation”> <fptr FILEID=”file:issue-content”/> <fptr FILEID=”file:1”/>
<div TYPE=”EJARSECTION” LABEL=”section label” ORDER=”n”> <div TYPE=”EJARITEM” LABEL=”item bibliographic citation” ORDERLABEL=”n”> <mptr xlink:type=”simple” xlink:href=”itemid1/item-md.xml”/> </div>
... </div>
... </div> </structMap></mets>
UCSD Digital Library Program Working Group February 6, 2002
GenDL (Generic Digital Library
• Focus of METS-based tools– Specify how files and parts of files fit together– Coordinate external and internal descriptive
and administrative metadata with object structure
– Mitigate complexity of METS for users
• Efficiency and coherence through standardization. – Automatic generation of digital objects– Presentation of disparate digital material
through coherent tools
UCSD Digital Library Program Working Group February 6, 2002
METS tools at UC Berkeley
• GenDB: Generic database to capture structural, descriptive and administrative metadata for digital reformatting projects
• GenX: Java program to extract metadata from GenDB database and package it up into METS
• GenView: Java programs for end user navigation of METS objects
• GenRep: Repository for METS objects
UCSD Digital Library Program Working Group February 6, 2002
Database(SQL Server)
Digital ObjectRepository
(Unix file system)
Gathering Metadata: GenDB
Viewing METS Objects: GenView
GenDBClient
(browser/servlet)
GenDBDatabase
Server
CreatingMETS
Objects:GenX
GenXMETS
Generator
GenViewClient
(browser/servlet)
GenViewRepository
Server
UCSD Digital Library Program Working Group February 6, 2002
GenDB
• Tool to capture structural, descriptive and administrative metadata
• First implemented as an MS Access DB
• Now implemented as a SQL server with web front end
• Java client?
UCSD Digital Library Program Working Group February 6, 2002
GenDB Key Features
• Exposes Digital Object’s structure– UI enables easy visualization to build object
structure
• Highly configurable– Project manager specifies what fields should appear
and how they should be tagged
• Layered architecture enhances flexibility– UI doesn’t know underlying DB table structure– Different UIs can be layered over same middle layer
UCSD Digital Library Program Working Group February 6, 2002
GenView
• Tool to view and navigate METS objects
• Web-based user interface (Java)
UCSD Digital Library Program Working Group February 6, 2002
GenView: Key Features
• Exposes Digital Object’s structure– Table of Contents for navigation– Select from multiple manifestations of
currently selected TOC entry (including side by side display)
– Link to descriptive/administrative metadata for • highest-level object• currently selected TOC entry
• Supports non-Roman text (beyond ISO-8859)
UCSD Digital Library Program Working Group February 6, 2002
METS summary
• Descriptive/technical/administrative metadata– not defined internally– points to external standard schemas
• Dublin Core, MARC, MPEG-7, etc.• AES audio metadata
– set of “best practice” schemas being identified
UCSD Digital Library Program Working Group February 6, 2002
METS summary
• Structural metadata– defined internally and required– SMIL-lite
• simple support for multimedia, audio/visual
• SMIL may replace eventually
UCSD Digital Library Program Working Group February 6, 2002
METS summary
• Current users include• UC Berkeley (archival collections)• Harvard (scanned print publications, e-
journals)• Library of Congress (audio/visual
collections)• EU MetaE project (historic newspapers)• Michigan State (oral history collections)• Univ of Virginia (FEDORA digital objects)• National Library of Australia• more daily...
UCSD Digital Library Program Working Group February 6, 2002
METS summary
• Tools under development for– metadata capture– transformation– transfer– dissemination/display
• Profiles necessary for interoperation– Which extension schemas used?– How structure maps are organized…
UCSD Digital Library Program Working Group February 6, 2002
METS summary
• Current status– version 1.0 due out in February– editorial board being set up– LC standards office for maintenance
agency– DLF and RLG underwriting
• RLG will host editorial board, offer documentation and training, develop tools, seek funding
UCSD Digital Library Program Working Group February 6, 2002
METS summary
• METS is not all things to all people…– Designed for local institutional application
support• Solving an immediate local problem• Common to many institutions• Flexible framework supports many institutional
situations
– Profiling necessary to interoperate• For OAIS packages• For shared tools• For other kinds of interoperation (e.g. cross
repository search)