Post on 06-Jan-2016
description
transcript
1
Metadata Toolsfor JISC Digitisation Projectsof still images and text
Ed FayBOPCRIS, Hartley Library
University of Southampton
2
Overview: BOPCRIS today
Move to work natively with standards• Interoperability• Preservation
Design project procedures from ground up with metadata in mind
• File-naming and directory structuring• Metadata capture processes
Production workflow that automates where possible Minimize possibility for human error / subjectivity “Final package” of digital object that records preservation
information on the “digital shelf” and aims for maximum interoperability between systems, all in one place
3
Overview: technical details
File-naming / directory structure• Incorporating project-specific “unique ids”
Final package (digital object)• Internally consistent “tarball” [*.TAR]• Relative path-naming conventions• METS wrapper• Extension formats for metadata: descriptive (MODS);
technical (MIX); process (PREMIS) Production workflow
• Automated production of final package Metadata recording
• Dynamic input by scanner operators
4
History
Eighteenth Century Parliamentary Papers• Project under Phase 1 of JISC Digitization Programme• Proprietary system and data formats (Agora)• Manual input of metadata
o Descriptive and Structural
• Advantages and Disadvantages
5
History: Advantages
Proprietary system with advanced functionality:• OCR workflow• Web presentation
Highly customizable• Metadata fields specified and modified at will
6
History: Disadvantages
Non-standard metadata fields • No mapping to standard formats difficulties: interoperability; metadata harvesting
Translation• Between systems, or between “use” and “archive” formats introduces possibility of versioning issues
No scope for preservation metadata• Separation between workflow / presentation system and
preservation strategy
Resulted in disparate collection of scripts and tools to manage data
7
Present: Metadata Standards
Bibliographic database export File-system level
• Directory structure• File-naming conventions
Scanning level• TIFF headers• Additional descriptive metadata
METS profile• Tailored to project needs• Extension formats (MODS, MIX, PREMIS)
Checksums (MD5)
8
Present: Metadata Origins
Scanned Images• TIFF headers
METS
OCR (Agora / ABBYY)
MIX
(Z39.87)
File-naming
Directory structure
(TAR)
Other metadata• Process• Additional descriptive
PREMIS
Bibliographic Metadata
MARC21 / MODS / etc.
File formats• TIFF master / Derived JPEG
• Flat text (TXT) & Word-co-ordinated OCR
Custom dmdSec
PRECURSORS
GENERATED
9
Future
One tool for entire process, from scanned images to METS
Tool would:• Extract technical metadata• Include descriptive metadata• Build flat-structure METS
Tool would require:• File-naming, directory-structuring conventions• Image file sources
10
Future: Advantages
Abstraction = standardization All digitization projects will produce metadata in
similar formats interoperability Certain technical base-standards will be present
preservation Any centrally developed preservation or
presentation systems would be able to ingest output from any project
Saves wasted effort developing similar solutions many times, when one solution can be developed once and adapted
11
Future: Questions…
Usefulness of such a tool? Relevance to your project? Problems / obstacles? How much flexibility is necessary? Manual input / editing?
Main points: Abstraction, functionality, flexibility
12
Further information
Ed Fay, Software Developer• BOPCRIS, Hartley Library• University of Southampton• ef1@soton.ac.uk• 023 8059 3575