Post on 17-Jan-2018
description
transcript
Digitization with Millennium & CONTENTdm
Stuart Hunt
IUG17AnaheimMay 2009
Overview• Background• Digitisation• Metadata• Workflows• Now
University of Warwick• Royal Charter 1965• Russell Group• 16,000 FTE students• 5000 staff
University Library• Approx 1.1 million volumes• 170 staff (110 FTE)• Millennium 2003• Approx 100,000 issues/renewals per yr• Approx 28,000 new books per yr• RLUK member• OCLC member
Content• Marandet Collection• 4000+ French plays 1720 to 1900• Acquired 1970s• Guide published 1979• Bibliographic records in Millennium,
RLUK, COPAC, & WorldCat• No IPR issues
Projects• Revolutionary Drama (1789-1800)
– 339 plays• Empire Period Drama (1801-1815)
– 123 plays• JISC Digitisation Programme:
Enriching Digital Resources• ‘Exposing Marandet’
– 1500 plays/75,000 pages
Objectives• Cross-searching• Full-text searching• Integration with existing & future
systems– Millennium– Web– Vertical search solution
Options• Existing solutions
– Millennium– In-house web publishing tool
• Separate product– Digital collection management software– CONTENTdm
• Solution would drive approach taken
Digital production• Image files
– TIFF & JPEG derivative– Full colour & greyscale– Outsourced
• Text files/full-text transcripts– OCR quality initially not acceptable– Re-keying– Outsourced
Media Management• Tried & tested solution• Quick & easy• Link digital content• D2D process simplified• Existing bibs• New bibs• Use existing authentication if required
Media Management• No full-text searching• No cross-collection searching (unless in
separate scope)• Tied to MARC metadata• Metadata enrichment difficult• Image file format• Not a total solution
CONTENTdm• Full-text & cross-collection searching• Not tied to MARC metadata• Metadata enrichment simple• Local Windows server• Initial licence <50K images• Upgraded to unlimited licence 2008
Local metadata context• Separate bibs
– Print vs electronic– Describes what is– Supports better (future) FRBRisation– Ease of maintenance– Location & format based scoping
• 793 for local added entry/uniform title– Collection name
Metadata option 1• Create metadata within CONTENTdm• Play-by-play• Metadata already present in Millennium
Metadata option 1• Assumes that metadata is already
available• Not scalable• Poor use of resources• Does not allow data to work harder or
smarter
Metadata option 2• Create metadata outside of Millennium• Metadata not already present in
Millennium• Play-by-play• Harvest from CONTENTdm into
Millennium via XML Harvester
XML Harvester• Single configuration file• Needs to be edited for each separate
resource• Uses XSLT not load table(s)• Major changes (e.g. harvest different
schema) may need to be done by III
Configuration file triggers@XML_TYPE=DC (or MARCXML)@OAI_FORMAT=oai_dc@DBNAME=[Repository name]@URL=[url for OAI-PMH]@USEOAI=true (or false)@OAISET=[Name of set]@RECID_MARCTAG=001
XML Harvester
Harvested metadata• Loaded through Data Exchange• Significant re-editing• Tags & indicators• Diacritics• Creating attached items or holdings
records
Harvested metadata
Metadata option 3• Batchload into CONTENTdm via
delimited file from Create Lists• Cross-walk MARC21 to DC• Directory structure
MARC to Simple DC crosswalkRecord# dc:identifier008/07-10 dc:language100 dc:creator245 dc:title260|ab dc:publisher
260|c dc:date300 dc:format5XX dc:description6XX dc:subject700 dc:contributor700|t dc:relation793 dc:source
MARC – DC Crosswalk
Additional DC elements• dc:rights• dc:type• Transcript mapped to dc:description
Metadata workflow• Create separate bibs for e-versions• Export print records via Data Exchange• MarcEdit to remove extraneous tags
(907, etc)• Insert 006, 007, 008/23, GMD, 533• Re-import into Millennium as new bibs• [856 CONTENTdm reference url added]
Metadata workflow• Review file of newly loaded bibs
exported from Create Lists• Cross-walked from MARC to DC• Additional DC elements added• Item level metadata added• Loaded to CDM as delimited files with
directory structure
Metadata in CONTENTdm• Compound objects• Document level• Page level
– Less rich than document level• Hospitable to multiple schemas• Deliberate attempt to stay close to DC• Administrative metadata
– Later feature
Document level• AACR in DC wrapper• All descriptive metadata from bib
(except LDR, 006, 007, 008, GMD)• Authority control (names, subjects,
uniform titles)• Rights (dc:rights)• Identifier (.b number)• Mapped to DC for OAI harvesting
Page level• Basic descriptive metadata (creator,
title, publisher, date)• Rights (dc:rights)• Identifier (.b number)• Transcript (dc:description)• No OAI harvesting at page level
– Local decision
Access & availability• Availability across local → global
continuum• Metadata contribution• Collection level descriptions• OAI• Collapse D2D
Metadata in WorldCat• Local CDM server – not able to use
Connexion Digital Import• Bug between WorldCat and CDM for
compound objects• FRBRized display in worldcat.org
potentially impedes discovery
Now• ‘Exposing Marandet’ completes 9/2009• Established service 4 collections
– Ancien Régime Drama– Revolutionary Drama– Empire Period Drama– Restoration Drama
• Integration with course delivery• Metadata enrichment to/from CÉSAR
Links• http://go.warwick.ac.uk/fac/arts/french/
marandet/• http://www.jisc.ac.uk/whatwedo/program
mes/digitisation/enrichingdigi/marandet.aspx
• http://webcat.warwick.ac.uk• http://contentdm.warwick.ac.uk
stuart.hunt@warwick.ac.uk