+ All Categories
Home > Documents > Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011....

Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011....

Date post: 13-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
82
Digital Preservation UN FAO May 23-24, 2011 Tom Cramer Perry Willett Stephen Abrams Sheila Morrissey
Transcript
Page 1: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Digital Preservation

UN FAO

May 23-24, 2011

Tom Cramer

Perry Willett

Stephen Abrams

Sheila Morrissey

Page 2: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Agenda, Day 110:30 – 10:45 Welcome, introductions, and review of objectives and agenda

10:45 – 12:30 Preservation goals and concepts: long-term usability, risk, trust

12:30 – 13:30 Lunch

13:30 – 14:15 Policy frameworks and business/technical sustainability

14:15 – 15:00 Strategies: redundancy/replication, migration, emulation

15:00 – 15:30 Afternoon break

15:30 – 16:30 Standards and best practices: reformatting, OAIS, PREMIS, Drambora, TRAC

16:30 – 17:00 Questions and discussion

Page 3: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Agenda, Day 2

08:30 – 08.35 Review of objectives and agenda

08:35 – 09:30 Infrastructure and tools

09:30 – 10:30 Case study: preservation activities at CDL

10:30 – 11:00 Morning break

11:00 – 12:00 Case study: preservation activities at Portico

12:00 – 12:30 Preservation initiatives and organizations: DataNet, DCC, DPC, IIPC, NDSA, OPF

12:30 – 13:30 Lunch

13:30 – 14:00 Case study: preservation activities at Stanford

14:00 – 15:00 Other preservation resources

15:00 – 15:30 Afternoon break

15:30 – 16:00 Format characterization

16:00 – 16:30 Characterization in preservation workflows

16:30 – 17:00 Questions and discussion

Page 4: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Agenda, Day 110:30 – 10:45 Welcome, introductions, agenda

10:45 – 12:30 Preservation goals and concepts: long-term usability, risk, trust

12:30 – 13:30 Lunch

13:30 – 14:15 Policy frameworks and business/technical sustainability

14:15 – 15:00 Strategies: redundancy/replication, migration, emulation

15:00 – 15:30 Afternoon break

15:30 – 16:30 Standards & best practices: reformatting, OAIS, PREMIS, Drambora, TRAC

16:30 – 17:00 Questions and discussion

Page 5: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Introduction

• Libraries, archives and digital preservation

Page 6: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Policies for:

• Designated community• Content acquisition• Ownership• Access• Security• Privacy• Takedown challenges• Legal agreements• Copyright, intellectual property

Page 7: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Planning, I

• Perfection is impossible.

• Organizations need to constantly improve their preservation policies and activities.

• The goal of digital preservation is to assure accessibility to important content

Page 8: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Planning II

• Designated community: who is the audience/clientele?

• Expectations, requirements

• Roles

• What content will you preserve? Legacy data? Projected growth? Metadata? Access? Archival retention period?

Page 9: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Planning III

• Do you have adequate resources to provide service?

• Staff, administration, equipment, training

http://www.flickr.com/photos/nationallibrarynz_commons/3326203787/

Page 10: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Planning IV

• Roles/staff: system administrators, developers, archivists, business analysts, metadata specialists, product managers, administrators

http://digital.nls.uk/74548646

Page 11: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Planning V

• Do you have authority/permission to archive content? Do the organizations have the rights to submit it?

• Legal environment: legislative mandates, copyright restrictions, agreements with rightsholders

http://arcweb.archives.gov/arc/action/ExternalIdSearch?id=1696015

Page 12: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Planning VI

• Disaster planning

• Business planning

• Risk mitigation

http://commons.wikimedia.org/wiki/File:I40_Bridge_disaster.jpg

Page 13: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Plato

• The Preservation Planning Tool:

http://www.ifs.tuwien.ac.at/dp/plato

• Planets Project of the Digital Preservation Lab at the Vienna University of Technology– Define requirements

– Evaluate alternatives

– Analyze results

– Build preservation plan

Page 14: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

SLAs

• Service level agreements make explicit the terms of the service– Define roles, terms, rights, permitted uses

– Define service period

– When is the service available?

– When are maintenance outages scheduled?

– When do you respond to support requests?

– Who is notified for unscheduled outages?

– Process to end agreement

Page 15: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Sustainability

• Funding for the long term– What’s the total cost of preservation?

– What resources are available?

– What’s possible given resource constraints?

Page 16: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

BRTF

• Blue Ribbon Task Force on Sustainable Digital Preservation. Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information. February 2010.

http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdf

Page 17: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

BRTF

• Barriers to sustainable digital access and preservation include:– Inadequacy of funding models

– Confusion and/or lack of alignment between stakeholders, roles, and responsibilities

– Inadequate incentives to support collaboration

– Complacency that current practices are good enough

– Fear that digital access and preservation is too big.

Page 18: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

BRTF

• Conditions for sustainable digital preservation:– Recognition of the benefits by decision makers

– Process for selecting materials with long-term value

– Incentives for decision makers to preserve in the public interest

– Appropriate organization and governance

– Mechanisms to secure adequate resources

Page 19: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

LIFE

• Life Cycle Information for E-Literature

http://www.life.ac.uk/

• University College, London and the British Library, funded by JISC

• Developed a model and a tool for predicting the costs of preserving digital content

Page 20: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

LIFE

Page 21: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Agenda, Day 110:30 – 10:45 Welcome, introductions, agenda

10:45 – 12:30 Preservation goals and concepts: long-term usability, risk, trust

12:30 – 13:30 Lunch

13:30 – 14:15 Policy frameworks and business/technical sustainability

14:15 – 15:00 Strategies: redundancy/replication, migration, emulation

15:00 – 15:30 Afternoon break

15:30 – 16:30 Standards & best practices: reformatting, OAIS, PREMIS, Drambora, TRAC

16:30 – 17:00 Questions and discussion

Page 22: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Strategies: Replication

• Data redundancy/data replication: geographical (Japan) (New Zealand) (California)

• System manufacturer/OS heterogeneity: not reliant on single manufacturer (Sun bought by Oracle, Isilon bought by EMC)

• Issues with replication: how much data? How quickly can it be moved? How often does it change? How to monitor data on multiple systems?

Page 23: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Strategies: Identifiers

• Uniquely identify objects within repository

• Bind identifiers to target object

Page 24: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Strategies: Format Migration

Why:• To manage a limited number of formats in

the repository• To create an access copy • To a preservation-ready format • To take advantage of additional functionality• In response to identified risk of format

failure

Page 25: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Strategies: Format Migration

When:

• At time of deposit, as part of ingest

(TIFF->JPEG2000)

• As a batch migration (SGML->XML)

• Lossless/lossy

Page 26: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Strategies: Emulation

• Increasingly robust option, particularly for operating systems and firmware.

• Intellectual property and copyright complexities

• Continued challenges for maintenance and reliance.

Page 27: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Strategies: Metadata

• Sufficient metadata to understand object

• Descriptive, technical, administrative, preservation

• How much is enough?

Page 28: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Strategies: Fixity I

Fixity: test for corruption– Corruption could

happen during submission or storage

Process: • Calculate digest at ingest:• Recalculate later• Compare

Mike Johnson: TheBusyBrain.comhttp://www.flickr.com/photos/thebusybrain/2492945625/

Page 29: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Strategies: Fixity II

• Need to have a clear policy on how to respond in cases of corruption

• In conjunction with replication provides robust infrastructure

http://www.flickr.com/photos/library_of_congress/2179131683/

Page 30: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Strategies: Backup

• Digital backups: – tape, disc, cloud

– Online, nearline, offline

– Frequency, retention period

• Physical backups: print, microfilm

Wikimedia Commonshttp://commons.wikimedia.org/wiki/File:Backup_Backup_Backup_-_And_Test_Restores.jpg

Page 31: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Strategies: Access

• Access is the goal of preservation

• Users who are actively using digital content will be the first to spot problems

Cornell Universityhttp://hdl.handle.net/1813.001/5skm

Page 32: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Agenda, Day 110:30 – 10:45 Welcome, introductions, agenda

10:45 – 12:30 Preservation goals and concepts: long-term usability, risk, trust

12:30 – 13:30 Lunch

13:30 – 14:15 Policy frameworks and business/technical sustainability

14:15 – 15:00 Strategies: redundancy/replication, migration, emulation

15:00 – 15:30 Afternoon break

15:30 – 16:30 Standards & best practices: reformatting, OAIS, PREMIS, Drambora, TRAC

16:30 – 17:00 Questions and discussion

Page 33: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Best practices: Reformatting

Reformatting: best practices for digitization

• NISO: A Framework of Guidance for Building Good Digital Collectionshttp://www.niso.org/publications/rp/framework3.pdf

• JISC: Advicehttp://www.jiscdigitalmedia.ac.uk/advice/

• NARA: Technical Informationhttp://www.archives.gov/preservation/technical/

Page 34: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Best practices: Reformatting

• Goal is to use documented, open standards.

• Many image, audio and video formats may have patented software for creation (encoding) and reading (decoding) but use an open metadata standard (eg MPEG-4, JPEG2000)

Page 35: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Standards: PREMIS

• PREMIS: Preservation Metadata: Implementation Strategies

• http://www.loc.gov/standards/premis

• A core set of preservation metadata: “the information a repository uses to support the digital preservation process.”

Page 36: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Standards: PREMIS

Basic data model includes:

• Intellectual entity

• Digital Object

• Event

• Agent

• Rights

Page 37: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Standards: OAIS

• OAIS: Open Archival Information System

• Model for digital preservation– ISO 14721:2003

– Defines key terms: archival storage, content object, designated community, representation information, information package (content information and

– Defines process: Submission, Archive, Dissemination

Page 38: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

OAIS Functional Entities

Page 39: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

OAIS: Functions of Ingest

Page 40: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

OAIS concepts

• SIP: Submission Information Package

• AIP: Archival Information Package

• DIP: Dissemination Information Package

• Information Object=Data object + Representation Information (Rep Info)

Page 41: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Uses of OAIS

• It’s a reference model, not a blueprint• Used to perform gap analysis• Provides a way to measure business and administrative

practices, succession plans, budgets, staffing• A framework for audits• TRAC, Drambora, Digital Asset Framework, ISO (which

one?) • Self audit vs external audit• CRL acting as external auditor; has completed audits

for Portico, HathiTrust

Page 42: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

TRAC

• Trustworthy Repository Audit and Certification Checklist

• Developed by OCLC, RLG and NARA, now managed by CRL

• Under consideration as an ISO standard

• http://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying-0

Page 43: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

TRAC

• A tool to guide auditors to discover gaps

• Three main categories:– Organization infrastructure

– Digital Object management

– Technologies, Technical Infrastructure and Security

Page 44: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

TRAC

• Can be used to conduct a self-audit, or used by external auditors

• Center for Research Libraries serves as an external auditor in the US– Completed audits on Portico and HathiTrust

– Working on others

• Requires extensive documentation of policies, procedures, staffing, budgets, systems, and technology

Page 45: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Audits

• Quis custodiet ipsos custodes? Can audits be trusted? – Not such a great track record in financial industry

– What are the benchmarks?

• Proposed ISO standard http://wiki.digitalrepositoryauditandcertification.org/bin/view

– Metrics for Audits

– Requirements for Auditing Organizations

Page 46: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

DRAMBORA

• Digital Repository Audit Method Based on Risk Assessment (DCC and DPE)

• http://www.repositoryaudit.eu/

• A toolkit for administrators to conduct a self-assessment of their digital repositories

• Identify gaps in current policies, systems, staffing

• Measure likelihood and severity of the risk

Page 47: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

DAF

• Data Asset Framework (formerly Data Audit Framework) from the DCC

• http://www.data-audit.eu/

• An online tool to help administrators “identify, locate, describe and assess” research data within their organizations

• Once they’ve acquired a better understanding, the tool will help assess the policies, practices and systems to manage research data

Page 48: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Agenda, Day 110:30 – 10:45 Welcome, introductions, agenda

10:45 – 12:30 Preservation goals and concepts: long-term usability, risk, trust

12:30 – 13:30 Lunch

13:30 – 14:15 Policy frameworks and business/technical sustainability

14:15 – 15:00 Strategies: redundancy/replication, migration, emulation

15:00 – 15:30 Afternoon break

15:30 – 16:30 Standards & best practices: reformatting, OAIS, PREMIS, Drambora, TRAC

16:30 – 17:00 Questions and discussion

Page 49: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Agenda, Day 2

08:30 – 08.35 Review of objectives and agenda

08:35 – 09:30 Infrastructure and tools

09:30 – 10:30 Case study: preservation activities at CDL

10:30 – 11:00 Morning break

11:00 – 12:00 Case study: preservation activities at Portico

12:00 – 12:30 Preservation initiatives and organizations: DataNet, DCC, DPC, IIPC, NDSA, OPF

12:30 – 13:30 Lunch

13:30 – 14:00 Case study: preservation activities at Stanford

14:00 – 15:00 Other preservation resources

15:00 – 15:30 Afternoon break

15:30 – 16:00 Format characterization

16:00 – 16:30 Characterization in preservation workflows

16:30 – 17:00 Questions and discussion

Page 50: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Technical Infrastructure

• Technical infrastructure: is it adequate to respond to user expectations? High-availability? Back-up copy to online versions? Can it scale up to meet future demand? Seemingly simple question: Can you put data in, and get it back out? The same as it went in? Bigger picture: does the infrastructure adhere to the OAIS framework?

Page 51: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Support infrastructure

• Do you have adequate staffing to provide the service described in the Service Level Agreement?

Page 52: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Systems and tools

• JHOVE2 characterization: http://jhove2.org

• PRONOM: http://www.nationalarchives.gov.uk/PRONOM/Default.aspx

• United Digital Format Registry: http://udfr.org

Page 53: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Systems and tools

• Local systems: Fedora, DSpace, ePrints, Ex Libris Rosetta

• Hosted systems: Merritt (CDL), Chronopolis (UCSD), Tessella Safety Deposit Box, DuraSpace, MetaArchive

• Local/Hosted: LOCKSS

Page 54: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Agenda, Day 2

08:30 – 08.35 Review of objectives and agenda

08:35 – 09:30 Infrastructure and tools

09:30 – 10:30 Case study: preservation activities at CDL

10:30 – 11:00 Morning break

11:00 – 12:00 Case study: preservation activities at Portico

12:00 – 12:30 Preservation initiatives and organizations: DataNet, DCC, DPC, IIPC, NDSA, OPF

12:30 – 13:30 Lunch

13:30 – 14:00 Case study: preservation activities at Stanford

14:00 – 15:00 Other preservation resources

15:00 – 15:30 Afternoon break

15:30 – 16:00 Format characterization

16:00 – 16:30 Characterization in preservation workflows

16:30 – 17:00 Questions and discussion

Page 55: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

CDL UC3: Who are we?

Page 56: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Case study: CDL

• Founded in 1997 by the University of California (UC)• Provide service to 10 UC campuses:

– 222,000 students– 121,000 faculty and staff members

• Work closely with libraries. • Services:

– joint journal licensing and purchasing; – union catalog (Melvyl)– digitization of special collections– scholarly communications and publishing – digital preservation

Page 57: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Case study: CDL UC3

• Digital Preservation Group (DPG)– History: DPR, METS metadata, largely library

clientele

• DPG University of California Curation Center (UC3)

• New clientele, new requirements, new mandates

Page 58: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

http://www.flickr.com/photos/65328860@N00/14320717By Felix Burton (Flickr) [CC-BY-2.0 (www.creativecommons.org/licenses/by/2.0)], via Wikimedia Commons

Page 59: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Case study: CDL UC3

• Changing funding: Contracting state funding

• Changing clientele: Expanding to include a wide range of groups, researchers, museums, as well as libraries.

• Changing requirements and needs: data management and sharing requirements

• Challenge: CDL must work more efficiently, collaboratively, and find new revenue sources

Page 60: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Case study: CDL UC3

• Main UC3 services:– EZID: http://n2t.net/ezid

– Web Archiving Service: http://was.cdlib.org

– Merritt Repository Service:http://merritt.cdlib.org

Page 61: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

• Easy identifiers for the long-term• A service to make and manage actionable ids

–User interface–Programming interface for bulk ops (Dryad)

• Ids for anything: digital, physical, living, abstract• Can manage identifiers under different schemes:

–ARKs, DOIs, and more to come (LSIDs, ...)• Visit EZID at

http://n2t.net/ezid

Case Study: UC3 EZID

Page 62: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey
Page 63: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey
Page 64: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Case Study: UC3 Web Archiving Service

Page 65: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Build unique archives for local research communities

Geographically focused archives support local research• Los Angeles• Monterey Bay• Orange County• San Diego• Santa Barbara

Topical archives support special research collections• Guantanamo Bay / Tamiment Library

• California Water Districts / Water Resources Center Archives

Page 66: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Search Across all Sites in Archive

Page 67: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Easy to Use : Simple Workflow

Page 68: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Analyze Site Change

Are there new documents on this site?

Are there documents in my archive that have been removed from the live site?

Page 69: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

WAS Snapshot: Spring 2011

Stats:22 organizations using serviceseveral more joining shortly

4725 sites captured26 terabytes of content stored100+ archives under construction35 archives published

Page 70: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Case study: Curation Micro-services

• Curation micro-services: work of Stephen Abrams, John Kunze and Patricia Cruse

• http://www.cdlib.org/uc3/curation/

• https://confluence.ucop.edu/display/Curation

Page 71: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Case Study: Micro-services

• Complex emergent behavior

• Low barrier, low maintenance, low commitment

• Policy neutral, protocol/platform independent

• The file system is the database

http://www.flickr.com/photos/oskay/265899811

Page 72: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

The Unix philosophy

“Make each program do one thing well”

“To do a new job, build afresh rather than complicate old programs by adding new features”

“Expect the output of every program to become the input to another, as yet unknown, program”

“Design and build software … to be tried early”

“Don't hesitate to throw away the clumsy parts and rebuild them”

— D. L. McIlroy et al., “Unix time-sharing system forward,” Bell System Technical Journal57:6, part 2 (1978): 1902

Page 73: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Curation micro-servicesMode Focus Value Service Valence Visibility

Curation

ValueAccretion Annotation

UI / A

ccess control / Message queuing

Interoperation

User-facing

Visibility Notification

Utility

Accessibility Access

Application

Derivation Transformation

Selectivity Search

Actionability Index

Stewardship Ingest

Preservation

ContextEpistemology Characterization

Interpretation

Provider-facing

Ontology Inventory

State

Reliability Replication

ProtectionFixity Fixity

Stability Storage

Identity Identity

Page 74: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Merritt micro-services

• Merritt is built from a micro-services toolkit– IdM/Authn/Authz LDAP

– Persistent identifiers EZID

– Persistent storage CAN/Pairtree/Dflat/Checkm/ReDD

– Fixity Fixity

– Replication Replication

– Catalog Inventory/4store

– Ingest Ingest/Zookeeper

– Characterization JHOVE2

– Discovery XTF

– Transformation

– Notification– Annotation

Version 2

GhOST/Shibboleth

Page 75: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Design goals

Principle of least surprise

Multiple interface modalities

– RESTful HTTP

– Command line

– Procedural (Java, Perl, Ruby, …)

Linked data

Stable URL references

The file system is the database

http://example-store/

State or content

Storage node

ObjectVersionFile

default/1234/3/xyzstate/

Storage service

Page 76: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Merritt repository

“How can I meet the data management requirements of my grant?”

“I know my desktop content is at risk; what should I do?”

“What’s a good way to share the data underlying a recent publication?”

“How can I ensure persistent availability?”

Page 77: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey
Page 78: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Model free

application/msword 342.5 KB

Page 79: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Strongly versioned

Page 80: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Easy submission

Page 81: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

More info on CDL

• CDL UC3 home pagehttp://www.cdlib.org/uc3

• Curation Micro Serviceshttps://confluence.ucop.edu/display/curation/

• Contact:

[email protected]

Page 82: Digital Preservation - Food and Agriculture OrganizationDigital Preservation UN FAO May 23-24, 2011. Tom Cramer. Perry Willett. Stephen Abrams. Sheila Morrissey

Other services/projects

• JHOVE2: http://jhove2.org

• United Digital Format Registry (UDFR)

https://bitbucket.org/udfr/main/

• DataONE

• DCXL (Digital Curation Excel)

• Data Management Plan Tool

https://bitbucket.org/dmptool/main/


Recommended