+ All Categories
Home > Documents > Bryan Beecher University of Michigan Director, Computing & Network Services E:...

Bryan Beecher University of Michigan Director, Computing & Network Services E:...

Date post: 12-Jan-2016
Category:
Upload: anis-mcdonald
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
19
Bryan Beecher University of Michigan Director, Computing & Network Services E: [email protected] W: http://www.icpsr.umich.edu/ICPSR/staff/beecher. Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate Director, Harvard-MIT Data Center Senior Research Scientist, Institute for Quantitative Social Sciences E: [email protected] W: http://maltman.hmdc.harvard.edu/
Transcript
Page 1: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Bryan BeecherUniversity of Michigan

Director, Computing & Network Services

E: [email protected]

W: http://www.icpsr.umich.edu/ICPSR/staff/beecher.html/

Micah AltmanHarvard University

Archival Director, Henry A. Murray Research Archive

Associate Director, Harvard-MIT Data Center

Senior Research Scientist, Institute for Quantitative Social Sciences

E: [email protected]: http://maltman.hmdc.harvard.edu/

Page 2: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Roadmap Why replicate for preservation? What is institutional model for replication

in Data-PASS use? How do we build on LOCKSS to support

these institutional needs? Collaborators and Conspirators

Leonid Andreev, IQSS; Steve Burling, ICPSR; Jonathan Crabtree, Odum; Marc Maynard, Roper; Nancy McGovern, ICPSR;

Page 3: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Technical Media failure: storage conditions, media characteristics Format obsolescence Preservation infrastructure software failure Storage infrastructure software failure Storage infrastructure hardware failure

External Threats to Institutions Third party attacks Institutional funding Change in legal regimes

Quis custodiet ipsos custodes? Unintentional curatorial modification Loss of institutional knowledge & skills Intentional curatorial deaccessioning Change in institutional mission

Page 4: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

There are potential single points of failure in both technology, organization and legal regimes: Diversify your

portfolio: multiple software systems, hardware, organization

Find diverse partners – diverse business models, legal regimes

http://failblog.org/2008/02/08/floppy-fail/

Page 5: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Consider organizational credentials

No organization is absolutely certain to be reliable

Consider the trust relationships across institutions

http://flickr.com/photos/phauly/35555985/

Page 6: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Policy Driven Institutional policy creates formal

replication commitments Replication commitments are described in

metadata, using schema Metadata drives

Configuration of replication network Auditing of replication network

Page 7: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Asymmetric Commitments

Partners vary in … … storage commitments to replication … size of holdings being replicated … what holdings of other partners they

replicate

Page 8: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Completeness Complete public holdings of each partner Retain previous version of holdings Include

metadata data documentation legal agreements

Page 9: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Restoration guarantees Restore groups of versioned content

to owning archive to replication hosts

Institutional failure restoration:– transfer entire holdings of an archive to another

Page 10: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Trust & Verification Each partner is trusted …

… to hold the public content of other(not to disseminate improperly)

… to add units to be harvested No partner is trusted to be “super-user”

No deletion (or directly manipulation of replication storage owned by another partner

Legal agreements reinforce trust model Schema based auditing used to …

… verify replication guarantees are met … record replication and storage commitments … document related TRAC criteria

Page 11: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Network level: Identification: name; description; contact;

access point URI Capabilities: protocol version; number of

replicates maintained; replication frequency; versioning/deletion support

Human readable documentation: restrictions on content that may be placed in the network; services guaranteed by the network; Virtual Organization policies relating to network maintenance

Host level Identification: name; description; contact; access

point URI Capabilities: protocol version; storage available Human readable terms of use: Documentation of

hardware, software and operating personnel in support of TRAC criteria 

Archival unit level Identification: name; description; contact; access

point URI Attributes: update frequency, plugin required for

harvesting, storage required Terms of use: Required statement of content

compliance with network terms. ; Dissemination terms and conditions

TRAC Integration A number of elements comprise documentation

showing how the replication system itself supports relevant TRAC criteria

Other elements that may be use to include text, or reference external text that documents evidence of compliance with TRAC criteria.

Specific TRAC criteria are identified implicitly, can be explicitly identified with attributes

Schema documentation describes each elements relevance to TRAC, and mapping to particular TRAC criteria

Page 12: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Initialization: Given schema instance distribute AU harvesting responsibility to hosts

Auditing: Does current host harvesting allocation & history match replication commitment in schema?

Recovery of hosts Deliver AU content to source archive Addition of AU’s, hosts Growth of AU’s over initial commitment

Assumptions Nothing is deleted Resources in network grow monotonically Off-the-path behavior

is detected automatically, resolved manually

Page 13: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:
Page 14: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:
Page 15: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

LOCKSS CLOCKSS

Very easy to build and deploy 5 minutes

Very easy to plug into public LOCKSS network 5 minutes

Very easy to manage thereafter – it is basically an appliance

Also easy to set-up Grouping your CLOCKSS

devices into a private network and paring the 20k-line configuration file into the right 200-line configuration file is not

Managing a network now, not a device

Page 16: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Standard LOCKSS used for Harvesting Recovery

New LOCKSS bulk update mechanism used for Initial configuration Adding AU’s

CLOCKSS mechanisms (certificates, cache monitor) Content delivery Optimize recovery Auditing

Data-PASS customizations for schema processing Translating schema instance into bulk update requests Reporting on compliance based on cache monitor database

Page 17: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Summer 2007: Attended the MetaArchive LOCKSS tutorial Very good overview of LOCKSS

Summer 2007: SSP System Requirements Developed & Approved

Winter 2007: First public LOCKSS network nodes built at two Data-PASS sites

Winter 2007: SSP Replication Commitment Schema Developed

Spring 2008: Completed Test harvest of MRA collection into LOCKSS

Sprint 2008: SSP System Use Cases Developed Spring 2008: Prototype plugin developed to harvest

Dataverse Networks Spring 2008: Data-PASS sites are joined into single Private

LOCKSS Network (PLN) Spring 2008: Met with LOCKSS developers to review use

cases SSP will leverage functionality “in the works” by LOCKSS team

Page 18: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:
Page 19: Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W:

Replication ameliorates institutional risks to preservation

Data PASS requires policy based, auditable, asymmetric replication commitments

Formalize policy in schema (Re)Configure & audit LOCKSS using

schema Replication uses standard LOCKSS

mechanisms


Recommended