Date post: | 15-Jan-2016 |
Category: |
Documents |
Upload: | regan-osswald |
View: | 219 times |
Download: | 0 times |
Overview of LOCKSS
Session Learning Objectives
Provide an overview of the LOCKSS architecture.
Describe the LOCKSS polling process
Describe how LOCKSS private networks differ.
Provide a vocabulary of technical terms used frequently with LOCKSS networks
Architectural Components
Provider Sites (digital collections) LOCKSS nodes (aka “peers”) Plugins / Plugin Repository Cache Manager Title Database / Conspectus
Database
Provider Sites
Prepare a digital collection so that it is web accessible to the preservation nodes
Expose a “manifest” web page for each collection, according to LOCKSS specifications. Grants permission for LOCKSS to crawl Gives starting point for crawl
Provide information sufficient to create a LOCKSS plugin for the collection (or else create the plugin themselves and reposit that plugin with the LOCKSS network)
LOCKSS Peer Nodes
Data caches for harvested content Caches organized into archival units
(AUs) Nodes can select which AUs to crawl
and preserve There must be >= 6 copies of an AU
in order for the polling process to work properly
Plugins / Plugin Repository
Tell LOCKSS where, how and how often to crawl a provider site for AUs
Plugins are Java based Distinct from core LOCKSS software
Cache Manager
Distributed separately from LOCKSS Can remotely inspect and manage
the caches on the various peer nodes
Title / Conspectus Databases
Title database on each node describes and manages which AUs to preserve on that node
Conspectus Database designed for MetaArchive Project, provides more extensive metadata about the preserved digital collections, and feeds the Title database with entries
Web Site
Source Code
SQL Dump
Digital Collection 1 Private LOCKSS Network Nodes
Manifest page
Manifest page
9
1
8
2
7
3
6
4
5
Digital Collection 2
AU 1
AU 2
AU 2
AU 3
WebSite
AU 1
Plugin Repository
DC1
DC2
DC1
DC2
DC2
DC2
DC2
DC2
DC2
DC2
DC1
DC1
DC1DC1
DC1
The Polling Process
Polling Process resulting in “landslide loss”, AU repair
9
1
8
2
7
4
5
DC2-AU1
Node 5 calls poll on AU 1
of Digital Collection 2
DC2-AU1 DC2-AU1
DC2-AU1
DC2-AU1
DC2-AU1 DC2-AU1
Node 5 invites some recently encountered
peers to vote.
(Each node maintains a reference list of the
recently encountered peers)
Those invited are the “inner circle” for this
opinion poll.
SHA1
Invited nodes create fresh
SHA1 digest of the AU
SHA1
Invitation
SHA1
SHA1
SHA1
PollChallenge
Affirmative PollChallenge
message responses allow that inner circle node to
participate in poll
PollProof
Poll Effort Proof is cryptographically
derived and sent to affirmative voter’s
challenges
Node 9 nominates 7 and 8
Nominated Nodes 7 and 8 belong to the “outer circle”, can be invited to subsequent
voting rounds by Node 5
Node 5 discovers new peers through
nomination process
Valid vote agrees
Valid vote disagrees
Valid vote disagreesValid vote disagrees
There is a “landslide” of valid, disagreeing votes
against the Node 5’s SHA1 digest of DC2-AU1
Since agreeing votes are below
threshold, Node 5 picks a random
disagreeing voter from the inner circle
Encrypted RepairRequest messageRepair made
Once repair is completed, Node 5 immediately calls a new poll,
which effectively verifies, or invalidates and corrects, the
repair
Polling Refresh Timer
A peer sets a refresh timer for a given AU to determine the interval between successive polls
System parameter R is the mean for the possible random values generated for the refresh timer
System Parameter – ‘Quorum’
Q = # of valid inner circle votes required to conclude a poll successfully
Q = 6 is the thoroughly tested value in use
If votes < Q, poller invites additional peers, or else aborts the opinion poll
Polling Outcome – ‘Landslide Win’
The poller considers its current copy to have integrity
This is the only scenario in which an opinion poll concludes successfully
The poller updates its reference list and then waits until the next polling period (determined by the refresh timer)
Reference List Update
Happens only after a successful poll Poller removes the inner circle peers
who had valid votes in the last opinion poll
Culls peers it has not been able to contact for some time
Adds outer circle peers whose votes were valid and eventually agreeing
Polling Outcome - Inconclusive
D = max allowed “minority” votes If Agreeing Votes > D, and Agreeing Votes < Total valid votes – D, Then the poll is inconclusive, raises alarm Human intervention needed to determine
if nodes have been compromised Peers voting in agreement with a known
bad copy are blacklisted if that peer node can’t be identified or it won’t cooperate
Further Details on Polling Process
Petros Maniatis, Mema Roussopoulos, TJ Giuli, David S. H. Rosenthal, Mary Baker, and Yanto Muliadi, "LOCKSS: A Peer-to-Peer Digital Preservation System", ACM Transactions on Computer Systems (TOCS). http://www.eecs.harvard.edu/~mema/publications/TOCS2005.pdf
See also LOCKSS related publications at http://www.lockss.org/lockss/Publications
The LOCKSS Private Network Difference
More flexible (not appliance based) Can run on any operating system that
supports Java LOCKSS Team maintains rpm packages for
Linux installations Peer Node administrators have greater
discretion configuring access, customizing functionality, e.g. altering system parameters
The LOCKSS Private Network Difference (cont.)
Can extend LOCKSS core functionality with supplemental tools and methods to fit new use cases
E.g. the MetaArchive Conspectus database
Vocabulary
(Please refer to the workshop binder for terminology and definitions)
Overview of LCAP version 3