LOCKSS: Lots of Copies Keeps Stuff
Safe
UNIVERSITY of WISCONSIN-MADISONComputer Sciences Department
CS 739Distributed Systems
Andrea C. Arpaci-Dusseau
Preserving Peer Replicas By Rate-Limited Sampled Voting, Maniatis, Roussopoulos, Giuli, Rosenthal, Baker, Muliadi (Stanford) -- SOSP’03
Motivation
Librarians: Responsibility to preserve important materials Traditional approach:
• Acquire lots of copies• Distribute around world• Lend or copy to provide access
Academic publishing is moving to Web• LOCKSS: Real system used by many libraries (1999) • How to apply techniques to digital preservation?
Strength: Real problem that people care about, real solution being used
Design Goals and Assumptions
Must be cheap to build and maintain• No RAID systems
Need not operate quickly• Want to prevent change, not expedite it
Must function properly for decadesNo centralized controlHandle failures
• Handle malicious attackers• Handle catastrophic random failures
How is this different from other P2P systems?
Design PrinciplesCheap storage is unreliableNo long-term secrets
• Can’t hold private keys for arbitrary time periods
Use inertia• Rate limit the amount of activity and change
Avoid third-party reputation• Malicious users can lie about good users• Attackers can “cash in” history of good behavior
Reduce predictability• Make difficult for attackers to predict behavior of victims
Make intrusion detection intrinsic• Part of the system itself
Assume strong adversary• May want to change, suppress, or steal content
LOCKSS Overview
Libraries run persistent web caches• Collect by crawling journal web-sites• Distribute by acting as limited proxy cache• Preserve by cooperating with others to detect
and repair damage
Peers vote on large archival units (AU’s)• AU == year’s run of a journal• Each peer holds different AU’s• If AU damaged, call increasingly specific partial
polls
Opinion Poll ProtocolTerminology:
• Loyal, malign, healthy, damaged peers
Goal: • High probability loyal peers are healthy
(despite attacks by malign peers and failures)• Low probability even powerful adversary can damage significant
proportion of loyal peers without detection
Overview• Poll initiator calls opinion poll on AU >> rate of random damage• Invites small subset of known peers (poll participant or voter)• Voter computes and returns digest of AU• Vote results for poll initiator:
– Landslide win: Votes overwhelmingly agree with own version– Landslide loss: Repair AU by fetching copy of AU from peer– Inconclusive poll: Raise alarm for human attention
• Who can benefit from the poll? What if voter disagrees?
Peer Lists per AULists for every AU
• Friends list: Peers have outside relationships with friends• Reference list: Peers encountered recently
– Bootstrap: Init with friends list– Inner circle: Those invited to influence poll results– Outer circle: Nominated by inner circle
Poll InitiationPoll initiation: (about every 3 months per AU)
• Choose N random peers from ref list: Inner circle• Send Poll [Poll ID, Diffie-Hellman Public Key]• Wait for responses..
Voter from inner circle: Decide if want to participate• Why might a peer not participate?• Pick new DH public key, compute symmetric session key• How does Diffie-Hellman work?
– A chooses secret a, sends g^a mod p– B chooses secret b, sends g^b mod p– Each computes secret (g^b mod p)^a mod p = (g^a mod p)^b mod
p • Why encrypt messages??• Send back encrypted YES or NO to participate
– Send PollChallenge [Poll ID, DH public key, {challenge, YES}]
Poll Effort
Initiator: Produce computational effort for voter• Why proof of computation by initiator needed?• Use memory-bound functions (MBF) with poll id and
challenge as input– Why are MBF good?
• Send back PollProof [Poll Id, poll effort proof]– Even send this to voters who responded NO. Why?
Voter: Verifies result• Less computation needed to verify result than compute• Nominate outer circle peers (more later)
– Randomly selected from reference list• Send Vote messages for AU
– Also send proof of computational effort in rounds– Why proof of computation by voter needed? Why in rounds?
Vote Tabulation
Initiatator: Tabulates valid votes from inner circleThree cases:
• Landslide loss: Agreeing votes <= D– Repair AU
• Landslide win: Agreeing votes > V-D– Opinion poll concludes successfully; reschedule poll
• Inconclusive: Raise alarm
Repair• Initiator picks disagreeing voter and requests repair• When is voter willing to supply content?• Retabulate results with new content
Outer Circle
What is the purpose of the outer circle?Initiator: Picks same number from every nominator
• Repeat same steps of protocol with outer circle– Why?
• Differences?
Update reference list• What is a malign peer trying to do?• Who is removed?• Insert: Valid/agreeing outer circle peers and random
friends– Why?
Adversary Attacks
Assume powerful adversary• Total information awareness• Perfect work balancing• Perfect digital preservation• Local eavesdropping• Local spoofing• Stealth• Unconstrained identities• Exploitation of Common peer vulnerabilities• Complete parameter knowledge
Adversary Attacks
Stealth modification• Convince loyal peer has damaged AU• Replace protected content with bad version• Focus of paper
Nuisance• Raise alarms
Attrition• Make loyal peers waste computational resources so can’t repair
damage
Theft• Acquire published content from peers without fee• How does LOCKSS prevent?
Free-loading• Obtain services without supplying to others
Stealth Modification Attack1) Lurk phase
• Increase foothold: malign peers in reference list (inner circle)– Wait until invited into circle– Act loyal– Nominate more malign peers
2) Attack phase• When see poll is vulnerable (I.e., overwhelming majority of
inner circle is malign), vote bad
Why is attacking successfully hard?• Rate limiting: Must wait for vulnerable polls to occur• Damaged loyal peers call and vote in polls using bad copy
– Can be repaired or raise alarms (doesn’t act differently when don’t have majority)
• Must expend effort calling polls too– Loyal peer only requests repair if voted in malign peer’s polls
Simulations
Environment• 1000 peers• Clusters of 30 peers; 80% for friends, 20% random• Call polls every 3 months on average• N (size of inner circle): 20, Q: 10
How many false alarms with no adversaries?• 20 years, random damage at every peer: 5-10 years
Simulation: Lurking Time
How long must lurk for desired foothold ratio?• 10% malign; how many years for 40% ratio?
50%?• 30% malign; how many years for 50% ratio?
70%?
Simulation: Alarm Time
How long before attack detected (I.e., inconclusive poll alarm raised)?
Simulation: Damage to AU
How many bad replicas? How many years?When is irrecoverable damage caused?
Simulation: Worst-case
How long should adversary lurk before attack?
Simulation: Benefit of Churn
What churn rates are best?
Conclusions
Interesting motivation• Real problem and deployed solution
Opinion Poll Protocol has many attractive propertiesUses problem domain to guide protocol
• Inertia: Adversaries can’t influence poll timing• Friend list: Use outside relationships to influence trust
Attacking is very costly• Must lurk long period to increase foothold in inner circle• Must continually pay through proofs of computation (MBF)• Immediately removed from lists if disagree
Easy to set off alarms• If voting results are inconclusive, human notified