Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | william-poole |
View: | 217 times |
Download: | 0 times |
Improving the Efficiency of Fault-Tolerant Distributed
Shared-Memory Algorithms
Eli Sadovnik and Steven Homberg
Second Annual MIT PRIMES Conference, May 19-20, 2012
Introduction
• Shared memory supports concurrent access– Read & write interface• Memory models: single writer, multiple reader (SWMR)
and multiple writer, multiple reader (MWMR)– Consistency is important• Strong consistency provides useful semantics
• Abstraction for message-passing networks– Shared memory can be emulated– Difficult to do, but solutions exist– For example applications for the Internet, such as Dropbox
Our Research Project
THE RAMBO PROJECT•Framework for emulating shared memory– Introduced by Lynch and Shvartsman, extended by Gilbert– Implements the MWMR model with strong consistency– Designed for dynamic distributed message-passing settings
OUR GOAL•RAMBO is elegant but not always efficient•Extend RAMBO with intelligent data management
Consistency & Atomicity• There are many consistency models• We are interested in atomicity
Violation
(Safety)
Violation
(Safety)
Violation
(Regularity)
Violation
(Regularity)
AtomicityAtomicity
time
0
read(3) read(0) read(8)
write(8)
time
0
read(8) read(0) read(8)
write(8)
time
0
read(8) read(8) read(8)
write(8)
Emulating Shared Memory
Data:
5
Status:
WORKING
User 1:ReaderData:
5
User 2:WriterData:
5
User 3:ReaderData:
5
Weakness of the Centralized Approach
Data:
Status:
FAILED
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
error
errorerror
Replication in Distributed Setting
Data:
Status:
FAILED
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
5
55
Data:
5
Status:
WORKING
Data:
5
Status:
WORKING
The ABD AlgorithmHagit Attiya, Amotz Bar-Noy, Danny Dolev
A SWMR algorithm•Operation level wait-freedom– Termination unaffected by concurrency
•Designed for a message-passing setting– Allows limited failures– Communication is reliable– Messages can be delayed
Quorum Systems and ABD
• ABD is a quorum based algorithm– Quorum system is a collection of intersecting sets
• For example a voting majority quorum system
• Data is replicated in a quorum systems– Quorum system members are networked servers
• Guarantee of atomicity– Quorum intersection and read/write protocols
• Reads must write! (… sometimes as we will see later)– A reader must write the latest data– Writer cannot be trusted to complete
Phased Read/Write Protocols
Data:
Status:
WORKING
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
5
55
Data:
Status:
WORKING
Data:
Status:
WORKING
Q2
Q1
User 2 writesits data, a 5, to quorum Q1.
55
Phased Read/Write Protocols
Data:
Status:
WORKING
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
5
55
Data:
5
Status:
WORKING
Data:
5
Status:
WORKING
Q2
Q1
User 1 queriesquorum Q2,sees the latestdata is a 5,and writesthat back tothe computerthat does nothave the latestdata.
5
Data Versions & Timestamps
Data:
5,t=1
Status:
WORKING
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
5,t=1
7,t=25,t=1
Data:
7,t=2
Status:
WORKING
Data:
7,t=2
Status:
WORKING
Q2
Q1
Timestamps allow us to distinguish among different versions of the data.
Data Versions & Timestamps
Data:
7,t=2
Status:
WORKING
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
7,t=2
7,t=27,t=2
Data:
7,t=2
Status:
WORKING
Data:
7,t=2
Status:
WORKING
Q2
Q1
Quorum Viability
Data:
7,t=2
Status:
WORKING
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
error
errorerror
Data:
7,t=2
Status:
WORKING
Data:
7,t=2
Status:
WORKING
Q2
Q1
Data:
Status:
FAILED
Data:
Status:
FAILED
A weakness ofthe ABD algorithmis that it isdependent ona quorum ofservers always beingviable. When no quorum is available, thenoperations are blocked.
The RAMBO Framework(Reconfigurable Atomic Memory
for Basic Objects)
Seth Gilbert
Nancy Lynch
Alexander Shvartsman
Quorum Reconfiguration
Data:
Status:FAILED
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
Q2
Q1
Data:
Status:WORKING
Data:
Status:WORKING
RAMBO uses quorum reconfiguration to ensure service longevity.
A new quorum system (a new set of servers) is installed to replace the old ones, allowing progress in spite of failures.
Replica Transfer
Data:
Status:FAILED
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
Q2
Q1
Data:
Status:WORKING
Data:
Status:WORKING
7,t=2
7,t=2 7,t=2
After a new set of servers is installed, these servers do not have any information.
The replica information (copies of data) must be transferred to the new configuration.
Garbage Collection
Data:
Status:FAILED
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
After information is transferred to the new servers, the old servers are phased out of use.
This process is called `garbage collection’.
The mechanism for garbage collection has two phases and is analogous to read/write operations (introduced in the next slies).
Read/Write Operations
Data:
Status:FAILED
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
User 1:ReaderData:
7,t=2 7,t=2 7,t=2 7,t=2
7,t=2
What if reads and writes occur during reconfiguration?
Concurrent operations contact all existing configurations to ensure the latest information is accessed.
Multi-Configuration Access
Read/Write Operations
Old configurations need to be removed from use.
Ongoing read/write operations use their existing configuration knowledge. New operations ignore the old configuration.
Data:
Status:FAILED
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
User 1:ReaderData:
7,t=2
7,t=2 7,t=27,t=27,t=2
Garbage Collection
Q1: Can a reader (respectively writer) avoid contacting configurations that it learned have been marked as garbage collected?
Q2: When can a reader avoid its second phase, and can a reader propagate selectively?
Q3: Can we propagate to the most recent configuration only?
Research Questions
Concurrent Garbage Collection (Q1)
Data:
5,t=1
Status:WORKING
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
Q2
Q1
Data:
Status:WORKING
Data:
0,t=0
Status:WORKING
7,t=2
1
2
3
6
7Return 7
7,t=2 7,t=2 7,t=2 7,t=2
4
User 1:ReaderData:
5
7,t=2
7,t=2
We believe that the garbage collected configuration can in fact be ignored because the reader learns of the configuration’s information regardless.
7,t=2 0,t=0 0,t=00,t=0 0,t=0
Improved Configuration Management (Q1)
• Authors of RAMBO conjecture that operations must contact all configurations that are discovered during the query (respectively propagate) phase.
• Communicating with configurations learned to be garbage collected mid-operation is unnecessary– Intermediate discovery of garbage collected configurations
from another server– That server knows at least as recent tag as any known in
the old configurations
• IMPACT: improves operation liveness
Improved Bookkeeping (Q2)
Data:
7,t=2
Status:
WORKING
User 1:ReaderData:
Data:
7,t=2
Status:
WORKING
Data:
7,t=2
Status:
WORKING
Q2
Q1
7t=2
7t=2
After querying the reader learns that a majority of nodes has the up-to-date information, thus making propagation needless.
7,t=2
7,t=27,t=2
7,t=27,t=2
Semi-Fast Read Operations (Q2)
• Read operations always propagate– Regardless of the actual replica dissemination – Redundant messages and slow operation
• The proposed solution– During the query phase, reader records the latest
timestamps of server with which it communicated– The reader contacts servers that are not up-to-date– Sometimes this allows omitting the propagation phase
entirely (`semi-fast’ read operations)• IMPACT: improves operation latency and reduces
communication costs
Overly Extensive Propagation (Q3)
Data:
Status:FAILED
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
User 1:WriterData:
7,t=27,t=27,t=27,t=27,t=2
Currently, RAMBO both queries and propagates to all active configurations. In fact, just the query phase covering all active configurations is sufficient for atomicity.
Propagate to the Latest Configuration (Q3)
• We believe it is not necessary to propagate to any configuration but the last active configuration.
• Properties of configuration information • All configurations are totally ordered.• Configuration have a forward link.• Discovery is faster than reconfiguration
• Operations query all active configurations• IMPACT: reduces communication cost
Summary
• Algorithmic optimizations• Opportunistic benefits– A clear advantage when • Servers gossip, and• Configurations have members in common
• Changes are minimally intrusive– Modest increase in bookkeeping and the size of
messages
Future Work
• Formal reasoning– Use the Input/Output Automata framework to
demonstrate that the new changes preserve consistency guarantees of RAMBO
• Simulation– Use the TEMPO toolkit to simulate RAMBO executions and
build confidence in our proofs
• Empirical experiments– Augment the existing implementations of RAMBO and
collect behavior data on Planet-Lab
Special Thanks to:The MIT PRIMES Program
Supervisor Prof. Nancy Lynch
Mentor Dr. Peter Musial