Post on 30-Dec-2015
description
transcript
Timeliness, Failure Detectors,and Consensus Performance
Idit Keidar and Alexander ShraerTechnion – Israel Institute of Technology
PODC 2006Keidar & Shraer, Technion, Israel
Basic Model
• Message passing• Links between every pair of processes
– do not create, duplicate or alter messages (integrity)
• Process and link failures
PODC 2006Keidar & Shraer, Technion, Israel
Eventually Stable (Indulgent) Models
• Initially asynchronous– for unbounded period of time
• Eventually reach stabilization– GST (Global Stabilization Time) – following GST certain assumptions hold
• Examples– ES (Eventual Synchrony) – starting from GST all links
have a bound on message delay[Dwork, Lynch, Stockmeyer 88]
– failure detectors[Chandra, Toueg 96], [Chandra, Hadzilacos, Toueg 96]
PODC 2006Keidar & Shraer, Technion, Israel
Indulgent Models: Research Trend
• Weaken post-GST assumptions as much as possible [Guerraoui, Schiper96], [Aguilera et al. 03, 04], [Malkhi et al. 05]
Weaker = better?
PODC 2006Keidar & Shraer, Technion, Israel
You only need ONE machine with eventually ONE timely link. Buy the hardware to ensure it, set the timeout accordingly,
and EVERYTHING WILL WORK.
Indulgent Models: Research Trend
PODC 2006Keidar & Shraer, Technion, Israel
Consensus with Weak Assumptions
Network Network
Why isn’t anything happening???
Don’t worry!It will eventually happen!
PODC 2006Keidar & Shraer, Technion, Israel
Consensus with Weak Assumptions
Network Network
PODC 2006Keidar & Shraer, Technion, Israel
What’s Going On?
• In practice, bounds just need to hold “long enough” for the algorithm (TA) to finish
• But TA depends on our synchrony assumptions – with weak assumptions, TA might be unbounded
• For practical systems, eventual completion of the job is not enough!
PODC 2006Keidar & Shraer, Technion, Israel
Our Goal• Understand the relationship between:
– assumptions (1 timely link, failure detectors, etc.) that eventually hold
– performance of algorithms that exploit these assumptions, and only them
• Challenge: How do we understand the performance of asynchronous algorithms that make very different assumptions?
PODC 2006Keidar & Shraer, Technion, Israel
Typical Metric: Count “Rounds”
• Algorithms normally progress in rounds, though rounds are not synchronized among processes at process pi:
forever do send messages receive messages while (!some conditions) compute…
• Previous work: – look at synchronous runs (every message takes
exactly time)– count rounds or “s”[Keidar, Rajsbaum 01], [Dutta, Guerraoui 02], [Guerraoui, Raynal 04] [Dutta et al. 03], etc.
PODC 2006Keidar & Shraer, Technion, Israel
Are All “Rounds” the Same?
• Algorithm 1 waits for messages from a majority that includes a pre-defined leader in each round– takes 3 rounds
• Algorithm 2 waits for messages from all (unsuspected) processes in each round– E.g., group membership– takes 2 rounds
GIRAFGeneral Round-based Algorithm
Framework• Inspired by Gafni’s RRFD, generalizes it
• Organize algorithms into rounds
• Separate algorithm logic from waiting condition
• Waiting condition defines model
• Allows reasoning about lower and upper bounds for rounds of different types
PODC 2006Keidar & Shraer, Technion, Israel
Defining Properties in GIRAF
• Environment can have – perpetual properties
– eventual properties
• In every run r, there exists a round GSR(r)
• GSR(r) – the first round from which:– no process fails
– all eventual properties hold in each round
PODC 2006Keidar & Shraer, Technion, Israel
Defining Timeliness
• Timely link in round k: pd receives the round k message of ps, in round k
– if pd is correct, and ps executes round k (end-of-rounds occurs in round k)
Time – free!
PODC 2006Keidar & Shraer, Technion, Israel
Some Results: Context
• Consensus problem• Global decision time metric
– Time until all correct processes decide
• Message passing• Crash failures
– t < n/2 potential failures out of n>1 processes
PODC 2006Keidar & Shraer, Technion, Israel
◊LM Model: Leader and Majority• Nothing required before GSR
• In every round k ≥ GSR– Every correct process receives a round k
message from a majority of processes, one of which is the Ω-leader.
• Practically requires much shorter timeouts than Eventual Synchrony [Bakr, Keidar]
PODC 2006Keidar & Shraer, Technion, Israel
◊LM: Previous Work• Most Ω-based algorithms wait for
majority in each round (not ◊LM)
• Paxos [Lamport 98] works for ◊LM– Takes constant number of rounds in
Eventual Synchrony (ES)– But how many rounds without ES?
PODC 2006Keidar & Shraer, Technion, Israel
Paxos Run in ES
21 21
21
21
.
.
.
(Commit, 21 ,v1)
21
21
21
.
.
.
20 21
21
21
.
.
.
(“prepare”,21)
yes
decide v1
(Commit, 21, v1)
Ω Leader
BallotNum
number of attempts to decide initiated by leaders
1 2
5
20
.
.
.
no5
20
.
.
.
yes(“prepare”,2)
PODC 2006Keidar & Shraer, Technion, Israel
Paxos in ◊LM (w/out ES)
2
(“prepare”,2)
2
5
20
8
13
99
9
20
9
13
(“prepare”,9) (“prepare”,14)
Ω Leader
ok
no (5)
no (8)
ok
ok
no (13)
1
5
20
8
13
GSR GSR+1 GSR+2 GSR+3
BallotNum
Commit may take Ω(n) rounds!
PODC 2006Keidar & Shraer, Technion, Israel
What Can We Hope For?
• Tight lower bound for ES: 3 rounds from GSR [DGK05]
• ◊LM weaker than ES
• One might expect it to take a longer time in ◊LM than in ES
PODC 2006Keidar & Shraer, Technion, Israel
Result 1: Don't Need ES• Leader and majority can give you the
same performance!
• Algorithm that matches lower bound for ES!
PODC 2006Keidar & Shraer, Technion, Israel
Our ◊LM Algorithm in a Nutshell• Commit with increasing ballot numbers, decide on value
committed by majority– like Paxos, etc.
• Challenge: Don’t know all ballots, how to choose the new one to be highest one?
• Solution: Choose it to be the round number• Challenge: rounds are wasted if a prepare/commit fails. • Solution: pipeline prepares and commits: try in each round• Challenge: do they really need to say no?• Solution: support leader’s prepare even if have a higher
ballot number– challenge: higher number may reflect later decision! Won’t
agreement be compromised?– solution: new field “trustMe” ensures supported leader doesn't miss
real decisions
PODC 2006Keidar & Shraer, Technion, Israel
Example Run: GSR=100
1
5
20
8
13
Ω Leader
Rounds: GSR+1 GSR+2
101
101
101
101
101
8
8
20
13
13
GSR
<PREPARE, …, trustMe>All PREPAREwith !trustMe
All COMMIT
101
101
101
101
101
All DECIDE
Did not lead todecision
PODC 2006Keidar & Shraer, Technion, Israel
Question 2: ◊S and Ω Equivalent?
• ◊S and Ω equivalent in the “classical” sense [Chandra, Hadzilacos, Toueg 96]– Weakest for consensus
• ◊S: eventually (from GSR onward), – all faulty processes are suspected by every
correct process– there exists one correct process that is not
suspected by any correct process.
• Can we substitute Ω with ◊S in ◊LM?
PODC 2006Keidar & Shraer, Technion, Israel
Result 2: ◊S and Ω not that Equivalent
• Consensus takes linear time from GSR
• By reduction to mobile failure model [Santoro, Widmayer 89]
PODC 2006Keidar & Shraer, Technion, Israel
Result 3: Do We Need Oracles?• Timely communication with majority
suffices!
• ◊AFM (All-From-Majority) simplified: – In every round k ≥ GSR, every correct
process p receives round k message from a majority of processes, and p’s message reaches a majority of processes.
• Decision in 5 rounds from GSR– 1st constant time algorithm w/out oracle or ES– idea: information passes to all nodes in 2
rounds
PODC 2006Keidar & Shraer, Technion, Israel
• ◊MFM: Majority from Majority– The rest receive a message from a minority
• Only a little missing for ◊AFM• Stronger than models in literature
[Aguilera et al. 03, 04], [Malkhi et al. 05]
• Bounded time from GSR impossible!
Result 4: Can We Assume Less?
Conclusions• Which guarantees should one implement ?
– weaker ≠ better• some previously suggested assumptions are too
weak
– sometimes a little stronger = much better• worth longer timeouts / better hardware
– ES is not essential• not worth longer timeouts / better hardware
– future: more models, bounds to explore
• GIRAF