Efficient Eventual Leader Election in Crash-Recovery Systems

UPV / EHU

Efficient Eventual Leader Election in Crash-Recovery

Systems

Mikel Larrea, Cristian Martín, Iratxe SoraluzeUniversity of the Basque Country, UPV/EHU

2

UPV / EHU

Mikel Larrea – Mannheim, May 2011

Contents

• Motivation

• System Model– Efficiency Definitions

• A Near-Efficient Algorithm– Instability Awareness

• Efficient Algorithms

• Relaxing the Assumptions

3

UPV / EHU


Motivation• Unreliable failure detectors have been used to

address Consensus and related problems in asynchronous crash-prone distributed systems– Theory: impossibility/possibility results, minimality

results– Practice: efficient implementations, transformations

• The Omega failure detector satisfies the following property (“eventual leader election”):– there is a time after which every correct process always

trusts the same correct process

• Omega is the weakest failure detector for solving Consensus in the crash failure model

4

UPV / EHU


Eventual Leader Election

p2

p1 p3

p4

p5

p6

p7

Ω=p4

crashedcorrect

Ω=p4 Ω=p4

Ω=p4

5

UPV / EHU


Is Omega a Failure Detector?• The Eventually Perfect failure detector (P) satisfies:

– Strong completeness: eventually every process that crashes is permanently suspected by every correct process

– Eventual strong accuracy: there is a time after which correct processes are not suspected by any correct process

• The Eventually Strong failure detector (S) satisfies:– Strong completeness– Eventual weak accuracy: there is a time after which

some correct process is never suspected by any correct process

• Omega is equivalent to S

6

UPV / EHU


This Work

• We address the implementation of Omega in the crash-recovery failure model– crashed processes can recover– some (unstable) processes can crash and recover

infinitely often

• Previously proposed algorithms are not efficient– they require every process to periodically send a

message to the rest of processes

• We propose several algorithms in which eventually, among correct processes, only one (the elected leader) keeps sending messages forever

7

UPV / EHU


System Model• Finite set of n processes = p1, p2, ..., pn that

communicate only by message-passing– processes are synchronous

• Every pair of processes is connected by two unidirectional communication links, one in each direction– types of links: eventually timely, fair lossy

• Crash-recovery failure model– types of processes: eventually up, eventually down,

unstable– eventually up processes are correct, the rest incorrect– we assume that at least one process is correct

8

UPV / EHU


Efficiency Definitions

• An algorithm implementing Omega in the crash-recovery failure model is efficient if there is a time after which only one process sends messages forever

• An algorithm implementing Omega in the crash-recovery failure model is near-efficient if there is a time after which, among correct processes, only one sends messages forever

• Since the leader must send messages forever, an efficient algorithm is also near-efficient

• In a near-efficient algorithm, besides the leader, unstable processes can send messages forever

9

UPV / EHU


A Near-Efficient Algorithm• Assumptions on communication reliability/synchrony:

– (i) for every correct process p, there is an eventually timely link from p to every correct and every unstable process

– (ii) for every unstable process u, there is a fair lossy link from u to every correct process

• Uses a set of candidates to become leader, and a counter of the number of times that each process has recovered– During initialization (and upon recovery), a RECOVERED

message is sent to the rest of processes– The leader is set to the process in the set of candidates

with the smallest associated counter

• If a process considers itself the leader, it sends a LEADER message periodically to the rest of processes

10

UPV / EHU


A Near-Efficient Algorithm

11

UPV / EHU


A Near-Efficient Algorithm

12

UPV / EHU


Unstable Processes Disagree• With this algorithm, eventually every correct process

always trusts the same correct process l. Consequently, eventually among correct processes, only one keeps sending LEADER messages ()

• Concerning the behavior of unstable processes:– (1) upon recovery, they send a RECOVERED message to

the rest of processes– (2) initially they trust themselves, and they can trust

other unstable processes before trusting process l ()

• We propose an adaptation that avoids (2)– initially they do not trust any process, and —if they

remain up for sufficiently long— then l until they crash– the adaptation assumes a majority of correct processes

13

UPV / EHU


Unstable Processes Disagree

p2

p1 p3

p4

p5

p6

p7

Ω=p4

eventually downeventually up

Ω=p4 Ω=p4

Ω=p4Ω=p2

Ω=p2

unstable

14

UPV / EHU


Instability Awareness

15

UPV / EHU



16

UPV / EHU



p2

p1 p3

p4

p5

p6

p7

Ω=p4

eventually downeventually up

Ω=p4 Ω=p4

Ω=p4Ω=p4

Ω=NULL

unstable

17

UPV / EHU



• The proposed adaptation makes the algorithm no longer near-efficient, since all correct processes may send PONG messages forever ()

• Can we design an algorithm such that…– processes do not have access to stable storage,– unstable processes eventually do not disagree,– and it is near-efficient?

• Yes We Can! ()

18

UPV / EHU


A Near-Efficient++ Algorithm

19

UPV / EHU


A Near-Efficient++ Algorithm

20

UPV / EHU


An Efficient Algorithm• Assumes that local stable storage is accessible

– process recovery counter– leader identity

• Assumption on communication reliability/synchrony:– (i) for every correct process p, there is an eventually

timely link from p to every correct and every unstable process

• No need of RECOVERED messages

• With this algorithm, eventually every process that is up, either correct or unstable, always trusts the same correct process l– assuming that every unstable process succeeds in

writing l definitely in its stable storage

21

UPV / EHU


Another Efficient Algorithm• Besides (i), assumes a non-decreasing local clock at

each process

• The elected leader will be the “oldest” correct process, i.e., the process that first recovers definitely

22

UPV / EHU


Relaxing the Assumptions

• Based on message relaying

• Weaker assumptions on communication reliability/synchrony:– (i’) for every correct process p, there is an

eventually timely path from p to every correct and every unstable process

– (ii’) for every unstable process u, there is a fair lossy link from u to some correct process

• Algorithms are no longer (near-)efficient

23

UPV / EHU


The One Slide to Remember• The Omega failure detector provides an eventual

leader election functionality in a distributed system– Theory: weakest failure detector for solving Consensus– Practice: used by several real fault-tolerant protocols

• It is interesting to design efficient algorithms implementing Omega

• In the crash-recovery failure model, we have to cope with unstable processes– to avoid them to send messages forever– to avoid disagreement with correct processes

• Stable storage, if available, makes things easier

24

UPV / EHU


An Example: Paxos• Leslie Lamport. The Part-Time Parliament.

ACM Transactions on Computer Systems, 1998. First submitted in 1990!

• Leader-based Consensus algorithms– Could benefit from efficient leader election

• Production use of Paxos (from wikipedia):– Google Chubby distributed lock service– IBM SAN Volume Controller– Microsoft Autopilot cluster management service– WANdisco Distributed Coordination Engine– Scalien Keyspace

Date post:	11-Feb-2016
Category:	Documents
Upload:	wells
View:	30 times
Download:	0 times

Efficient Eventual Leader Election in Crash-Recovery Systems

Documents