+ All Categories
Home > Documents > 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation...

1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation...

Date post: 26-Mar-2015
Category:
Upload: lauren-dunn
View: 219 times
Download: 5 times
Share this document with a friend
Popular Tags:
28
1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond, WA Article MSR-TR-2003-96 Consensus on Transaction Commit http:// research.microsoft.com/research/pubs/view.aspx?tr_id =701
Transcript
Page 1: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

1

Paxos Commit

Jim GrayLeslie Lamport

Microsoft ResearchPreview of a paper in preparation

Presented Microsoft Research Techfest 3 March 2004, Redmond, WA

Article MSR-TR-2003-96

Consensus on Transaction Commithttp://research.microsoft.com/research/pubs/view.aspx?tr_id=701

Page 2: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

2

Commit is Common

• Marriage ceremony

• Theater

• Contract law

Do you?I do.I now pronounce you…

Ready on the set?Ready!Action!

OfferSignatureDeal / lawsuit

Page 3: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

3

Action!

Action!

Action!

The Common Picture

director actors

actors

actors

Ready?

Ready?

Ready?

Ready?

Ready

Ready

Ready

Ready

Action!

Page 4: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

4

All or Nothing: If any actor says no the deal is off.

director actors

actors

actors

Ready?

Ready?

Ready?

Ready?

Ready

No!

Ready

Ready

No deal!

No deal!

No deal!

No deal!

No! or timeout

Page 5: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

5

The Database Version

director actors

actors

actors

RM

RM

director

Commit

Ready

CommitCommit

TM: Transaction ManagerRM: Resource Manager

client TM RM

Ready?

Page 6: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

6

Two Phase Commit• N Resource Managers (RMs)• Want all RMs to commit or all abort.• Coordinated by Transaction Manager (TM)

TM sends Prepare, Commit-Abort• RM responds Prepared, Aborted• 3N+1 messages• N+1 stable writes• Delay

– 4 message– 2 stable write

• Blocking: if TM fails, Commit-Abort stalls

working

committed aborted

Transaction Manager

working

prepared

committed aborted

Resource Manager

RequestCommit

PreparePreparePreparePrepare

PreparePreparePrepareCommit

PreparePreparePreparePrepared

Page 7: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

7

The Problem With 2PC

• Atomicity – all or nothing

• Consistency – does right thing

• Isolation – no concurrency anomalies

• Durability / Reliability – state survives failures

• Availability: always up

Blocks if TM fails

Page 8: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

8

Problem Statement

• ACID Transactions make error handling easy.

• One fault can make 2-Phase Commit block.

• Goal: ACID and Available.Non-blocking despite F faults.

Page 9: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

9

RequestCommit

Prepare

Prepared

client

TM RM

TM RMRequestCommit

Prepare

Prepare

Prepared

Prepared

Fault-Tolerant Two Phase Commit

If the 2PC Transaction Manager (TM) Fails, transaction blocks. Solution: Add a “spare” transaction manager (non blocking commit, 3 phase commit)

Page 10: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

10

RequestCommit

Preparecommit

client

TM RM

TM RM

Prepare

Prepare

Prepared

Prepared

commitcommit

abort

commit

Fault-Tolerant Two Phase Commit

If the 2PC Transaction Manager (TM) Fails, transaction blocks.Solution: Add a “spare” transaction manager (non blocking commit, 3 phase commit)

But… What if….?

TM

Prepare

Prepared

commit

abort

Inconsistent! Now What?

The complexity is a mess.

Prepared

Page 11: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

11

Fault Tolerant 2PC

• Several workarounds proposed in database community:

• Often called "3-phase" or "non-blocking" commit.

• None with complete algorithm and correctness proof.

Page 12: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

12

“Reaching Agreement in the Presence of Faults”

• 25 years of theory

• Now called the Consensus problem

• N processes want to agree on a value, even if F of them have failed.

Shostak, Pease, & LamportJACM, 1980

Page 13: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

13

W Chosenclient

Propose X

consensusbox

client

clientPropose W

W Chosen

W Chosen

Consensus

• collects proposed values

• Picks one proposed value• remembers it forever

Page 14: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

14

RMPropose PreparedPrepared Chosen

consensusbox

Prepared Chosen

Prepared

Prepared

Prepared

RequestCommit

Prepare

Commit

client

TM RM

TMRequest Commit

Prepare

Prepare

CommitCommit

Commit

Commit

Consensus for CommitThe Obvious Approach

• Get consensus on TM’s decision.• TM just learns consensus value.• TM is “stateless”

Propose Prepared

Prepared Chosen

Page 15: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

15

RM

RM

RM1 Prepared Chosen

RM1 Prepared Chosen

RM2 Prepared Chosen

RequestCommit

Prepare

Commit

client

TM

TMRequest Commit

Prepare

Prepare

CommitCommit

Commit

Commitconsensus

box

consensusbox

Propose RM2 Prepared

Propose RM1 Prepared

Consensus for CommitThe Paxos Commit Approach

• Get consensus on each RM’s choice.• TM just combines consensus values.• TM is “stateless”

Propose RM1 Prepared

RM2 Prepared Chosen

Propose RM2 Prepared

Page 16: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

16

Prepared Chosen

Prepared

Prepare

Commit

Propose Prepared

RM1 Prepared Chosen

Prepare

Commit

Propose RM1 Prepared

RM2 Prepared Chosen

Propose RM2 Prepared

The Obvious Approach Paxos Commit

One fewer message delay

Page 17: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

17

RM

TM

TM

acceptor

acceptor

acceptor

Consensus boxPropose RM Prepared

Consensus in Action

• The normal (failure-free) case• Two message delays• Can optimize

Propose RM PreparedPropose RM Prepared

Vote RM Prepared

Vote RM Prepared

Vote RM PreparedRM

Prepared

Chosen

Page 18: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

18

RM

TM

TM

acceptor

acceptor

acceptor

Consensus box

Consensus in Action

TM

TM can always learn what was chosen,or get Aborted chosen if nothing chosen yet; if majority of acceptors working .

Page 19: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

19

The Complete Algorithm

• Subtle.

• More weird cases than most people imagine.

• Proved correct.

Page 20: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

20

Paxos Commit• N RMs

• 2F+1 acceptors (~2F+1 TMs)

• If F+1 acceptors see all RMs prepared, then transaction committed.

• 2F(N+1) + 3N + 1 messages5 message delays 2 stable write delays.

Client TM RM1…NAcceptors

0…2Frequestcommit

prepare

prepared

all prepared

commit

Page 21: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

21

Two-Phase Commit Paxos Commit

tolerates F faults

• 3N+1 messages

• N+1 stable writes

• 4 message delays

• 2 stable-write delays

• 3N+ 2F(N+1) +1 messages

• N+2F+1 stable writes

• 5 message delays

• 2 stable-write delays

Same algorithm when F=0 andTM = Acceptor

Page 22: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

22

Summary

• Commit is common

• Two Phase commit is good but…It is the un-availability protocol

• Paxos commit is non-blocking if there are at most F faults.

• When F=0 (no fault-tolerance), Paxos Commit == 2PC

Page 23: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

23

Page 24: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

24

Paxos Consensus

• Group has a leader known to all– leader election is a subroutine

• Process proposes a value v to leader.

• Leader sends proposal (phase 2) (ballot, value) to all acceptors

• Acceptors respond with:max(ballot, value) they have seen

• If leader gets no higher ballot, and gets at least F+1 responses then leader can announce (ballot, value)

• Full protocol 3-phase • Phase 1:

– Leader starts new ballot

• Phase 2– Leader proposes value

• Phase 3– If value accepted by F+1

then value is accepted. – If not, leader tries to get

majority value accepted.

6F+4 messages, 2F+1 stable writes4 message delays and 2 stable write delays

Page 25: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

25

RequestCommit

Prepare

Commit

Prepared

client

TM RM

TM RMRequestCommit

Prepare

Prepare

Prepared

Prepared

CommitCommit

Commit

Commitconsensus

boxconsensusbox

Using ConsensusHave a consensus for each RM

Page 26: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

26

X Chosen

RMPropose X

consensusbox

TM

TM

Propose W

X Chosen

X Chosen

Page 27: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

27

Paxos Commit (success case)

Acceptors

working

prepared

committed aborted

Resource Managers

working

AllPrepared aborted

Commit Leader

working

committed aborted

Request Commit

Prepare

Prepared Prepared

Prepared

Commit

All Prepared

Page 28: 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

28

Consensus• The distributed systems theory community has

thought about this a lot. • They call it Consensus:

N processes want to agree on a value• Want to tolerate F faults

– Tolerate F processes stopping– Tolerate F Messages delayed or lost

• If there are fewer than F faults in a windowThen consensus achieved.

• Byzantine faults need 3F “acceptors”• Benign faults need 2F+1 “acceptors”

stalls but safe if more than F faults


Recommended