+ All Categories
Home > Documents > 12. Distributed DBMS Reliability - 3 of 3[Good]

12. Distributed DBMS Reliability - 3 of 3[Good]

Date post: 27-Apr-2015
Category:
Upload: lakhveer-kaur
View: 322 times
Download: 1 times
Share this document with a friend
35
Distributed Database Systems Autumn, 2007 Chapter 12 – Part 3 of 3 Distributed DBMS Reliability 1 Distributed Database Systems
Transcript
Page 1: 12. Distributed DBMS Reliability - 3 of 3[Good]

Distributed Database SystemsAutumn, 2007

Chapter 12 – Part 3 of 3

Distributed DBMS Reliability

1Distributed Database Systems

Page 2: 12. Distributed DBMS Reliability - 3 of 3[Good]

12.1 Reliability Concepts And Measures12.2 Failures And Fault Tolerance In

Distributed Systems12.3 Failures In Distributed DBMS12.4 Local Reliability Protocols12.5 Distributed Reliability Protocols

Distributed Database Systems 2

Page 3: 12. Distributed DBMS Reliability - 3 of 3[Good]

Dealing with Site FailuresSection 12.6

Distributed Database Systems 3

Page 4: 12. Distributed DBMS Reliability - 3 of 3[Good]

Problem with 2PC2PC is designed for dealing with system crashes.

Distributed Database Systems 4

Failed site can properly recover without consulting other sites.

Operational site can properly terminate properly without waiting for the recovery of failed site.

Independent recovery and non-blocking protocols exist only for single-site failures.

Page 5: 12. Distributed DBMS Reliability - 3 of 3[Good]

Problem with 2PC

2PC is inherently blocking !

Distributed Database Systems 5

Page 6: 12. Distributed DBMS Reliability - 3 of 3[Good]

Subsection 12.6.1

Termination and Recovery Protocols of 2PC

Page 7: 12. Distributed DBMS Reliability - 3 of 3[Good]

State transition in 2PC protocol

Distributed Database Systems 7

Page 8: 12. Distributed DBMS Reliability - 3 of 3[Good]

Coordinator time-outsThe coordinator can time-out in WAIT, ABORT, and COMMIT states.

WAIT◦ The coordinator is waiting for the local

decisions from the participants.◦ Solution: the coordinator decides to globally

abort the transaction by writing an abort record in the log, and sending a global abort to all participants

Distributed Database Systems 8

Page 9: 12. Distributed DBMS Reliability - 3 of 3[Good]

Coordinator time-outs

COMMIT or ABORT◦ The coordinator is not certain if the commit

or abort procedures have been completed by the LRMs of all participants.◦ Solution: resend the "global-commit" or

"global abort" to the site that have not acknowledged.

Distributed Database Systems 9

Page 10: 12. Distributed DBMS Reliability - 3 of 3[Good]

Participant time-outsA participant can time-out in INITIAL or READY states.INITIAL◦ The participant is waiting for a “prepare” message.◦ The coordinator must have failed in INITIAL state.◦ Solution: the participant unilaterally aborts the

transaction. If the "prepare" message arrives later. It can be responded by

vote abort, orjust ignoring the message. This causes the time-out of the coordinator in the WAIT state (see the previous discussion for this case).

Distributed Database Systems 10

Page 11: 12. Distributed DBMS Reliability - 3 of 3[Good]

Participant time-outsREADY◦ The participant must have "voted commit" and

therefore cannot change it and unilaterally abort it.◦ Solution: blocked until it can learn (from the

coordinator or other participants) the ultimate fate of the transaction.

In centralized communication structure, a participant has to ask the coordinator for its decision. If the coordinator failed, the participant will remain blocked.

Distributed Database Systems 11

Page 12: 12. Distributed DBMS Reliability - 3 of 3[Good]

Can blocking problem be overcome?

No!2PC is an inherently blocking protocol.

Distributed Database Systems 12

Page 13: 12. Distributed DBMS Reliability - 3 of 3[Good]

Analysis

Page 14: 12. Distributed DBMS Reliability - 3 of 3[Good]

Assumptions and definitions

Assume participants can communicate each other.Let Pi be the participant that time-outs in the READY state, and Pj be the participant to be asked.

Distributed Database Systems 14

Page 15: 12. Distributed DBMS Reliability - 3 of 3[Good]

All the cases that Pj can respond1. Pj is in the INITIAL state. This means Tj has

not voted yet. Pj can unilaterally abort the transaction and reply to Pi with a “vote-abort” message.

2. Pj is in the READY state. Pj does not know the global decision and cannot help.

3. Pj is in COMMIT or ABORT state. Pj can send global "vote-commit" or global "vote-abort" to Pi.

Distributed Database Systems 15

Page 16: 12. Distributed DBMS Reliability - 3 of 3[Good]

How Pi interprets these responses

1. Pi receives “vote-abort” from all Pj. Pi just proceed to abort the transaction.

2. Pi receives "vote-abort" from some Pj, but some other participants are in READY state. Pi go ahead and abort the transaction.

3. Pi receives the information that all Pj are READY. Pi is blocked, since it has no knowledge about the global decision.

Distributed Database Systems 16

Page 17: 12. Distributed DBMS Reliability - 3 of 3[Good]

How Pi interprets these responses

4. Pi receives either “global-abort” or “global-commit” messages from all Pj. Pi can go ahead and terminate the transaction according to the message.

5. Pi receives either “global-abort” or “global-commit” messages from some Pj, but others are in READY. Pi takes action same as (4).

Distributed Database Systems 17

These are all the alternatives that a termination protocol needs to handle.

Page 18: 12. Distributed DBMS Reliability - 3 of 3[Good]

Recovery protocols

The protocols that a failed coordinator or participant can use to recover when they restart.Assuming: 1. Writing log and sending messages are in an

atomic action,2. The state transition occurs after message

sending.

Distributed Database Systems 18

Page 19: 12. Distributed DBMS Reliability - 3 of 3[Good]

Coordinator site failureThe coordinator fails while in the INITIAL state.◦ Action: restart the transaction.

The coordinator fails while in the WAIT state.◦ Action: restart the commit process by sending the

“prepare” message once more.

The coordinator fails while in the COMMIT or ABORT state.◦ Action: If all ACK messages have been received, then

no action is needed; otherwise follow the termination protocols.

Distributed Database Systems 19

Page 20: 12. Distributed DBMS Reliability - 3 of 3[Good]

Participant site failuresA participant fails while in the INITIAL state.◦ Action: Upon recovery the participant should abort

the transaction unilaterally.

A participants fails while in the READY state.◦ Action: Same as time-out in the READY state and

follow its termination protocols (ask for help).

A participant fails while in the ABORT or COMMIT state.◦ Action: No action.

Distributed Database Systems 20

Page 21: 12. Distributed DBMS Reliability - 3 of 3[Good]

Additional casesThe first assumption of recovery protocols is relaxed, i.e. it is possible to fail after writing log but before sending a message to a site.The coordinator fails after begin_commit is written in the log but before the "prepare" message is sent.◦ Action: Same as a failure in the WAIT state, and send

the “prepare” message upon recovery.

All other additional cases can be treated on the basis on techniques discussed in this chapter.

Distributed Database Systems 21

Page 22: 12. Distributed DBMS Reliability - 3 of 3[Good]

Subsection 12.6.2

Three-Phase Commit Protocol

Page 23: 12. Distributed DBMS Reliability - 3 of 3[Good]

3PC – A non-blocking protocol

A commit protocol that is synchronous within one state transition is non-blockingif and only if its state transition diagram contains neither of following:1. A state that is adjacent to both commit and

abort state;2. A non-commitable state that is adjacent to

a commit state.

Distributed Database Systems 23

Page 24: 12. Distributed DBMS Reliability - 3 of 3[Good]

Action diagramCOMMIT – commitablestateWAIT, READY – non-commitable stateAdd a PRE-COMMITstate between WAIT and COMMIT for the coordinator, and between REDAY and COMMIT for participants.

Distributed Database Systems 24

Page 25: 12. Distributed DBMS Reliability - 3 of 3[Good]

State transitions

Distributed Database Systems 25

Page 26: 12. Distributed DBMS Reliability - 3 of 3[Good]

Termination protocolCoordinator time-out1. In the WAIT state

Same as in 2PC. The coordinator unilaterally aborts the transaction and send a “global abort” message to all participants that have voted to commit.

2. In the PRE-COMMIT stateAll participant must at least be in READY state (have voted to commit).The coordinator globally commit the transaction and send GC message to all operational participants.

3. In the COMMIT (or ABORT) stateNo action to take.

Distributed Database Systems 26

Page 27: 12. Distributed DBMS Reliability - 3 of 3[Good]

Termination protocolParticipants time-out1. In the INTIAL state

Same as 2PC.

2. In the READY stateThe participant does not know the global decision.Elect a new coordinator and the new coordinator terminates the transaction according to the termination protocols to be discussed below.

3. In the PRE-COMMIT stateThe participant is waiting for the "global commit" message from the coordinator.Solution: same as case 2.

Distributed Database Systems 27

Page 28: 12. Distributed DBMS Reliability - 3 of 3[Good]

For above case 2 and 3The new coordinator (elected from old participants) may be in WAIT, PRE-COMMIT, or ABORT sate.If the new coordinator is in WAIT, it will globally abort the transaction. The participants may be in◦ INITIAL

◦ READY

◦ ABORT

PRE-COMMIT: add an edge from PRE-COMMIT to ABORT

Distributed Database Systems 28

No problem for taking global or abort action

Page 29: 12. Distributed DBMS Reliability - 3 of 3[Good]

For above case 2 and 3If the new coordinator is in the PRE-COMMIT state, the participants can be in PRE-COMMIT or COMMIT (but no one can be in ABORT).◦ Solution: globally commit the transaction and send a

GC message to all participants.

If the new coordinator is in ABORT, all participants have to move to abort.

Distributed Database Systems 29

Page 30: 12. Distributed DBMS Reliability - 3 of 3[Good]

Recovery Protocols

The coordinator fails while in WAITThis causes participants time-out (see above discussion)◦ Solution: the recovered coordinator asks around to determine

the fate of the transaction.

The coordinator fails while in the PRE-COMMIT state.This causes participants time-out in the PRE-COMMIT state.◦ Solution: ask around upon recovery.

A participant fails while in the PRE-COMMIT state.◦ Solution: ask other participants when recovered.

Distributed Database Systems 30

Only indicate the differences from those in 2PC.

Page 31: 12. Distributed DBMS Reliability - 3 of 3[Good]

More about 3PC

Advantages

◦ non-blockingDisadvantages

◦ Fewer independent recovery cases◦More messages

Distributed Database Systems 31

Page 32: 12. Distributed DBMS Reliability - 3 of 3[Good]

Network PartitioningSection 12.7

Distributed Database Systems 32

Page 33: 12. Distributed DBMS Reliability - 3 of 3[Good]

Network PartitioningSimple partitioning◦ The network is partitioned into two parts.

Multiple partitioning◦ More than two parts.

In general, it is not possible to find a non-blocking termination protocols in the presence of network partitioning.It is possible to design an atomic non-blocking protocols that are resilient to simple partitioning.

Distributed Database Systems 33

Page 34: 12. Distributed DBMS Reliability - 3 of 3[Good]

Design decision

Allow partitions to continue their operations and compromise database consistency, orGuarantee the consistency by permitting one partition work, while the sites in other partitions remain blocked.

Distributed Database Systems 34

Page 35: 12. Distributed DBMS Reliability - 3 of 3[Good]

The End of Chapter 12


Recommended