+ All Categories
Home > Documents > DISTRIBUTED TRANSACTION

DISTRIBUTED TRANSACTION

Date post: 21-Jan-2016
Category:
Upload: quasim
View: 43 times
Download: 1 times
Share this document with a friend
Description:
DISTRIBUTED TRANSACTION. FASILKOM UNIVERSITAS INDONESIA. What is a Transaction?. An atomic unit of database access, which is either completely executed or not executed at all. - PowerPoint PPT Presentation
58
1 DISTRIBUTED DISTRIBUTED TRANSACTION TRANSACTION FASILKOM UNIVERSITAS INDONESIA
Transcript
Page 1: DISTRIBUTED TRANSACTION

1

DISTRIBUTEDDISTRIBUTEDTRANSACTIONTRANSACTION

FASILKOM

UNIVERSITAS INDONESIA

Page 2: DISTRIBUTED TRANSACTION

2

What is a Transaction?

An atomic unit of database access, which is either completely executed or not executed at all.

It consists of an application specified sequence of operation, beginning with a begin_transaction primitive and ending with either commit or abort.

Page 3: DISTRIBUTED TRANSACTION

3

E.g.

Transfer $200 from account A in London to account B in Depok:begin_transaction

amntA = lookup amount in account A

amntB = lookup amount in account Bif (amntA < $200) abort

set account A = amntA - $200

set account B = amntB + $200 commit

Page 4: DISTRIBUTED TRANSACTION

4

Transaction Properties

Four main properties, the ACID properties:– Atomicity: A transaction must be all or nothing.– Consistency: A transaction takes the system form one

consistent state to another consistent state.– Isolation: The results of an incomplete transactions

are not allowed to be revealed to other transactions.– Durability: The results of a committed transaction will

never be lost, independent of subsequent failures.

Atomicity & durability -> failure tolerance

Page 5: DISTRIBUTED TRANSACTION

5

Failure Tolerance

Atomicity & durability -> failure tolerance Types of failures :

• Transaction-local failures detected by the application (e.g.insufficient funds)

• Transaction-local failures not detected by the application (e.g. divide by zero)

• System failures affecting volatile storage (e.g. CPU failure)• Media failures (e.g. HD crash)

What is a volatile storage? What is a stable storage?

Page 6: DISTRIBUTED TRANSACTION

6

Recovery

Based on redundancy. For example :

1.Periodically archive database2.Every time a change is made, record old and new values

to a log.

3.If a failure occurs :• If not damage to physical database undo all ‘unreliable’ changes.• If database physically damaged, restore from archive and redo

changes

Page 7: DISTRIBUTED TRANSACTION

7

Logging (1)

Database vs transaction log. For each change (begin transaction,

commit, and abort), write a log record with:

• Transaction ID (TID)• Record ID• Type of action• Old value of record• New value of record• Other info, e.g. pointer to previous log record of this

transaction.

Page 8: DISTRIBUTED TRANSACTION

8

Logging (2)

After a failure we need to undo or redo changes.

Undo and redo must be idempotent as there may be a failure whilst they are executing.

Page 9: DISTRIBUTED TRANSACTION

9

Log Write-ahead Protocol (1)

Before performing any update, at least the undo portion of the log record must be written to stable storage.

Before committing a transaction, all log records must have been fully recorded on stable storage. The commit record is written after these.

Page 10: DISTRIBUTED TRANSACTION

10

Log Write-ahead Protocol (2)

Reason for first rule :– If we change log before database :

• log -- change -- crash • log -- crash

– If we change log after database :• change -- log -- crash • change -- crash can’t undo

Page 11: DISTRIBUTED TRANSACTION

11

Checkpointing (1)

How does the recovery manager know which transaction to undo an which to redo after a failure.

Naive approach :– Examine entire log from the start. Look for

begin transaction records: • if a corresponding commit record exists, redo; • if there’s an abort, do nothing; and • if neither, undo.

Page 12: DISTRIBUTED TRANSACTION

12

Checkpointing (2)

Alternative:– Every so often:

1) Force all log buffers to disk.2) Write a checkpoint record to disk containing:

a) A list of all active transactionsb) The most recent log records for each transaction in a)

3) Force all database buffers to disk - disk is now totally up-to-date.

4) Write address of checkpoint record to fixed ‘restart location’ (had better be atomic).

Page 13: DISTRIBUTED TRANSACTION

13

Checkpointing (3)

There are 5 categories of transaction:

Time

T1

T2

T3

T4

T5

CrashCheckpointing

Leave

Redo

Undo

Undo

Redo

Page 14: DISTRIBUTED TRANSACTION

14

Recovery (1)

Look for most recent checkpoint record. For all records active at checkpoint must:

– undo all active at failure– redo all others

Page 15: DISTRIBUTED TRANSACTION

15

Recovery (2)

Have 2 lists: undo and redo Initially, undo contains all TIDs in

checkpoint record & redo is empty3 passes through log:

– Forwards from checkpoint to end:• If we find ‘begin_transaction’ add undo list.• If we find ‘commit’, transfer from undo to redo list.• If we find ‘abort’, remove from undo list.

– Backwards from end to checkpoint: undo.– Forwards from checkpoint to end: redo.

Page 16: DISTRIBUTED TRANSACTION

16

Commit Protocols

Commit protocols. Assume a set of cooperating managers

which deal with parts of a transaction. For atomicity we must ensure that

– At each site, either all actions or none are performed.– All sites take the same decision on whether to commit

or abort

Page 17: DISTRIBUTED TRANSACTION

17

Two Phase Commit (2PC) Protocol - 1

One node, the coordinator, has a special role, the others are participants.

The coordinator initiates the 2PC protocol. If any participant cannot commit, then all

site must abort.

Page 18: DISTRIBUTED TRANSACTION

18

2PC – 2

Phase I:– reach a common decision on whether to abort

or commit

Phase II:– Implement the decision at all sites

Page 19: DISTRIBUTED TRANSACTION

19

2PC - 3

I

U

CA

Coordinator

tm/ACM

-/PM

AAM/ACM

RM/CCM

I

R

AC

Participant

ua/ -

CCM/ - ACM/ -

PM/AAM

PM/RM

2PC

States:I = Initial stateU = UndecidedR = Ready to CommitA = AbortC = Commit

Messages:PM = Prepare MessageRM = Ready MessageAAM = Abort Answer MessageACM = Abort Command MessageCCM = Commit Command Message

Other Transitions:ua = Unilateral Aborttm = timeout

Page 20: DISTRIBUTED TRANSACTION

20

2PC – Phase 1 Coordinator:

– Write prepare record to log– Multicast prepare message and set timeout

Participant:– Wait for prepare message– If we are willing to commit then

• force log records to stable storage• write ready record in log• send ready message to coordinator

– else• write ABORT in log• send abort answer message to coordinator

Page 21: DISTRIBUTED TRANSACTION

21

2PC – Phase 2 (1)

Coordinator:– wait for a reply messages (ready or abort) or timeout– If timeout expires or any message is abort

• write global abort record in the log

• send abort command message to all participants

– else• if all answers were ready

• write global commit record to log

• send commit command message to all participants

Page 22: DISTRIBUTED TRANSACTION

22

2PC – Phase 2 (2)

Participants:– Wait for command message (abort or commit)– write abort or commit in the log– send ack message to coordinator– execute command (may be null)

Coordinator:– wait for ack messages from all participants– write complete in the log

Page 23: DISTRIBUTED TRANSACTION

23

2PC – Site Failures

Resilient to all failures in which no log information is lost.

Site failures– participants fails before having written ready to log:

• timeout expires ---> ABORT

– Participants fails after having written ready to log:• Msg sent -- others take decision. This node gets outcome

from the coordinator or other participants after restart

• Msg unsent -- timeout expires ---> ABORT

Page 24: DISTRIBUTED TRANSACTION

24

2PC – Coordinator Failures

Coordinator fails after writing prepare but before global commit/global abort (globalX).– All participants must wait for recovery of coordinator ->

BLOCKING– Recovery of coordinator involves restarting protocol from identities

in prepare log record– Participants must identify duplicate prepare messages

Coordinator fails after having written global X but before writing complete.– On restart, coordinator must resend decision, to ensure blocked

processes get it. Others must discard duplicate. Coordinator fails after having written complete.

– No action needed

Page 25: DISTRIBUTED TRANSACTION

25

2PC – Lost Messages A reply message (ready or abort) from a

participant is lost.– Timeout expires -- coordinator ABORTs

A prepare message is lost.– Timeout expires -- coordinator ABORTs

A commit/abort command message is lost.– Timeout in participant -- request repetition of command

from the coordinator. An ack message is lost

– Timeout in coordinator -- coordinator resends command

Page 26: DISTRIBUTED TRANSACTION

26

2PC - Partitions

Everything aborts as coordinator can’t contact all participants. Those participants in partition without coordinator may remain blocked & the resources are still retained until the blocked participants are unblocked.

Page 27: DISTRIBUTED TRANSACTION

27

2PC - Comments

Blocking is a problem if the coordinator or network fails which reduces availability -> use 3PC.

Unilateral abort.– Any node can abort until it sends ready (site autonomy

before the ready state). Efficiency can be increased:

– Elimination of prepare messages. The participants, that can commit, will automatically send RM.

– Presumed commit/abort , if there’s no information found in the log. See [CER84] 13.5.1,2,&3.

Page 28: DISTRIBUTED TRANSACTION

28

Impossible Termination in 2PC

No operational participant has received the command. The operational participants are in the R state, but they haven’t received the ACM or CCM, AND

At least one participant failed. Unfortunately the failed participant acted as the coordinator.

Page 29: DISTRIBUTED TRANSACTION

29

Impossible Termination in 2PC

The failed participant might have already performed an undone action (commit or abort), i.e. in the C or A state.

The operational participants can’t know what the failed participant had done, and can’t take an independent decision.

The problem is solved by the 3PC.

Page 30: DISTRIBUTED TRANSACTION

30

3PC (1)3PC

I

U

BCA

Coordinator

tm/ACM

-/PM

AAM/ACMRM/PCM

I

R

APC

Participant

ua/ -

PCM/OK ACM/ -

PM/AAM

PM/RM

tm/ACM

C

OK/CCM

C

CCM/ -

New States:PC = Prepared to CommitBC = Before Commit

New Messages:PCM = Prepare to CommitOK = Entered PC statepossible restart

transitions

Restart 1

Restart 2

Page 31: DISTRIBUTED TRANSACTION

31

3PC (2)

Case study:– See slide no 3.

– London: Coordinator & Participant1

– Depok: Participant2

Page 32: DISTRIBUTED TRANSACTION

32

3PC (3)

3PC avoids problems with 2PC:1. If any operational participant has received an abort

then all can abort. The failed participant will abort at restart if it hasn’t already. [As 2PC] E.g. Depok fails, London is operational and has received an ACM.

2. If any participants has received the PCM, then all can commit. The failed participant (e.g.cannot have aborted unilaterally, because it had answered READY (RM). The failed participant will commit at restart (see “restart 1”). E.g. London fails, Depok is operational and has received the PCM.

Page 33: DISTRIBUTED TRANSACTION

33

3PC (4)

3. If none of the operational participants has received the PCM participant, i.e. all of the operational participants are in the R state, then 2PC would block. With 3PC we can abort safely since the failed participant cannot have committed. At most it has received the PCM -> it can abort at restart (see “restart 2”). E.g. London fails, Depok is operational and has NOT received the PCM (in the R state).

Page 34: DISTRIBUTED TRANSACTION

34

3PC (5)

3PC guarantees that there won’t be blocking condition caused by all possible failures during the 2nd phase.

Failures during the 3rd phase -> blocking???– If coordinator fails in 3rd phase, then elect

another and continue the commit process (since all must be in the PC state).

Page 35: DISTRIBUTED TRANSACTION

35

Consistency & Isolation Consistency & isolation -> concurrency control. The Lost Update Problem:

Transaction 1

Read X

Update X

Transaction 2

Read X

Update X

Lost update

time

Page 36: DISTRIBUTED TRANSACTION

36

The Uncommitted Dependency (Temporary Update) Problem

Transaction 1

Read X

Transaction 2

Update X

ABORT

temporary incorrect value of X,because Trasaction2 is aborted.

time

Page 37: DISTRIBUTED TRANSACTION

37

The Inconsistent Analysis Problem

Transaction 1

sum := 0Read Asum := sum + A

Transaction 2

Read Bsum := sum + B

Read A

Read B

Update A

Update B

COMMIT

time

before the update by transaction2

after the update by transaction2

Page 38: DISTRIBUTED TRANSACTION

38

Concurrent Transactions

If we have concurrent transactions, we must prevent interference.

c.f. lost update problem– Prevent T2’s read (because T1 has seen it and may update it)

[Locking]– Prevent T1’s update (because T2 has seen it) [Locking]– Prevent T2’s update (because T1 has already updated it and so

this is based on obsolete values) [timestamping]– Have them work independently and resolve difficulties on

commit.[Optimistic concurrency control]

Page 39: DISTRIBUTED TRANSACTION

39

Serializability

What we need is some notion of correctness.

Serializability is usually used write to transactions.

Page 40: DISTRIBUTED TRANSACTION

40

Serial Transactions

Two transactions execute serially if all operations of one precede all operations of the other. e.g:

S1: Ri(x) Wi(x) Ri(y) Rj(x) Wj(y) Rk(y) Wk(x), or

S1: TiTjTk, S2: TkTjTi, ………..

S1 = Schedule 1, S2 = Schedule 2 All serial schedules are correct, but restrictive of

concurrency .

Page 41: DISTRIBUTED TRANSACTION

41

Transaction Conflict

Two operations are in conflict if:– At least one is a write– They both act on the same data– They are issued by different transactions

Which of the following are in conflict?

Ri(x) Rj(x) Wi(y) Rk(y) Wj(x)

Page 42: DISTRIBUTED TRANSACTION

42

Computationally Equivalent

Two schedules (S1 & S2) are computationally equivalent if:– The same operations are involved (possibly

reordered)

– For every pair of operations in conflict (Oi & Oj),such that Oi precedes Oj in S1, then also Oi

precedes Oj in S2.

Page 43: DISTRIBUTED TRANSACTION

43

Serializable Schedule

A schedule is serializable if it is computationally equivalent to a serial schedule. e.g:

Ri(x) Rj(x) Wj(y) Wi(x)(which is not a serial schedule)is computationally equivalent to:

Rj(x) Wj(y) Ri(x) Wi(x)(which is a serial schedule: TjTi)

The following is NOT a serial schedule. But is it serialisable? Ri(x) Rj(x) Wi(y) Rk(y) Wj(x)The above schedule is computationally equivalent to serial schedules: TiTjTk, TiTkTj.

Page 44: DISTRIBUTED TRANSACTION

44

Serializability in Distributed Systems (1)

A local concurrency control mechanism isn’t sufficient. e.g:– Site 1: Ri(x) Wi(x) Rj(y) Wj(x) i.e: Ti < Tj

– Site 2: Rj(y) Wj(y) Ri(y) Wi(y) i.e: Tj < Ti

Page 45: DISTRIBUTED TRANSACTION

45

Serializability in Distributed Systems (2)

Let T1…Tn be a set of transactions and E be an execution of these modeled by schedules S1…Sm on machines 1…m.

Each local schedule (S1…Sm) is serialisable. Then E is serialisable (in distributed systems) if,

for all i and j, all conflicting operations from Ti and Tj in each of the schedules have the same order i.e. there is a global total ordering for all sites.

Page 46: DISTRIBUTED TRANSACTION

46

Locking (1)

How to implement serializability use locking

Shared/eXclusive (Read/Write) locks:1. A transaction T must have SLockx or

XLockx before any Read X.2. A transaction T must have XLockx before

any Write X.3. A transaction T must issue unLockx after

Read x or Write x is completed.

Page 47: DISTRIBUTED TRANSACTION

47

Locking (2)

4. A transaction T can upgrade the lock, i.e. issuing a XLockx after having SLockx, as long as T is the only transaction having Slockx. Otherwise T must wait.

5. A transaction T can downgrade the lock, i.e. issuing a SLockx after having XLockx.

Page 48: DISTRIBUTED TRANSACTION

48

Locking (3)

E.g.T1: X = X + Y T2: Y = X + Y

If initially X=20, Y=30 then either:– S1: T1 < T2: X=50, Y=80

– S2: T2 < T1: X=70, Y=50

Both are serial schedules, thus both are correct.

Page 49: DISTRIBUTED TRANSACTION

49

Locking (4)

However using Shared/eXclusive (Read/Write) locks does NOT guarantee serializability.

If any transaction releases a lock and then acquires another, it may produce incorrect results.

Page 50: DISTRIBUTED TRANSACTION

50

Locking (5)T1 T2

SLock xtemp2 = x 20

XLock y

y = temp2 + temp3 50

COMMIT

SLock y

temp1=y 30

unLock y

XLock x

x = temp4 + temp1 50

COMMIT

unLock xunLock x

temp3 = y

unLock y

temp4 = x 20

unLock x The schedule is NOT serializable!!!So it is NOT correct

time

Page 51: DISTRIBUTED TRANSACTION

51

Locking (6)

What is the problem?– It was too early unlocking Y in T1 and

unlocking X in T2. See the italics unLock Y and unLock X.

What is the solution?– 2 Phase Locking (2PL).

Page 52: DISTRIBUTED TRANSACTION

52

2PL - 1

Two phase locking (2PL)– Before operating on any object the transaction must

obtain a lock for it.– After releasing a lock the transaction never acquires

more locks– 2 phases:

1. Expanding (growing) phase: acquiring new locks, but NEVER releasing any locks.

2. Shrinking phase: releasing existing locks, but NEVER acquiring new locks.

Page 53: DISTRIBUTED TRANSACTION

53

2PL - 2

Exercise: modify the schedule on slide 50 by following the 2 PL.

2PL may cause deadlocks. See [ELM00]. If a schedule obeys 2PL it is serializable. How is the vice versa? Do all serializable

schedules follow the 2 PL?

Page 54: DISTRIBUTED TRANSACTION

54

2PL - 3Serializable but not 2PL

Ri (x) Temp1 = xWi (x)

Rj (x)

Wj (x) Ri (y)

Wi (y)

Rj (y)

Wj (y)

Equivalent 2PL

Ri (x)

Wi (x)

Rj (x)

Wj (x)

Ri (y)

Wi (y)

Rj (y)

Wj (y)

Account x at site1 & account y at site2.Ti : Ri(x) Wi(x) Ri(y) Wi(y)Tj : Rj(x) Wj(x) Rj(y) Wj(y)

Site1 Site2Site1 Site2

New problem: 2 PL may limit the amount of

concurrency. See the schedule on the right.

time

Page 55: DISTRIBUTED TRANSACTION

55

Optimistic Concurrency Control

Locking is pessimistic. Assume instead that contention is rare– All updates made to a private copy– On commit see if there are conflicts with other

transactions started afterwards.– If not, install changes atomically– else ABORT

Deadlock free & maximum parallelism, but may get livelock.– What is livelock?

Page 56: DISTRIBUTED TRANSACTION

56

Timestamping (1)

Again, no deadlock Rules:

– Each transaction receives a globally unique timestamp, TSi when started.

– Updates are not physically installed until commit.

– Every objects in the database carries the timestamp of the last transaction to read it (RTM(x)) and the last to write it (WTM(x))

Page 57: DISTRIBUTED TRANSACTION

57

Timestamping (2)

– If a transaction, Ti, requests an operation that conflicts with a younger transaction Tj, then Ti is restarted with a new timestamp.

– An operation from Ti is in conflict with an operation from Tj if.:

- It is a read and the object has already been

update by Tj; i.e. TSi < WTM(x), read operation is rejected & Ti is started with new time stamp. If the read is OK, set RTM(x) = max(TSi,RTM(x))

- It is update and the object has already been read or update by Tj; i.e. TSi < RTM(x) or

TSi < WTM(x), update operation is rejected & Ti is started with new time stamp. If the update is OK, set WTM(x) = TSi.

Page 58: DISTRIBUTED TRANSACTION

58

References

[CER84] Ceri, S., G. Pelagatti. Distributed Databases: Principles and Systems. New York: McGraw-Hill, 1984

[ELM00] Elmasri R,. S.B. Navathe. Fundamentals of Database Systems 3rd ed. Reading: Addison-Wesley, 2000


Recommended