+ All Categories
Home > Documents > 1 CS411 Database Systems 12: Recovery obama and eric schmidt sysadmin song .

1 CS411 Database Systems 12: Recovery obama and eric schmidt sysadmin song .

Date post: 29-Mar-2015
Category:
Upload: jase-bebb
View: 218 times
Download: 4 times
Share this document with a friend
Popular Tags:
42
1 CS411 Database Systems 12: Recovery obama and eric schmidt http://www.youtube.com/watch? v=k4RRi_ntQc8 sysadmin song http://www.youtube.com/watch?v=udhd9fmOdCs 14th century sysadmin http://www.youtube.com/watch?v=8UXAF- CUmIA
Transcript
Page 1: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

1

CS411Database Systems

12: Recoveryobama and eric schmidt http://www.youtube.com/watch?v=k4RRi_ntQc8sysadmin song http://www.youtube.com/watch?v=udhd9fmOdCs14th century sysadmin http://www.youtube.com/watch?v=8UXAF-CUmIA

Page 2: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

2

Bad things happen, but the DB contents must live on regardless.

System crashes are the most common problem.

We’ll worry about media failure later.

Page 3: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

3

On restart, some transactions should be aborted, others must be durable.

crash!T1

T2

T3

T4

T5

T1, T2, T3 are should be durable.

T4, T5 should be aborted.

Page 4: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

Recovery has a big impact on buffer management.

Force T’s writes to disk at commit time?– Poor response time.– If not, how do we

guarantee durability?

Steal dirty buffer pool pages from uncommitted Tns?– If not, poor throughput.– If so, what about

atomicity?

Force

No Force

No Steal Steal

Trivial

Desired

If T aborts, must undo T’s writes on disk!

Page 5: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

The log helps us guarantee atomicity and durability.

Append-only file with all info needed to REDO or UNDO every write

Give it its own disk (why?)

Page 6: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

6

Undo Logging

(force, steal)

Page 7: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

7

Undo logs don’t need to save after-images

Log record types:<START T>

– transaction T has begun

<COMMIT T> – T has committed

<ABORT T>– T has aborted

<T, X, old_v>– T has updated element X, and its old value was

old_v

Page 8: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

8

Undo logging has 2.5 rules.

U1: If T modifies X, then the log record <T, X, old_v> must be on disk before X is written to disk

U2: If T commits, then <COMMIT T> can’t be written to disk until all data changes by T are on disk (“early OUTPUTs”)

There may be many pages

to force, &

other Tns may want

them in memor

y

U2.5: Need to do the right thing when a transaction aborts (what?)

Buffer management

rule, not a logging rule

Page 9: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

9

Crash recovery is easy with an undo log.1. Scan log, decide which transactions T

completed. <START T>….<COMMIT T>…. <START T>….<ABORT T>……. <START T>………………………

2. Starting from the end of the log, undo all modifications made by incomplete transactions.

The chance of crashing during recovery is relatively high!

But undo recovery is idempotent: just restart it if it crashes.

Page 10: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

10

Detailed algorithm for undo log recovery

From the last entry in the log to the first:– <COMMIT T>: mark T as completed– <ABORT T>: mark T as completed– <T,X,v>: if T is not completed

then write X=v to disk else ignore

– <START T>: ignore

So how should we

handle ordinary

Tn aborts?

Page 11: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

11

Undo recovery practice

…<T6,X6,v6>……<START T5><START T4><T1,X1,v1><T5,X5,v5><T4,X4,v4><COMMIT T5><T3,X3,v3><T2,X2,v2>

Which actions do we undo, in which order?

What could go wrong if we undid them in a different order?

Page 12: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

12

Scanning a year-long log is SLOW and businesses lose money every minute their DB is down.

Solution: checkpoint the database periodically.

Easy version:

1. Stop accepting new transactions

2. Wait until all current transactions complete

3. Flush log to disk

4. Write a <CKPT> log record, flush

5. Resume transactions

Page 13: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

13

During undo recovery, stop at first checkpoint.

……<T9,X9,v9>……(all completed)<CKPT><START T2><START T3<START T5><START T4><T1,X1,v1><T5,X5,v5><T4,X4,v4><COMMIT T5><T3,X3,v3><T2,X2,v2>

T2,T3,T4,T5

other transactions

Page 14: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

14

This “quiescent checkpointing” isn’t good enough for 24/7 applications. Instead:

1. Write <START CKPT(T1,…,Tk)>,where T1,…,Tk are all active transactions

2. Continue normal operation

3. When all of T1,…,Tk have completed, write <END CKPT>

Page 15: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

15

Example of undo recovery with nonquiescent checkpointing

…………

…<START CKPT T4, T5, T6>…………<END CKPT>………

T4, T5, T6, plus later transactions

earlier transactions plus T4, T5, T5

later transactions

What would go wrong if we didn’t use<END CKPT> ?

What would go wrong if we didn’t use<END CKPT> ?

Page 16: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

16

Crash recovery algorithm with undo log, nonquiescent checkpoints.

1. Scan log backwards until the start of the latest completed checkpoint, deciding which transactions T completed. <START T>….<COMMIT T>…. <START T>….<ABORT T>……. <START CKPT {T…}>….<COMMIT T>…. <START CKPT {T…}>….<ABORT T>……. <START T>………………………

2. Starting from the end of the log, undo all modifications made by incomplete transactions.

Page 17: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

17

Redo Logging

(no force, no steal)

Page 18: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

18

Redo log entries are just slightly different from undo log entries.

<START T>

<COMMIT T>

<ABORT T>

<T, X, new_v> – T has updated element X, and its new value is

new_v

same as before

Page 19: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

19

Redo logging has one rule.

R1: If T modifies X, then both <T, X, new_v> and <COMMIT T> must be written to disk before X is written to disk (“late OUTPUT”)

Don’t have to force all those

dirty data pages to disk

before committing!

Don’t steal dirty buffer pages

from uncommitted

tns!

Implicit and reasonable

assumption: log records reach disk in order;

otherwise terrible things will happen.

Page 20: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

20

Recovery is easy with an undo log.

1. Decide which transactions T completed. <START T>….<COMMIT T>…. <START T>….<ABORT T>……. <START T>………………………

2. Read log from the beginning, redo all updates of committed transactions.

The chance of crashing during recovery is relatively high!

But REDO recovery is idempotent: just restart it if it crashes.

Page 21: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

21

Example of redo recovery

<START T1><T1,X1,v1><START T2><T2, X2, v2><START T3><T1,X3,v3><COMMIT T2><T3,X4,v4><T1,X5,v5>……

Which actions do we redo, in which order?

What could go wrong if we redid them in a different order?

Page 22: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

22

Nonquiescent checkpointing is trickier with a redo log than an undo log

1. Write a <START CKPT(T1,…,Tk)>where T1,…,Tk are the active transactions

2. Flush to disk all dirty data pages of transactions committed by the time the checkpoint started, while continuing normal operation

3. After that, write <END CKPT>

dirty = written

Page 23: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

23

What exactly does “dirty” mean?

• When you are talking about buffer management and which buffers you can steal, a dirty page is a data page in memory that has been modified but not yet sent back to disk.

• When you are talking about concurrency control, a dirty page is a data page in memory that has been modified but not yet committed. A dirty read is a read of a dirty page.

Either way, the dirty pages are the ones that can get you in trouble.

Page 24: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

24

Example of redo recovery with nonquiescent checkpointing

…<START T1>…<COMMIT T1>……<START CKPT T4, T5, T6>……<END CKPT>……<START CKPT T9, T10>…

1. Look forthe last<END CKPT>

2. Redo from <START T>, for committed T in {T4, T5, T6}.

3. Normal redo for committed Tns that started after this point.

All data written by T1 is known

to be on disk

Page 25: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

25

But neither undo nor redo logging matches what we would like to have for buffer management

Force

No Force

No Steal Steal

Trivial

Desired

Undo Logging

Redo Logging

Use undo/redo logging to attain this

nirvana

Page 26: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

26

Redo/undo logs save both before-images and after-images.

<START T> <COMMIT T> <ABORT T><T, X, old_v, new_v>

– T has written element X; its old value was old_v, and its new value is new_v

Page 27: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

Undo/redo recovery has 1.5 rules.

1. Must force the log record for an update to disk before the corresponding data page goes to disk.

As usual, T committed iff <T

commits> is on disk

1.5: Need to do the right thing when a transaction aborts (what?)

Item X can be updated on disk once <T wrote X> is

on disk , before <T

commits> is on disk (i.e., early or late

OUTPUT)

“Write-ahead

logging”

Page 28: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

28

Recovery is more complex with undo/redo logging.

1. Redo all committed transactions, starting at the beginning of the log

2. Undo all incomplete transactions, starting from the end of the log

<START T1><T1,X1,v1><START T2><T2, X2, v2><START T3><T1,X3,v3><COMMIT T2><T3,X4,v4><T1,X5,v5>……

REDO

UNDO

“incomplete” = started &

not committed or aborted

How do we know these undos won’t undo some committed

writes?

Page 29: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

29

Algorithm for non-quiescent checkpoint for undo/redo

1. Write <start checkpoint, list of all active transactions> to log

2. Flush log to disk3. Write to disk all dirty buffers,

whether or not their transaction has committed(this implies some log records may

need to be written to disk (WAL))

4. Write <end checkpoint> to log

5. Flush log to disk29

Flush dirty

buffer pool

pages

<start checkpoint, active Tns are T1, T2, …>

<end checkpoint>

Active

Tns

Pointers are one of

many tricks to speed up future

undos

Page 30: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

UNDO

30

Algorithm for undo/redo recovery with nonquiescent checkpoint 1. Backwards undo pass (end of log to

start of last completed checkpoint)

a. C = transactions that committed after the checkpoint started

b. Undo actions of transactions that (are in A or started after the checkpoint started) and (are not in C)

2. Undo remaining actions by incomplete transactionsa. Follow undo chains for transactions in

(checkpoint active list) – C

3. Forward pass (start of last completed checkpoint to end of log)

a. Redo actions of transactions in C

Active

Tns…

<start checkpoint, A=active Tns>

…<end checkpoint>

REDO

S

Page 31: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

31

Examples what to do at recovery time?

no <T1 commit>

Undo T1 (undo A, B, C)

…T1 wrote A, ……checkpoint start (T1 active)

…T1 wrote B, ……checkpoint end…T1 wrote C, ……

Page 32: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

32

Redo T1: (redo B, C)

…T1 wrote A, ……checkpoint start (T1 active)

…T1 wrote B, ……checkpoint end…T1 wrote C, ……T1 commit

Examples what to do at recovery time?

Page 33: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

33

Real world actions

E.g., dispense cash at ATM

Ti = a1…... aj …... an

$

“Solution”:

(1) try to make idempotent

(2) execute real-world actions after commit

Why are these a problem from a

DB perspecti

ve?

Page 34: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

34

PHYSICAL DISASTERS

Page 35: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

35

These recovery algorithms won’t help you if your disk fails.

Solution: careful replication!

Page 36: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

36

Example 1 Triple modular redundancy

Keep 3 copies on separate disks• Output(X) --> three outputs• Input(X) --> three inputs + vote

Copy 1 Copy 2 Copy 3

Page 37: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

37

Example 2 Redundant writes, single reads

Keep N copies on separate disks• Output(X) --> N outputs• Input(X) --> Input one copy

- if ok, done

- else try another one

Assumes bad data can be

detected (traditional but false)

Copy 1Copy 1Copy 1Copy 1Copy 1Copy 1

Page 38: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

38

Example 3: DB dump + log

backup

databaseactive

databaselog

If active database is lost,– restore active database from backup– bring up-to-date using redo entries in log

Page 39: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

39

When can log be discarded?

check-

pointdb

dump

last

needed

undo

not needed for

media recovery

not needed for undo

after system failure

not needed for

redo after system failure

log

time

Page 40: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

The real picture: what’s stored where

DB

Data pageseach with a pageLSN

(LSN of last write to that data page)

Xact TablelastLSN

status

Dirty Page TablerecLSN

flushedLSN

RAM

prevLSNXIDtype

lengthpageID

offsetbefore-imageafter-image

LSN (log sequence number)

LogRecords

LOG

Master record

Page 41: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

Summary of Logging/Recovery

• Recovery manager guarantees atomicity & durability---two of the ACID properties.

• Redo logging and undo logging are simple but make the system too slow in practice for serious applications.

• Use write-ahead logging with undo/redo logging to speed up the system (by allowing STEAL/NO-FORCE) without sacrificing correctness.

Page 42: 1 CS411 Database Systems 12: Recovery obama and eric schmidt  sysadmin song .

Summary, Cont.

• Checkpointing: A quick way to limit the amount of log to scan on recovery. Nonquiescent checkpoints are especially useful.

• Recovery works in 3 phases:– Analysis: Forward from checkpoint.– Redo: Forward.– Undo: Backward.


Recommended