Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 221 times |
Download: | 0 times |
1
ICS 214B: Transaction Processing and Distributed Data Management
Distributed Database Systems
ICS214B Notes 11 2
So far: Centralized DB systems
Software:
ApplicationSQL Front EndQuery ProcessorTransaction Proc.File Access
P
M ...
• Simplifications: single front end one place to keep locks if processor fails, system fails, ...
ICS214B Notes 11 3
Next: distributed database systems
• Multiple processors ( + memories)• Heterogeneity and autonomy of
“components”
ICS214B Notes 11 4
Why do we need Distributed Databases?
• Example: Big Corp. has offices in London, New York, and Hong Kong.
• Employee data:– EMP(ENO, NAME, TITLE, SALARY, …)
• Where should the employee data table reside?
ICS214B Notes 11 5
Big Corp. Data Access Pattern
• Mostly, employee data is managed at the office where the employee works– E.g., payroll, benefits, hire and fire
• Periodically, Big Corp needs consolidated access to employee data– E.g., Big Corp. changes benefit plans and
that affects all employees.– E.g., Annual bonus depends on global net
profit.
ICS214B Notes 11 6
EMP
Internet
LondonPayroll app
London
New YorkPayroll app
New York
Hong KongPayroll app
Hong Kong
Problem:NY and HK payrollapps run very slowly!
ICS214B Notes 11 7
LondonEmp
Internet
LondonPayroll app
London
New YorkPayroll app
New York
Hong KongPayroll app
Hong Kong
HKEmp
NYEmp
Much better!!
ICS214B Notes 11 8
Internet
LondonPayroll app
Annual Bonus app
London
New YorkPayroll app
New York
Hong KongPayroll app
Hong Kong
LondonEmp NY
Emp
HKEmp
Distribution providesopportunities forparallel execution
ICS214B Notes 11 9
Internet
LondonPayroll app
Annual Bonus app
London
New YorkPayroll app
New York
Hong KongPayroll app
Hong Kong
LondonEmp NY
Emp
HKEmp
ICS214B Notes 11 10
Internet
LondonPayroll app
Annual Bonus app
London
New YorkPayroll app
New York
Hong KongPayroll app
Hong Kong
Lon, NYEmp NY, HK
Emp
HK, LonEmp
Replication improvesavailability
ICS214B Notes 11 11
Heterogeneity and Autonomy
Application
RDBMS FilesStocktickertape
Portfolio History ofdividends,ratios,...
ICS214B Notes 11 12
• data management with multiple processors and
possible autonomy, heterogeneity
– Impact on: Data organization Query processing Access structures Concurrency control Recovery
ICS214B Notes 11 13
• transaction monitors – Coordinate transaction execution
Multiple DBMSs High performance
– Have workflow facilities– Manage communications with client
“terminals”
ICS214B Notes 11 14
DB architectures
(1) Shared memory
P P P...
M
ICS214B Notes 11 15
DB architectures
(2) Shared disk
...
...
P
M
P P
M M
ICS214B Notes 11 16
DB architectures(3) Shared nothing
P
M
P
M
P
M
...
ICS214B Notes 11 17
DB architectures(4) Hybrid example – Hierarchical or Clustered
M
P P P...
M
P P P...
ICS214B Notes 11 18
Issues for selecting architecture
• Reliability• Scalability• Geographic distribution of data• Data “clusters”• Performance• Cost
ICS214B Notes 11 19
Parallel or distributed DB system?
• More similarities than differences!
ICS214B Notes 11 20
• Typically, parallel DBs:– Fast interconnect– Homogeneous software– High performance is goal– Transparency is goal
ICS214B Notes 11 21
• Typically, distributed DBs:– Geographically distributed– Data sharing is goal (may run into heterogeneity, autonomy)– Disconnected operation possible
ICS214B Notes 11 22
Distributed Database Challenges
• Distributed Database Design– Deciding what data goes where– Depends on data access patterns of
major applications– Two subproblems:
Fragmentation: partition tables into fragments
Allocation: allocate fragments to nodes
ICS214B Notes 11 23
Distributed Database Challenges
• Distributed Query Processing– Centralized query plan goal: minimize
number of disk I/Os– Additional factors in distributed
scenario: Communication costs Opportunity for parallelism
– Space of possible query plans is much larger!
ICS214B Notes 11 24
Distributed Database Challenges
• Distributed Concurrency Control– Transactions span nodes
Must be globally serializable
– Two main approaches: Locking Timestamps
– Distributed Deadlock Management– Multiple data copies – need to be kept
in sync when updates occur
ICS214B Notes 11 25
Distributed Database Challenges
• Reliability of Distributed Databases– Centralized database failure model:
processor fails
– Distributed database failure model: One or more processors may fail Network may fail Network may be partitioned
– Data must be kept in sync
ICS214B Notes 11 26
To illustrate synchronization problems: “Two Generals” Problem
ICS214B Notes 11 27
The one general problem (Trivial!)
Battlefield
G
Troops
ICS214B Notes 11 28
The two general problem:
<------------------------------->
messengers
Blue army Red army
BlueG Red
G
Enemy
ICS214B Notes 11 29
• Blue and red army must attack at same time
• Blue and red generals synchronize through messengers
• Messengers can be lost
Rules:
ICS214B Notes 11 30
Application
RDBMS FilesStocktickertape
Portfolio History ofdividends,ratios,...
Distributed Database Challenges
• Heterogeneity
ICS214B Notes 11 31
Example: unable to get statisticsfor query optimization
Example: blue general may have mind of his (or her) own!
Distributed Database Challenges
• Autonomy
ICS214B Notes 11 32
• Distributed DB Design
ICS214B Notes 11 33
Distributed DB DesignTop-down approach: - have DB…- how to split and allocate the sites
Bottom-up approach: - multi-database (possibly heterogeneous,
autonomous)- no design issues!
ICS214B Notes 11 34
Two issues in DDB design:
• Fragmentation• Allocation
Note: issues not independent, but will cover separately
ICS214B Notes 11 35
Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select *
from E from Ewhere loc=Sa where
loc=Sband… and ...
Motivation: Two sites: Sa, Sb Qa QbSa Sb
ICS214B Notes 11 36
# NM Loc Sal
E 578
Sa 10Sally Sb 25Tom Sa 15
Joe
# NM Loc Sal # NM Loc Sal58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
..
..
....
F
At SaAt Sb
ICS214B Notes 11 37
F = { F1, F2 }
F1 = loc=Sa(E) F2 = loc=Sb(E)
called primary horizontal fragmentation
ICS214B Notes 11 38
Fragmentation• Horizontal Primary
depends on local attributes
R Derived depends on foreign
relation
• Vertical
R
ICS214B Notes 11 39
Three common horizontal fragmentation techniques
• Round robin• Hash partitioning• Range partitioning
Used mostly in parallel dbs
Used in parallel dbs and distributed dbs
ICS214B Notes 11 40
• Round robinR D0 D1 D2
t1 t1t2 t2t3 t3t4 t4... t5• Evenly distributes data
• Good for scanning full relation
• Not good for point or range queries
• Not suitable for databases distributed over WAN
ICS214B Notes 11 41
• Hash partitioningR D0 D1 D2
t1h(k1)=2 t1t2h(k2)=0 t2t3h(k3)=0 t3t4h(k4)=1 t4...• Good for point queries on key; also for joins on key
• Not good for range queries; point queries not on key
• If hash function good, even distribution
• Not suitable for databases distributed over a WAN
ICS214B Notes 11 42
• Range partitioningR D0 D1 D2
t1: A=5 t1t2: A=8 t2t3: A=2 t3t4: A=3 t4...
4 7
partitioningvector
V0 V1
• Good for point queries on A; also for joins on A
• Good for some range queries on A
• Need to select good vector: else unbalanced
• data skew, execution skew
ICS214B Notes 11 43
Which are good fragmentations?
Example:
F = { F1, F2 }
F1 = sal<10 E F2 = sal>20 E
Problem: Some tuples lost!
ICS214B Notes 11 44
Which are good fragmentations?Second example:
F = { F3, F4 }
F3 = sal<10 E F4 = sal>5 E
Tuples with 5 < sal < 10 are duplicated...
ICS214B Notes 11 45
Better design
Example: F = { F5, F6, F7 }
F5 = sal 5 E F6 = 5<sal<10 E
F7 = sal 10 E
Then replicate F6 if convenient (part of allocation problem)
ICS214B Notes 11 46
Desired properties for fragmentationR F = {F1, F2, …, Fn}• Completeness
– For every data item x R, FiF such that xFi
• Disjointness xFi, Fj such that xFj, i j
• Reconstruction– There is function g such that R = g(F1, F2, …, Fn)
ICS214B Notes 11 47
Desired properties for horizontal fragmentation
R F = {F1, F2, …, Fn}
• Completeness– For every tuple tR, FiF such that
tFi
• Disjointness tFi, Fj such that tFj, i j
• Reconstruction – can safely ignore– Completeness R = Fi
FiF
ICS214B Notes 11 48
How do we get completeness and disjointness?
(1) Check it “manually”!
e.g., F1 = sal<10 E ; F2 = sal10 E
ICS214B Notes 11 49
How do we get completeness and disjointness?
(2) “Automatically” generate fragments with these properties
• Horizontal fragments are defined by selection predicates
• Generate a set of selection predicates with the desired properties
ICS214B Notes 11 50
Example of generation• Say queries use predicates:
A<10, A>5, Loc = SA, Loc = SB• Next: - generate “minterm”
predicates- eliminate useless ones
• Given simple predicates Pr= { p1, p2,.. pn }minterm predicates are of the form p1* p2* … pn*
where pk* is pk or is ¬pk
ICS214B Notes 11 51
Minterm predicates (part I)(1) A<10 A>5 Loc=SA Loc=SB
(2) A<10 A>5 Loc=SA ¬(Loc=SB)(3) A<10 A>5 ¬(Loc=SA) Loc=SB
(4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB)(5) A<10 ¬(A>5) Loc=SA Loc=SB
(6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB)(7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB
(8) A<10 ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)
A 5
5 < A < 10
ICS214B Notes 11 52
Minterm predicates (part II)
(9) ¬(A<10) A>5 Loc=SA Loc=SB
(10) ¬(A<10) A>5 Loc=SA ¬(Loc=SB)(11) ¬(A<10) A>5 ¬(Loc=SA) Loc=SB
(12) ¬(A<10) A>5 ¬(Loc=SA) ¬(Loc=SB)(13) ¬(A<10) ¬(A>5) Loc=SA Loc=SB
(14) ¬(A<10) ¬(A>5) Loc=SA ¬(Loc=SB)(15) ¬(A<10) ¬(A>5) ¬(Loc=SA) Loc=SB
(16) ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)
A 10
ICS214B Notes 11 53
Final fragments:
F2: 5 < A < 10 Loc=SA F3: 5 < A < 10 Loc=SB F6: A 5 Loc=SA F7: A 5 Loc=SB F10: A 10 Loc=SA F11: A 10 Loc=SB
ICS214B Notes 11 54
Note: elimination of useless fragments depends on application semantics:
e.g.: if LOC could be SA, SB, we need to add fragments
F4: 5 <A <10 Loc SA Loc SB
F8: A 5 Loc SA Loc SB
F12: A 10 Loc SA Loc SB
ICS214B Notes 11 55
Why does this algorithm work?
• Must prove that the set of fragments is:– Complete– Disjoint
ICS214B Notes 11 56
Summary• Given simple predicates Pr= { p1, p2,..
pn }minterm predicates are
M={m | m = pk*, 1 k n }
where pk* is pk or is ¬ pk
pkPr
• Fragments m R for all m M are
complete and disjoint
ICS214B Notes 11 57
.
Distributed commit problem
Action:a1,a2
Action:a3
Action:a4,a5
Transaction T
Commit must be atomic
ICS214B Notes 11 58
Distributed commit problem
• Commit must be atomic– site failures– communication failures– network partitions– timeout failures
• Solution: Atomic commit protocol– must ensure that despite failures, if all failures repaired,
then transactions commits or aborts at all sites.• Most common ACP: Two-phase commit (2PC)
– Centralized 2PC– Distributed 2PC– Linear 2PC– Many other variants…
ICS214B Notes 11 59
Terminology
• Resource Managers (RMs)– Usually databases
• Participants– RMs that did work on behalf of
transaction
• Coordinator– Component that runs two-phase
commit on behalf of transaction
ICS214B Notes 11 60
Coord
inato
r
Part
icip
ant
REQUEST-TO-PREPARE
PREPARED*
COMMIT*
DONE
ICS214B Notes 11 61
Coord
inato
r
Part
icip
ant
REQUEST-TO-PREPARE
NO
ABORT
DONE
ICS214B Notes 11 62
States of the Transaction
• At Coordinator:– Initiated (I) -- transaction known to system– Preparing (P) -- prepare message sent to participants– committed (C) -- has committed– Aborted (A) -- has aborted
• At participant:– Initiated (I)– Prepared (P) -- prepared to commit, if the coordinator
so desires– committed (C) – Aborted (A)
ICS214B Notes 11 63
Protocol Database
• Coordinator maintains a protocol database (in main memory) for each transaction
• Protocol database – enables coordinator to execute 2PC– answers inquiries by participants about status of
transaction cohorts may make such inquiries if they fail during recovery
– entry for transaction deleted when coordinator is sure that no one will ever inquire about transaction again (when it has been acked by all the participants)
ICS214B Notes 11 64
two-phase commit (messages)
Coordinator Participant
I
P
C
A
I
P
C
A
commit-request request-prepare*
no abort*
prepared* Commit*
commit ack
request-prepare prepared
request-prepare no
abort ack
F ack*
ack*
ICS214B Notes 11 65
• Notation: Incoming message Outgoing message
( * = everyone)• When participant enters “P” state:
– it must have acquired all resources– it can only abort or commit if so
instructedby a coordinator
• Coordinator only enters “C” state if all participants are in “P”, i.e., it is certain that all will eventually commit
ICS214B Notes 11 66
Two phase commit -- normal actions (coordinator)
– make entry into protocol database for transaction marking its status as initiated when coordinator first learns about transaction
– Add participant to the cohort list in protocol database when coordinator learns about the cohorts
– Change status of transaction to preparing before sending prepare message. (it is assumed that coordinator will know about all the participants before this step)
– On receipt of PREPARE message from cohort, mark cohort as PREPARED. If all cohorts PREPARED, then change status to COMMITTED and send COMMIT message.
must force a commit log record to disk before sending commit message.– on receipt of ACK message from cohort, mark cohort as ACKED. When all
cohorts have acked, then delete entry of transaction from protocol database. Must write a completed log record to disk before deletion from protocol
database. No need to force the write though.
ICS214B Notes 11 67
Two Phase Commit - normal actions (participant)
• On receipt of PREPARE message, write PREPARED log record before sending PREPARED message– needs to be forced to disk since coordinator
may now commit.
• On receipt of COMMIT message, write COMMIT log record before sending ACK to coordinator– cohort must ensure log forced to disk before
sending ack -- but no great urgency for doing so.
ICS214B Notes 11 68
Timeout actions• At various stages of protocol, transaction waits from messages at both
coordinator and participants. • If message not received, on timeout, timeout action is executed:• Coordinator Timeout Actions
– waiting for votes of participants: ABORT transaction, send aborts to all.– waiting for ack from some participant: forward the transaction to
recovery process that periodically will send COMMIT to participant. When participant will recover, and all participants send an ACK, coordinator writes a completion log record and deletes entry from protocol database.
• Cohort timeout actions:– waiting for prepare: abort the transaction, send abort message to
coordinator. Alternatively, it could wait for the coordinator to ask for prepare. – Waiting for decision: forward transaction to recovery process. Recovery
process executes status-transaction call to the coordinator. Such a transaction is blocked for recovery of failure. The participant could have used a different termination protocol -- e.g., polling other participants. (cooperative Termination)
ICS214B Notes 11 69
2PC is blockingSample scenario:
Coord P2W
P1 P3W
P4W
ICS214B Notes 11 70
Case I:P1 “W”; coordinator sent commits
P1 “C”Case II:P1 NO; P1 A
P2, P3, P4 (surviving participants)
cannot safely abort or commit transaction
coord
P1
P2
P3
P4w
w
w
ICS214B Notes 11 71
Recovery Actions (cohort)
• All sites execute REDO-UNDO pass• Detection: A site knows it is a cohort if it
finds a prepared log record for a transaction• If the log does not contain a commit log
record:– reacquire all locks for the transaction– ask coordinator for the status of transaction
• If log contains a commit log record– do nothing
ICS214B Notes 11 72
Recovery Action (coordinator)• If protocol database was made fault-tolerant by logging every change,
simply reconstruct the protocol database and restart 2PC from the point of failure.
• However, since we have only logged the commit and completion transitions and nothing else:– if the log does not contain a commit. Simply abort the transaction.
If a cohort asks for status in the future, its status is not in the protocol database and it will be considered as aborted.
– If commit log record, but no completion log record, recreate transactions entry committed in the protocol
database and the recovery process will ask all the participants if they are still waiting for a commit message. If no one is waiting, the completion entry will be written.
– If commit log record + completion log record do nothing.
ICS214B Notes 11 74
Variants of 2PC
• Linear
Coord
• Hierarchical
ok ok ok
commit commit commit
ICS214B Notes 11 75
• Distributed
– Nodes broadcast all messages– Every node knows when to commit
Variants of 2PC
ICS214B Notes 11 76
Cooperative Termination Protocol
• Bad case– Participant P recovers from failure– Has prepared record for transaction T– No commit or abort record for T– Coordinator is down
• Participant P is blocked until coordinator recovers
ICS214B Notes 11 77
Cooperative termination protocol
• But perhaps some other participant can help?
• Requires participants “know” each other!
ICS214B Notes 11 78
Cooperative Termination Protocol
• Participant P sends a DECISION-REQUEST message to other participants
• Alive participants respond with COMMIT, ABORT, or UNCERTAIN
• If any participant replies with a decision (COMMIT or ABORT), P acts on decision– And sends decision to UNCERTAIN
participants
ICS214B Notes 11 79
Cooperative Termination Protocol
• When P receives a DECISION-REQUEST– If it knows decision, responds with
COMMIT or ABORT– If it has not prepared transaction,
responds ABORT– If it is prepared but does not know
decision, responds UNCERTAIN
ICS214B Notes 11 80
Cooperative TerminationSample scenario:
Coord P1C
P2W
P3W
ICS214B Notes 11 81
Cooperative TerminationSample scenario:
Coord P1W
P2W
P3A
ICS214B Notes 11 82
Cooperative TerminationSample scenario:
Coord P1W
P2W
P3W
ICS214B Notes 11 83
Is there a non-blocking protocol?
Theorem: If communications failure or total site failures (i.e.,
all sites are down simultaneously) are possible, then every atomic protocol may cause processes to become blocked.
Two exceptions:if we ignore communication failures, it is possible to design such a protocol (Skeen et. al. 83)If we impose some restrictions on transactions (I.e., what data they can read/write) such a protocol can also be designed (Mehrotra et. al. 92)
ICS214B Notes 11 84
Next…• Three-phase commit (3PC)
– Nonblocking if reliable network (no communications failure) and no total site failures
– Handling communications failures
ICS214B Notes 11 85
Why 2PC blocks?
• Since operational site on timeout in prepare state does not know if the failed site(s) had committed or aborted the transaction.
• Polling all operational sites does not work since all the operational sites might be in doubt.
ICS214B Notes 11 86
Approach to Making ACP Non-blocking• For a given state S of a transaction T in the ACP, let the concurrency
set of S be the set of states that other sites could be in. • For example, in 2PC, the concurrency set of PREPARE state is
{PREPARE, ABORT, COMMIT}• We develop non-blocking protocol, we will
– ensures that concurrency set of a transaction does not contain both a commit and an abort
– There exists no non-committable state whose concurrency set contains a commit. A state is committable if occupancy of the state by any site implies everyone has voted to commit the transaction.
• Necessity of these conditions illustrated by considering a situation with only 1 site operational. If either of the above violated, there will be blocking.
• Sufficiency illustrated by designing a termination protocol that will terminate the protocol correctly if the above assumptions hold.
ICS214B Notes 11 90
Coord
inato
r
Part
icip
ant
REQUEST-TO-PREPARE
PREPARED
COMMIT
DONE
PRECOMMIT
ACK
ICS214B Notes 11 91
Coord
inato
r
Part
icip
ant
REQUEST-TO-PREPARE
NO
ABORT
DONE
ICS214B Notes 11 92
Coordinator Participant
Log start-3PC record(participant list)
Log commit record(state C)
Log prepared record(state W)
Log committed record(state C)
REQUEST-PREPARE
PREPARED
COMMIT
PRECOMMIT
ACK
ICS214B Notes 11 93
Coordinator Participant
REQUEST-PREPARE
PREPARED
COMMIT
PRECOMMIT
ACK
1. Timeout: Abort
2. Timeout: ignore
1. Timeout: abort
2. TimeoutTermination Protocol
3. TimeoutTermination Protocol
ICS214B Notes 11 94
Process categories
• Three categories– Operational
Process has been up since start of 3PC
– Failed Process has halted since start of 3PC, or
is recovering
– Recovered Process that failed and has completed
recovery
ICS214B Notes 11 95
Three Phase Commit - Termination Protocol
• Choose a backup coordinator from the remaining operational sites.
• Backup coordinator sends messages to other operational sites to make transition to its local state (or to find out that such a transition is not feasible) and waits for response.
• Based on response as well as its local state, it continues to commit or abort the transaction.
• It commits, if its concurrency set includes a commit state. Else, it aborts.
ICS214B Notes 11 96
Termination Protocol
Start 3PC
Coordinatorfails
Decisionreached
All siteslearn decision
• Only operational processes participate in termination protocol. • Recovered processes wait until decision is reached and then learn decision
ICS214B Notes 11 97
Coordinator Participant
REQUEST-PREPARE
PREPARED
COMMIT
PRECOMMIT
ACK
Abortable (A)
Uncertain (U)
Precommitted (PC)
Committed (C)
ICS214B Notes 11 98
Termination Protocol• Elect new coordinator
– Use Election Protocol (coming soon…)
• New coordinator sends STATE-REQUEST to participants
• Makes decision using termination rules
• Communicates to participants
ICS214B Notes 11 99
Coord
inato
r
Part
icip
ant
STATE-REQUEST*
ABORTABLE
ABORT*
ICS214B Notes 11 100
Coord
inato
r
Part
icip
ant
STATE-REQUEST*
COMMITTED
COMMIT*
ICS214B Notes 11 101
Coord
inato
r
Part
icip
ant
STATE-REQUEST*
UNCERTAIN*
ABORT*
ICS214B Notes 11 102
Coord
inato
r
Part
icip
ant
STATE-REQUEST*
PRECOMMITTED, NO COMMITTED
COMMIT*
PRECOMMIT*
ACK*
ICS214B Notes 11 103
Termination ProtocolSample scenario:
Coord P1W
P2W
P3W
ICS214B Notes 11 104
Termination ProtocolSample scenario:
Coord P1W
P2W
P3PC
ICS214B Notes 11 105
Note: 3PC unsafe with communication failures!
W
W
W
P
P
abort commit
ICS214B Notes 11 106
• After coordinator receives DONE message, it can forget about the transaction– E.g., cleanup control structures