+ All Categories
Home > Documents > Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of...

Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of...

Date post: 02-Jan-2016
Category:
Upload: clifford-lamb
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal
Transcript
Page 1: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

Consistent and Efficient Database Replication

based on Group Communication

Bettina KemmeSchool of Computer ScienceMcGill University, Montreal

Page 2: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

2Database Replication, Bettina Kemme, Feb. 2001

Outline

State of the Art in Database Replication

Key Points of our Approach Overview of the Algorithms Implementation Issues Performance ResultsOngoing Work

Page 3: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

3Database Replication, Bettina Kemme, Feb. 2001

Replication -- Why?

Fault-Tolerance: Take-Over

Scale-Up: cluster instead of bigger mainframe

Page 4: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

4Database Replication, Bettina Kemme, Feb. 2001

Replica Control: Research and Reality

Eager

Lazy

Primary Copy

UpdateEverywhere

Globally correctToo expensive

Deadlocks

Inconsistent reads

Configuration Restrictions

Feasible

Inconsistent readsReconciliation

Feasible in someapplications

UPDATEWHEN

UPDATEWHERE

Quorum/ROWAOracle Synchr.

Repl

Placement Strat.Sybase/IBM/Oracle

Placement Strategies

Weak ConsistencySybase/IBM/Oracle

Page 5: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

5Database Replication, Bettina Kemme, Feb. 2001

Requirements Develop and apply appropriate techniques

in order to avoid current limitations of eager update everywhere approaches Keep flexibility of update everywhere

no restrictions on type of transactions and where to execute them

Consistency and fault-tolerance of eager replication

Good Performance response time + throughput

Straightforward, Implementable Solution Easy integration in existing systems

Page 6: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

6Database Replication, Bettina Kemme, Feb. 2001

Response Time and Message Overhead

Goal: Reduce number of messages per transaction reduce response time reduce message overhead

Solution local execution of transaction bundle writes and send them in a single

message at the end of the transaction (as done in lazy schemes)

transaction with local execution and write set propagation at the end

transaction with individualremote writes

WriteReadcentral transaction

Page 7: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

7Database Replication, Bettina Kemme, Feb. 2001

Ordering Transactions Before: uncoordinated

message delivery; danger of deadlocks

Now: pre-order txns by using total order multicast of Group Communication

Before: 2-phase-commit

Now: Independent execution at the different sites

Total Order

Page 8: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

8Database Replication, Bettina Kemme, Feb. 2001

Group Communication Systems

Group Communication: Multicast Delivery order (FIFO, causal, total, etc.) Reliable delivery: on all nodes vs. on all

available nodes Membership control ISIS, Totem, Transis, Phoenix, Horus,

Ensemble, ... Goal: Exploit rich semantics of group

communication on a low level of the software hierarchy

Page 9: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

9Database Replication, Bettina Kemme, Feb. 2001

Replica Control based on 2Phase LockingNode 1

Local Transaction

Write Phase commitcommit

write setwrite set

Node 2Remote Transaction

Node 3Remote Transaction

Lock Phase

Local Phase

Commit

Send Phase

Phase

Write Phase

Lock Phase

Transaction is first performed locally at a single site. Writes are sent in one message to all sites at the end

of transaction. Write messages are totally ordered. Serialization order obeys total order.

Local Phase

Send PhaseWS

Page 10: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

10Database Replication, Bettina Kemme, Feb. 2001

Concurrency Control One possible Solution: Given a transaction T

Local Phase: T acquires standard local read and write locks Send Phase: Send write set using total order multicast Upon reception of write set of T on local node

Commit Phase: multicast commit message Upon reception of write set of T on remote node

Lock Phase: request all write locks in a single step; if there is a local transaction T’ with conflicting lock and T’ is still in local phase or send phase, abort T’. If T’ in send phase, multicast abort

Write Phase: apply updates of T Upon reception of commit/abort message of T on remote node,

terminate T accordingly For two transactions of same node: 2 phase locking. For concurrent transactions of different nodes:

optimistic scheme with early conflict detection: when write set of one transaction is delivered the conflict is detected.

Adjustment to other concurrency control schemes possible

Page 11: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

11Database Replication, Bettina Kemme, Feb. 2001

Message Delivery Guarantees Uniform-Reliable Delivery

If a site delivers a message, all non-faulty sites deliver the message

Correctness on all sites (faulty or non-faulty): when a transaction commits at any site then it commits at all non-faulty sites

High message delay Reliable Delivery

If a non-faulty site (non-faulty for a sufficiently long time) delivers a message it is delivered at all non-faulty sites.

Correctness in the failure-free case In case of failures:

all non-faulty sites commit the same set of transactions a transaction might be committed at a faulty site (shortly

before failure) and it is not committed at the other sites. Low message delay

Page 12: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

12Database Replication, Bettina Kemme, Feb. 2001

A Suite of Replication Protocols

Uniform Reliable Reliable

Serializability

Cursor Stability

Snapshot Isolation

Hybrid

SER-UR

CS-UR

SI-UR

HYB-UR

SER - R

CS - R

SI - R

HYB - R

The solutions provideflexibilityaccepted correctness criteria

Page 13: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

13Database Replication, Bettina Kemme, Feb. 2001

Implementation

Integration of our replica control approach into the database system PostgreSQL

Purpose - We wanted to answer the following questions Can the abstract protocols really be mapped to

concrete algorithms in a relational database? How difficult is it to integrate the replication tool

into a real database system? What is the performance in a real cluster

environment?

Page 14: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

14Database Replication, Bettina Kemme, Feb. 2001

Architecture of Postgres-R

GroupCommunication:

Ensemble

ClientClient Client

Client

Client

Postmaster

Local Txn Local Txn

Replication Mgr

Remote Txn

Server Server

Server

originalPostgreSQL

Remote Txn

Remote Txn

Page 15: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

15Database Replication, Bettina Kemme, Feb. 2001

Write Set Messages

UPDATE employeeSET salary = salary+100WHERE salary < 2000

Parser/Optimizer

Executor

SQL-Statement

Set of physical records

Send SQL statements and reexecute at all sites Simple Small message size High execution overhead

on all site Problem with locks for

implicit reads in statement Send physical updates

and only apply changes Opposite characteristics

than sending statements

Page 16: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

16Database Replication, Bettina Kemme, Feb. 2001

Gain of sending and applying physical changes

Scaleup for different remote update costs and update rate of 1

0

2

4

6

8

1 5 10 15 20

Number of Nodes

wo:0.1wo:0.2wo:0.5wo:1

Sca

leup

Scaleup numberOfNodes

1 updateRate remoteUpdateCostlocalUpdateCost

(numberOfNodes 1)

Page 17: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

17Database Replication, Bettina Kemme, Feb. 2001

Comparison with standard distr. locking

1 2 3 4 50

200

400

600

800Traditional

Number Servers

Res

pons

e Ti

me

in m

s

1 2 3 4 50

50

100

150

200Postgres- R

Number Servers

Res

pons

e Ti

me

in m

s

Workload: 10 transactions per second 5 concurrent clients (each submitting a

transaction each 500 milliseconds)

For all experiments: Database: 10 relations a 1000 tuples Transactions: 10 updates per transaction

Page 18: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

18Database Replication, Bettina Kemme, Feb. 2001

Response Time vs. Throughput

Page 19: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

19Database Replication, Bettina Kemme, Feb. 2001

Scalability with fixed workload

Workload: 1 update transaction per second per server 14 queries per second per server

3 clients per server

Postgres-R

0

50

100

150

200

1 server/15tps

5 server/75tps

10 server/150tps

15 server/225tps

Res

po

nse

Tim

e in

ms

Write TxnRead Txn

Page 20: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

20Database Replication, Bettina Kemme, Feb. 2001

Differently loaded Nodes Nodes: 10 nodes 15 clients in total

Page 21: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

21Database Replication, Bettina Kemme, Feb. 2001

Conclusions Eager, update everywhere replication is

feasible (at least in clusters) by using adequate techniques As few messages as possible within transaction

boundaries As few synchronization points as possible Complete transaction execution only at one site Simple to adjust to existing concurrency control

mechanisms

Page 22: Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

22Database Replication, Bettina Kemme, Feb. 2001

Current Work Recovery under various failure models Development of a middleware replication

tool Development of group communication

protocols that better support the needs of the database system Ordering semantics Failure models

Building a complete system Adding other replica control protocols System administration Partial replication functionality


Recommended