+ All Categories
Home > Documents > Parallel and distributed databases R & G Chapter 22.

Parallel and distributed databases R & G Chapter 22.

Date post: 21-Dec-2015
Category:
View: 229 times
Download: 1 times
Share this document with a friend
Popular Tags:
36
Parallel and distributed databases R & G Chapter 22
Transcript
Page 1: Parallel and distributed databases R & G Chapter 22.

Parallel and distributed databases

R & G Chapter 22

Page 2: Parallel and distributed databases R & G Chapter 22.

What is a distributed database?

Page 3: Parallel and distributed databases R & G Chapter 22.

Why distribute a database

Scalability and performance

Resilience to failures

Th

roughput

Data

siz

e

versusX X

Page 4: Parallel and distributed databases R & G Chapter 22.

Why distribute a database

Data is already distributed Or needs to be distributed

Data is in multiple systems

Page 5: Parallel and distributed databases R & G Chapter 22.

Why not distribute a database

You must earn your complexity!

Communication needed Must build a complex infrastructure Unpredictable latencies must be masked

More types of failures More components to fail Network failures Congestion, timeouts

More complex planning Communication cost plus I/O cost

May have to deal with heterogeneity Different types of systems Different schemas, possibly incompatible Different administrative domains

Page 6: Parallel and distributed databases R & G Chapter 22.

Types of distributed databases

Page 7: Parallel and distributed databases R & G Chapter 22.

The old days: mainframes

Definitely not distributed!

Page 8: Parallel and distributed databases R & G Chapter 22.

Client-server

User interaction

Data processing

Network

Page 9: Parallel and distributed databases R & G Chapter 22.

Parallel database

Page 10: Parallel and distributed databases R & G Chapter 22.

Primary/secondary

X

Page 11: Parallel and distributed databases R & G Chapter 22.

Multidatabase

Page 12: Parallel and distributed databases R & G Chapter 22.

How do they work?

What is shared? How to distribute the data? How to process the data? How to update the data?

Page 13: Parallel and distributed databases R & G Chapter 22.

What is shared?

Memory

CPUs RAM Disk

Most modern DBMSsMost modern DBMSs

Page 14: Parallel and distributed databases R & G Chapter 22.

What is shared?

Disk

RAM

Oracle RACOracle RAC

Page 15: Parallel and distributed databases R & G Chapter 22.

What is shared?

Nothing

RAM

Search engines, TeradataSearch engines, Teradata

Page 16: Parallel and distributed databases R & G Chapter 22.

Server 1 Server 2 Server 3 Server 4

Bike $866/2/07 636353

Chair $106/5/07 662113

How to distribute the data?

Couch $5706/1/07 424252

Car $11236/1/07 256623

Lamp $196/7/07 121113

Bike $566/9/07 887734

Scooter $186/11/07 252111

Hammer $80006/11/07 116458

Page 17: Parallel and distributed databases R & G Chapter 22.

How to distribute the data?

Hash partitioning Range partitioning

(key,value)

Hash()

(key,value)

<= X > X

Page 18: Parallel and distributed databases R & G Chapter 22.

Server 1 Server 2 Server 3 Server 4

How to distribute the data?

Bike

Chair

Couch

Car

Lamp

Bike

Scooter

Hammer

$86

$10

$570

$1123

$19

$56

$18

$8000

6/2/07

6/5/07

6/1/07

6/1/07

6/7/07

6/9/07

6/11/07

6/11/07

636353

662113

424252

256623

121113

887734

252111

116458

Page 19: Parallel and distributed databases R & G Chapter 22.

Query processing

Intra-operator parallelism

Inter-operator parallelism

Page 20: Parallel and distributed databases R & G Chapter 22.

Parallel scanning

filter filter filter filter filter filter

Result

Page 21: Parallel and distributed databases R & G Chapter 22.

Sorting

Page 22: Parallel and distributed databases R & G Chapter 22.

Sorting

Page 23: Parallel and distributed databases R & G Chapter 22.

Parallel hash join

Hash()

Page 24: Parallel and distributed databases R & G Chapter 22.

Join

Page 25: Parallel and distributed databases R & G Chapter 22.

Semi-join

Page 26: Parallel and distributed databases R & G Chapter 22.

Inter-operator parallelism

Page 27: Parallel and distributed databases R & G Chapter 22.

Updating distributed data

Synchronous: read-any-write-all

Reads are fastReads are fast

Page 28: Parallel and distributed databases R & G Chapter 22.

Updating distributed data

Synchronous: voting

Page 29: Parallel and distributed databases R & G Chapter 22.

Updating distributed data

Synchronous: voting

Writes tolerant to disconnectionWrites tolerant to disconnection

Page 30: Parallel and distributed databases R & G Chapter 22.

Consistency of distributed data

Should provide ACID

Page 31: Parallel and distributed databases R & G Chapter 22.

Primary/secondary

Page 32: Parallel and distributed databases R & G Chapter 22.

Two-phase commit

PREPARE

PREPARED PREPARED

COMMIT

Page 33: Parallel and distributed databases R & G Chapter 22.

Two-phase commit

PREPARE

PREPARED ABORT

ABORT

Page 34: Parallel and distributed databases R & G Chapter 22.

Two-phase commit

PREPARE

PREPARED

ABORT

Page 35: Parallel and distributed databases R & G Chapter 22.

Two-phase commit

PREPARE

PREPARED PREPARED

X

Page 36: Parallel and distributed databases R & G Chapter 22.

Conclusion

Parallelism and distribution very useful Performance Fault tolerance Scale

But complex! Rethink lots of aspects of the system Must earn the complexity


Recommended