Intro to Database Systems
15-445/15-645
Fall 2020
Andy PavloComputer Science Carnegie Mellon UniversityAP
22 Introduction to Distributed Databases
15-445/645 (Fall 2020)
ADMINISTRIVIA
Homework #5: Sunday Dec 6th @ 11:59pm
Project #4: Sunday Dec 13th @ 11:59pm
Potpourri + Review: Wednesday Dec 9th
→ Vote for what system you want me to talk about.https://cmudb.io/f20-systems
Final Exam:→ Session #1: Thursday Dec 17th @ 8:30am→ Session #2: Thursday Dec 17th @ 1:00pm
2
15-445/645 (Fall 2020)
UPCOMING DATABASE TALKS
Confluent ksqlDB (Kafka)→ Monday Nov 23rd @ 5pm ET
Microsoft SQL Server Optimizer→ Monday Nov 30th @ 5pm ET
Snowflake Lecture→ Monday Dec 7th @ 3:20pm ET
3
15-445/645 (Fall 2020)
PARALLEL VS. DISTRIBUTED
Parallel DBMSs:→ Nodes are physically close to each other.→ Nodes connected with high-speed LAN.→ Communication cost is assumed to be small.
Distributed DBMSs:→ Nodes can be far from each other.→ Nodes connected using public network.→ Communication cost and problems cannot be ignored.
4
15-445/645 (Fall 2020)
DISTRIBUTED DBMSs
Use the building blocks that we covered in single-node DBMSs to now support transaction processing and query execution in distributed environments.→ Optimization & Planning→ Concurrency Control→ Logging & Recovery
5
15-445/645 (Fall 2020)
TODAY'S AGENDA
System Architectures
Design Issues
Partitioning Schemes
Distributed Concurrency Control
6
15-445/645 (Fall 2020)
SYSTEM ARCHITECTURE
A DBMS's system architecture specifies what shared resources are directly accessible to CPUs.
This affects how CPUs coordinate with each other and where they retrieve/store objects in the database.
7
15-445/645 (Fall 2020)
SYSTEM ARCHITECTURE
8
SharedNothing
Network
SharedMemory
Network
SharedDisk
Network
SharedEverything
15-445/645 (Fall 2020)
SHARED MEMORY
CPUs have access to common memory address space via a fast interconnect.→ Each processor has a global view of all the
in-memory data structures. → Each DBMS instance on a processor has to
"know" about the other instances.
9
Network
15-445/645 (Fall 2020)
SHARED DISK
All CPUs can access a single logical disk directly via an interconnect, but each have their own private memories.→ Can scale execution layer independently
from the storage layer.→ Must send messages between CPUs to
learn about their current state.
10
Network
15-445/645 (Fall 2020)
Storage
SHARED DISK EXAMPLE
11
Node
ApplicationServer Node
Get Id=101Page ABC
15-445/645 (Fall 2020)
Storage
SHARED DISK EXAMPLE
11
Node
ApplicationServer Node
Get Id=200Page XYZ
15-445/645 (Fall 2020)
Storage
SHARED DISK EXAMPLE
11
Node
ApplicationServer Node
NodeGet Id=101 Page ABC
15-445/645 (Fall 2020)
Storage
SHARED DISK EXAMPLE
11
Node
ApplicationServer Node
Node
Update 101Page ABC
15-445/645 (Fall 2020)
Storage
SHARED DISK EXAMPLE
11
Node
ApplicationServer Node
Node
Update 101Page ABC
15-445/645 (Fall 2020)
SHARED NOTHING
Each DBMS instance has its own CPU, memory, and disk.
Nodes only communicate with each other via network.→ Harder to scale capacity.→ Harder to ensure consistency.→ Better performance & efficiency.
12
Network
15-445/645 (Fall 2020)
SHARED NOTHING EXAMPLE
13
Node
ApplicationServer Node
P1→ID:1-150
P2→ID:151-300
Get Id=200
15-445/645 (Fall 2020)
SHARED NOTHING EXAMPLE
13
Node
ApplicationServer Node
P1→ID:1-150
P2→ID:151-300
Get Id=10 Get Id=200
Get Id=200
15-445/645 (Fall 2020)
SHARED NOTHING EXAMPLE
13
Node
ApplicationServer Node
Node
P3→ID:101-200
P1→ID:1-100
P2→ID:201-300
15-445/645 (Fall 2020)
EARLY DISTRIBUTED DATABASE SYSTEMS
MUFFIN – UC Berkeley (1979)
SDD-1 – CCA (1979)
System R* – IBM Research (1984)
Gamma – Univ. of Wisconsin (1986)
NonStop SQL – Tandem (1987)
14
Bernstein
Mohan DeWitt
Gray
Stonebraker
15-445/645 (Fall 2020)
DESIGN ISSUES
How does the application find data?
How to execute queries on distributed data?→ Push query to data.→ Pull data to query.
How does the DBMS ensure correctness?
15
15-445/645 (Fall 2020)
HOMOGENOUS VS. HETEROGENOUS
Approach #1: Homogenous Nodes→ Every node in the cluster can perform the same set of
tasks (albeit on potentially different partitions of data).→ Makes provisioning and failover "easier".
Approach #2: Heterogenous Nodes→ Nodes are assigned specific tasks.→ Can allow a single physical node to host multiple "virtual"
node types for dedicated tasks.
16
15-445/645 (Fall 2020)
MONGODB HETEROGENOUS ARCHITECTURE
17
Router (mongos)
Shards (mongod)
P3 P4
P1 P2
P1→ID:1-100
P2→ID:101-200
P3→ID:201-300
P4→ID:301-400
Config Server (mongod)
Router (mongos)
⋮
⋮
ApplicationServer
Get Id=101
15-445/645 (Fall 2020)
DATA TRANSPARENCY
Users should not be required to know where data is physically located, how tables are partitionedor replicated.
A query that works on a single-node DBMS should work the same on a distributed DBMS.
18
15-445/645 (Fall 2020)
DATABASE PARTITIONING
Split database across multiple resources:→ Disks, nodes, processors.→ Often called "sharding" in NoSQL systems.
The DBMS executes query fragments on each partition and then combines the results to produce a single answer.
19
15-445/645 (Fall 2020)
NAÏVE TABLE PARTITIONING
Assign an entire table to a single node.
Assumes that each node has enough storage space for an entire table.
Ideal if queries never join data across tables stored on different nodes and access patterns are uniform.
20
15-445/645 (Fall 2020)
NAÏVE TABLE PARTITIONING
21
Table1
SELECT * FROM table
Ideal Query:
Table2 Partitions
15-445/645 (Fall 2020)
NAÏVE TABLE PARTITIONING
21
Table1
SELECT * FROM table
Ideal Query:
Table2 Partitions
15-445/645 (Fall 2020)
NAÏVE TABLE PARTITIONING
21
Table1
SELECT * FROM table
Ideal Query:
Table2 Partitions
Table1
Table2
15-445/645 (Fall 2020)
HORIZONTAL PARTITIONING
Split a table's tuples into disjoint subsets.→ Choose column(s) that divides the database equally in
terms of size, load, or usage.→ Hash Partitioning, Range Partitioning
The DBMS can partition a database physical(shared nothing) or logically (shared disk).
22
15-445/645 (Fall 2020)
HORIZONTAL PARTITIONING
23
SELECT * FROM tableWHERE partitionKey = ?
Ideal Query:
PartitionsTable1101 a XXX 2019-11-29
102 b XXY 2019-11-28
103 c XYZ 2019-11-29
104 d XYX 2019-11-27
105 e XYY 2019-11-29
hash(a)%4 = P2
hash(b)%4 = P4
hash(c)%4 = P3
hash(d)%4 = P2
hash(e)%4 = P1
Partitioning Key
15-445/645 (Fall 2020)
HORIZONTAL PARTITIONING
23
SELECT * FROM tableWHERE partitionKey = ?
Ideal Query:
PartitionsTable1101 a XXX 2019-11-29
102 b XXY 2019-11-28
103 c XYZ 2019-11-29
104 d XYX 2019-11-27
105 e XYY 2019-11-29
hash(a)%4 = P2
hash(b)%4 = P4
hash(c)%4 = P3
hash(d)%4 = P2
hash(e)%4 = P1
Partitioning Key
15-445/645 (Fall 2020)
HORIZONTAL PARTITIONING
23
SELECT * FROM tableWHERE partitionKey = ?
Ideal Query:
PartitionsTable1101 a XXX 2019-11-29
102 b XXY 2019-11-28
103 c XYZ 2019-11-29
104 d XYX 2019-11-27
105 e XYY 2019-11-29
P3 P4
P1 P2
hash(a)%4 = P2
hash(b)%4 = P4
hash(c)%4 = P3
hash(d)%4 = P2
hash(e)%4 = P1
Partitioning Key
15-445/645 (Fall 2020)
HORIZONTAL PARTITIONING
23
SELECT * FROM tableWHERE partitionKey = ?
Ideal Query:
PartitionsTable1101 a XXX 2019-11-29
102 b XXY 2019-11-28
103 c XYZ 2019-11-29
104 d XYX 2019-11-27
105 e XYY 2019-11-29
P3 P4
P1 P2
hash(a)%4 = P2
hash(b)%4 = P4
hash(c)%4 = P3
hash(d)%4 = P2
hash(e)%4 = P1
Partitioning Key
15-445/645 (Fall 2020)
HORIZONTAL PARTITIONING
23
SELECT * FROM tableWHERE partitionKey = ?
Ideal Query:
PartitionsTable1101 a XXX 2019-11-29
102 b XXY 2019-11-28
103 c XYZ 2019-11-29
104 d XYX 2019-11-27
105 e XYY 2019-11-29
P3 P4
P1 P2
hash(a)%4 = P2
hash(b)%4 = P4
hash(c)%4 = P3
hash(d)%4 = P2
hash(e)%4 = P1
Partitioning Key
15-445/645 (Fall 2020)
HORIZONTAL PARTITIONING
23
SELECT * FROM tableWHERE partitionKey = ?
Ideal Query:
PartitionsTable1101 a XXX 2019-11-29
102 b XXY 2019-11-28
103 c XYZ 2019-11-29
104 d XYX 2019-11-27
105 e XYY 2019-11-29
P3 P4
P1 P2
Partitioning Key
hash(a)%5 = P4
hash(b)%5 = P3
hash(c)%5 = P5
hash(d)%5 = P1
hash(e)%5 = P3
15-445/645 (Fall 2020)
CONSISTENT HASHING
24
01
0.5
hash(key1)
P1
P3
P2
15-445/645 (Fall 2020)
CONSISTENT HASHING
24
01
0.5
hash(key2)
hash(key1)
P1
P3
P2
15-445/645 (Fall 2020)
CONSISTENT HASHING
24
01
0.5
hash(key2)
hash(key1)
P1
P3
P2
15-445/645 (Fall 2020)
CONSISTENT HASHING
24
01
0.5
If hash(key)=P4
P1
P3
P4P2
15-445/645 (Fall 2020)
CONSISTENT HASHING
24
01
0.5
P5
P1
P3
P4P2
15-445/645 (Fall 2020)
CONSISTENT HASHING
24
01
0.5
P5
P1
P3
P4P2
P6
15-445/645 (Fall 2020)
CONSISTENT HASHING
24
01
0.5
Replication Factor = 3P5
P1
P3
P4P2
P6
15-445/645 (Fall 2020)
CONSISTENT HASHING
24
01
0.5
Replication Factor = 3
hash(key1)
P5
P1
P3
P4P2
P6
15-445/645 (Fall 2020)
CONSISTENT HASHING
24
01
0.5
Replication Factor = 3
hash(key1)
P5
P1
P3
P4P2
P6
15-445/645 (Fall 2020)
Storage
LOGICAL PARTITIONING
25
Node
ApplicationServer Node
Id=1
Id=2
Id=3
Id=4
15-445/645 (Fall 2020)
Storage
LOGICAL PARTITIONING
25
Node
ApplicationServer Node
Get Id=1
Id=1
Id=2
Id=3
Id=4
Id=1
Id=2
Id=3
Id=4
15-445/645 (Fall 2020)
Storage
LOGICAL PARTITIONING
25
Node
ApplicationServer Node
Get Id=3
Id=1
Id=2
Id=3
Id=4
Id=1
Id=2
Id=3
Id=4
15-445/645 (Fall 2020)
Node
Node
PHYSICAL PARTITIONING
26
ApplicationServer
Get Id=1Id=1
Id=2
Id=3
Id=4
15-445/645 (Fall 2020)
Node
Node
PHYSICAL PARTITIONING
26
ApplicationServer
Get Id=3
Id=1
Id=2
Id=3
Id=4
15-445/645 (Fall 2020)
SINGLE-NODE VS. DISTRIBUTED
A single-node txn only accesses data that is contained on one partition.→ The DBMS does not need coordinate the behavior
concurrent txns running on other nodes.
A distributed txn accesses data at one or more partitions.→ Requires expensive coordination.
28
15-445/645 (Fall 2020)
TRANSACTION COORDINATION
If our DBMS supports multi-operation and distributed txns, we need a way to coordinate their execution in the system.
Two different approaches:→ Centralized: Global "traffic cop".→ Decentralized: Nodes organize themselves.
29
15-445/645 (Fall 2020)
TP MONITORS
A TP Monitor is an example of a centralized coordinator for distributed DBMSs.
Originally developed in the 1970-80s to provide txns between terminals and mainframe databases.→ Examples: ATMs, Airline Reservations.
Many DBMSs now support the same functionality internally.
30
15-445/645 (Fall 2020)
Coordinator
CENTRALIZED COORDINATOR
31
Partitions
ApplicationServer P3 P4
P1 P2
15-445/645 (Fall 2020)
Coordinator
CENTRALIZED COORDINATOR
31
PartitionsLock Request
ApplicationServer P3 P4
P1 P2
P1
P2
P3
P4
15-445/645 (Fall 2020)
Coordinator
CENTRALIZED COORDINATOR
31
PartitionsLock Request
ApplicationServer P3 P4
P1 P2
P1
P2
P3
P4
15-445/645 (Fall 2020)
Coordinator
CENTRALIZED COORDINATOR
31
PartitionsLock Request
Acknowledgement
ApplicationServer P3 P4
P1 P2
P1
P2
P3
P4
15-445/645 (Fall 2020)
Coordinator
CENTRALIZED COORDINATOR
31
PartitionsCommit Request
Safe to commit?Application
Server P3 P4
P1 P2
P1
P2
P3
P4
15-445/645 (Fall 2020)
Coordinator
CENTRALIZED COORDINATOR
31
Partitions
Acknowledgement
Commit Request
Safe to commit?Application
Server P3 P4
P1 P2
P1
P2
P3
P4
15-445/645 (Fall 2020)
CENTRALIZED COORDINATOR
32
Mid
dle
wa
re
Query Requests
ApplicationServer P3 P4
P1 P2
Partitions
15-445/645 (Fall 2020)
CENTRALIZED COORDINATOR
32
Mid
dle
wa
re
Query Requests
ApplicationServer P3 P4
P1 P2
P1→ID:1-100
P2→ID:101-200
P3→ID:201-300
P4→ID:301-400
Partitions
15-445/645 (Fall 2020)
CENTRALIZED COORDINATOR
32
Mid
dle
wa
re
Query Requests
ApplicationServer P3 P4
P1 P2
P1→ID:1-100
P2→ID:101-200
P3→ID:201-300
P4→ID:301-400
Partitions
15-445/645 (Fall 2020)
CENTRALIZED COORDINATOR
32
Mid
dle
wa
re
Safe to commit?
ApplicationServer P3 P4
P1 P2
P1→ID:1-100
P2→ID:101-200
P3→ID:201-300
P4→ID:301-400
Commit Request
Partitions
15-445/645 (Fall 2020)
P3 P4
P1 P2
DECENTRALIZED COORDINATOR
33
ApplicationServer
Begin Request
PartitionsMaster Node
15-445/645 (Fall 2020)
P3 P4
P1 P2
DECENTRALIZED COORDINATOR
33
ApplicationServer
Query Request
PartitionsMaster Node
15-445/645 (Fall 2020)
P3 P4
P1 P2
DECENTRALIZED COORDINATOR
33
ApplicationServer
Safe to commit?
Commit Request
PartitionsMaster Node
15-445/645 (Fall 2020)
DISTRIBUTED CONCURRENCY CONTROL
Need to allow multiple txns to execute simultaneously across multiple nodes.→ Many of the same protocols from single-node DBMSs
can be adapted.
This is harder because of:→ Replication.→ Network Communication Overhead.→ Node Failures.→ Clock Skew.
34
15-445/645 (Fall 2020)
DISTRIBUTED 2PL
35
Node 1 Node 2
NETWORK
Set A=2
A=1
Set B=7
B=8
ApplicationServer
ApplicationServer
15-445/645 (Fall 2020)
DISTRIBUTED 2PL
35
Node 1 Node 2
NETWORK
Set A=2
A=1A=2
Set B=7
B=8B=7
ApplicationServer
ApplicationServer
15-445/645 (Fall 2020)
DISTRIBUTED 2PL
35
Node 1 Node 2
NETWORK
Set A=2
A=1A=2
Set B=7
B=8B=7
ApplicationServer
ApplicationServerSet B=9 Set A=0
15-445/645 (Fall 2020)
DISTRIBUTED 2PL
35
Node 1 Node 2
NETWORK
Set A=2
A=1A=2
Set B=7
B=8B=7
ApplicationServer
ApplicationServerSet B=9 Set A=0
Waits-For Graph
T1 T2
15-445/645 (Fall 2020)
CONCLUSION
I have barely scratched the surface on distributed database systems…
It is hard to get this right.
36
15-445/645 (Fall 2020)
NEXT CL ASS
Distributed OLTP Systems
Replication
CAP Theorem
Real-World Examples
37