Date post: | 17-Dec-2014 |
Category: |
Technology |
Upload: | stormdbclouddatabase |
View: | 6,984 times |
Download: | 3 times |
Postgres-XC: Write-Scalable PostgreSQL Cluster
Mason Sharp
August 7th, 2012
CC License: Attribution-NonCommercial-ShareAlike
Aug 7, 2012 2
Content Attribution
• Koichi Suzuki• Michael Paquier• Ashutosh Bapat• Pavan Deolasee• Mason Sharp• ...?
Aug 7, 2012 Postgres-XC 3
Who am I
● Mason Sharp
● Co-organizer of NYC PUG
● Co-founder of StormDB
● Previously worked at EnterpriseDB
● Original architect of Stado (GridSQL)
● One of the original architects of Postgres-XC
Aug 7, 2012 Postgres-XC 4
PostgreSQL User Groups
San Francisco New York616 Members 502 Members
Tokyo2000? Members
New:PhiladelphiaLos Angeles
Aug 7, 2012 Postgres-XC 5
NYC PUG Meetup Membership
Aug 7, 2012 Postgres-XC 6
NYC PUG Speakers
● Recent speakers include● Bruce Momjian● Greg Smith● Greg Stark● Joe Conway● Joachim Wieland
Aug 7, 2012 Postgres-XC 7
NYC PUG Speakers
We want you!
8
Postges-XC Talk
● Background● Postgres-XC Introduction & Usage● Postgres-XC Components● Postgres-XC Details
Aug 7, 2012 Postgres-XC 9
Background
Aug 7, 2012 Postgres-XC 10
Data Tier Scaling
● Up versus Out
● More memory, more cores● Read-only Replicated Slaves
● Caching
● Memcached● Sharding
● NoSQL
● NewSQL
Aug 7, 2012 Postgres-XC 11
XC Origins
Koichi Suzuki, NTT Data Mason Sharp
Aug 7, 2012 Postgres-XC 12
PostgreSQL-Related Clustering Projects
● pgpool-II
● Read replicated slaves● PL/Proxy
● Used by Skype, meetme (myYearbook)● All access is over a stored function
● Postgres-R, PostgresForest
● Stado (GridSQL)
● Parallel Query● Not write-scalable
Can we make it write scalable?
Aug 7, 2012 Postgres-XC 13
Postgres-XC Introduction
Aug 7, 2012 Postgres-XC 14
Overview
● PostgreSQL-based database cluster● Same API to Apps as PostgreSQL
– Same drivers● Currently based upon PG 9.1. Soon: 9.2.
● Symmetric Multi-headed Cluster
● No master, no slave– Not just PostgreSQL replication.
– Application can read/write to any coordinator server● Consistent database view to all the transactions
– Complete ACID property to all the transactions in the cluster
● Scales both for Write and Read
Aug 7, 2012 Postgres-XC 15
Postgres-XC Cluster
Coordinator
Data Node
PG-XC Server
Coordinator
Data Node
Coordinator
Data Node
Coordinator
Data Node
・・・・・
Communication among PG-XC servers
Add PG-XC servers as needed
Global Transaction Manager
Application can connect to any server to have the same database view and service.
GTM
PG-XC Server PG-XC Server PG-XC Server
Aug 7, 2012 Postgres-XC 16
Read/Write Scalability
DBT-1 throughput scalability
Aug 7, 2012 Postgres-XC 17
I Consistency
Aug 7, 2012 Postgres-XC 18
Is XC right for you?
● I need write scalability● I like ACID● I like SQL● I don't want to rewrite my existing SQL
applications● I want to leverage the PostgreSQL community
for all of their contrib modules
Aug 7, 2012 Postgres-XC 19
Why XC may not be right for you
● I need MPP parallel query capability● Parallel Query in XC Limited● Try Stado: www.stado.us
● I need a solution with built-in HA● I need massive scale and have loose
consistency requirements● I would rather use a NoSQL solution so I can
put it on my resume
Aug 7, 2012 Postgres-XC 20
Postgres-XC Components
Aug 7, 2012 Postgres-XC 21
Coordinator Overview● Based on PostgreSQL 9.1 (9.2 soon)● Accepts connections from clients● Parses and plans requests● Interacts with Global Transaction Manager● Uses pooler for Data Node connections● Sends down XIDs and snapshots to Data
Nodes● Collects results and returns to client● Uses two phase commit if necessary
22
Data Node Overview● Based on PostgreSQL 9.1 (9.2 soon)● Where user created data is actually
stored● Coordinators (not clients) connects to
Data Nodes● Accepts XID and snapshots from
Coordinator● The rest is fairly similar to vanilla
PostgreSQL
23
Aug 7, 2012 Postgres-XC 24
Global Transaction Manager
Cluster nodesGTM
XIDSnapshotTimestampSequence values
Aug 7, 2012 Postgres-XC 25
Summary● Coordinator
● Visible to apps
● SQL analysis, planning, execution
● Connection pooling
● Datanode (or simply “NODE”)
● Actual database store
● Local SQL execution
● GTM (Global Transaction Manager)
● Provides consistent database view to transactions
– GXID (Global Transaction ID)
– Snapshot (List of active transactions)
– Other global values such as SEQUENCE
● GTM Proxy, integrates server-local transaction requirement for performance
Postgres-XC core, based uponvanilla PostgreSQL
Share same binary
May want to colocate
Different binaries
Aug 7, 2012 Postgres-XC 26
Data Distribution
Distribution Strategies
Aug 7, 2012 Postgres-XC 27
Distributing the data
● Replicated table● Each row in the table is replicated to the datanodes● Statement based replication
● Distributed table● Each row of the table is stored on one datanode,
decided by one of following strategies– Hash– Round Robin– Modulo– Range and user defined function (future)
Aug 7, 2012 Postgres-XC 28
Table Distribution and Replication
● Each table can be distributed or replicated● Strategy based on usage
– Transaction tables → Distributed– Static lookup tables → Replicate– Distribute parent-children together
● Join pushdown when possible● Where clause pushdown● Simple parallel aggregates
Aug 7, 2012 Postgres-XC 29
Defining Tables
● Table Distribution/Replication● CREATE TABLE tab (…) DISTRIBUTE BY
HASH(col) | MODULO(col) | ROUND ROBIN | REPLICATION
Aug 7, 2012 Postgres-XC 30
Replicated Tables
Writes
write write write
val val2
1 2
2 10
3 4
val val2
1 2
2 10
3 4
val val2
1 2
2 10
3 4
Reads
read
val val2
1 2
2 10
3 4
val val2
1 2
2 10
3 4
val val2
1 2
2 10
3 4
Aug 7, 2012 Postgres-XC 31
Distributed Tables
Combiner
Read
read read read
val val2
1 2
2 10
3 4
val val2
11 21
21 101
31 41
val val2
10 20
20 100
30 40
Write
write
val val2
1 2
2 10
3 4
val val2
11 21
21 101
31 41
val val2
10 20
20 100
30 40
Aug 7, 2012 Postgres-XC 32
Join PushdownHash/Module distributed
Round Robin Replicated
Hash/Modulo distributed
Inner join with equality condition on the distribution column with same data type and same distribution strategy
NO Inner join if replicated table's distribution list is superset of distributed table's distribution list
Round Robin No No Inner join if replicated table's distribution list is superset of distributed table's distribution list
Replicated Inner join if replicated table's distribution list is superset of distributed table's distribution list
Inner join if replicated table's distribution list is superset of distributed table's distribution list
All kinds of joins
Aug 7, 2012 Postgres-XC 33
Constraints● XC does not support Global constraints – i.e.
constraints across datanodes● Constraints within a datanode are supportedDistribution strategy Unique, primary key
constraintsForeign key constraints
Replicated Supported Supported if the referenced table is also replicated on the same nodes
Hash/Modulo distributed Supported if primary OR unique key is distribution key
Supported if the referenced table is replicated on same nodes OR it's distributed by primary key in the same manner and same nodes
Round Robin Not supported Supported if the referenced table is replicated on same nodes
Aug 7, 2012 Postgres-XC 34
Demo
Aug 7, 2012 Postgres-XC 35
Transaction Management
Why MVCC is Important for ConsistencyGlobal Transaction Manger
Aug 7, 2012 Postgres-XC 36
Multi-version Concurrency Control (MVCC) (quick overview)
● Readers do not block writers
● Writers do not block readers
● Transaction Ids (XIDs)
● Every transaction gets an ID● Snapshots contain a list of running XIDs
Aug 7, 2012 Postgres-XC 37
Multi-version Concurrency Control (MVCC) (quickly discussed)
Example:
T1 Begin...
T2 Begin; INSERT...; Commit
T3 Begin...
T4 Begin; SELECT
● T4's snapshot contains T1 and T3
● T2 already committed● It can see T2's commits, but not T1's nor T3's
Aug 7, 2012 Postgres-XC 38
Multi-version Concurrency Control (MVCC) on 2 Independent Nodes
Example:T1 Begin...
T2 Begin; INSERT..; Commit;
T3 Begin...
T4 Begin; SELECT
● Node 1: T2 Commit, T4 SELECT
● Node 2: T4 SELECT, T2 Commit
● T4's SELECT statement returns inconsistent data
● Includes data from Node1, but not Node2. ● C in ACID Fails
Aug 7, 2012 Postgres-XC 39
Global Transaction Manager (GTM)
● Provides Global Transaction Consistency
Cluster nodesGTM
XIDSnapshotTimestampSequence values
Aug 7, 2012 Postgres-XC 40
Transaction Management
● 2PC is used to guarantee transactional consistency across nodes
● When there are more than one nodes involved OR● When there are explicit 2PC transactions
● Only those nodes where write activity has happened, participate in 2PC
● In PostgreSQL 2PC can not be applied if temporary tables are involved. Same restriction applies in Postgres-XC
● When single coordinator command needs multiple datanode commands, we encase those in transaction block
Aug 7, 2012 Postgres-XC 41
Postgres-XC Considerations
• Depending on implementation– Current Implementation
– Large snapshot size and number– Too many interaction between GTM and Coordinators
Can GTM be a Performance Bottleneck?
July 12th, 2012 42
Applicable up to five PG-XC servers (DBT-1)
GTM Threads
GTM
Sna
psh
ot
Da
ta
GTM Main Thread
Create Terminate
Coordinators
Coordinator Backend
Clie
nt
Libr
ary
Co
ord
ina
tor
Ca
llInte
rne
t D
om
ain
So
cke
t
Lock
Can GTM be a Performance Bottleneck?
July 12th, 2012 43
Proxy Implementation
•Request/Response grouping•Single representative snapshot applied to multiple transactions
GTM Worker Threads
GTM
Sna
psh
ot
Da
ta
GTM Main Thread
Create Terminate
Coordinators
Coordinator Backend
Clie
nt
Libr
ary
Co
ord
ina
tor
Ca
llInte
rne
t D
om
ain
S
ocke
t
Lock
GT
M S
nap
shot
Han
dler
GT
M S
erv
er
Sca
nne
r
Ca
ll
Ser
ver
Pro
toco
l Han
dler
Proxy Main Thread
Ba
cken
d C
omm
and
Han
dler
Ba
cken
d R
esp
ons
e H
andl
er
Un
ix
Do
ma
in
Soc
ket
Connection
CreateTerminate
ConnectionAssignment
GTM Proxy Thread
Can GTM be a SPOF?
July 12th, 2012 44
• Implement GTM Standby
GTM Master GTM Standby
Checkpoint next starting point (GXID and Sequence)
Standby can failover the master without referring to GTM master information.
Aug 7, 2012 Postgres-XC 45
Parallel Query
● OK for simple queries● Also when all joins can be pushed down
– Star schema with replicated dimensions
● Even aggregates● SELECT SUM(col1) FROM tab1
● If cross-node join needed performs poorly● Data on one node needs to join with another● Ships all data to coordinator for joining
Aug 7, 2012 Postgres-XC 46
High Availability
● GTM-standby provides basic HA● No native HA for nodes
● Use HA middleware such as Pacemaker
● Each data node should be configured with synchronous replication
Aug 7, 2012 Postgres-XC 47
Status
Settings and options
Aug 7, 2012 Postgres-XC 48
Present Status
● Project/Developer site● http://postgres-xc.sourceforge.net/
● http://sourceforge.net/projects/postgres-xc/
● Version 1.0 available● Base PostgreSQL version: 9.1● Soon, PostgreSQL 9.2!
– Group commit: even more write scalability– “Index-only Scans”
● Get Involved● Even as just a tester
Aug 7, 2012 Postgres-XC 49
Easy way of trying it out?
● www.stormdb.com● Not Postgres-XC, but similar● Nothing to install, cloud hosted● Free beta