Date post: | 12-Apr-2017 |
Category: |
Software |
Upload: | frederic-descamps |
View: | 448 times |
Download: | 6 times |
Galera Replication Demystified
how does it work ?
1/262
about.me/lefred
Who am I ?
2/262
Frédéric Descamps@lefred
3/262
Frédéric Descamps@lefredWorking for Percona since 2011
4/262
Frédéric Descamps@lefredWorking for Percona since 2011Senior Architect
5/262
Frédéric Descamps@lefredWorking for Percona since 2011Senior ArchitectManaging MySQL since 3.23
6/262
Frédéric Descamps@lefredWorking for Percona since 2011Senior ArchitectManaging MySQL since 3.23devops believer
7/262
Frédéric Descamps@lefredWorking for Percona since 2011Senior ArchitectManaging MySQL since 3.23devops believerand I installed my first Galera Cluster in February 2010 ;-)
8/262
Galera Replication
Cluster
9/262
Galera Replication - Cluster
What is it ?
What does it handle ?
10/262
Standard asyncrhonous replication is server-centric, one serverstreams data to another one. All the nodes have a specific role.
11/262
In Galera, the dataset is synchronized between one or more servers:data-centric
12/262
You can write to any node in your cluster No need to worry abouteventual out-of-sync
13/262
Write events/transactions are sent in parallel
14/262
Cluster Membershipdetermined by the clusterwsrep_cluster_address is just a pointerany node is permitted to join that
knows the cluster namecan find a single active cluster node
15/262
Cluster Membershipdetermined by the clusterwsrep_cluster_address is just a pointerany node is permitted to join that
knows the cluster namecan find a single active cluster node
㫙�㫙�㫙�㫘�㫙�㫘�㫘�㫙�㫙�㫙�㫙�㫘�㫙�㫘�㫘�㫘�㫘�㫙�㫘�㫙�㫙�㫔�㫖�㫔�㫘�㫘�㫙�㫙�㫙�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�
16/262
The cluster manages quorum
and has split-brain protection.
17/262
18/262
19/262
20/262
21/262
22/262
23/262
Replication
24/262
ReplicationDelivers the writeset to all nodes in the cluster
25/262
ReplicationDelivers the writeset to all nodes in the cluster
and all nodes acknowledge the writeset
26/262
ReplicationDelivers the writeset to all nodes in the cluster
and all nodes acknowledge the writeset
27/262
ReplicationDelivers the writeset to all nodes in the cluster
and all nodes acknowledge the writesetCost is ~roundtrip latency to furthest node
28/262
ReplicationDelivers the writeset to all nodes in the cluster
and all nodes acknowledge the writesetCost is ~roundtrip latency to furthest nodeSerialized by Group Communication
29/262
GTID
30/262
GTIDNot the same as 5.6 Aynchronous GTID's
31/262
GTIDNot the same as 5.6 Aynchronous GTID's
though they appear the same
32/262
GTIDNot the same as 5.6 Aynchronous GTID's
though they appear the same939aac77-f7d1-11e3-bd5e-b211d6ab1ec6:1534285
33/262
GTIDNot the same as 5.6 Aynchronous GTID's
though they appear the same939aac77-f7d1-11e3-bd5e-b211d6ab1ec6:1534285
GTIDs ensure cluster members are consistent with each other
34/262
GTIDNot the same as 5.6 Aynchronous GTID's
though they appear the same939aac77-f7d1-11e3-bd5e-b211d6ab1ec6:1534285
GTIDs ensure cluster members are consistent with each othernodes joining a cluster have their GTIDs checked
35/262
GTIDNot the same as 5.6 Aynchronous GTID's
though they appear the same939aac77-f7d1-11e3-bd5e-b211d6ab1ec6:1534285
GTIDs ensure cluster members are consistent with each othernodes joining a cluster have their GTIDs checked
GTIDs can be used to compare downed nodes to each other
36/262
GTIDThe highest GTID is the most recently written
Generally the best practice it to bootstrap the node with the most recentdata
37/262
GTID
38/262
GTID
39/262
GTID
40/262
Global Transaction IDsinitial dataset
bfb912e5-f560-11e2-0800-1eefab05e57d:0
41/262
Global Transaction IDsinitial dataset
bfb912e5-f560-11e2-0800-1eefab05e57d:0
first change/transaction/writeset
bfb912e5-f560-11e2-0800-1eefab05e57d:1
42/262
Global Transaction IDsinitial dataset
bfb912e5-f560-11e2-0800-1eefab05e57d:0
first change/transaction/writeset
bfb912e5-f560-11e2-0800-1eefab05e57d:1
undefined GTID
00000000-0000-0000-0000-000000000000:-1
43/262
Global Transaction IDs : Galera vsMySQL 5.6
44/262
Global Transaction IDs : Galera vsMySQL 5.6
45/262
Global Transaction IDs : Galera vsMySQL 5.6
46/262
Global Transaction IDs : Galera vsMySQL 5.6
In MySQL 5.6㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫘�㫘�㫕�㫕�㫕�㫘�㫔�㫙�㫘�㫙�㫔�㫙�㫘�㫙�㫙�㫘�㫙�㫔�㫙�㫙�㫙�㫙�㫙�㫙�㫘�㫘�㫔�㫘�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�
47/262
Global Transaction IDs : Galera vsMySQL 5.6
In Galera㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫘�㫘�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫘�㫘�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫘�㫘�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�
48/262
Global Transaction IDs : Galera vsMySQL 5.6
In Galera㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫘�㫘�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫘�㫘�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫘�㫘�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫔�㫙�㫘�㫙�㫔�㫙�㫘�㫙�㫙�㫘�㫙�㫔�㫙�㫙�㫙�㫙�㫙�㫙�㫘�㫘�㫔�㫘�
49/262
Global Transaction IDs : Galera vsMySQL 5.6
In Galera㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫘�㫘�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫘�㫘�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫘�㫘�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫔�㫙�㫘�㫙�㫔�㫙�㫘�㫙�㫙�㫘�㫙�㫔�㫙�㫙�㫙�㫙�㫙�㫙�㫘�㫘�㫔�㫘�㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫘�㫘�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫘�㫘�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫘�㫘�㫕�㫕�㫘�㫕�㫕�㫘�㫘�㫕�㫕�㫕�㫕�㫕�
50/262
GTID Assignment
UUID
The UUID section, 128-bit, is generated during bootstrapping to identifythe cluster.
Generated by mixing the timer value and pseudo-random numbers(depending primarily from the timer and PID), but, currently, there is nouse of NIC's MAC-address although in theory it may be done that way.
More info: https://gist.github.com/lefred/88a2cec88d03854d9934
51/262
GTID Assignment (3)
SEQNO
The seqno, 64-bit, is incremented only when the transaction passescertification and is ready for commit.
For the curious, the algorithm used to derive sequence number is TotemSingle-ring Ordering protocol.
Before that, there is already communication between the nodes, groupcommunication is used to define a group-channel id which is a locallymaintained counter by each node in sync with the group.
52/262
Serialization of writesets
53/262
Serialization of writesets
54/262
Serialization of writesets
55/262
Serialization of writesets
56/262
Serialization of writesets
57/262
Serialization of writesets
58/262
Serialization of writesets
59/262
Serialization of writesets
60/262
Serialization of writesets
61/262
Serialization of writesets
62/262
Serialization of writesets
63/262
Serialization of writesets
64/262
Serialization of writesets
65/262
Serialization of writesets
66/262
RolesWe have 4 distinct roles in Galera:
2 for replication2 for state transfer
67/262
Replication RolesWithin the cluster, all nodes are equal
68/262
Replication RolesWithin the cluster, all nodes are equal'master/donor node'
69/262
Replication RolesWithin the cluster, all nodes are equal'master/donor node'
The node a given transaction was written and committed on.
70/262
Replication RolesWithin the cluster, all nodes are equal'master/donor node'
The node a given transaction was written and committed on.'slave/joiner node'
71/262
Replication RolesWithin the cluster, all nodes are equal'master/donor node'
The node a given transaction was written and committed on.'slave/joiner node'
The node that received the given transaction via Galerareplication.
72/262
Replication RolesWithin the cluster, all nodes are equal'master/donor node'
The node a given transaction was written and committed on.'slave/joiner node'
The node that received the given transaction via Galerareplication.
The terms master and slave in this context are only relevant for agiven transaction
73/262
Replication RolesWithin the cluster, all nodes are equal'master/donor node'
The node a given transaction was written and committed on.'slave/joiner node'
The node that received the given transaction via Galerareplication.
The terms master and slave in this context are only relevant for agiven transactionWriteset: Galera’s term for a transaction. One or more RBR rowchanges
74/262
State Transfer RolesNew nodes joining an existing cluster get provisioned automatically
75/262
State Transfer RolesNew nodes joining an existing cluster get provisioned automatically
Joiner = Mew node
76/262
State Transfer RolesNew nodes joining an existing cluster get provisioned automatically
Joiner = Mew nodeDonor = Node giving a copy of the datadir
77/262
State Transfer RolesNew nodes joining an existing cluster get provisioned automatically
Joiner = Mew nodeDonor = Node giving a copy of the datadir
State Snapshot transfer
78/262
State Transfer RolesNew nodes joining an existing cluster get provisioned automatically
Joiner = Mew nodeDonor = Node giving a copy of the datadir
State Snapshot transferFull backup of Donor to Joiner
79/262
State Transfer RolesNew nodes joining an existing cluster get provisioned automatically
Joiner = Mew nodeDonor = Node giving a copy of the datadir
State Snapshot transferFull backup of Donor to Joiner
Incremental Snapshot transfer
80/262
State Transfer RolesNew nodes joining an existing cluster get provisioned automatically
Joiner = Mew nodeDonor = Node giving a copy of the datadir
State Snapshot transferFull backup of Donor to Joiner
Incremental Snapshot transferOnly changes since node left cluster
81/262
WritesetRBR payload (black box to Galera)
82/262
WritesetRBR payload (black box to Galera)Replication keys (generated by master/donor node)
83/262
WritesetRBR payload (black box to Galera)Replication keys (generated by master/donor node)
Primary keys
84/262
WritesetRBR payload (black box to Galera)Replication keys (generated by master/donor node)
Primary keysUnique keys
85/262
WritesetRBR payload (black box to Galera)Replication keys (generated by master/donor node)
Primary keysUnique keysForeign Keys
86/262
WritesetRBR payload (black box to Galera)Replication keys (generated by master/donor node)
Primary keysUnique keysForeign KeysTable names
87/262
WritesetRBR payload (black box to Galera)Replication keys (generated by master/donor node)
Primary keysUnique keysForeign KeysTable namesSchema names
88/262
WritesetRBR payload (black box to Galera)Replication keys (generated by master/donor node)
Primary keysUnique keysForeign KeysTable namesSchema names
Keys are what make certification possible
89/262
Replication ?It consists in 4 operations:
ApplyReplicationCertificationCommit
The order differs with the node's role
90/262
Replication Order on Master/Donor1. Apply
2. Replication3. Certification4. Commit
91/262
Replication Order on Slave/Joiner1. Replication (from master/donor)
2. Certification3. Apply4. Commit
92/262
CertificationCan this writeset be applied?
93/262
CertificationCan this writeset be applied?
Based on unapplied earlier transactions on master/donor
94/262
CertificationCan this writeset be applied?
Based on unapplied earlier transactions on master/donorSuch conflicts must come from other nodes
95/262
CertificationCan this writeset be applied?
Based on unapplied earlier transactions on master/donorSuch conflicts must come from other nodes
Happens on every node
96/262
CertificationCan this writeset be applied?
Based on unapplied earlier transactions on master/donorSuch conflicts must come from other nodes
Happens on every nodeShould be deterministic on every node
97/262
CertificationCan this writeset be applied?
Based on unapplied earlier transactions on master/donorSuch conflicts must come from other nodes
Happens on every nodeShould be deterministic on every nodeResults are not reported to the cluster
98/262
CertificationCan this writeset be applied?
Based on unapplied earlier transactions on master/donorSuch conflicts must come from other nodes
Happens on every nodeShould be deterministic on every nodeResults are not reported to the cluster
Pass: enter apply queue (commit success on master/donor)
99/262
CertificationCan this writeset be applied?
Based on unapplied earlier transactions on master/donorSuch conflicts must come from other nodes
Happens on every nodeShould be deterministic on every nodeResults are not reported to the cluster
Pass: enter apply queue (commit success on master/donor)Fail: drop transaction (or return deadlock on master/donor)
100/262
CertificationCan this writeset be applied?
Based on unapplied earlier transactions on master/donorSuch conflicts must come from other nodes
Happens on every nodeShould be deterministic on every nodeResults are not reported to the cluster
Pass: enter apply queue (commit success on master/donor)Fail: drop transaction (or return deadlock on master/donor)
Serialized by group communication sequence (and GTID will besynchronized following the same sequence)
101/262
CertificationCan this writeset be applied?
Based on unapplied earlier transactions on master/donorSuch conflicts must come from other nodes
Happens on every nodeShould be deterministic on every nodeResults are not reported to the cluster
Pass: enter apply queue (commit success on master/donor)Fail: drop transaction (or return deadlock on master/donor)
Serialized by group communication sequence (and GTID will besynchronized following the same sequence)Cost based on # of keys or # of rows
102/262
ApplyApply is done on slave nodes after certification
103/262
ApplyApply is done on slave nodes after certificationCan be parallelized
104/262
ApplyApply is done on slave nodes after certificationCan be parallelized
if wsrep_slave_threads > 1
105/262
ApplyApply is done on slave nodes after certificationCan be parallelized
if wsrep_slave_threads > 1if there are no other writesets with conflicting keys also beingapplied
106/262
ApplyApply is done on slave nodes after certificationCan be parallelized
if wsrep_slave_threads > 1if there are no other writesets with conflicting keys also beingapplied
Cost: size of transaction
107/262
ApplyApply is done on slave nodes after certificationCan be parallelized
if wsrep_slave_threads > 1if there are no other writesets with conflicting keys also beingapplied
Cost: size of transactionGenerates brute force aborts on local node for conflicts
108/262
CommitFinal local InnoDB commit
109/262
CommitFinal local InnoDB commit
i.e., innodb_flush_log_at_trx_commit
110/262
CommitFinal local InnoDB commit
i.e., innodb_flush_log_at_trx_commitGTID gets generated
111/262
CommitFinal local InnoDB commit
i.e., innodb_flush_log_at_trx_commitGTID gets generatedDone by applier threads on slaves/joiners
112/262
CommitFinal local InnoDB commit
i.e., innodb_flush_log_at_trx_commitGTID gets generatedDone by applier threads on slaves/joinersDone by client thread on master/donor
113/262
CommitFinal local InnoDB commit
i.e., innodb_flush_log_at_trx_commitGTID gets generatedDone by applier threads on slaves/joinersDone by client thread on master/donorinnodb_flush_log_at_trx_commit=1 not required generally for PXC!
114/262
Galera Replication (autocommit)
115/262
Galera Replication (autocommit)
116/262
Galera Replication (autocommit)
117/262
Galera Replication (autocommit)
118/262
Galera Replication (autocommit)
119/262
Galera Replication (autocommit)
120/262
Galera Replication (autocommit)
121/262
Galera Replication (full transaction)
122/262
Galera Replication (full transaction)
123/262
Galera Replication (full transaction)
124/262
Galera Replication (full transaction)
125/262
Galera Replication (full transaction)
126/262
Galera Replication (full transaction)
127/262
Galera Replication (full transaction)
128/262
Galera Replication (full transaction)
129/262
Galera Replication (full transaction)
130/262
Optimistic LockingTraditional locking
131/262
Optimistic LockingTraditional locking
132/262
Optimistic LockingTraditional locking
133/262
Optimistic LockingTraditional locking
134/262
Optimistic LockingTraditional locking
135/262
Optimistic LockingTraditional locking
136/262
Optimistic LockingOptimistic Locking
137/262
Optimistic LockingOptimistic Locking
138/262
Optimistic LockingOptimistic Locking
139/262
Optimistic LockingOptimistic Locking
140/262
Optimistic LockingOptimistic Locking
141/262
Optimistic LockingOptimistic Locking
142/262
Optimistic LockingOptimistic Locking
143/262
Certification Failure
144/262
Certification Failure
Trx1 is open on 㫗�㫙�㫘�㫘�㫕�Trx2 is open on 㫗�㫙�㫘�㫘�㫕�
145/262
Certification Failure
㫗�㫙�㫘�㫘�㫕� gets 㫖�㫗�㫗�㫗�㫗�㫗�
146/262
Certification Failure
Synchronous replication
147/262
Certification Failure
Certification testsrun in isolation on each node
148/262
Certification Failure
Certification tests:asynchronous
149/262
Certification Failure
Synchronous replicationdeterministic
150/262
Certification Failure
Certification succeeds
151/262
Certification Failure
Certified transaction goes to the apply queue
152/262
Certification Failure
153/262
Certification Failure
On 㫗�㫙�㫘�㫘�㫕�, a successful cert test, means an actual 㫘�㫙�㫙�㫙�㫘�㫙�
154/262
Certification Failure
and transactions in the apply queue (㫗�㫙�㫘�㫘�㫕� & 㫗�㫙�㫘�㫘�㫕�) are executedasynchronously
155/262
Certification Failure
On 㫙�㫙�㫘�㫘�㫕� we commit the transaction
156/262
Certification Failure
Synchronous replication
157/262
Certification Failure
158/262
Certification Failure
159/262
Certification Failure
160/262
Certification Failure
161/262
Certification Failure
162/262
Certification Failure
163/262
Certification Failure
164/262
Brute Force Abort (bfa)
165/262
Brute Force Abort (bfa)
166/262
Brute Force Abort (bfa)
167/262
Brute Force Abort (bfa)
168/262
Brute Force Abort (bfa)
169/262
Brute Force Abort (bfa)
170/262
Brute Force Abort (bfa)
171/262
Local Certification Failure (lcf)
172/262
Local Certification Failure (lcf)
173/262
Local Certification Failure (lcf)
174/262
Local Certification Failure (lcf)
175/262
Local Certification Failure (lcf)
176/262
Local Certification Failure (lcf)
177/262
Local Certification Failure (lcf)
178/262
Local Certification Failure (lcf)
179/262
Local Certification Failure (lcf)
180/262
Local Certification Failure (lcf)
181/262
Local Certification Failure (lcf)
182/262
Local Certification Failure (lcf)
183/262
Certification Errors: summary
184/262
Certification Errors: summary
185/262
Certification Errors: summary
186/262
Flow ControlAbility of any node in the cluster to ask the rest of the nodes topause writes while it catches up
187/262
Flow ControlAbility of any node in the cluster to ask the rest of the nodes topause writes while it catches upFeedback mechanism for replication process
188/262
Flow ControlAbility of any node in the cluster to ask the rest of the nodes topause writes while it catches upFeedback mechanism for replication processONLY caused by 㫙�㫙�㫙�㫘�㫙�㫘�㫙�㫙�㫘�㫘�㫙�㫘�㫙�㫘�㫘�㫙�㫘�㫙�㫙�㫘�㫙�㫘� exceeding a node's㫘�㫘�㫘�㫙�㫘�㫙�㫘�㫙�
189/262
Flow ControlAbility of any node in the cluster to ask the rest of the nodes topause writes while it catches upFeedback mechanism for replication processONLY caused by 㫙�㫙�㫙�㫘�㫙�㫘�㫙�㫙�㫘�㫘�㫙�㫘�㫙�㫘�㫘�㫙�㫘�㫙�㫙�㫘�㫙�㫘� exceeding a node's㫘�㫘�㫘�㫙�㫘�㫙�㫘�㫙�CAN pause the entire cluster and look like a cluster stall !
190/262
Tuning Flow ControlToo low:
191/262
Tuning Flow ControlToo low:
frequent FC from any and all nodes in the cluster
192/262
Tuning Flow ControlToo low:
frequent FC from any and all nodes in the clusterToo high:
193/262
Tuning Flow ControlToo low:
frequent FC from any and all nodes in the clusterToo high:
increase in replication conflicts in multi-node writings
194/262
Tuning Flow ControlToo low:
frequent FC from any and all nodes in the clusterToo high:
increase in replication conflicts in multi-node writingsincrease in apply lag on nodes
195/262
Tuning Flow ControlToo low:
frequent FC from any and all nodes in the clusterToo high:
increase in replication conflicts in multi-node writingsincrease in apply lag on nodesincrease in commit lag on nodes
196/262
Tuning Flow ControlToo low:
frequent FC from any and all nodes in the clusterToo high:
increase in replication conflicts in multi-node writingsincrease in apply lag on nodesincrease in commit lag on nodes
One node with FC issues
197/262
Tuning Flow ControlToo low:
frequent FC from any and all nodes in the clusterToo high:
increase in replication conflicts in multi-node writingsincrease in apply lag on nodesincrease in commit lag on nodes
One node with FC issuesdeal with that node -bad hardware? too slow ?
198/262
Flow Control
199/262
Flow Control
200/262
Flow Control
201/262
Flow Control
202/262
Flow Control
203/262
Flow Control
204/262
Flow Control
205/262
Flow Control
206/262
Flow Control
207/262
Flow Control
208/262
Flow Control
209/262
Flow Control
210/262
Flow Control
211/262
Flow Control
212/262
Flow Control
213/262
Flow Control
214/262
Flow Control
215/262
Flow Control
216/262
Flow Control
217/262
Flow Control
218/262
Flow Control
219/262
State TransferThere are two types of State Transfer in Galera:
220/262
State TransferThere are two types of State Transfer in Galera:
1. SST (Snapshot State Transfer): full data copy
221/262
State TransferThere are two types of State Transfer in Galera:
1. SST (Snapshot State Transfer): full data copyrsyncmysqldumpxtrabackup
222/262
State TransferThere are two types of State Transfer in Galera:
1. SST (Snapshot State Transfer): full data copyrsyncmysqldumpxtrabackup
2. IST (Incremental State Transfer): only copy the missing events
223/262
State TransferThere are two types of State Transfer in Galera:
1. SST (Snapshot State Transfer): full data copyrsyncmysqldumpxtrabackup
2. IST (Incremental State Transfer): only copy the missing events
It's always better to try to avoid SST!
224/262
State TransferThere are two types of State Transfer in Galera:
1. SST (Snapshot State Transfer): full data copyrsyncmysqldumpxtrabackup
2. IST (Incremental State Transfer): only copy the missing events
It's always better to try to avoid SST!
㫙�㫙�㫙�㫘�㫙�㫘�㫙�㫙�㫙�㫘�㫘�㫙�㫙�㫙�㫙� can be used to specify the donor
225/262
State Transfer: grastate.datWhen MySQL starts it checks 㫘�㫙�㫘�㫙�㫙�㫘�㫙�㫘�㫕�㫘�㫘�㫙� file (in datadir)
226/262
State Transfer: grastate.datWhen MySQL starts it checks 㫘�㫙�㫘�㫙�㫙�㫘�㫙�㫘�㫕�㫘�㫘�㫙� file (in datadir)This file placeholds GTID between MySQL restarts
227/262
State Transfer: grastate.datWhen MySQL starts it checks 㫘�㫙�㫘�㫙�㫙�㫘�㫙�㫘�㫕�㫘�㫘�㫙� file (in datadir)This file placeholds GTID between MySQL restartsContains UUID and Seqno
228/262
State Transfer: grastate.datWhen MySQL starts it checks 㫘�㫙�㫘�㫙�㫙�㫘�㫙�㫘�㫕�㫘�㫘�㫙� file (in datadir)This file placeholds GTID between MySQL restartsContains UUID and Seqno
㫔�㫔�㫗�㫖�㫗�㫖�㫗�㫖�㫔�㫙�㫘�㫙�㫘�㫘�㫔�㫙�㫙�㫘�㫙�㫘�㫙�㫘�㫙�㫙�㫘�㫙�㫙�㫕�㫔�㫕�㫕�㫕�㫙�㫙�㫘�㫘�㫕�㫔�㫔�㫔�㫔�㫘�㫘�㫘�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫙�㫘�㫙�㫙�㫙�㫕�㫔�㫔�㫔�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫙�㫙�㫘�㫘�㫙�㫘�㫘�㫙�㫕�
229/262
State Transfer: grastate.datWhen MySQL starts it checks 㫘�㫙�㫘�㫙�㫙�㫘�㫙�㫘�㫕�㫘�㫘�㫙� file (in datadir)This file placeholds GTID between MySQL restartsContains UUID and Seqno
㫔�㫔�㫗�㫖�㫗�㫖�㫗�㫖�㫔�㫙�㫘�㫙�㫘�㫘�㫔�㫙�㫙�㫘�㫙�㫘�㫙�㫘�㫙�㫙�㫘�㫙�㫙�㫕�㫔�㫕�㫕�㫕�㫙�㫙�㫘�㫘�㫕�㫔�㫔�㫔�㫔�㫘�㫘�㫘�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫙�㫘�㫙�㫙�㫙�㫕�㫔�㫔�㫔�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫙�㫙�㫘�㫘�㫙�㫘�㫘�㫙�㫕�
node shutdown cleanly
230/262
grastate.dat - example㫔�㫔�㫗�㫖�㫗�㫖�㫗�㫖�㫔�㫙�㫘�㫙�㫘�㫘�㫔�㫙�㫙�㫘�㫙�㫘�㫙�㫘�㫙�㫙�㫘�㫙�㫙�㫕�㫔�㫕�㫕�㫕�㫙�㫙�㫘�㫘�㫕�㫔�㫔�㫔�㫔�㫘�㫘�㫘�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫙�㫘�㫙�㫙�㫙�㫕�㫔�㫔�㫔�㫕�㫕�㫘�㫘�㫙�㫙�㫘�㫘�㫙�㫘�㫘�㫙�㫕�
231/262
grastate.dat - example㫔�㫔�㫗�㫖�㫗�㫖�㫗�㫖�㫔�㫙�㫘�㫙�㫘�㫘�㫔�㫙�㫙�㫘�㫙�㫘�㫙�㫘�㫙�㫙�㫘�㫙�㫙�㫕�㫔�㫕�㫕�㫕�㫙�㫙�㫘�㫘�㫕�㫔�㫔�㫔�㫔�㫘�㫘�㫘�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫘�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫘�㫕�㫕�㫕�㫘�㫕�㫕�㫕�㫕�㫕�㫘�㫕�㫙�㫘�㫙�㫙�㫙�㫕�㫔�㫔�㫔�㫕�㫕�㫘�㫘�㫙�㫙�㫘�㫘�㫙�㫘�㫘�㫙�㫕�
node is running or did not shutdown cleanly
232/262
grastate.dat - example㫔�㫔�㫗�㫖�㫗�㫖�㫗�㫖�㫔�㫙�㫘�㫙�㫘�㫘�㫔�㫙�㫙�㫘�㫙�㫘�㫙�㫘�㫙�㫙�㫘�㫙�㫙�㫕�㫔�㫕�㫕�㫕�㫙�㫙�㫘�㫘�㫕�㫔�㫔�㫔�㫔�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫔�㫔�㫔�㫔�㫔�㫔�㫔�㫔�㫔�㫙�㫘�㫙�㫙�㫙�㫕�㫔�㫔�㫔�㫕�㫕�㫘�㫘�㫙�㫙�㫘�㫘�㫙�㫘�㫘�㫙�㫕�
233/262
grastate.dat - example㫔�㫔�㫗�㫖�㫗�㫖�㫗�㫖�㫔�㫙�㫘�㫙�㫘�㫘�㫔�㫙�㫙�㫘�㫙�㫘�㫙�㫘�㫙�㫙�㫘�㫙�㫙�㫕�㫔�㫕�㫕�㫕�㫙�㫙�㫘�㫘�㫕�㫔�㫔�㫔�㫔�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫕�㫔�㫔�㫔�㫔�㫔�㫔�㫔�㫔�㫔�㫙�㫘�㫙�㫙�㫙�㫕�㫔�㫔�㫔�㫕�㫕�㫘�㫘�㫙�㫙�㫘�㫘�㫙�㫘�㫘�㫙�㫕�
node aborted, SST on next restart
234/262
State Transfer: ISTWhen a node starts, it knowns the UUID of the cluster it belonged andthe last sequence number it applied.
So, it sends that position to the other members of the cluster and if anode can send the next events (ws/trx), IST will be performed, if none,then SST will be triggered.
235/262
Galera CacheThose events are stored on the 㫘�㫘�㫙�㫘�㫙�㫘�㫕�㫘�㫘�㫘�㫘�㫘� file.
236/262
Galera CacheThose events are stored on the 㫘�㫘�㫙�㫘�㫙�㫘�㫕�㫘�㫘�㫘�㫘�㫘� file.
preallocated file with a specific size
237/262
Galera CacheThose events are stored on the 㫘�㫘�㫙�㫘�㫙�㫘�㫕�㫘�㫘�㫘�㫘�㫘� file.
preallocated file with a specific sizeused to store the writesets in circular buffer style
238/262
Galera CacheThose events are stored on the 㫘�㫘�㫙�㫘�㫙�㫘�㫕�㫘�㫘�㫘�㫘�㫘� file.
preallocated file with a specific sizeused to store the writesets in circular buffer styledefault size is 128M
239/262
Galera CacheThose events are stored on the 㫘�㫘�㫙�㫘�㫙�㫘�㫕�㫘�㫘�㫘�㫘�㫘� file.
preallocated file with a specific sizeused to store the writesets in circular buffer styledefault size is 128Mcan be increase via provider option 㫘�㫘�㫘�㫘�㫘�㫘�㫕�㫙�㫘�㫚�㫘�
240/262
Galera CacheThose events are stored on the 㫘�㫘�㫙�㫘�㫙�㫘�㫕�㫘�㫘�㫘�㫘�㫘� file.
preallocated file with a specific sizeused to store the writesets in circular buffer styledefault size is 128Mcan be increase via provider option 㫘�㫘�㫘�㫘�㫘�㫘�㫕�㫙�㫘�㫚�㫘�
㫙�㫙�㫙�㫘�㫙�㫘�㫙�㫙�㫙�㫙�㫘�㫘�㫘�㫙�㫘�㫙�㫙�㫙�㫘�㫙�㫙�㫙�㫔�㫖�㫔�㫔�㫘�㫘�㫘�㫘�㫘�㫘�㫕�㫙�㫘�㫚�㫘�㫖�㫕�㫗�㫔�
241/262
Galera CacheThose events are stored on the 㫘�㫘�㫙�㫘�㫙�㫘�㫕�㫘�㫘�㫘�㫘�㫘� file.
preallocated file with a specific sizeused to store the writesets in circular buffer styledefault size is 128Mcan be increase via provider option 㫘�㫘�㫘�㫘�㫘�㫘�㫕�㫙�㫘�㫚�㫘�
㫙�㫙�㫙�㫘�㫙�㫘�㫙�㫙�㫙�㫙�㫘�㫘�㫘�㫙�㫘�㫙�㫙�㫙�㫘�㫙�㫙�㫙�㫔�㫖�㫔�㫔�㫘�㫘�㫘�㫘�㫘�㫘�㫕�㫙�㫘�㫚�㫘�㫖�㫕�㫗�㫔�Galare Cache is mmaped (I/O buffered to memory)
242/262
Galera CacheThose events are stored on the 㫘�㫘�㫙�㫘�㫙�㫘�㫕�㫘�㫘�㫘�㫘�㫘� file.
preallocated file with a specific sizeused to store the writesets in circular buffer styledefault size is 128Mcan be increase via provider option 㫘�㫘�㫘�㫘�㫘�㫘�㫕�㫙�㫘�㫚�㫘�
㫙�㫙�㫙�㫘�㫙�㫘�㫙�㫙�㫙�㫙�㫘�㫘�㫘�㫙�㫘�㫙�㫙�㫙�㫘�㫙�㫙�㫙�㫔�㫖�㫔�㫔�㫘�㫘�㫘�㫘�㫘�㫘�㫕�㫙�㫘�㫚�㫘�㫖�㫕�㫗�㫔�Galare Cache is mmaped (I/O buffered to memory)㫙�㫙�㫙�㫘�㫙�㫘�㫙�㫙�㫘�㫘�㫙�㫘�㫘�㫘�㫘�㫘�㫘�㫘�㫘�㫘�㫙�㫙�㫙�㫙�㫙� provide the first seqno present inthe cache for that node
243/262
Galera Cache & IST
244/262
Galera Cache & IST
245/262
Galera Cache & IST
246/262
Galera Cache & IST
247/262
Galera Cache & IST
248/262
Galera Cache & IST
249/262
Galera Cache & IST
250/262
Galera Cache & IST
251/262
Galera Cache & IST
252/262
Galera Cache & IST
253/262
Galera Cache & IST
254/262
Galera Cache & IST
255/262
Galera Cache & IST
256/262
Galera Cache & IST
257/262
Galera Cache & IST
258/262
Galera Cache & IST
259/262
Galera Cache & IST
260/262
Galera Cache & IST
261/262
Thank you !
262/262