Mashing the DataReal-Time replication from
MySQL to Google Cloud Datastore
Ingredients● MySQL● NodeJS● ZongJi● Google Cloud Datastore
There are two types of DBAs:1) DBAs that do backups
2) DBAs that will do backups
MySQL● Most used Open source DB - second place overall after Oracle (but almost
equal)*● Since 1995● Currently at version 5.7 (5.7.16 in Oct’16)● Several forks - MariaDB, Percona● Several storage engines, most used is InnoDB ● NDB Cluster and Master-Master Replication for HA
* According to http://db-engines.com/en/ranking
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
MySQL replication● Master - Slave(s)● Slaves can be Masters in their turn (Master->Slave->Slave->...->Slave)
○ log_slave_updates
● Only data modifying queries are logged (Create, Update, Delete; not Reads)
● 2 ½ types of replication○ Statement Based (SBR) -> binary log records queries (UPDATE … SET ..) which are then
replayed on slave
○ Row Based (RBR) -> binary log records directly the values of the affected row before and after the change is applied
○ Mixed -> binary log records a mix of SBR and RBR (default is SBR, but for certain statements + storage engine used, the log is automatically switched to row-based)
Q: Why do you never ask SQL people to help you move your furniture?
A: They sometimes drop the table
MySQL replication (cont’d)● SBR is good when changes affect lots of rows (as for e.g. 1k modified rows
we only send a few bytes across the wire)● SBR has problems when there are inconsistencies between master and
slave or when queries are not deterministic (e.g. UPDATE … SET … LIMIT 100)
● RBR is good in maintaining a better consistency (as every changed row is replicated)
● RBR can be problematic when many rows are changed with a single statement (lots of traffic over the network)
Google Cloud Datastore
What is GCD● NoSQL document database● Automatic scaling● High performance● Flexible storage
GCD (cont’d)● Balance of strong and eventual consistency
○ entity lookups by key and ancestor queries always receive strongly consistent data○ Other queries are eventually consistent
● Encryption at rest○ encrypts all data before it is written to disk
● Querying of data through GQL○ Similar with “classic” SQL; e.g. SELECT * FROM myKind WHERE myProp >= 100 AND
myProp < 200 or SELECT * FROM myKind ORDER BY myProp DESC LIMIT 100
● By default all properties are indexed, supports composite indexes (a bit more work to enable them though)
Our Setup
Setup
MySQL Master
MySQL Slave
SBR NodeJS App
RBR
Google Cloud Datastore
Google Cloud Node modules
Details about NodeJS App● Uses ZongJi (https://github.com/nevill/zongji - MySQL binlog listener)
var ZongJi = require('zongji');
var zongji = new ZongJi(config.database);
zongji.on('binlog',function (evt) {doSomething('binlog',evt)})
zongji.on('query', function(evt) {doSomething('query',evt)})
zongji.on('writerows',function(evt) {doSomething('insert',evt)})
zongji.on('updaterows', function(evt) {doSomething('update',evt)})
zongji.on('deleterows', function(evt) {doSomething('delete',evt)})
NodeJS (cont’d)zongji.start({
startAtEnd: true,
includeSchema: {yourDBhere":true,"yourOtherDBHere":true},//config.monitor,
includeEvents: [ 'tablemap', 'writerows', 'updaterows', 'deleterows' , 'query','rotate']
});
var doSomething = function(type, event) {
//event has a rows attribute containing every modified row
//it also has a tableMap containing table metadata (most important - table name)
}
NodeJS (last one, I promise)var sendToDataStore = function(namespace,idfldname,row) {
var k = datastore.key([namespace, row[idfldname]]);
datastore.save({key:k,data:row} ,function(err,res){
if(err) console.log("ERROR",err)
else console.log("OK",JSON.stringify(res))
});
}
Demo Time
In case the demo does not work
Thank you!