Mashing the data

Post on 15-Feb-2017

132 views 1 download

transcript

Mashing the DataReal-Time replication from

MySQL to Google Cloud Datastore

Ingredients● MySQL● NodeJS● ZongJi● Google Cloud Datastore

There are two types of DBAs:1) DBAs that do backups

2) DBAs that will do backups

MySQL● Most used Open source DB - second place overall after Oracle (but almost

equal)*● Since 1995● Currently at version 5.7 (5.7.16 in Oct’16)● Several forks - MariaDB, Percona● Several storage engines, most used is InnoDB ● NDB Cluster and Master-Master Replication for HA

* According to http://db-engines.com/en/ranking

A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"

MySQL replication● Master - Slave(s)● Slaves can be Masters in their turn (Master->Slave->Slave->...->Slave)

○ log_slave_updates

● Only data modifying queries are logged (Create, Update, Delete; not Reads)

● 2 ½ types of replication○ Statement Based (SBR) -> binary log records queries (UPDATE … SET ..) which are then

replayed on slave

○ Row Based (RBR) -> binary log records directly the values of the affected row before and after the change is applied

○ Mixed -> binary log records a mix of SBR and RBR (default is SBR, but for certain statements + storage engine used, the log is automatically switched to row-based)

Q: Why do you never ask SQL people to help you move your furniture?

A: They sometimes drop the table

MySQL replication (cont’d)● SBR is good when changes affect lots of rows (as for e.g. 1k modified rows

we only send a few bytes across the wire)● SBR has problems when there are inconsistencies between master and

slave or when queries are not deterministic (e.g. UPDATE … SET … LIMIT 100)

● RBR is good in maintaining a better consistency (as every changed row is replicated)

● RBR can be problematic when many rows are changed with a single statement (lots of traffic over the network)

Google Cloud Datastore

What is GCD● NoSQL document database● Automatic scaling● High performance● Flexible storage

GCD (cont’d)● Balance of strong and eventual consistency

○ entity lookups by key and ancestor queries always receive strongly consistent data○ Other queries are eventually consistent

● Encryption at rest○ encrypts all data before it is written to disk

● Querying of data through GQL○ Similar with “classic” SQL; e.g. SELECT * FROM myKind WHERE myProp >= 100 AND

myProp < 200 or SELECT * FROM myKind ORDER BY myProp DESC LIMIT 100

● By default all properties are indexed, supports composite indexes (a bit more work to enable them though)

Our Setup

Setup

MySQL Master

MySQL Slave

SBR NodeJS App

RBR

Google Cloud Datastore

Google Cloud Node modules

Details about NodeJS App● Uses ZongJi (https://github.com/nevill/zongji - MySQL binlog listener)

var ZongJi = require('zongji');

var zongji = new ZongJi(config.database);

zongji.on('binlog',function (evt) {doSomething('binlog',evt)})

zongji.on('query', function(evt) {doSomething('query',evt)})

zongji.on('writerows',function(evt) {doSomething('insert',evt)})

zongji.on('updaterows', function(evt) {doSomething('update',evt)})

zongji.on('deleterows', function(evt) {doSomething('delete',evt)})

NodeJS (cont’d)zongji.start({

startAtEnd: true,

includeSchema: {yourDBhere":true,"yourOtherDBHere":true},//config.monitor,

includeEvents: [ 'tablemap', 'writerows', 'updaterows', 'deleterows' , 'query','rotate']

});

var doSomething = function(type, event) {

//event has a rows attribute containing every modified row

//it also has a tableMap containing table metadata (most important - table name)

}

NodeJS (last one, I promise)var sendToDataStore = function(namespace,idfldname,row) {

var k = datastore.key([namespace, row[idfldname]]);

datastore.save({key:k,data:row} ,function(err,res){

if(err) console.log("ERROR",err)

else console.log("OK",JSON.stringify(res))

});

}

Demo Time

In case the demo does not work

Thank you!