Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction...

transcript

Tungsten Replicator for Kafka, Elasticsearch, Cassandra

Topics

In todays session• Replicator Basics• Filtering and Glue• Kafka and Options• Elasticsearch and Options• Cassandra• Future Direction

Asynchronous replication decouples transaction processing on master and slave DBMS nodes

DBMS Logs

Download transactions via network

Apply using JDBC

THL = Events + Metadata

MySQL/Oracle

DBMS-specific Logging (i.e. Redo or Binary)

Option 1: Local InstallExtractor reads directly from the logs, even when the DBMS service is down. This is the default.

Option 2: RemoteExtractor gets log data via MySQL Replication Slave protocols (which requires the DBMS service to be online) or the Redo Reader feature. This is how we handle RDS and Oracle extraction tasks.

Extractor Options

Master Replicator: Extractor

2 1Slave Replicator:

Applier

MySQL/Oracle

Parallel apply maximizes DBMS I/O bandwidth when updating replicas

Master replicator

Parallel Queue

(Events+ metadata)

Extract Filter Apply Extract Filter Apply Extract Filter Apply

Extract Filter Apply

StageStage Stage

Slave Replicator Pipeline

remote-to-thl thl-to-q q-to-dbms

Why Kafka

• Kafka is a high performance message bus• NOT a database• Great for distributing messages and firing/triggering operations on content• Log aggregation• Activity/security tracking• Metrics• Auditing• Data ingestion for Hadoop

Mass Data Collection with Kafka

Tungsten Replicator

Multiple Target Distribution

Tungsten Replicator

Database

Image Process

Metrics

How Kafka Replication Works

Kafka Applier(Native)

Slave Replicator: Applier

Zookeeper

DBMS Logs

MySQL/Oracle

DBMS-specific Logging (i.e. Redo or Binary)

What Tungsten Replicator Does to Apply into Kafka

• Takes an incoming row and converts it to a message• Message consists of metadata:

– Schema name, table name– Sequence number– Commit timestamp– Operation Type

• Embedded Message Content

Message Structure

SchemaTable

RowRowRowRowRow

Topic: Schema_Table

MsgID: Schema Table PKey

Sample Message

{"_meta_committime" : "2017-05-27 14:27:18.0","_meta_source_schema" : "sbtest","_meta_seqno" : "10130","_meta_source_table" : "sbtest","_meta_optype" : "INSERT","record" : {

"c" : "Base Msg","k" : "100","id" : "255759","pad" : "Some other submsg"

Customizable Elements

• Whether acknowledgements are required from Kafka• How much distribution/replication is required before sending the message• Format of the message key• Whether to embed schema and table name• Whether the commit timestamp should be embedded

Elasticsearch

• Immediately replicate data into Elasticsearch for searching

• Contains the core text and content of the records

• Provides the original information to track back to the record

• Content structure against the schema (index type) and tablename (index)

• Document ID based on the pkey and other information which is configurable

How Elasticsearch Replication Works

DBMS Logs

Elasticsearch Applier(REST API)

Redo Logging

Slave Replicator: Applier

Redo ReaderGeneratedPLOG

Sample Entry

{ "_id" : "99999", "_type" : "mg", "found" : true, "_version" : 2, "_index" : "msg", "_source" : { "msg" : "Hello ElasticSearch", "id" : "99999" } }

Replicating into CassandraReplicating into Cassandra

Cassandra

• Great for fast online and CRM style deployments

• Highly fault tolerant and scalable

• Has some data and formatting changes– Currently needs our DDL translation tool (soon built-in)

• Quasi table/doccument style

How Cassandra Replication Works

Master Replicator

Slave Replicator

Ruby Connector

staging

Cassandra

Future Direction for these appliers and related technology

• Full transaction support for Kafka• Support for Amazon Elasticsearch• Kafka Extraction

– Parsing contents of Kafka message queues– Database updates– Large scale distribution of database changes– Filtering and re-submission

General Tungsten Replicator Functionality

• Expanding the standard filter technology– Data translation (dates, numbers, hex)– Basic lookup/combination to aid ETL style deployments– Data munging/obfuscation (PII, credit cards) for analytics

• More appliers– InfluxDB– SQL Server– PostgreSQL– Hadoop JDBC– MemSQL– Amazon (Aurora, Elasticsearch)– CouchDB/Base

• THL Compression/Encryption

Next Steps

• If you are interested in knowing more about Tungsten Replicator and would like to try it out for yourself, please contact our sales team who will be able to take you through the details and setup a POC – sales@continuent.com

• Read the documentation at http://docs.continuent.com/tungsten-replicator-5.2/index.html

• Subscribe to our Tungsten University YouTube channel! http://tinyurl.com/TungstenUni

For more information, contact us:

MC BrownVP Productsmc.brown@continuent.com

Tungsten Replicator for Kafka, Elasticsearch, Cassandra to Kafka+Elastic... · •Full transaction...

Documents