Date post: | 16-Apr-2017 |
Category: |
Data & Analytics |
Upload: | datamantra |
View: | 916 times |
Download: | 1 times |
Real Time ETL processing
By Veeramani Moorthy
Agenda
Real time ETL Architecture
Why Reconciler?
Reconciler Data model
Q & A?
Requirements for Reconciler
[1.2
.1]
JDB
C F
etch
Tab
le S
chem
a
Trail Files
AdapterRead
GoldenGate
Schema Registry[1.1] Data
Pump
• Schema Registry is a repository of ALL schemas which are versioned.• GoldenGate captures the table change events• Kafka – Distributed Messaging system• CDC – Change Data Capture
[2.1] CDC Events to
broker
Spark Reconciler Spark Joiner
Get Table Schema Get Table Schema
Streaming Reconciler
job
Write output
Reconciled Companies Topic
Source DB
Golden Gate
[1.0] Data Extract
[1.2
] G
et/
Cre
ate
/Up
dat
e Sc
hem
a
Real-Time ETL Architecture
Companies Topic
Addresses Topic
Streaming Joiner/Transfo
rmer Job
Streaming Reconciler
jobReconciled
Addresses Topic
Read/Write for Reconcile Addresses
Read/Write for Reconcile Companies
[3.1] CDC Events to
broker
Streaming Joiner/Transfo
rmer Job
fn
Mapping service
Get Mapping
Requirements for Reconciler
Support for Idempotency
Support for immutability
Support for Schema evolution
Support to handle out of order CDC events
Challenges in Spark streaming
Out of sequence
UPDATE comes first INSERT comes later
Challenges in Spark streaming …
Data model
Tuple Id Source DB Timestamp
Attribute Name Attribute value isDelete?
10201 12345677 company_id 10201 false
10201 12345677 company_name ABC Inc false
10201 12345677 company_addr EGL, BLR false
10201 22345677 company_addr Ecospace, BLR false
….
Company_id Company_name Company_addr
10201 ABC Inc EGL, BLR
….
Instead of
Go with
How does it solve?
Immutability?
Idempotency?
Out of sequence events?
Schema Evolution
Tuple Id Source DB Timestamp
Attribute Name Attribute value isDelete?
10201 12345677 company_id 10201 false
10201 12345677 company_name ABC Inc false
10201 12345677 company_addr EGL, BLR false
10201 22345677 company_addr Ecospace, BLR false
10201 22345900 Registered_name
ABC India Pvt Ltd
false
….
Do I have to change the destination schema?
Schema Evolution
Addition of new column
Deletion of an existing column
Data Type change