+ All Categories
Home > Documents > New Database Replication and Data Integration with Hadoop...

New Database Replication and Data Integration with Hadoop...

Date post: 04-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
40
New Database Replication and Data Integration with Hadoop and BI Jeffrey Surretsky NYOUG December 2013
Transcript
Page 1: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

New Database Replication and Data Integration with Hadoop and BI

Jeffrey Surretsky

NYOUG

December 2013

Page 2: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

2

Big Data –Hadoop®

Page 3: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

3

Petabyte

Exabyte

Zettabyte

Terabyte

The explosion of data continues to burden the data tool chain

Transactional DataTraditionally, only transactional data was generated and stored in databases

• Structured

• Measured growth

Human FilesBut over time, we started creating unstructured data

• Likes, tweets, relationships (social)

• Log files (machine)

• Exponential growth

Social & Machines have added exponentially

mainframe PC internet mobile machine

• Docs, Images, Video

• Multiple formats

• Fast growth

Page 4: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

4

• Proliferation of new user generated data creation and data capture technologies

• Increased “interconnectedness” drives consumption (creating more data)

• Inexpensive storage makes it possible to keep more data longer

• Need to extract actionable insights from all data assets to gain competitive edge

*Source: IDC 2011

Big data market drivers

VelocityBatchNear timeReal timeStreams

VolumePetabytesRecordsTransactionsTables, files

VarietyStructuredUnstructuredSemi-structuredAll the above

3Vs

Page 5: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

5

Big dataScaling up on RDBMs

• Partitioning

• Materialized Views

• In memory cache

• …and who are we kidding here!

RDBMS Yodabytes handle cannot!

Page 6: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

6

Jan 1990

Big dataRDBMS Cluster

SQL

Jan 1990Feb

1990

SQL

Mar 1990

SQL

Apr 1990

SQL

May 1990

SQL

Jun 1990

SQL

Jul 1990

SQL

Aug 1990

SQL

Jun 2013

SQL

Controller

Page 7: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

7

Big data - Hadoop

Page 8: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

9

Big data – Hadoop benefits

Scalable storage

Massive parallel processing

Cost effective

Page 9: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

10

Hadoop operational use cases

Staging

Warehousing

Archiving

1 2 3

Not glamorous, but highly effective.

Page 10: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

11

Today’s solutions

Analytics

OLTPData

Warehouse

Page 11: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

12

Log-based CDC Replication

• Near real-time log-based CDC from Oracle

• Applying Changes to Hadoop

Page 12: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

13

Redo/Archive logs

Log-based CDC from Oracle-to-Oracle Architecture

Source Target

Export queue

Post queue

SQL

Post

Capture

Read

Export Import

Capture queue

Page 13: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

14

Log-based CDC Replication – impact-free and limitless!

Page 14: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

15

Capturequeue

Postqueue

Log-based CDC Data Integration Architecture

Target(s)

Capture

Read

JMS post

…And more

Combined source & target process implementation

Near real-time data integration

Custom App

Dell App

Oracle source

Redo/Archive logs

JMS queue

JMS queue

Page 15: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

16

JMS queue

Log-based CDC Database Replication & Near Real-time Data Integration Summary

Source Target(s)

…And more

Near real-time data integration Custom app

Database replication

Page 16: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

17

Connector for Hadoop

• Provides near real-time data replication from Oracle to Hadoop environments. The solution enables organizations to affordably replicate live data from Oracle tables

– In near real time to HDFS and Hive environments

– In real time to HBase

Page 17: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

18

HBase HDFS

Page 18: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

19

SQOOP

JMS

HBase HDFS

Page 19: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

20

SQOOP

JMS

HBase HDFS

Page 20: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

21

JMS

HBase HDFS

Page 21: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

22

HBase HDFS

Page 22: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

23

HBase HDFS

Page 23: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

24

Log-based CDC

Connector for HadoopJMS

HBase HDFS

Page 24: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

25

SharePlex for Oracle

Connector for HadoopJMS

HBase HDFS

Page 25: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

26

SharePlex for Oracle

Connector for HadoopJMS

HBase HDFS

Page 26: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

27

SharePlex for Oracle

Connector for HadoopJMS

HBase HDFS

Page 27: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

28

SharePlex for Oracle

Connector for HadoopJMS

HBase HDFS

Page 28: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

29

Log-based CDC

Connector for HadoopJMS

HBase HDFS

Page 29: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

30

Log-based CDC

Connector for HadoopJMS

HBase HDFS

Page 30: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

31

Log-based CDC

Connector for HadoopJMS

HBase HDFS

Page 31: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

32

Log-based CDC

Connector for HadoopJMS

HBase HDFS

Page 32: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

33

Log-based CDC

Connector for HadoopJMS

HBase HDFS

Page 33: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

34

Log-based CDC

Connector for HadoopJMS

HBase HDFS

Page 34: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

35

Log-based CDC

Connector for HadoopJMS

HBase HDFS

Page 35: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

36

Log-based CDC

Connector for HadoopJMS

HBase HDFS

Page 36: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

37

Log-based CDC

Connector for HadoopJMS

HBase HDFS

Page 37: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

38

Log-based CDC

SQOOP

Connector for HadoopJMS

HBase HDFS

SharePlex Connector for Hadoop architecture

Page 38: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

39

Siebel CRM

PeopleSoftHR

SAPManufacturing

OracleFinancials

Data warehouse, stage and archive

Reporting Dashboards

Analytics

SharePlex Connector for Hadoop – use case

...

Page 39: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

40

Questions

Page 40: New Database Replication and Data Integration with Hadoop ...nyoug.org/Presentations/2013/Winter/Surretsky-NYOUG-2013-Hadoo… · Analytics OLTP Data Warehouse. 12 Log-based CDC Replication

41


Recommended