Post on 06-Feb-2018
transcript
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration Solutions
(DIS) An Overview
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 3
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Market Overview
estimates the DI market
will reach $2.8 billion
estimates the DQ market
will reach $1.75 billion
by 2016 with an average growth rate of 18.2%
Exponential Growth in Data Volumes
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration Solutions and Proven Benefits
5
Improve Agility • Deploy Projects Faster
• Reliable Real-Time
Reduce Risk • Popular, Proven Tools
• Open, Not Proprietary
Reduce Costs • Better Productivity
• Eliminate ETL Servers
Analytic Data Integration • Big Data Integration & Governance • Data Warehouse Integration • Business Intelligence Applications
Enterprise Data Integration and Governance • Enterprise Data Quality and Profiling • Comprehensive, Heterogeneous Data Integration • Business Glossary and Metadata Management
Business Continuity • Active-Active for Maximum Availability • Zero Downtime Migrations • Data Consolidation / Application Modernization
24 x 7 x 365
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration 12c Delivery real-time data integration for Cloud and Big Data
Big Data
Cloud
Apps
Database
• Real-time data replication; optimized for Database 12c and Oracle Exadata
• End-to-end integrated with simplified deployment
• Unified tooling for both structured data sources and Hadoop / NoSQL
• Flexible deployment on-premise or in the Cloud for heterogeneous systems
• Expanded support for 3rd party systems and Oracle Applications in real-time data integration and continuous availability solutions
Oracle Data Integrator
Oracle GoldenGate
Oracle Enterprise Data
Quality
Oracle Data Services
Integrator
Oracle Meta Data
Management
Oracle Active
Data Guard
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Comprehensive Data Integration & Governance Capabilities
7
Real-Time Data Movement – Low impact capture, stage in Hadoop – Continuous data availability
Data Transformation – Bulk data movement – Pushdown data processing
Data Federation – Virtualized Data Services
Data Quality & Verification – Fix quality at the source – Verify data consistency
Metadata Management – Lineage and Impact Analysis – Business Glossary Semantics
Data Governance Foundation
Oracle Data Integrator (Transformation)
Enterprise Data Quality (Profile, Cleanse, Match and De-duplicate)
Fast Load
Oracle GoldenGate (Movement)
Enterprise Metadata Management & Business Glossary (Business Glossary, Data Lineage, Impact Analysis and Data Provenance)
Data Service Integrator (Federation)
GoldenGate Veridata (Online Data Verification)
ELT Processing on Hadoop or SQL
Continuous Availability
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Governance Foundation
Differentiated Technical Approach
8
Dynamic Data Movement – Real-time CDC is by default, not ETL – Least invasive on sources – Proven best performance – Integrated Oracle capture/apply
No ETL Engines – Take the processing to the data;
don’t move the data to the process – Leverage your data engines for the
workloads (Hadoop or SQL)
Most Heterogeneous – Leverage open source Hadoop, not
proprietary distributions – Hadoop is the Hub, not ETL tools – Open metadata standards
Oracle Data Integrator (Transformation)
Enterprise Data Quality (Profile, Cleanse, Match and De-duplicate)
Fast Load
Oracle GoldenGate (Movement)
Enterprise Metadata Management & Business Glossary (Business Glossary, Data Lineage, Impact Analysis and Data Provenance)
Data Service Integrator (Federation)
GoldenGate Veridata (Online Data Verification)
ELT Processing on Hadoop or SQL
Continuous Availability
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Comprehensive, Open & Heterogeneous Data Integration
9
Hadoop HBase Hadoop Hive/Flume HP Enscribe HP NonStop HP Neoview Hypersonic SQL IBM DB2 i Series IBM DB2 UDB IBM DB2 z Series IBM Informix IBM Netezza JMS / MQ Microsoft Access Microsoft SQLServer MySQL Pivotal Greenplum PostgreSQL Salesforce.com SAP BW / BI SAP ERP / ECC SAS SQL/MP SQL/MX Sybase ASE Sybase IQ Teradata
Adaptive Altova Apache Hcatalog Apache Hive/HQL Borland CA ERwin Cloudera Impala COBOL Copybook DataStax Embarcadero EMC ProActivity GentleWare Google BigQuery Grandite Hadapt Hive Hortonworks Hive IBM Cognos IBM DB2 IBM DataStage IBM Discovery IBM Federation Server IBM Lotus Notes IBM Netezza IBM Rational Rose IBM Rational Architect Informatica Metadata Mgr. Informatica PowerCenter
CoSORT ISO SQL Standard (DDL) MapR Hadoop Hive MicroFocus Microsoft Access Microsoft Office Excel Microsoft Visio Microsoft SQL Server Microsoft SSIS Microsoft Visual Studio Microstrategy Magic Draw OMG CWM Standard OMG UML Standard Oracle BI Answers Oracle BI Enterprise Edition Oracle BI Server Oracle DAC Oracle Data Integrator Oracle Data Modeler Oracle Database Oracle Designer Oracle Hyperion Applications Oracle Hyperion Essbase Oracle Warehouse Builder Pivotal Greenplum PostgreSQL
QlikView SAP BO Crystal Reports SAP BO Designer SAP BO Desktop Intelligence SAP BO Repository SAP BO Data Integrator SAP BO Data Steward SAP Master Data Management SAP Sybase PowerDesigner SAP Sybase ASE Database SAS Data Integration Studio SAS BI Server SAS Information Map SAS Metadata Management SAS OLAP Server Select Sparx Architect Syncsort Tableau Talend Teradata Tigris Visible W3C DTD & XSD Schema
Operational Integration (Movement / Transformation) Metadata Harvesting (Glossary, Lineage & Impact Analysis) Oracle Database Oracle Exadata Oracle Big Data Appliance Oracle TimesTen Oracle OLAP Oracle Business Intelligence Oracle BI Applications Oracle E-Business Suite Oracle JD Edwards Enterprise One Oracle JD Edwards World Oracle Fusion Applications Oracle Governance Risk and Compliance Oracle Fusion AIA Oracle Retail Applications Oracle Agile BI / DW Oracle Agile PLM for Process Oracle iFlex FlexCUBE Oracle iFlex Mantas Oracle Hyperion Applications Oracle PeopleSoft Oracle Siebel CRM / OnDemand Oracle Communications Oracle WebLogic Server Oracle Coherence Data Grid Oracle SOA Suite Oracle Enterprise Service Bus
+ open APIs and standards based meta-model
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Enterprise Metadata Management
10
Packaged for two offerings:
Oracle Enterprise Metadata Management (OEMM) Fully featured enterprise edition product
Oracle Metadata Management for Oracle Business Intelligence (OMM) Limited for use with OBIEE, no Business Glossary
Key Features: Report to Source Lineage
Impact Analysis
Model Versioning
Annotations and Tagging
Supports Metadata Standards
Business Glossary
3rd Party BI Metadata
3rd Party ETL Metadata
3rd Party DB Metadata
Big Data Ready
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Metadata Harvesting from all Popular Platforms
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Enrichment Cloud Service (ODECS)
12
Data Discovery & Visualization
Desktop Analytics
Enterprise Reporting
Internet
Logs
Unstructured & Structured Data
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle GoldenGate Low-Impact, Real-Time Data Integration & Transactional Replication
PERFORMANCE: Low-
impact Real-Time Data
Integration and Replication
FLEXIBLE: Open,
Modular Architecture –
Heterogeneous including
Cloud and Big Data
RELIABLE: Maintains,
Transactional Integrity –
Resilient against Failures
Real-Time Changed Data Capture
Data Integrator
New DB/ HW/OS/APP
Fully Active Distributed
DB
Reporting Database
Data Warehouse
Message Bus
Oracle & Non-Oracle Database(s)
Cloud
Cloud & On-
Premises
Big Data
Message Bus
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integrator E-LT: Bulk Data Processing and Fast Data Transformation
Big Data
Cloud
Apps
Database
Oracle Data Integrator
High Performance E-LT
Declarative Design
Extensible Knowledge
Modules
Data Services
Structured &
Unstructured Data
• Certified for leading technologies to deliver fast time to value
• High-performance, low cost of ownership E-LT architecture
• Lightweight deployment
• Flexible, easy to enrich functionality
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 15
Industry Leading Performance Extremely Fast Execution and Reduced Cost
E-LT provides a flexible architecture for
optimized performance on any platform
Benefits
Leverages set-based transformations
Improves performance for loading,
no network hop
Takes advantage of existing infrastructure:
hardware and software
Conventional ETL Architecture
Extract Load
Transform
Next Generation Architecture
“E-LT”
Load Extract
Transform Transform
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Sources
Oracle Enterprise Data Quality
Parsing Standardization Cleansing Matching Merging
Targets
Oracle Data Integrator
E-LT/ETL Process
- Continuous Quality Monitoring - Quality Alerts
4
Create new Data Quality Rules
2
- Add Data Quality to E-LT/ETL Flow
3
Profile Data 1
EDQ and ODI: Comprehensive Data Quality Process
Data Profiling
• Analyze and understand
data
to build ODI mappings
Automated Processes
• Data De-duplication
• Semantic/Contextual data
parsing, cleansing and
standardization
• Address Validation &
Geolocation > 240 countries
• invoked in ODI workflow
Measure Ongoing Data Quality
• Assess quality of data
• in target system. How well
is ETL working?
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Reduce ODI Implementation Time and Risk 50% of data warehouse/BI projects have limited acceptance or are outright failures as a
result of lack of attention to data quality issues
ETL mappings should not be solely developed based on specifications
Data Profiling helps uncover defects, patterns, formats early in the ETL development process
Use EDQ Profiling to analyze and understand your data and required mappings
Populate a Data Warehouse with High Quality Data Avoid making poor decisions based on poor data (avoid garbage-in, garbage-out)
Platform for Data Governance/Data Stewardship and ongoing quality improvement
Engage business users in defining and implementing appropriate business rules
Use EDQ Batch Processing to deliver accurate, consistent and complete data
ODI and EDQ: Core Use Cases
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
EDQ Product Architecture
• All Java Server (Stateless)
• Java Webstart Client Applications
• Fully integrated with a single repository and UI
• Batch and Real-time Execution
• Connects to virtually any source/target of data
• Platform Independent
18
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Core Four Plays for Data Integration Solutions
DIS for
Business Intelligence &
Data Warehousing
Modernization and
Consolidations
High Availability Data Integration
for Oracle Applications
19
24 X 7
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
• Real-time or near-real time data feeds
• Move to EL-T and remove middle-tier ETL
• Integrated Data Cleansing using EDQ
• Optimized for Exadata
• Make business decisions with real-time data
• Oracle’s BI Apps solutions
20
DIS for Business Intelligence & Data Warehousing
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Infrastructure Modernization
• Move away from legacy to gain better ROI & drive innovation
• Cross platform (DB/OS), Cloud or On-Premises with ease
Data Consolidation to Exadata and Cloud
• DIS is red stack optimized – only ODI can run on Exadata for best data loading performance
• OGG fully supports Oracle on Exadata
• EDQ for de-duplication and cleansing of data
21
Modernization and Consolidation
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Active-Active, Multi-Master, Disaster Recovery
• Simply the best HA solution using Oracle GoldenGate and Active Data Guard
• Make better use of your HA investments
Zero Downtime Operations
• Avoid downtime planned or unplanned
• Keep production systems making money for the company!
• Reduce risk with fail-back or phased migrations
22
High Availability
24 X 7
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
High Availability Solution: Avoid Planned or Unplanned Downtime
Solution • Zero downtime operations for any supported database. o Upgrades, Maintenance for HW, OS, DB or Applications
• Logical, heterogeneous replication with GoldenGate for DR for non-Oracle databases
• Active-Active bi-directional or multi-master replication with GoldenGate
• Physical replication with Active Data Guard o Best for Disaster Recovery for Oracle Applications o Best for Disaster Recovery for Oracle Database
Benefits • Ensure business continuity in any situation • Eliminate planned downtime for maintenance for any
supported database with Oracle GoldenGate • Improve ROI by utilizing standby database • Mitigate Risk for Applications and infrastructure upgrades
and migrations.
23
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Optimized for Oracle Applications
• Active-active data synchronization
• Integrated operational reporting
• Included in Oracle BI Applications
Data Integration for SOA
• Large, complex transformations
• Direct to database integration connections with no impact on performance, avoiding placing large data traffic on queues
24
Data Integration for Oracle Applications
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Integration for Oracle Applications Solution: Integrated Application Data in Both Batch and Real-Time
Solution • Oracle Apps Unlimited to Fusion Migrations
• E-Business Suite Application Database Migrations
• E-Business Suite Operational Reporting
• Oracle BI Applications utilizing Oracle Data Integrator
• Siebel CRM Zero Downtime App Upgrades
• ATG Active-Passive or Active-Active
• JDE Edwards Zero Downtime App Upgrades
• PeopleSoft Real-Time Integrated Operational Reporting
Benefits • Trusted, pre-built, certified solutions
• Right tools for the job
• Improved report generation times
• Improved performance on transaction systems
25
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Integration with Oracle Coherence
Oracle TopLink
Tight Integration with Oracle Coherence
enables real-time updates to Coherence
cache
Refreshes invalidated object in the Coherence cache when the database is directly modified
Coherence users can access real-time data without any changes required to the source system
Oracle Coherence Grid Edition 12.1.2
Oracle & Non-Oracle Database(s)
Capture C
oh
ere
nc
e
Ad
ap
ter
Trail
Files
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Next Major Disruptive Forces
Oracle Company Confidential
Data Self Service Big Data Reservoir Devices & Things Virtualization
Cloud Affecting everything, the location of the data or the data processing
can be anywhere in public or private cloud data centers
Bringing automation and simplicity to data
movement, sandboxing, and preparation
Enterprise scale use of Hadoop for staging,
storage and manipulation of all types of data
Integrating data that originates from devices,
things and any other event sources on the
network
Enabling data consumers to access and manipulate
data regardless of its physical location
…from IT led ETL to: …from ELT w/SQL to: …from SQL Logs to: …from Federation to:
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What Will Transform Data Integration Solutions in the Future?
28
GOV
Big Data Reservoir
Cloud Data Governance
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 29
GOV
Big Data Reservoir
Cloud Data Governance
What Will Transform Data Integration Solutions in the Future?
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration – Pragmatic Solutions for Cloud
30
Cloud BI / Analytics Oracle Business Intelligence Cloud
uses Oracle Data Integration Oracle Data Integration also
supports non-Oracle BI/Analytics
Cloud SaaS to Mart/EDW Bring SaaS Application data into
on-premise data warehouses Synchronize reference data or
master data with SaaS Apps
Cloud Database Sync On-premise DBs to managed or
private cloud data centers Sync local databases with
Database as a Service (DBaaS)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
On-Premise
Amazon S3 Bucket
Amazon Redshift
FTP
On-Premise Apps to Heterogeneous Cloud BI/Analytics
OGG ODI
EDQ
ODI
OBIEE
Key Opportunity Provide high volume data movement
and data synchronization capabilities between on-premises and Cloud-based resources
Perform E-LT/ETL and Data Quality transformations natively on Cloud BI/Analytics platforms
Avoid using different Data Integration solutions for Cloud and on-premises deployments
31
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
On-Premise
SaaS Application Data into On-Premise BI/Analytics
ODI
EDQ
OGG
ODI
Key Opportunity
Integrate natively with on-premises resources and Cloud-based Applications such as Salesforce.com, Sales Cloud, Service Cloud or Eloqua
Offload reporting to eliminate impact on production systems
Provide high volume data movement and data synchronization capabilities for Cloud and on-premises Apps
32
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
On-Premise
Database to Database Replication in the Cloud
OGG
OGG
Private
Cloud /
Managed
Cloud
ODI
Key Opportunity
Synchronize data efficiently between on-premises databases and Oracle DBaaS
Consolidate numerous databases into a Private or Public Cloud database infrastructure
Implement an highly-available infrastructure for both on-premises and Cloud database deployments
33
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration Can Help Now
Unidirectional Query Offloading Zero-Downtime Migration Data Integration Cloud or On-Premise
Bi-Directional Active-Active for Multi-Master/HA Cloud or On-Premises
Big Data Delivery Real/Time and Batch Delivery Structured Data to Data Reservoir
Data Distribution via Messaging
Cloud Apps Integration
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration Components in the Cloud
• Oracle GoldenGate with Amazon RDS is available under the “Bring-your-own-license” model in all AWS regions
• Oracle Data Integrator is already being used internally within various Oracle Cloud Applications such as Oracle Sales Cloud (ex- FA CRM)
• Enterprise Data Quality is already being used in some of our Cloud Applications such as the Address Verification Service
• Amazon RDS supports migration and replication across several Oracle Database Editions using Oracle GoldenGate. We do not support nor prevent customers from migrating or replicating across heterogeneous databases
• Oracle Data Integrator is also an integral part of the Oracle Cloud to OBIA Connector offering
• Cloud to On-Premise App replication using OGG has also been proven at customer sites
• ODI can be installed in Cloud environments such as Oracle Cloud or Amazon EC2
• Customers are successfully using ODI with Cloud databases such as Amazon Redshift
35
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 36
Big Data Reservoir
GOV Data Governance Cloud
What Will Transform Data Integration Solutions in the Future?
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Why the word “Reservoir?”
37
https://blogs.oracle.com/bigdata/entry/big_data_and_analytic_top
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
True Hadoop Opportunity: Big Data Reservoir
38
Deep Data Storage
Data Preparation
Data Discovery
Data staged / merged in
Hadoop to provide single place
to explore/discover data
External data staging and long
running batch jobs run in Hadoop
to make the most of the DB
Store more raw detail data for
less cost, while keeping
aggregates in the DB
DW
Support for Exploratory Analytics
without time consuming data
modeling
Lower cost data staging and data
preparation
Lower cost storage for
questionable business data
Data Staging & Preparation
New Data Discovery
Detailed, Deep Data
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Self Service/Reservoir Solution – What is it?
39
High Level Pattern #2:
Hadoop as a pre-processing platform for staging, preparing and transforming data prior to loading the Data Warehouse
Also used for long term storage of Detail data records (vs. Summary) and other aged data
Analytics run (a) directly on Hadoop, (b) federated with DW, or (c) only on DW
High Level Pattern #3:
Hadoop as transparent backend expansion point for Detail data records (vs. Summary) and other aged data
Also used for long term storage of Detail data records (vs. Summary) and other aged data
Analytics run only on DW
Data Flow DW
Analytics Analytics Analytics
Data
Dat
a
DW
Analytics
Data (optional)
Data Flow
Analytics Analytics
High Level Pattern #1:
Also used for long term storage of Detail data records (vs. Summary) and other aged data
Analytics run (a) directly on Hadoop
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Logical Architecture – Seamless Data Integration is Crucial
40
Virtu
alis
atio
n &
Qu
ery
Fed
era
tio
n
Enterprise Performance Management
Pre-built & Ad-hoc BI Assets
Information
Services
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Data Science
Data Engines & Poly-structured sources
Content
Docs Web & Social Media
SMS
Structured Data Sources
• Operational Data
• COTS Data
• Streaming & BAM
Immutable raw data reservoir
Raw data at rest is not interpreted
Immutable modelled data. Business
Process Neutral form. Abstracted
from business process changes
Past, current and future interpretation of
enterprise data. Structured to support agile
access & navigation
Discovery Lab Sandboxes Rapid Development Sandboxes
Project based data stores
to support specific
discovery objectives
Project based data stored
to facilitate rapid content /
presentation delivery
Data Sources
Master & Reference Data Sources
Data Integration & Governance
Data Integration & Governance
DI&
G
DI&
G
DI&
G
DI&
G
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Concrete Business Value with Big Data Reservoir
41
Lower TCO for the Data
Warehouse
LoB Faster Access to
Analytic Data
New Types of Analytics for
All Data • Control the costs of the Data
Warehouse
• Massive value multipliers for Teradata and Netezza customers
• Put an end to the annual upgrade cycle
• Give analytics to the business earlier in the data lifecycle
• Avoid up front modelling overhead for Discovery
• Empower IT to focus on highest value analytics
• Run BI queries faster
• Support Exploratory Analytics directly from Hadoop
• Run Streaming Analytics from OEP, Storm, Flume etc.
• Drive new business solutions (telematics data, machine data, log data, unstructured data)
COST SPEED VALUE
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration with Hadoop
42
Sources
Oracle Data Integrator (E-LT & ETL)
Enterprise Data Quality (Profile, Cleanse, Match and De-duplicate)
Fast Load
Oracle GoldenGate (Replication)
E-LT & DQ
Enterprise Meta Data Management (Lineage, Impact Analysis and Data Provenance)
Comprehensive data integration platform designed to work with all data.
• Data Replication
– Continuous data staging into Hadoop
• Data Transformation
– Pushdown processing in Hadoop
• Data Federation
– Query Hadoop SQL via JDBC
• Data Quality
– Fix quality at the source or invoke Machine Learning in Hadoop
• Metadata Management
– Lineage and Impact Analysis w/Hadoop
Data Service Integrator (Federation)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Connectors
Data Load Oracle Loader for Hadoop
Data Access Oracle SQL Connector for
HDFS
R Analytics Oracle R Advanced Analytics
on Hadoop
Oracle Data Integrator Application Adapter for
Hadoop
XML/XQuery Oracle XQuery on Hadoop
XQuery R Client
Optimized for Hadoop: Maximise parallelism Fast performance Analyze data on Hadoop using familiar client tools
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Supports Hadoop standards
Reverse Engineer Hadoop
metadata
Check, Validate and Ensure
Data Integrity with Hadoop
Load Data into HDFS/Hive
Generate HiveQL and execute
in Hadoop
Leverage existing Hadoop
transformations
Oracle Data Integrator for Big Data Heterogeneous Integration with Hadoop Environments
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Simplifies creation of Hadoop and MapReduce code to boost productivity
Integrates big data heterogeneously via industry standards: Hadoop, MapReduce, Hive, NoSQL, HDFS
Unifies integration tooling across unstructured/semi-structured and structured data
Optimizes loading of big data to Oracle Exadata using Oracle Big Data Connectors
Engineered for running on and integrating with Oracle Big Data Appliance via Big Data Connectors
Oracle Data Integrator for Big Data
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle GoldenGate for Continuous Streaming to Hadoop
• Leverages GoldenGate & HDFS / Hive Java APIs
• My Oracle Support Documents
• HDFS – 1586210.1
• Hive – 1586188.1
• Can also integrate with Flume for delivery to HDFS • Flume – 1926867.1
Overview
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration Can Help Right Now
47
Any Sources
Staging
Temp
Prod
Files
Files
Detail
MR
MR
Oracle Data Integrator Oracle GoldenGate
Fast Load SQL
#1 – Tools not Spaghetti • “ETL 101” avoid complex, costly custom coding
#2 – Non-invasive Capture and Staging • Move data without inefficient batch extracts
#3 – Processing is Taken to the Data • No separate ETL engine needed • Eliminate unnecessary data movement • Reclaim latency and time from network overhead
#4 –Native Hadoop Execution • Choose the right Hadoop language for your use case
• HiveQL, Pig, Spark, Storm, Java/MR2, etc. • Template driven code gen keeps pace w/change on Hadoop platform
#5 – Native SQL Pushdown • Optimize some join types within the Data Warehouse
#6 – Oracle Optimized • OGG and ODI certified to run on the Oracle Appliances
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Heterogeneous Reservoir with Oracle Data Integration
48
Flume Hive on MR, Tez, Spark
Logs
OLTP DB
SQOOP
OGG
Pig on MR, Tez, Spark
ODI
SQOOP
Any DW
OGG
Spark
Oozie
OEDQ OEMM
Data Validation & Cleansing
Metadata Mgmt & Lineage
API/File
Hive/HCat, HDFS,HBase
Hive/HCat, HDFS,HBase
NoSQL
Flume
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Load to Oracle
OLH/OSCH
Red Stack Reservoir with Oracle Data Integration
49
Transform Hive
ODI
Hive/HDFS
Federate Hive/HDFS to Oracle
Big Data SQL
Oracle DB OLTP
Load from Oracle
CopyToBDA
Hive/HDFS
Federate Oracle to Hive
Query Provider for Hadoop
OGG OGG Hive/HDFS
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Engineered System for Big Data from Oracle
50
DISK
PCI
FLASH
DRAM
Warm
Data
Hottest Data
Active Data
• Engineered data platform
• ODI Data Transformation at the
speed of DRAM or the scale of
Hadoop
• Utilize each data tier for
specialized algorithms &
compression
• Speed of DRAM
• I/Os of Flash
• Cost of Disk
• Scale of Hadoop
Hadoop
DISKS Deep Data
Oracle Data Integrator
Oracle GoldenGate
Fully exploit Big Data SQL, In-Memory and No-SQL Advancements from Oracle
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Strategy
Acquire – Organize – Analyze
Oracle BI Foundation Suite
Oracle Real-Time Decisions
Endeca Information Discovery
Decide
Oracle Big Data Connectors
Oracle Data Integrator
Oracle Advanced Analytics
Oracle Database
Oracle Spatial & Graph
Stream
Oracle Event Processing
Apache Flume
Oracle GoldenGate
Oracle NoSQL Database
Cloudera Hadoop
Oracle R Distribution Oracle Big Data
SQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Streaming Reservoir with NoSQL and DIS
52
Transform (Hive, Pig/Oozie, Spark)
ODI
Federate Hive/HDFS
Big Data SQL
Oracle NoSQL
Hive/HDFS
OGG
OGG
Hive/HDFS Any DB
Sensors & Events
Hive/HDFS
OEP
Load to Oracle
OLH/OSCH
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
GoldenGate and Streaming Data
53
Sensors
Apps
Apps
Storm / Flume / Spark / Kafka / etc
Hive (high speed apply) & HBase
OGG OGG
OGG OGG
OGG OGG
OGG
OGG
Leverage DB transactions w/in realtime analytic
streams
Stage DB records for subsequent
processing
…
Open OGG APIs for capture of non-DBMS events
Non-invasive Capture and Staging
• Move data without batch extracts
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Does Big Data Integration Better
54
Dynamic Data Movement – CDC is by default, not an add-on – Least invasive on sources – Proven best performance – Native Oracle capture/apply
NoETL Engine – Take the processing to the data;
don’t move the data to the process – Leverage your data engines for the
workloads (Hadoop or SQL)
Most Heterogeneous – Leverage open source Hadoop, not
proprietary distributions – Hadoop is the Hub, not ETL tools – Open metadata standards
vs.
Batch Data Movement – Typical ETL vendors all default to batch data
movement in their reference architectures – Some can “talk the talk” but their CDC tech can’t
touch Oracle GoldenGate scale/performance
ETL Engine Must Scale Alongside Hadoop – Carefully watch how ETL engines scale out;
parallelism runs via the Engine – more H/W to buy – Map out the physical deployment architecture,
compare to ODI, the TCO difference will be clear
Proprietary Vendor Lock-in – One popular ETL vendor puts their engines at the
center of the architecture, not Hadoop – The mainframe of ETL vendors is has proprietary
features that mainly run in their own distro – A “fake free” ETL vendor sells proprietary add-ons
vs.
vs.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Does Big Data Better: Dynamic Data Movement
55
HDFS (Files)
HBase (NoSQL)
Hive / Hive Streaming (SQL)
Flume & Storm (Streaming)
Kafka (MPP Pub/Sub)
Spark Streaming (Machine Learning)
Capture Database Transactions and Deliver to Big Data in Real-Time
Ca
ptu
re
Tra
il
Ro
ute
De
live
r
Pu
mp
GoldenGate
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Does Big Data Better: Invented Pushdown Processing
56
OR
CL In
ve
stm
en
ts in E
LT
/Pu
sh
dow
n T
ech
Scripted
SQL
Stored
Procs
Warehouse
Builder
Data
Integrator
(Heterogeneous)
ODI for
Columnar
DBs
ODI for
In-Memory
DBs
ODI for
Engineered
Systems
ODI for
Hadoop
NoSQL
ODI for
Hadoop
Pig & Oozie
ODI for
Spark
ODI for …
1990’s
Eon of Scripts and PL-SQL Era of Native SQL Big Data Revolution
Oracle’s tool maturity and operational know-how for E-LT is unmatched
10x bigger footprint with E-LT than next closest competitor using “pushdown”
Simple and easy way to blend Hadoop and SQL E-LT execution from one tool
ODI for
Hadoop
Hive
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Does Big Data Better: NoETL Approach
57
One Logical Design: Many Engine Alternatives:
Data Engines: Examples: Engine I/O: Best Use:
SQL / OLTP Database
• Oracle DBMS • Any OLTP DBMS • DW Appliances
SSD / Disk based
High volumes of transformations on relational data
MapReduce • Hive / MR2 • Pig / Oozie / MR2
SSD / Disk based
Huge batch-like transformations on any data types
In Memory (SQL / Big Data)
• Oracle InMemory • Hive / Tez / YARN • Spark / YARN • Cloudera Impala
D/RAM; with various built in spill to disk approaches
Highly interactive data transformation patterns
Streaming Big Data
• Storm / YARN • Oracle Event
Processor (OEP)
D/RAM; “always on” data pipeline
Very low latency transformations
Modern design studio for simple map development
Team-based GUI Tooling for work on Enterprise projects
Integrated lifecycle and metadata management
Automated support for Changed Data Capture
SEPARATE ETL ENGINE NOT REQUIRED!
Data Integrator
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Does Big Data Better: Clear Business Benefits
58
Proven
Technology
Better
Architecture
Best for
Oracle • Unlike custom coding, a tools
based approach is proven to result in lower cost long term operations
• Oracle GoldenGate is industry standard for Data Replication
• Oracle invented E-LT Pushdown processing and is 10x more widely deployed than competitors
• Oracle GoldenGate provides the most scalable, native integration for database replication
• Oracle Data Integrator provides ultimate scalability and choice for Hadoop data transformations
• Consistent agent-based architecture avoids having multiple, incompatible engines (eg; old style ETL tools)
• Exadata – OGG and ODI are deeply integrated and are the only Replication and ETL processes certified to run on the appliance
• Big Data Appliance – deeply integrated technology part of core reference architecture
• Big Data Connectors – ODI included with core connector technologies for Hadoop
RISK SCALE COMPLETE
Heterogeneous Access
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 59
Big Data Reservoir
Cloud
GOV Data Governance
What Will Transform Data Integration Solutions in the Future?
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Core Data Governance Solution Use Cases
60
System Consolidation/ Migration
Enterprise DQ Services/ Governance
DW/BI Enablement
MDM Enablement
Application Enablement
Compliance
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Resources
61
Oracle Data Integration OracleDataintegration OracleGoldenGate ORCLGoldenGate blogs.oracle.com/dataintegration
Follow us and connect with our community