Date post: | 27-Jan-2015 |
Category: |
Technology |
Upload: | michael-rainey |
View: | 109 times |
Download: | 3 times |
www.rittmanmead.com [email protected] @rittmanmead
Real-Time Data Warehouse Upgrade - Success StoriesNick Hurt - IFPI Michael Rainey - Rittman Mead KScope 2014 - Seattle, WA
www.rittmanmead.com [email protected] @rittmanmead
Introduction
•Michael Rainey (Rittman Mead) ‣Principal Consultant ‣Oracle Data Integration expert
-GoldenGate and Oracle Data Integrator ‣Oracle ACE @mRainey
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
About Rittman Mead
•Oracle Gold partner with offices in US (Atlanta), Europe, Australia, India and South Africa
•World leading specialist partner for technical excellence, solutions delivery and innovation in Oracle BI
•Provide consulting, training, global managed services for customers around the world
•120+ consultants including 1 Oracle ACE Director, 3 Oracle ACEs and 1 Oracle ACE Associate
•All expert in Oracle BI, DW, EPM and Analytics tech
•Skills in broad range of supporting Oracle tools: OBIEE, OBIA, ODIEE, Essbase, Oracle OLAP, GoldenGate, Exadata, Endeca
•Blog : http://www.rittmanmead.com/blog/ •Twitter : @rittmanmead
www.rittmanmead.com [email protected] @rittmanmead
Introduction
•Nick Hurt (IFPI) ‣Solutions Developer at IFPI - using Oracle since 2002
@nicholas_hurt
•IFPI = International Federation of Photographic Industries‣represents the interests of recording industry worldwide‣green light for OBIEE in 2010‣required “near” real-time anti-piracy analytics‣joined in 2011 to work on delivery
www.rittmanmead.com [email protected] @rittmanmead
Agenda
•IFPI data - the good, the challenging, the ugly •Pre-upgrade ‣Environment ‣Challenges
•Overview of GoldenGate and Oracle Data Integrator •Upgrade - planning, migration steps •Post-upgrade results •Closing remarks on real-time warehousing
www.rittmanmead.com [email protected] @rittmanmead
Challenges of IFPI data
•The good ‣Seek & destroy infringing URLs
www.rittmanmead.com [email protected] @rittmanmead
Challenges of IFPI data
•The good ‣Seek & destroy infringing URLs
•The challenging ‣Velocity - 1 mil+ upserts per day ‣Volatility depth - indefinite retrospective updates ‣Large wide product dimension - 12 million rows
www.rittmanmead.com [email protected] @rittmanmead
Challenges of IFPI data
•The good ‣Seek & destroy infringing URLs
•The challenging ‣Velocity - 1 mil+ upserts per day ‣Volatility depth - indefinite retrospective updates ‣Large wide product dimension - 12 million rows
•The ugly ‣Multiple redundant updates ‣Back-dated corrections ‣Multiple sources of information (data consistency & quality)
-heavy data cleansing - identifying duplicates -inconsistencies (error-tolerant/error-correction ETL)
www.rittmanmead.com [email protected] @rittmanmead
Link Lifecycle
Time
Infringing URL Detected
t0
Link Found
Primary event
www.rittmanmead.com [email protected] @rittmanmead
Link Lifecycle
Time
Infringing URL Detected
t0
Link Found
Primary event
www.rittmanmead.com [email protected] @rittmanmead
Link Lifecycle
Time
Deleted / Matching
t0+tn
Link Correction
Optional events
Infringing URL Detected
t0
Link Found
Primary event
www.rittmanmead.com [email protected] @rittmanmead
Link Lifecycle
Time
Deleted / Matching
t0+tn
Link Correction
Cease & Desist
t1 = t0+tn
Link Actioned
Optional events
Infringing URL Detected
t0
Link Found
Primary event
www.rittmanmead.com [email protected] @rittmanmead
Link Lifecycle
Time
Deleted / Matching
t0+tn
Link Correction
Cease & Desist
t1 = t0+tn
Link Actioned
Take- down
t2 = t1+tn
Link Removed
Optional events
Infringing URL Detected
t0
Link Found
Primary event
www.rittmanmead.com [email protected] @rittmanmead
Process Flow / Dataset
Event DetectedETL
Cleansing
De-duping
SummariesDashboards
Fact table representation
Time Found Link New Unique File Unique Link Actioned Taken-down4/10/14 2:50 PM www.4shared.com/rar/-6ebvl89/Justin_Bieber_-_All_Around_The.html 1 1 0 0
4/15/14 11:44 AM www.4shared.com/mp3/-2J4lahU/Nickel_Back_-_If_Everyone_Care.htm 1 1 0 0
4/15/14 2:50 PM www.4shared.com/rar/-6ebvl89 0 1 0 0
Time
www.rittmanmead.com [email protected] @rittmanmead
Process Flow / Dataset
Fact table representation
Time Found Link New Unique File Unique Link Actioned Taken-down4/10/14 2:50 PM www.4shared.com/rar/-6ebvl89/Justin_Bieber_-_All_Around_The.html 1 1 1 1
4/15/14 11:44 AM www.4shared.com/mp3/-2J4lahU/Nickel_Back_-_If_Everyone_Care.htm 1 1 1 0
4/15/14 2:50 PM www.4shared.com/rar/-6ebvl89 0 1 1 1
4/15/14 11:01 PM www.4shared.com/mp3/-qXkFru8/Kanye_West__Jay-Z_Bingo_Player.html 1
Event Detected Summaries
Dashboards
Time
ETL upserts
www.rittmanmead.com [email protected] @rittmanmead
Pre-Upgrade Architecture
EDWOBIEE
Dashboards
ODSSubscriber Views Star Schema
ETL
OLTP (source)
Streams & CDC
OWB Mappings
www.rittmanmead.com [email protected] @rittmanmead
Asynchronous Distributed HotLog Configuration
Pre-Upgrade Architecture - Streams and Oracle CDC
www.rittmanmead.com [email protected] @rittmanmead
Pre-upgrade Challenges
•Throughput •Complex views •Recovery after VM/DB crash •Maintenance and development •Purging auditing information •Volume of redo •Oracle’s Statement of direction
www.rittmanmead.com [email protected] @rittmanmead
Oracle Data Integrator 11g
•Oracle’s strategic product for data integration •Uses ELT (Extract, Load, Transform) approach ‣No middle ETL engine necessary ‣Uses the power of the target database to perform transformations
•Supports heterogeneous data sources •Declarative design - separation of business and technical integration
•Data integrity controls create a “data firewall” •Extensible through “Knowledge Modules”
www.rittmanmead.com [email protected] @rittmanmead
ODI 11g Journalizing (CDC)
•Oracle Data Integrator Change Data Capture (CDC) delivered via Journalizing ‣Identify, capture, and deliver changes made to source data ‣Journalizing Knowledge Module (JKM) performs setup and creates infrastructure
•ODI CDC Framework ‣Capture Process - mechanism for capturing changed data from the source database (Ex. Oracle GoldenGate) ‣Journals - tables (J$) hold references to changed records and the change type (insert / update / delete) ‣Journalizing Views - (JV$, JV$D) provides access to changed data, used by IKM / LKM in mappings ‣Subscribers - used to allow consumption of changed data at different intervals, for multiple applications, etc.
www.rittmanmead.com [email protected] @rittmanmead
GoldenGate and ODI Integration
•JKM Oracle to Oracle Consistent (OGG) Knowledge Module ‣ODI Metadata used to generate GoldenGate parameter files (extract, pump, replicate) andconfiguration files ‣Delivered with ODI
•ODI CDC Framework generated ‣Staging table - replicate of source ‣J$ (journal) table - change rows
•Journalized data used in transformations (via JV$ views)
www.rittmanmead.com [email protected] @rittmanmead
GoldenGate and ODI Integration
•JKM Oracle to Oracle Consistent (OGG) Knowledge Module ‣ODI Metadata used to generate GoldenGate parameter files (extract, pump, replicate) andconfiguration files ‣Delivered with ODI
•ODI CDC Framework generated ‣Staging table - replicate of source ‣J$ (journal) table - change rows
•Journalized data used in transformations (via JV$ views)
www.rittmanmead.com [email protected] @rittmanmead
Migration Decisions / Upgrade Planning
•ODI Master repository location •GoldenGate considerations ‣Installation and configuration (RAC is trickier) ‣Classic vs Integrated capture (requires EE for both source & target) ‣How to use it? Product built for migration and/or replication ‣Naming conventions
•OWB mappings to ODI interfaces ‣Various migration approaches
•Control, Monitoring & Alerting (no free lunch) •Testing & Go-live approach
www.rittmanmead.com [email protected] @rittmanmead
Migration Steps•Migrate OLTP applications to RAC ‣GoldenGate RAC target kept in-sync during application migration
•Performance tuning & ODI KM Modifications ‣Retain existing CDC framework objects when adding new tables ‣Update column mapping in replicat ‣Remove unnecessary code in Integration Knowledge Module
•Generate GoldenGate extract, pump and replicat ‣ODI Journalizing Knowledge Module ‣Source definitions file recommended
•Migrate OWB mappings to ODI interfaces ‣3Rs: re-assess, replicate and refine existing mappings
•Test the migration ‣Run both systems in parallel and compare results ‣Trends, aggregates, row counts
www.rittmanmead.com [email protected] @rittmanmead
Micro-batch ETL
Variables to track execution status
Error handling
Recursive execution
Execute Load Plan
www.rittmanmead.com [email protected] @rittmanmead
Post-Upgrade Architecture
EDWOBIEE
Dashboards
2-node RACOLTP (source)
ODSJ$ Tables Star Schema
ETL
GoldenGate Replication
ODI Interfaces(CDC)
Control, Alerting, Monitoring
www.rittmanmead.com [email protected] @rittmanmead
Control, Alerting and Monitoring
•GoldenGate status and lag •ODI Agent monitoring •ETL throughput / health: ODI session tables •Enterprise Manager job scheduler to control ETL process •Monitoring dashboard !
!
www.rittmanmead.com [email protected] @rittmanmead
Monitoring Dashboard
Fact Table Load - Volume and Duration
ETL currently running and duration
Scheduled Job Duration
OLTP -> BI Latency
www.rittmanmead.com [email protected] @rittmanmead
Upgrade Results
•Reduced lag ‣From 5-15 minutes to <1 minute
•Stabilised fact mapping with equivalent load volumes ‣Pre-upgrade 2 mins - hours ‣Post-upgrade 10 - 25 seconds
•Reduced ETL downtime ‣2+ days p/m to minutes p/m
•Simpler to extend tables under CDC •Purging audit information <1 hour rather than days
www.rittmanmead.com [email protected] @rittmanmead
Upgrade Effects
•Faster troubleshooting & diagnosis times •Shorter maintenance & development times •Focus on performance and streamlining processes •Investigation into excessive redo volumes ‣Understanding incremental statistics
•MDM project kick-off •Contemplation of The Reference Architecture…
www.rittmanmead.com [email protected] @rittmanmead
Reference Architecture & Realtime DW
•Staging Data Layer ‣Buffers reception for right-time distribution ‣Apply business rules to make the data clean, consistent and complete ‣Retain rejected data for manual/automatic correction
www.rittmanmead.com [email protected] @rittmanmead
Reference Architecture & Realtime DW
•Staging Data Layer ‣Buffers reception for right-time distribution ‣Apply business rules to make the data clean, consistent and complete ‣Retain rejected data for manual/automatic correction
•Performance Layer ‣Dimensional model - star schema ‣Permanent & non-volatile data (traditionally speaking)
www.rittmanmead.com [email protected] @rittmanmead
Reference Architecture & Realtime DW
•Staging Data Layer ‣Buffers reception for right-time distribution ‣Apply business rules to make the data clean, consistent and complete ‣Retain rejected data for manual/automatic correction
•Performance Layer ‣Dimensional model - star schema ‣Permanent & non-volatile data (traditionally speaking)
•Something in-between… ‣Caters for deeply volatile data by persisting historic and real-time facts ‣Combines elements of staging and performance layers ‣Facilitates agile de-coupled ETL processes
www.rittmanmead.com [email protected] @rittmanmead
Real-time DW/BI - Blogged by Stewart Bryson 2011
http://www.rittmanmead.com/2011/05/real-time-bi-an-introduction/
www.rittmanmead.com [email protected] @rittmanmead
Reference A
rchitecture - Mashup
Sour
ces
Staging Data Layer Performance Layer
Foundation Layer
refresh intervalETL interval
timeframe
Tim
e
latency
Que
ry performance
www.rittmanmead.com [email protected] @rittmanmead
Reference A
rchitecture - Mashup
Sour
ces
Staging Data Layer
OLT
P
Performance LayerR
efer
ence
DW
Arc
hite
ctur
e M
ashu
p
Foundation Layer
Standard EDW (Star)
In-m
emor
y M
ater
ializ
ed V
iews
O
LAP
Temporary Structures
refresh intervalETL interval
timeframe
Tim
e
latency
Que
ry performance
www.rittmanmead.com [email protected] @rittmanmead
Reference A
rchitecture - Mashup
Sour
ces
Staging Data Layer
OLT
P
Performance LayerR
efer
ence
DW
Arc
hite
ctur
e M
ashu
p
Foundation Layer
Federated OLTP+EDW
Standard EDW (Star)
In-m
emor
y M
ater
ializ
ed V
iews
O
LAP
Temporary Structures
refresh intervalETL interval
timeframe
Tim
e
latency
Que
ry performance
www.rittmanmead.com [email protected] @rittmanmead
Reference A
rchitecture - Mashup
Sour
ces
Staging Data Layer
OLT
P
Performance LayerR
efer
ence
DW
Arc
hite
ctur
e M
ashu
p
Foundation Layer
Federated OLTP+EDW
Federated EDW+rolling hot partition
Standard EDW (Star)
In-m
emor
y M
ater
ializ
ed V
iews
O
LAP
Temporary Structures
refresh intervalETL interval
timeframe
Tim
e
latency
Que
ry performance
www.rittmanmead.com [email protected] @rittmanmead
Hybrid LayerExtreme Real-time
EDW
Reference A
rchitecture - Mashup
Sour
ces
Staging Data LayerO
LTP
vola
tility
dep
th
Performance LayerR
efer
ence
DW
Arc
hite
ctur
e M
ashu
p
Foundation Layer
Federated OLTP+EDW
Federated EDW+rolling hot partition
Standard EDW (Star)
In-m
emor
y M
ater
ializ
ed V
iews
O
LAP
Temporary Structures
refresh intervalETL interval
timeframe
Tim
e
latency
Que
ry performance
www.rittmanmead.com [email protected] @rittmanmead
Conclusion
•This was not a sales pitch!•Real-time DW/BI inevitable•Upgrade now•Share your thoughts & experiences: •[email protected] •[email protected]