Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | dale-griffin |
View: | 216 times |
Download: | 1 times |
My experience building a custom ETL system
Problems, solutions and Oracle quirks
orHow scary Oracle can look for a Java developer
Agenda
• WHY do we need an ETL?• HOW it works• Experience:– task– existing solutions?– problem or Oracle quirk– my solution
WHY do we need an ETL?
• OLTP - in most cases calculation is non-trivial:– SQLs grow in size & complexity– increased maintenance effort– poor SQL performance
• business values calculation is implemented in most of reports independently - no code reuse:– maintenance effort is multiplied by number of reports– copy-paste-driven development
WHY do we need an ETL?
OLTP-like RC• 46 lines• 7 joins• 3 levels
after RADIO• 13 lines• 3 tables• 1 level
Balance / UPL calculation:
Streams• Propagate changes to RC - replica tables
Triggers• Collect transaction info• Collect changes – change log tables
Processing routines• Read changes• Analyze transaction info• Denormalize, calculate additional fields• Insert into denormalized tables• O!
RADIO: Data flow
Routines:• FLAG – extract rows/entities
for each event in transaction, sort
• MAIN – given event rows, run actions
• SNAP – when we have all TX from OLTP snap – calculate appropriate DN snap
Radio: Data flow
CLOG tables
FLAG job
MAIN job
DN tables
TX info
Action
CLDao
Dao
DNDao
PLSQL code generator
>600kB of pl/sql code:• TX element row create type as object• resulting DN row create type as object• action for each event type create type as object
Code maintenance is pain use higher level language !!
JPLSQL = java+pl/sql: jsp-like parser for producing pl/sql.• XML-based DB structure• XML-based flag/action mapping• power of Java
Streams, Triggers & CLOGs
• after trigger• my equals• duplicate scn
After trigger
• we keep TX apply state in package variables• before trigger is invoked• SUDDENLY!, transaction is rolled back –
package variables stay altered!
Use only after triggers
My equals
We need to filter changes, that happened in columns we don’t collect. But what we do with Oracle’s null ?
nvl(:new.val = :old.val, :new.val is null and :old.val is null)
Simple inline in JPLSQL.
Streams duplicate SCN
Ingredients:• several sessions• several tables• Streams replication
Apply process can produce message with same SCN.Oracle BUG ID: ???
Processing
• Job control• FLAG MAIN communication• MAIN
Processing: JOB control
• identification – how do we know a job is running ?• communication – how do we communicate a job ?– dbms_alert has implicit commits– dbms_pipe is not compatible with RAC
• sleep – conditioned wait
Processing: FLAG
•250 lines of SQL•300 lines of explain plan•1 kTX p/second
Processing: FLAG
FLAG computes:– table of
• table of– table of
» number
– event types
• event occasions– table index for this event
» TX row id
3-dimensional table problems:• ordering (no order by in collect statement in 10g)• storing – nested table doesn’t preserve ordering
FLAG job
MAIN jobTX info
Processing: MAIN
What is different from Java• object has default constructors – very useful for bulk creation• encapsulation is bad - package method access is slower, than variable access• reading from package variable is much, much faster, than reading from tables
cache everything
What is very different from Java• object/record assignment works by value, not by reference
Processing: MAIN
Processing: MAIN
Java-like toString:• get all object fields using user_source view• execute immediate …• very useful for debugging
Processing: MAIN
Tom Kyte’s “when others” rule exception:• we really want to catch all kind of errors:
– infrastructure logic– business logic constraints– Oracle internal errors
• we really want to stop after any error
Post-processing: Deployer
Relieves system engineers from deployment paint• Read installation bundle• Read DB objects• Compute difference• Build patch
Each object type has it’s own:• create / change statement syntax• system view structure
Post-processing: Deployer
Oracle has object dependencies:• pl/sql depends on tables• tables depend on user types• user types depend on their parent types
Misc
• SNAP – Trade/RC scn mapping• datatest – xmlforest, emails• very slow dbms_output retrieval
Questions ?