Date post: | 18-Jan-2018 |
Category: |
Documents |
Upload: | virgil-armstrong |
View: | 218 times |
Download: | 0 times |
1
Do You Need an ETL Tool?
Ben BorBen BorNZ Ministry of HealthNZ Ministry of Health
2
Ben Bor
Over 20 years in IT, most of it in Information Management Oracle specialist since version 5 Involved in Business Intelligence for over 10 years Consulted the world’s largest corporations Presents regularly on Information Management Was annual Guest Lecturer at Sussex University
3
Contents What is ETL ETL tools vs. ‘handcraft’ code PL/SQL techniques
4
What is ETL
ETL = Extract, Transform and Load: Any source, target ; Built-in complex transformations
Point-to-point vs. hub-and-spoke
5
Traditional ETL
6
Our Own ETL Requirements
FlatFiles
SQL
Loader
PL/SQL PL/SQL
Data Quality
7
Travel Company Example
Aurora
CTQ
3rd PartyData
OracleFinancials
Calypso
150,000 Travel Agencies500 Groups
50 Consortia500,000 Consultants
3 million Bookings1 million Brochure Requests
400,000 Questionnaires
Brochure Reqs1 million Website
Others
Supplier35,000 Supplier Types
Employees
Australian Reservations
LoadArea
Oracle Staging Area
CleansedData
AuditReport
No existingprocess
3rd PartyMarketing
CRM
FileMaker
DQE
Business Group
ManualDataEffort
Cleansed
data
Business
Rules
ProgressReport
250,000
Estimated Volumes
Tropics
Key:
future system
existing system
feed back
8
Tools or Handcraft?
ETL Advantages: Graphic User Interface Automatic documentation Off-the-shelf set of ready-to-
use transformations Built-in scheduler Database Agnostic
Handcrafting Advantages: No limitation reuse existing code & non
ETL No specific methodology No license cost No impact on infrastructure Transportable Release & Code-
Management by script
9
Oracle ETL Facilities
External Tables Merge SQL Loader PL/SQL Database links
10
Why Use PL/SQL
Integrated environment (no installation required) Available resources Reuse code ‘snippets’ Good performance Integration with and control of the database
11
PL/SQL Tips and Techniques
1. Quality2. Techniques3. Tricks
12
Quality
13
What is Quality?
[1] “Totality of characteristics of an entity that bears on its ability to satisfy stated and implied needs.“
[The ISO 8204 definition for quality]
14
Quality 2
[2] Quality is a collection of “ilities”: Reliability - operate error free Modifiability - have enhancement changes made easily Understandability - understand the software readily Efficiency - the speed of the software Usability - use the software easily Testability - construct and execute test cases easily Portability - transport the software easily
15
Quality 3
[3] “All the things you do today in your software development, in order to bear fruit in the future.”
16
Standards & Conventions
Use meaningful namesV_Number_Of_Items_In_Array vs. i or no_itms
Distinguish between types:V_ Variablea_ ParameterC_ ConstantG_ Global constant
17
Using Packages
Central package with utilities and all output All error messages and numbers All common constants (date format etc’) Global variables Statistics data
Other packages encapsulate related logic Within package:
Procedures & functions have: Meaningful name A99_ prefix. A is the level (A highest). 99 unique ID
18
Example: procedure and variable naming
XXX_Write_Flat_File.U03_Write_Record_To_CSV(a_File_Handle,C_Field_Delim,C_Field_Separ,C_Record_Separ,RM_REFERENCE_rec.REFTYPE,RM_REFERENCE_rec.CODE, RM_REFERENCE_rec.DESCRIPTION,
To_Char(RM_REFERENCE_rec.ISDEFAULT , '9')) ;
19
TechniquesError logging Autonomous TransactionRun statisticsRelease mechanismOverloading
20
Error Logging Technique
Global variables keep key information: Record ID Run ID Location in code
Local error trapping decides severity and error code.
All error trapping passed up.
21
Error Logging Structure
TABLE ERROR_LOG( ERR_TIME DATE, ERR_NUM INTEGER,
SOURCE_URN VARCHAR2(20),SOURCE_SYSTEM_ID VARCHAR2(5),PLACE_IN_CODE VARCHAR2(64),ERR_LOCATION VARCHAR2(255),ERR_DESCRIPTION VARCHAR2(512),SEVERITY NUMBER(6) )
ERR_TIME 18-OCT-02 10:04:52ERR_NUM 1001SOURCE_URN 223010913SOURCE_SYSTEM CRSPLACE_IN_CODE In FLIP_PKG B06 ; 6(utils A08)ERR_LOCATION A08_Lookup_TypeERR_DESCRIPTION No match found for [Plan_Code] value [C3]SEVERITY 10
22
-- ===================PROCEDURE E00_write_error_log(-- ===================
a_err_num IN integer ,a_Severity IN Integer ,a_err_location IN VarChar ,a_err_description IN VarChar )
ISPRAGMA AUTONOMOUS_TRANSACTION;V_Place_In_Code DW_Process.Error_Log.Place_In_Code%Type;
BEGINV_Place_In_Code := G_Place_In_Code || '(utils ' || G_Place_In_UTILS_Code || ')' ;INSERT INTO DW_Process.Error_Log
(err_time, err_num, Severity,BOROUGH_ID, SOURCE_URN, SOURCE_SYSTEM_ID,Place_In_Code, err_location, err_description)
VALUES(sysdate, a_err_num, a_Severity,G_BOROUGH_ID, G_SOURCE_URN, G_SOURCE_SYSTEM_ID,V_Place_In_Code, a_err_location, a_err_description) ;
COMMIT ; -- commit the autonomous transaction, outside transaction is unaffected.G_Stats_Rec.TOTAL_NO_OF_ERRORS := G_Stats_Rec.TOTAL_NO_OF_ERRORS + 1 ;
-- ===================END E00_Write_Error_Log ;-- ===================
Autonomous Transaction
23
Run Statistics
G_Stats_Rec is a record with all the statistics fields Defined in the central package (therefore resident in memory) It is updated by the writing procedures (all central) It is written out at the end of the run
24
Release Mechanism
Table of ‘release notes’ Each package has C_Version constant updated each
release ‘Show_Version’ scripts display versions and notes Results shipped with each release
25
Remove Spaces
-- ===================FUNCTION A04_Remove_Spaces(-- ===================
a_Instring IN Varchar )Return Varchar
IS /*
** Removes all the spaces from a string, leaving the rest of the printable characters*/
BEGING_place_in_UTILS_code := 'A04' ; -- For use by the error trapping routine
RETURN TRANSLATE( a_Instring,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890’ || '\|,<.>/?#~@;:[{]}=+-_`¬!"£$%^&*() ','abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890’ ||'\|,<.>/?#~@;:[{]}=+-_`¬!"£$%^&*()' ) ;
-- ===================END A04_Remove_Spaces ;-- ===================
26
Strip Leading non-numerics
-- ============================FUNCTION F09_Strip_Leading_non_digits(-- ============================ a_String IN VARCHAR2 )
RETURN VARCHAR2IS /*
** Remove leading non-digits from the input.** Example: Input string: 'abcde12345edcba' ** Output string: '12345edcba' */v_string Varchar2(4000) ;v_first_digit_pos Integer ;
BEGIN-- Replace all digits by 0 v_string := Translate(a_String, '1234567890' , '0000000000') ;v_first_digit_pos := instr(v_string,'0') ;RETURN F01_Right(a_String, v_first_digit_pos ) ;
-- ============================END F09_Strip_Leading_non_digits;-- ============================
27
Overloading
-- =======================PROCEDURE U03_Write_Record_To_CSV(-- =======================
a_File_HandleIN utl_file.file_type ,a_Field_DelimIN VarChar , -- the quotes, for CSVa_Field_SeparIN VarChar , -- the comma , for CSVa_Record_Separ IN VarChar , -- the Carriage Return + Line feed , for CSVa_String1 IN VarChar := G_default_Value ,a_String2 IN VarChar := G_default_Value ,a_String3 IN VarChar := G_default_Value ,...)
ISBEGIN
IF a_String1 = G_default_Value THEN GOTO End_Of_Record ; END IF ;U02_Write(a_File_Handle, a_Field_Delim || a_String1 || a_Field_Delim) ;
IF a_String2 = G_default_Value THEN GOTO End_Of_Record ; END IF ;U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String2 || a_Field_Delim ) ;
IF a_String3 = G_default_Value THEN GOTO End_Of_Record ; END IF ;U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String3 || a_Field_Delim ) ;...
<<End_Of_Record>>U01_Write_Line(a_File_Handle, a_Record_Separ) ;
-- =======================END U03_Write_Record_To_CSV ;
----------------------------------------------------------------------------------------------------------------------------------------------------------------- =======================
28
Summary
ETL or PL/SQL? Your choice. Consider:
Overall cost ‘Politics’ Convenience Portability Speed of development Reusability
IF PL/SQL : ensure Quality
29
Thank you !
30