1 Do You Need an ETL Tool? Ben Bor NZ Ministry of Health Ben Bor NZ Ministry of Health.

Post on 18-Jan-2018

218 views 0 download

description

3 Contents  What is ETL  ETL tools vs. ‘handcraft’ code  PL/SQL techniques

transcript

1

Do You Need an ETL Tool?

Ben BorBen BorNZ Ministry of HealthNZ Ministry of Health

2

Ben Bor

Over 20 years in IT, most of it in Information Management Oracle specialist since version 5 Involved in Business Intelligence for over 10 years Consulted the world’s largest corporations Presents regularly on Information Management Was annual Guest Lecturer at Sussex University

3

Contents What is ETL ETL tools vs. ‘handcraft’ code PL/SQL techniques

4

What is ETL

ETL = Extract, Transform and Load: Any source, target ; Built-in complex transformations

Point-to-point vs. hub-and-spoke

5

Traditional ETL

6

Our Own ETL Requirements

FlatFiles

SQL

Loader

PL/SQL PL/SQL

Data Quality

7

Travel Company Example

Aurora

CTQ

3rd PartyData

OracleFinancials

Calypso

150,000 Travel Agencies500 Groups

50 Consortia500,000 Consultants

3 million Bookings1 million Brochure Requests

400,000 Questionnaires

Brochure Reqs1 million Website

Others

Supplier35,000 Supplier Types

Employees

Australian Reservations

LoadArea

Oracle Staging Area

CleansedData

AuditReport

No existingprocess

3rd PartyMarketing

CRM

FileMaker

DQE

Business Group

ManualDataEffort

Cleansed

data

Business

Rules

ProgressReport

250,000

Estimated Volumes

Tropics

Key:

future system

existing system

feed back

8

Tools or Handcraft?

ETL Advantages: Graphic User Interface Automatic documentation Off-the-shelf set of ready-to-

use transformations Built-in scheduler Database Agnostic

Handcrafting Advantages: No limitation reuse existing code & non

ETL No specific methodology No license cost No impact on infrastructure Transportable Release & Code-

Management by script

9

Oracle ETL Facilities

External Tables Merge SQL Loader PL/SQL Database links

10

Why Use PL/SQL

Integrated environment (no installation required) Available resources Reuse code ‘snippets’ Good performance Integration with and control of the database

11

PL/SQL Tips and Techniques

1. Quality2. Techniques3. Tricks

12

Quality

13

What is Quality?

[1] “Totality of characteristics of an entity that bears on its ability to satisfy stated and implied needs.“

[The ISO 8204 definition for quality]

14

Quality 2

[2] Quality is a collection of “ilities”: Reliability - operate error free Modifiability - have enhancement changes made easily Understandability - understand the software readily Efficiency - the speed of the software Usability - use the software easily Testability - construct and execute test cases easily Portability - transport the software easily

15

Quality 3

[3] “All the things you do today in your software development, in order to bear fruit in the future.”

16

Standards & Conventions

Use meaningful namesV_Number_Of_Items_In_Array vs. i or no_itms

Distinguish between types:V_ Variablea_ ParameterC_ ConstantG_ Global constant

17

Using Packages

Central package with utilities and all output All error messages and numbers All common constants (date format etc’) Global variables Statistics data

Other packages encapsulate related logic Within package:

Procedures & functions have: Meaningful name A99_ prefix. A is the level (A highest). 99 unique ID

18

Example: procedure and variable naming

XXX_Write_Flat_File.U03_Write_Record_To_CSV(a_File_Handle,C_Field_Delim,C_Field_Separ,C_Record_Separ,RM_REFERENCE_rec.REFTYPE,RM_REFERENCE_rec.CODE, RM_REFERENCE_rec.DESCRIPTION,

To_Char(RM_REFERENCE_rec.ISDEFAULT , '9')) ;

19

TechniquesError logging Autonomous TransactionRun statisticsRelease mechanismOverloading

20

Error Logging Technique

Global variables keep key information: Record ID Run ID Location in code

Local error trapping decides severity and error code.

All error trapping passed up.

21

Error Logging Structure

TABLE ERROR_LOG( ERR_TIME DATE, ERR_NUM INTEGER,

SOURCE_URN VARCHAR2(20),SOURCE_SYSTEM_ID VARCHAR2(5),PLACE_IN_CODE VARCHAR2(64),ERR_LOCATION VARCHAR2(255),ERR_DESCRIPTION VARCHAR2(512),SEVERITY NUMBER(6) )

ERR_TIME 18-OCT-02 10:04:52ERR_NUM 1001SOURCE_URN 223010913SOURCE_SYSTEM CRSPLACE_IN_CODE In FLIP_PKG B06 ; 6(utils A08)ERR_LOCATION A08_Lookup_TypeERR_DESCRIPTION No match found for [Plan_Code] value [C3]SEVERITY 10

22

-- ===================PROCEDURE E00_write_error_log(-- ===================

a_err_num IN integer ,a_Severity IN Integer ,a_err_location IN VarChar ,a_err_description IN VarChar )

ISPRAGMA AUTONOMOUS_TRANSACTION;V_Place_In_Code DW_Process.Error_Log.Place_In_Code%Type;

BEGINV_Place_In_Code := G_Place_In_Code || '(utils ' || G_Place_In_UTILS_Code || ')' ;INSERT INTO DW_Process.Error_Log

(err_time, err_num, Severity,BOROUGH_ID, SOURCE_URN, SOURCE_SYSTEM_ID,Place_In_Code, err_location, err_description)

VALUES(sysdate, a_err_num, a_Severity,G_BOROUGH_ID, G_SOURCE_URN, G_SOURCE_SYSTEM_ID,V_Place_In_Code, a_err_location, a_err_description) ;

COMMIT ; -- commit the autonomous transaction, outside transaction is unaffected.G_Stats_Rec.TOTAL_NO_OF_ERRORS := G_Stats_Rec.TOTAL_NO_OF_ERRORS + 1 ;

-- ===================END E00_Write_Error_Log ;-- ===================

Autonomous Transaction

23

Run Statistics

G_Stats_Rec is a record with all the statistics fields Defined in the central package (therefore resident in memory) It is updated by the writing procedures (all central) It is written out at the end of the run

24

Release Mechanism

Table of ‘release notes’ Each package has C_Version constant updated each

release ‘Show_Version’ scripts display versions and notes Results shipped with each release

25

Remove Spaces

-- ===================FUNCTION A04_Remove_Spaces(-- ===================

a_Instring IN Varchar )Return Varchar

IS /*

** Removes all the spaces from a string, leaving the rest of the printable characters*/

BEGING_place_in_UTILS_code := 'A04' ; -- For use by the error trapping routine

RETURN TRANSLATE( a_Instring,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890’ || '\|,<.>/?#~@;:[{]}=+-_`¬!"£$%^&*() ','abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890’ ||'\|,<.>/?#~@;:[{]}=+-_`¬!"£$%^&*()' ) ;

-- ===================END A04_Remove_Spaces ;-- ===================

26

Strip Leading non-numerics

-- ============================FUNCTION F09_Strip_Leading_non_digits(-- ============================ a_String IN VARCHAR2 )

RETURN VARCHAR2IS /*

** Remove leading non-digits from the input.** Example: Input string: 'abcde12345edcba' ** Output string: '12345edcba' */v_string Varchar2(4000) ;v_first_digit_pos Integer ;

BEGIN-- Replace all digits by 0 v_string := Translate(a_String, '1234567890' , '0000000000') ;v_first_digit_pos := instr(v_string,'0') ;RETURN F01_Right(a_String, v_first_digit_pos ) ;

-- ============================END F09_Strip_Leading_non_digits;-- ============================

27

Overloading

-- =======================PROCEDURE U03_Write_Record_To_CSV(-- =======================

a_File_HandleIN utl_file.file_type ,a_Field_DelimIN VarChar , -- the quotes, for CSVa_Field_SeparIN VarChar , -- the comma , for CSVa_Record_Separ IN VarChar , -- the Carriage Return + Line feed , for CSVa_String1 IN VarChar := G_default_Value ,a_String2 IN VarChar := G_default_Value ,a_String3 IN VarChar := G_default_Value ,...)

ISBEGIN

IF a_String1 = G_default_Value THEN GOTO End_Of_Record ; END IF ;U02_Write(a_File_Handle, a_Field_Delim || a_String1 || a_Field_Delim) ;

IF a_String2 = G_default_Value THEN GOTO End_Of_Record ; END IF ;U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String2 || a_Field_Delim ) ;

IF a_String3 = G_default_Value THEN GOTO End_Of_Record ; END IF ;U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String3 || a_Field_Delim ) ;...

<<End_Of_Record>>U01_Write_Line(a_File_Handle, a_Record_Separ) ;

-- =======================END U03_Write_Record_To_CSV ;

----------------------------------------------------------------------------------------------------------------------------------------------------------------- =======================

28

Summary

ETL or PL/SQL? Your choice. Consider:

Overall cost ‘Politics’ Convenience Portability Speed of development Reusability

IF PL/SQL : ensure Quality

29

Thank you !

30

31

Thank you !

I can be contacted at ben_bor@moh.govt.nz