+ All Categories
Home > Documents > Etl Report

Etl Report

Date post: 01-Mar-2018
Category:
Upload: jurguen-zambrano
View: 225 times
Download: 0 times
Share this document with a friend
40
Evaluating ETL and Data Integration Platforms  Wayne Eckerson and Col in Whit e REPORT SERIES
Transcript
Page 1: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 1/40

Evaluating

ETL and DataIntegration

Platforms

 Wayne Eckerson and Colin White

R E P O R T S E R I E S

Page 2: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 2/40

Research Sponsors

Business Objects

DataMirror Corporation

Hummingbird Ltd

Informatica Corporation

Evaluating ETL and Data Integration Platforms

Acknowledgements

TD W I w ould like to thank m any people w ho contributed to this report. First, w e appreciate

the m any users w ho responded to our survey, especially those w ho responded to our requests

for phone interview s. Second, w e’d like to thank Steve Tracy and Pieter M im no w ho review ed

draft m anuscripts and provided feedback, as w ell as our report sponsors w ho review ed outlines,

survey questions, and report drafts. Finally, w e w ould like to recognize TD W I’s production

team : D enelle H anlon, Theresa Johnston, M arie M cFarland, and D onna Padian.

About TDWIThe D ata W arehousing Institute (TD W I), a division of 101com m unications LLC, is the prem ier

provider of in-depth, high-quality education and training in the business intelligence and data

w arehousing industry. TD W I supports a w orldw ide m em bership program , quarterly educational

conferences, regional sem inars, onsite courses, leadership aw ards program s, num erous print

and online publications, and a public and private (M em bers-only) W eb site.

This special report is the property of The D ata W arehousing Institute (TD W I) and is m ade available to a restricted

num ber of clients only upon these term s and conditions. TD W I reserves all rights herein. Reproduction ordisclosure in w hole or in part to parties other than the TD W I client, w ho is the original subscriber to this report,is perm itted only w ith the w ritten perm ission and express consent of TD W I. This report shall be treated at alltim es as a confidential and proprietary docum ent for internal use only. The inform ation contained in the report isbelieved to be reliable but cannot be guaranteed to be correct or com plete.

For m ore inform ation about this report or its sponsors, and to view the archived report W ebinar, please visit:w w w .dw -institute.com /etlreport/.

© 2003 by 101com m unications LLC . All rights reserved. Printed in the U nited States. The D ata W arehousingInstitute is a tradem ark of 101com m unications LLC. O ther product and com pany nam es m entioned herein m aybe tradem arks and/or registered tradem arks of their respective com panies. The D ata W arehousing Institute is adivision of 101com m unications LLC based in Chatsw orth, CA .

Page 3: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 3/40

Scope, Methodology, and Demographics . . . . . . . . . . .3

Executive Summary: The Role of ETL in BI . . . . . . . . . .4

ETL in Flux  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

 The Evolution of ETL . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

Framework and Components   . . . . . . . . . . . . . . . . . . .5

A Quick History   . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6

Code Generation Tools . . . . . . . . . . . . . . . . . .6

Engine-Based Tools . . . . . . . . . . . . . . . . . . . . . . .6

Code Generators versus Engines  . . . . . . . . . . .7

Database-Centric ETL . . . . . . . . . . . . . . . . . . . . .8

Data Integration Platforms  . . . . . . . . . . . . . . . . . . . . .9

ETL versus EAI  . . . . . . . . . . . . . . . . . . . . . . . . . .9

ETL Trends and Requirements . . . . . . . . . . . . . . . . . . .10

Large Volumes of Data  . . . . . . . . . . . . . . . . . . .10

Diverse Data Sources . . . . . . . . . . . . . . . . . .10Shrinking Batch Windows  . . . . . . . . . . . . . . . .11

Operational Decision Making  . . . . . . . . . . . . .11

Data Quality Add-Ons . . . . . . . . . . . . . . . . . . . .12

M eta Da ta M anag emen t . . . . . . . . . . . . . . . . . . . . . . . .13

Packaged Solutions  . . . . . . . . . . . . . . . . . . . . . . . . .14

Enterprise Infrastructure  . . . . . . . . . . . . . . . . .14

SUMMARY  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

Build or Buy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

Why Buy?  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16

Maintaining Custom Code  . . . . . . . . . . . . . . . .16

Why Build?  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

Build and Buy  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

User Satisfac tion w ith ETL Tools   . . . . . . . . . . . . . . .18

Challenges in Deploying ETL  . . . . . . . . . . . . . . . . . .19

Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

 THE DATA WAREHOUSING INSTITUTE  www.dw-institute.com 1

Table of Contents

Recommendations  . . . . . . . . . . . . . . . . . . . . . . . . . .21

Buy If You … . . . . . . . . . . . . . . . . . . . . . . . . . . .21

Build If You …  . . . . . . . . . . . . . . . . . . . . . . . . .22

Data Integration Platforms . . . . . . . . . . . . . . . . . . . . . .22

Platform Characteristics . . . . . . . . . . . . . . . . . .23

Data Integration Characteristics  . . . . . . . . . . .23

High Performance and Scalability   . . . . . . . . . . . . . .23

Built-In Data Cleansing and Profiling   . . . . . . . . . . . .23

Complex, Reusable Transforma tions  . . . . . . . . . . . . .24

Reliable Operations and Robust Administr ation   . . . .25

Diverse Source a nd Target Systems  . . . . . . . . . . . . .25

Update and Capture Utilities  . . . . . . . . . . . . . . . . . . .26

Global M eta Data M anagement  . . . . . . . . . . . . . . . .27

SUMMARY  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28

ETL Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . .28 Available Resources  . . . . . . . . . . . . . . . . . . . . .29

Vendor Attributes  . . . . . . . . . . . . . . . . . . . . . . . . . . .29

Overall Product Considerations   . . . . . . . . . . . . . . . .30

Design Features  . . . . . . . . . . . . . . . . . . . . . . . . . . . .30

M eta Data Management Features  . . . . . . . . . . . . . .31

Transfor mation Features   . . . . . . . . . . . . . . . . . . . . .31

Data Quality Features  . . . . . . . . . . . . . . . . . . . . . . . .32

Performance Features  . . . . . . . . . . . . . . . . . . . . . . .32

Extrac t and Capture Features  . . . . . . . . . . . . . . . . . .33

Load and Update Features  . . . . . . . . . . . . . . . . . . . .33

Operate and Administer Component  . . . . . . . . . . . . .34

Integrated Product Suites   . . . . . . . . . . . . . . . . . . . .34

Company Services and Pricing  . . . . . . . . . . . . . . . . .35

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36

by Wayne W. Eckerson and Colin White 

Evaluating ETL and DataIntegration Platforms

REPORT SERIES

Page 4: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 4/40

2   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

WAYNE ECKERSONis director of research for The D ata W arehousing Institute (TD W I), the

leading provider of high-quality, in-depth education and research services to data w arehousing

and business intelligence professionals w orldw ide. Eckerson oversees TD W I’s M em ber publi-

cations and research services.

Eckerson has w ritten and spoken on data w arehousing and business intelligence since 1994.

H e has published in-depth reports and articles about data quality, data w arehousing, custom er

relationship m anagem ent, online analytical processing (O LA P), W eb-based analytical tools,

analytic applications, and portals, am ong other topics. In addition, Eckerson has delivered pre-

sentations at industry conferences, user group m eetings, and vendor sem inars. H e has also

consulted w ith m any vendor and user firm s.

Prior to joining TD W I, Eckerson w as a senior consultant at the Patricia Seybold G roup, and direc-

tor of the G roup’s Business Intelligence & D ata W arehouse Service, w hich he launched in 1996.

COLIN WHITEis president of Intelligent Business Strategies. W hite is w ell know n for his in-

depth know ledge of leading-edge business intelligence, enterprise portal, database and W eb

technologies, and how they can be integrated into an IT infrastructure for building and sup-

porting the intelligent business.

W ith m ore than 32 years of IT experience, W hite has consulted for dozens of com panies

throughout the w orld and is a frequent speaker at leading IT events. H e is a faculty m em ber

at TD W I and currently serves on the advisory board of TD W I’s Business Intelligence Strategies

program . H e also chairs a portals and W eb Services conference.

W hite has co-authored several books, and has w ritten num erous articles on business intelli-

gence, portals, database, and W eb technology for leading IT trade journals. H e w rites a regularcolum n for DM Review m agazine entitled “Intelligent Business Strategies.”Prior to becom ing

an analyst and consultant, W hite w orked at IBM and Am dahl.

About the Authors

ETL- Extract, transform, and load (ETL) tools play a critical part increating data w arehouses, w hich form the bedrock of business intell i-gence (see below ). ETL tools sit at the int ersection of myriad sourceand target systems and act as a funnel to pull toget her and blend

heterogeneous data into a consistent format and m eaning andpopulate data warehouses.

Business Intelligence- TDWI uses the term "business intell igence"

or BI as an umbrella term t hat encompasses ETL, data w arehouses,reporting and analysis tools, and analytic applicati ons. BI projects turndata into information, knowledge, and plans that drive profitablebusiness decisions.

 TDWI Definitions

The TDWI Report Series is designed

to educate technical and businessprofessionals about crit ical issues inBI. TDWI’s in-depth reports offer

objective, vendor-neutral researchconsist ing of interviews w ithindustry experts and a survey of BIprofessionals worldwide. TDWI in-depth reports are sponsored byvendors who collectively wish to

evangelize a discipline w ithin BI oran emerging approach or technology.

About the TDWI Report Series

Page 5: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 5/40

26%

5%

Corporate IT professional (63%)

Systems integrator or external consultant (26%)

Business sponsor or business user (5%)

63%

6%

Other (6%)

Less than $10 million (15%)

$10 billion +(15%)

$1 billion to $10 billion (28%)

$100 million to $1 billion (25%)

$10 million to $100 million (18%)

15%15%

28%

25%

18%

USA (62%)

Scandinavian countries (5%)

India (4%)

Canada (6%)

18%

62%

6%

4%

5%3%2%

United Kingdom (3%)

Other (18%)

Mexico (2%)

9%8%

10%

7%

5%

7%

13%13%

6%

5%3%

14%

Federal government

Education

 TelecommunicationsHealthcare

Insurance

State/local governmentOther industries

Consulting/professional services

Retail/wholesale/distributionManufacturing (non-computer)

Financial services

0 3 6 9 12 15

Software/Internet

Demographics

Position

Company Revenues

Industry

6%

Corporate (51%)

Business Unit/division (21%)

Department/functional group (21%)

51%

21%

21%

Does not apply (6%)

OrganizationStructure

Report Scope.This report examines

the current and future state of ETL

tools. It describes how business

requirements are fueling the creation

of a new generation of products called

data integration platforms. It examines

the pros and cons of building versus

buying ETL functionality and assesses

the challenges and success factors

involved wi th im plementing ETL tools.

I t f in ishes by providing a l ist of

evaluation criteria to assist

organizations in selecting ETLproducts or validating current ones.

Methodology.The research for this

report was conducted by interviewing

industry experts, including consultants,

industry analysts, and IT professionals,

who have implemented ETL tools.

The research is also based on a survey

of 1000+ business intelligence

professionals that TDWI conducted

in November 2002.

Survey Methodology.TDWI

received 1,051 responses to thesurvey. Of these, TDWI qualified 741

respondents who had both deployed

ETL functionality and were either IT

professionals or consultants at end-

user organizations. Branching logic

accounts for the variation in the

number of respondents to each

question. For example, respondents

w ho “built” ETL programs w ere

asked a few dif ferent questions from

those who “ bought” vendor ETL

tools. M ult i-choice questions and

rounding techniques account fortotals that don’t equal 100 percent.

Survey Demographics.Mo st

respondents were corporate IT

professionals who work at large U.S.

companies in a range of industries.

(See the ill ustrations on this page

for breakouts.)

Scope and Methodology

 THE DATA WAREHOUSING INSTITUTE  www.dw-institute.com 3

Scope, Methodology, and Demographics

Country

Page 6: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 6/40

4   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

Executive Summary: The Role of ETL in Business Intelligence

ETL is the heart and soul of business intelligence (BI). ETL processes bring together and com -

bine data from m ultiple source system s into a data w arehouse, enabling all users to w ork off a

single, integrated set of data—a single version of the truth. The result is an organization that

no longer spins its w heels collecting data or arguing about w hose data is correct, but one that

uses inform ation as a key process enabler and com petitive w eapon.

In these organizations, BI system s are no longer ni ce to have , but essential to success. These

system s are no longer stand-alone and separate from operational processing—they are inte-

grated w ith overall business processes. As a result, an effective BI environm ent based on inte-

grated data enables users to m ake strategic, tactical, and operational decisions that drive the

business on a daily basis.

Why ETL is Hard. According to m ost practitioners, ETL design and developm ent w ork con-

sum es 60 to 80 percent of an entire B I project. W ith such an inordinate am ount of resources

tied up in ETL w ork, it behooves BI team s to optim ize this layer of their BI environm ent.

ETL is so tim e consum ing because it involves the unenviable task of re-integrating the enter-

prise’s data from scratch. O ver the span of m any years, organizations have allow ed their busi-

ness processes to dis-integrate into dozens or hundreds of local processes, each m anaged by a

single fiefdom (e.g., departm ents, business units, divisions) w ith its ow n system s, data, and

view of the w orld.

W ith the goal of achieving a single version of truth, business executives are appointing BI

team s to re-integrate w hat has taken years or decades to undo. Equipped w ith ETL and m od-

eling tools, BI team s are now expected to sw oop in like conquering heroes and rescue the

organization from inform ation chaos. O bviously, the challenges and risks are daunting.

Mitigating Risk. ETL tools are perhaps the m ost critical instrum ents in a B I team ’s toolbox.

W hether built or bought, a good ET L tool in the hands of an experienced ET L developer can

speed deploym ent, m inim ize the im pact of system s changes and new user requirem ents, and

m itigate project risk. A w eak ETL tool in the hands of an untrained developer can w reak

havoc on BI project schedules and budgets.

ETL in FluxG iven the dem ands placed on ET L and the m ore prom inent role that BI is playing in corpora-

tions, it is no w onder that this technology is now in a state of flux.

More Complete Solutions. O rganizations are now pushing ETL vendors to deliver m ore com -

plete B I “solutions.”Prim arily, this m eans handling additional back-end data m anagem ent and

processing responsibilities, such as providing data profiling, data cleansing, and enterprise

m eta data m anagem ent utilities. A grow ing num ber of users also w ant BI vendors to deliver acom plete solution that spans both back-end data m anagem ent functions and front-end report-

ing and analysis applications.

Better Throughput and Scalability. They also w ant ETL tools to increase throughput and per-

form ance to handle exploding volum es of data and shrinking batch w indow s. Rather than

refresh the entire data w arehouse from scratch, they w ant ETL tools to capture and update

changes that have occurred in source system s since the last load.

ETL: The Heart andSoul of BI

ETL Work Consumes 60to 80 Percent of all

BI Projects

ETL Tools Are the MostImportant in a BI

 Toolbox

Users Seek Integrated Toolsets

ETL Tools MustProcess More Data inLess Time

Page 7: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 7/40

 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com 5

The Evolu tio n of ETL

More Sources, Greater Complexity, Better Administration.ETL tools also need to handle a

w ider variety of source system data, including W eb, XM L, and packaged applications. To inte-

grate these diverse data sets, ETL tools m ust also handle m ore com plex m appings and trans-

form ations, and offer enhanced adm inistration to im prove reliability and speed deploym ents.

“Near-Real-Time” Data.Finally, ETL tools need to feed data w arehouses m ore quickly w ithm ore up-to-date inform ation. This is because batch processing w indow s are shrinking and

business users w ant integrated data delivered on a tim elier basis (i.e., the previous day, hour,

or m inute) so they can m ake critical operational decisions w ithout delay.

Clearly, the m arket for ETL tools is changing and expanding. In response to user requirem ents,

ETL vendors are transform ing their products from single-purpose ETL products into m ulti-pur-

pose data in tegration platforms . BI professionals need help understanding these m arket and

technology changes as w ell as how to leverage the convergence of ETL w ith new technologies

to optim ize their BI architectures and ensure a healthy return on their ETL investm ents.

The Evolution of ETL

Framew ork and Components 

Before w e exam ine the future of ETL, w e w ill define the m ajor com ponents in an ETL fram ew ork.

ETL stands for extract, tran sform , and load . That is, ETL program s periodically extract data

from source system s,transform the data into a com m on form at, and then load the data into

the target data store, usually a data w arehouse.

As an acronym , how ever, ETL only tells part of the story. ETL tools also com m only move or

transport data betw een sources and targets,document how data elem ents change as they

m ove betw een source and target (i.e., m eta data),exchange this m eta data w ith other applica-

tions as needed, and administer all run-tim e processes and operations (e.g., scheduling, error

m anagem ent, audit logs, and statistics). A m ore accurate acronym m ight be EM TD LEA!

The diagram on page 5 depicts the m ajor com ponents involved in ETL processing. The follow -

ing bullets describe each com ponent in m ore detail:

Users Want TimelierData

Do We Need a NewAcronym?

Illustration S1. The diagram shows the core components of an ETL product. Courtesy of Intelligent Business Strategies.

Extract

Administration& operations

services

 Transport services

Load

Meta dataimport/export

Databases& files

Source adapters

 Transform

Runtimemeta dataservices

 Target adapters

Designmanager

Legacyapplications

Databases & files

Meta datarepository

ETL Processing Framework

Page 8: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 8/40

6   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

•Design manager: Provides a graphical m apping environm ent that lets developers

define source-to-target m appings, transform ations, process flow s, and jobs. The designs

are stored in a m eta data repository.

•Meta data management: Provides a repository to define, docum ent, and m anage infor-

m ation (i.e., m eta data) about the ETL design and runtim e processes. The repository

m akes m eta data available to the ETL engine at run tim e and other applications

•Extract: Extracts source data using adapters, such as O D BC, native SQ L form ats, or flat file

extractors. These adapters consult m eta data to determ ine w hich data to extract and how .

•Transform: ETL tools provide a library of transform ation objects that let developers transform

source data into target data structures and create sum m ary tables to im prove perform ance.

•Load: ETL tools use target data adapters, such as SQ L or native bulk loaders, to insert

or m odify data in target databases or files.

•Transport services: ETL tools use netw ork and file protocols (e.g., FTP) to m ove data

betw een source and target system s and in-m em ory protocols (e.g., data caches) to

m ove data betw een ETL run-tim e com ponents.

• Administration and operation: ETL utilities let adm inistrators schedule, run, and

m onitor ETL jobs as w ell as log all events, m anage errors, recover from failures, andreconcile outputs w ith source system s.

The com ponents above com e w ith m ost vendor-supplied ET L tools, but they can also be

built by data w arehouse developers. A m ajority of com panies have a m ix of packaged and

hom egrow n ETL applications. In som e cases, they use different tools in different projects, or

they use custom code to augm ent the functionality of an ETL product. The pros and cons of

building and buying ETL com ponents are discussed later in the report. (See “Build or Buy?”

on page 15.)

A Quick History 

Code Generation Tools

In the early 1990s, m ost organizations developed custom code to extract and transform datafrom operational system s and load it into data w arehouses. In the m id-1990s, vendors recog-

nized an opportunity and began shipping ETL tools designed to reduce or elim inate the labor-

intensive process of w riting custom ETL program s.

Early vendor ETL tools provided a graphical design environm ent that generated third-genera-

tion language (3G L) program s, such as a CO BO L. Although these early code-generation tools

sim plified ETL developm ent w ork, they did little to autom ate the runtim e environm ent or

lessen code m aintenance and change control w ork. O ften, adm inistrators had to m anually dis-

tribute and m anage com piled code, schedule and run jobs, or copy and transport files.

Engine-Based ToolsTo autom ate m ore of the ETL process, vendors began delivering “engine-based”products in

the m id to late 1990s that em ployed p roprietary scripting languages running w ithin an ETL

or D BM S server. These ETL engines use language interpreters to process ETL w orkflow s at

runtim e. The ETL w orkflow s defined by developers in the graphical environm ent are stored

in a m eta data repository, w hich the engine reads at runtim e to determ ine how to process

incom ing data.

Although this interpretive approach m ore tightly unifies design and execution environm ents, it

doesn’t necessarily elim inate all custom coding and m aintenance. To handle com plex or

unique requirem ents, developers often resort to coding custom routines and exits that the tool

ETL Engines InterpretDesign Rules atRuntime

Page 9: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 9/40

 THE DATA WAREHOUSING INSTITUTE  www.dw-institute.com 7

The Evolu tio n of ETL

accesses at runtim e. These user-developed routines and exits increase com plexity and m ainte-

nance, and therefore should be used sparingly.

All Processing Occurs in the Engine.Another significant characteristic of an engine-based

approach is that all processing takes place in the engine, not on source system s (although hybrid

architectures exist). The engine typically runs on a W indow s or UN IX m achine and establishesdirect connections to source system s. If the source system is non-relational, adm inistrators use a

third-party gatew ay to establish a direct connection or create a flat file to feed the ETL engine.

Som e engines sup port parallel processing, w hich enables them to process ETL w orkflow s in

parallel across m ultiple hardw are processors. O thers require adm inistrators to m anually define

a parallel processing fram ew ork in advance (using partitions, for exam ple)—a m uch less

dynam ic environm ent.

Code Generators versus EnginesThere is still a debate about w hether ETL engines or code generators, w hich have im proved

since their debut in the m id-1990s, offer the best functionality and perform ance.

Benefits of Code Generators.“The benefit of code generators is that they can handle m orecom plex processing than their engine-based counterparts,”says Steve Tracy, assistant director

of inform ation delivery and application strategy at H artford Life Insurance in Sim sbury, CT.

“M any engine-based tools typically w ant to read a record, then w rite a record, w hich lim its

their ability to efficiently handle certain types of transform ations.”

Consequently, code generators also elim inate the need for developers to m aintain user-w ritten

routines and exits to handle com plex transform ations in ETL w orkflow s. This avoids creating

“blind spots”in the tools and their m eta data.

In addition, code generators produce com piled code to run on various platform s. Com piled

code is not only fast, it also enables organizations to distribute processing across m ultiple plat-

form s to optim ize perform ance.

“Because the [code generation] product ran on m ultiple platform s, w e did not have to buy a

huge server,”says K evin Light, m anaging consultant of business intelligence solutions at ED S

Canada. “Consequently, our client’s overall capital outlay [for a code generation product] w as

less than half w hat it w ould cost them to deploy a com parable engine-based product.”

Benefits of Engines. Engines, on the other hand, concentrate all processing in a single server.

Although this can becom e a bottleneck, adm inistrators can optim ally configure the engine

platform to deliver high perform ance. Also, since all processing occurs on one m achine,

adm inistrators can m ore easily m onitor perform ance and quickly upgrade capacity if needed

to m eet system level agreem ents.

This approach also rem oves political obstacles that arise w hen ETL adm inistrators try to dis-

tribute transform ation code to run on various source system s. By offloading transform ationprocessing from source system s, ETL engines interfere less w ith critical runtim e operations.

Enhanced Graphical Development.In addition, m ost engine-based ETL products offer visual

developm ent environm ents that m ake them easier to use. These graphical w orkspaces enable

developers to create ETL w orkflow s com prised of m ultiple ETL objects for defining data m ap-

pings and transform ations. (See Illustration S2.) Although code generators also offer visual

developm ent environm ents, they often aren’t as easy to use as engine-based tools, lengthening

overall developm ent tim es, according to m any practitioners.

ETL Engines CentralizeProcessing in a High-Performance Server

Code GeneratorsEliminate the Needto Maintain CustomRoutines and Exits

Visual DevelopmentEnvironmentsHelp ManageComplex J obs

Page 10: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 10/40

8   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

Database-Centric ETLToday, several D BM S vendors em bed ETL capabilities in their D BM S products (as w ell as

O LA P and data m ining capabilities). Since these vendors offer ETL capabilities at little or no

extra charge, organizations are seriously exploring this option because it prom ises to reduce

costs and sim plify their BI environm ents.

In short, users are now asking, “W hy should w e purchase a third-party ETL tool w hen w e can

get ETL capabilities for free from our database vendor of choice? W hat are the additional ben-

efits of buying a third-party ETL tool?”

To address these questions, organizations first need to understand the level and type of ETL

processing that database vendors support. Today, there are three basic groups:

Cooperative ETL. H ere, third-party ETL tools can leverage com m on database functionality to

perform certain types of ETL processing. For exam ple, third-party ETL tools can leverage stored

procedures and enhanced SQ L to perform transform ations and aggregations in the database

w here appropriate. This enables third-party ETL tools to optim ize perform ance by exploiting

the optim ization, parallel processing, and scalability features of the D BM S. It also im proves

recoverability since stored procedures are m aintained in a com m on recoverable data store.

Complementary ETL. Som e database vendors now offer ETL functions that m irror features

offered by independent ET L vendors. For exam ple, m aterialized view s create sum m ary tables

that can be autom atically updated w hen detailed data in one or m ore associated tables

change. Sum m ary tables can speed up query perform ance by several orders of m agnitude. In

addition, som e D BM S vendors can issue SQ L statem ents that interact w ith W eb Services-basedapplications or m essaging queues, w hich are useful w hen building and m aintaining near-real-

tim e data w arehouses.

Competitive ETL. M ost database vendors offer graphical developm ent tools that exploit the ETL

capabilities of their database products. These tools provide m any of the features offered by

third-party ETL tools, but at zero cost, or for a license fee that is a fraction of the p rice of

independent tools. At present, how ever, database-centric ETL solutions vary considerably in

quality and functionality. Third-party ETL tools still offer im portant advantages, such as the

Illustration S2.Many ETL tools provide a graphical interface for mapping sources to targets and managing complex workflows.

ETL Visual Mapping Interface

“Why Purchasean ETL Tool WhenOur DatabaseBundles One?”

Page 11: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 11/40

 THE DATA WAREHOUSING INSTITUTE  www.dw-institute.com 9

The Evolu tio n of ETL

range of data sources supported, transform ation pow er, and adm inistration. Today, database-

centric ETL products are useful for building departm ental data m arts, but this w ill change over

tim e as D BM S vendors enhance their ETL capabilities.

In sum m ary, database vendors currently offer ETL capabilities that both enhance and com pete

w ith independent ET L tools. W e expect database vendors to continue to enhance their ET Lcapabilities and com pete aggressively to increase their share of the ETL m arket.

Data Integration Platforms 

D uring the past several years, business requirem ents for BI projects have expanded dram ati-

cally, placing new dem ands on ETL tools. These requirem ents are outlined in the follow ing

section (See “ETL Trends and Requi rements ”). Consequently, ETL vendors have begun to dra-

m atically change their products to m eet these requirem ents. The result is a new generation

ETL tool that TD W I calls a da ta in teg ra t ion p la t fo r m .

As a platform, this em erging generation of ETL tools provides greater perform ance, through-

put, and scalability to process larger volum es of data at higher speeds. To deal w ith shrinking

batch w indow s, these platform s also load data w arehouses m ore quickly and reliably, usingchange data capture techniques, continuous processing, and im proved runtim e operations.

The platform s also provide expanded functionality, especially in the areas of data quality,

transform ation pow er, and adm inistration.

As a data integration hub, these products connect to a broader array of databases, system s,

and applications as w ell as other integration hubs. They capture and process data in batch or

real tim e using either a hub-and-spoke or peer-to-peer inform ation delivery architecture. They

coordinate and exchange m eta data am ong heterogeneous system s to deliver a highly integrat-

ed environm ent that is easy to use and adapts w ell to change.

Solution Sets. U ltim ately, data integration platform s are designed to m eet a larger share of an

organization’s BI requirem ents. To do this, som e ETL vendors are extending their product

lines horizontally , adding data quality tools and near-real-tim e data capture facilities to pro-

vide a com plete data m anagem ent solution. O thers are extending vertically , adding analytical

tools and applications to provide a com plete B I solution.

ETL versus EAITo be clear, ETL tools are just one of several technologies vying to becom e the preem inent

enterprise data integration platform . O ne of the m ost im portant technologies is enterprise

application integration (EAI) softw are, such as BEA System s, Inc.’s W ebLogic Platform and

Tibco Softw are Inc.’s ActiveEnterprise.

EAI softw are enables developers to create real-tim e, event-driven interfaces am ong disparate

transaction system s. For exam ple, during the Internet boom , com panies flocked to EAI tools to

connect e-com m erce system s w ith back-end inventory and shipping system s to ensure thatW eb sites accurately reflected product availability and delivery tim es.

Although EAI softw are is event-driven and supports transaction-style processing, it does not

generally have the sam e transform ation pow er as an ETL tool. O n the other hand, m ost ETL

tools do not support real-tim e data processing. To overcom e these lim itations, som e ETL and

EAI vendors are now partnering to provide the best of both approaches. EAI softw are cap-

tures data and application events in real tim e and passes them to the ETL tools, w hich trans-

form the data and loads it into the B I environm ent.

Database Vendors Both

Enhance and Competewith ETL Vendors

Higher Performance,Complete Solution

ETL Vendors AreExpanding Vertically

and Horizontally

Some ETL and EAIVendors ArePartnering...

Page 12: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 12/40

10   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

Already, som e ETL vendors have begun incorporating EAI functionality into their core engines.

These tools contain adapters to operational applications and operate in an “alw ays aw ake”m ode

so they can receive and process data and events generated by the applications in real tim e.

Future Reality. It w on’t be long before all ETL and EAI vendors follow suit, integrating both

ETL and EAI capabilities w ithin a single product. The w inners in this race w ill have a substan-tial advantage in vying for delivery of a com plete, robust data integration platform .

ETL Trends and Requirements

As m entioned earlier, business requirem ents are fueling the evolution of ETL into data integra-

tion platform s. Business users are dem anding access to m ore tim ely, detailed, and relevant

inform ation to inform critical business decisions and drive fast-m oving business processes.

Large Volumes of DataM ore business users today w ant access to m ore granular data (e.g., transactions) and m ore

years of history across m ore subject areas than ever before. N ot surprisingly, data w arehouses

are exploding in size and scope. Terabyte data w arehouses—a rarity several years ago—are

now fairly com m onplace. The largest data w arehouses exceed 50 terabytes of raw data, and

the size of these data w arehouses is putting enorm ous pressure on ETL adm inistrators to

speed up ETL processing.

O ur survey indicates that the percentage of organizations loading m ore than 500 gigabytes

w ill triple in 18 m onths, w hile those loading less than 1 gigabyte w ill decrease by a third.

(See Illustration 1.)

Diverse Data SourcesO ne reason for increased data volum es is that business users w ant the data w arehouse to cull data

from a w ider variety of system s. According to our survey, on average, organizations now extract

data from 12 distinct data sources. This average w ill inexorably increase over tim e as organizationsexpand their data w arehouses to support m ore subject areas and groups in the organization.

Although alm ost all com panies use ETL to extract data from relational databases, flat files, and

legacy system s, a significant percentage now w ant to extract data from application packages,

such as SAP R/3 (39 percent), XM L files (15 percent), W eb-based data sources (15 percent),

and EAI softw are (12 percent). (See Illustration 2.)

“Spreadmarts.” In addition, m any respondents also w rote in the “other”category that they w ant

ETL tools to extract data from M icrosoft Excel and Access files. In m any com panies, critical data

ETL and EAI WillConverge

More History, MoreDetail, More SubjectAreas

Illustration 1. The average data load will increase significantly in the next 18 months. Based on 756 respondents.

 TODAY 18 MONTHS

Less than 1GB 59% 40%

1GB to 500GB 38% 50%

500GB+ 3% 10%

Eighteen-Month Plans for Data Load Volumes

On Average, DataWarehouses Cull Data

from 12 DistinctSources

Excel SpreadsheetsAre a Critical DataSource

Page 13: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 13/40

 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 11

ETL Trends & Requirements

is locked up in these personal data stores, w hich are controlled by executives or m anagers run-

ning different divisions or departm ents. Each group populates these data stores from different

sources at different tim es using different rules and definitions. As a result, these spreadsheets

becom e surrogate independent data m arts—or “spreadm arts”as TD W I calls them . The prolifer-

ation of spreadm arts precipitates organizational feuds (“dueling spreadsheets”) that can paralyze

an organization and drive a CEO to the brink of insanity.

Shrinking Batch WindowsW ith the expansion in data volum es, m any organizations are finding it virtually im possible to

load a data w arehouse in a single batch w indow . There are no longer enough hours at night

or on the w eekend to finish loading data w arehouses containing hundreds of gigabytes or

terabytes of data.

24x7 Requirements. Com pounding the problem , batch w indow s are shrinking or becom ing

non-existent. This is especially true in large organizations w hose data w arehouses and opera-

tional system s span regions or continents and m ust be available around the clock. There is no

longer any “system s dow ntim e”to execute tim e consum ing extracts and loads.

Variable Update Cycles.Finally, the grow ing num ber and diversity of sources that feed the

data w arehouse m ake it difficult to coordinate a single batch load. Each source system oper-

ates on a different schedule and each contains data w ith different levels of “freshness”or rele-

vancy for different groups of users. For exam ple, retail purchasing m anagers find little value in

sales data if they can’t view it w ithin one day, if not im m ediately. Accountants, how ever, m ay

w ant to exam ine general ledger data at the end of the m onth.

Operational Decision MakingAnother reason that ETL tools m ust support continuous extract/load processing is that users w ant

tim elier data. D ata w arehouses traditionally support strategic or tactical decision m aking based on

historical data com piled over several m onths or years. Typical strategic or tactical questions are:

•“What a re our year -over-year trend s?” 

•“How close are we to meeting thi s mon th’s revenue goals?” 

•“If we reduce pri ces, what impact wi ll i t have on our sales and inventory for the next 

thr ee months?”

Illustration 2. Types of data sources that ETL programs process. Multi -choice question, based on 755 respondents.

39%

15%

65%

15%

4%

12%

81%

89%

15%

Replication or change data capture utilities

Web

Other

EAI/messaging software

Relational databases

Packaged applications

Mainframe/legacy systems

XML

Flat files

0 20 40 60 80 100

Data Sources

More Data, Less Time

Data Has Different“Expiration Dates”

Users Want TimelierData

Page 14: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 14/40

12   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

Near-real-timeLoads Will Triplein 18 Months

Users Want ETLVendors to Offer

Data Quality Tools

Operational BI. H ow ever, business users increasingly w ant to m ake operational decisions

based on yesterday’s or today’s data. They are now asking:

•“Based on yesterday’s sales, which products shou ld we shi p to which stores at what pri ces 

to max imi ze today’s revenues?”

•“What is the most profitable way to reroute packages off a truck that broke down this mornin g?”

•“What produ cts and di scounts should I offer to the customer that I’m cu rr entl y speaking 

wi th on the phone?” 

According to our survey, m ore than tw o-thirds of organizations already load their data w are-

houses on a nightly basis. (See Illustration 3.) H ow ever, in the next 18 m onths, the num ber of

organizations that w ill load their data w arehouses m ultiple tim es a day w ill double, and the

num ber that w ill load data w arehouses in near real tim e w ill triple!

To m eet operational decision-m aking requirem ents, ETL processing w ill need to support tw o types

of processing: (1) large batch ETL jobs that involve extracting, transform ing, and loading a signifi-

cant am ount of data, and (2) continuous processing that captures data and application eventsfrom source system s and loads them into data w arehouses in rapid intervals or near real tim e.

M ost organizations w ill need to m erge these tw o types of processing, w hich is com plicated

because of the num erous interdependencies that adm inistrators m ust m anage to ensure consis-

tency of inform ation in the data w arehouse and reports.

Data Quality Add-OnsM any organizations w ant to purchase com plete solutions, not technology or tools for building

infrastructure and applications.

Add-on Products. W hen asked w hich add-on products from ETL vendors are “very im portant,”

users expressed significant interest in data cleansing and profiling tools. (See Illustration 4.)

This is not surprising since m any B I projects fail because the IT departm ent or (m ore thanlikely) an outside consultancy underestim ates the quality of source data.

Code, Load, and Explode. A com m on scenario is the “code, load, and explode”phenom enon.

This is w here ETL developers code the extracts and transform ations, then start processing only

to discover an unacceptably large num ber of errors due to unanticipated values in the source

data files. They fix the errors, rerun the ETL process, only to find m ore errors, and so on. This

ugly scenario repeats itself until the project deadlines and budgets becom e im periled and

angry business sponsors halt the project.

Illustration 3. Based on 754 respondents.

TODAY IN 18 M ONTHS

M onthly 32% 27%

W eekly 34% 29%Daily/ night ly 69% 65%

M ult iple t imes per day 15% 30%

Near real t ime 6% 19%

Data Warehouse Load Frequency

Page 15: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 15/40

 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 13

ETL Trends & Requirements

“In m y experience, 10 percent [of the tim e and cost associated w ith ETL] is learning the ETL tool the

first tim e out and m aking it conform to your w ill,”said John M urphy, an IT professional in a new s-

group posting. “N inety percent is the interm inable iterative process of learning just how little you

know about the source data, how dirty that data really is, how difficult it is to establish a cleansingpolicy that results in certifiably correct data on the target that m atches the source, and how long

it takes to re-w ork all the assum ptions you m ade at the outset w hile you w ere still an innocent.”

The problem w ith current ETL tools is that they assume the data they receive is clean and

consistent. They can’t assess the consistency or accuracy of source data and they can’t handle

specialized cleansing routines, such as nam e and address scrubbing. To m eet these needs, ETL

vendors are partnering w ith or acquiring data quality vendors to integrate specialized data

cleansing routines w ithin ETL w orkflow s.

Meta Data Management 

As the num ber and variety of source and target system s proliferate, BI m anagers are seeking

autom ated m ethods to docum ent and m anage the interdependencies am ong data elem ents inthe disparate system s. In short, they w ant a global m eta data m anagem ent solution.

As the hub of a BI project, an ETL tool is w ell positioned to docum ent, m anage, and coordi-

nate inform ation about data w ithin data m odeling tools, source system s, data w arehouses, data

m arts, analytical tools, portals, and even other ETL tools and repositories. M any users w ould

like to see ETL tools play a m ore pivotal role in m anaging global m eta data, although others

think ETL tools w ill alw ays be inherently unfit to play the role of global traffic cop.

“Certainly, ETL vendors should store m eta data for its ow n dom ain, but if they start trying to

becom e a global m eta data repository for all BI tools, they w ill have trouble,”says Tracy from

H artford Life. “I’d prefer they focus on im proving their perform ance, m anageability, im pact

analysis reports, and ability to support com plex logic.”

Enforce Data Consistency. N evertheless, large com panies w ant a global m eta data repository to

m anage and distribute “standard”data definitions, rules, and other elem ents w ithin and

throughout a netw ork of interconnected BI environm ents. This global repository could help

ensure data consistency and a “single version of the truth”throughout an organization.

Although ETL vendors often pitch their global m eta data m anagem ent capabilities, m ost tools

today fall short of user expectations. (And even single vendor all-in-one B I suites fall short.)

W hen asked in an op en-ended question, “W hat w ould m ake your ETL tool better?”m any sur-

vey respondents w rote “better m eta data m anagem ent”or “enhanced m eta data integration.”

Ill ustration 4.A significant percentage of users want ETL tools to offer data cleansing and profiling capabil ities. Based on 

740 respondents.

10%

9%

21%

24%

43%

Packaged analytic reports

Data cleansing tools

Packaged data models

Analytic tools

Data profiling tools

0 10 20 30 40 50

Rating the Importance of Add-On Products

Lack of Knowledgeabout Source DataCripples Projects

ETL Tools Assumethe Data Is Clean

Users Want ETL Toolsto Play a Pivotal Rolein Meta Data

Most ETL Tools FallShort of Expectations

Page 16: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 16/40

14   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

Packaged Solutions Can

Speed Deployment

“I don’t know if ETL tools should be used for a [BI] m eta data repository, but no one else is

stepping up to the plate,”says ED S C anada’s Light. Light says the gap betw een expectation

and reality som etim es puts him as a consultant in a tough position. “M any clients think the

ETL tools can do it all, and w e [as consultants] end up as the m eat in a sandw ich trying to

force a tool to do som ething it can’t.”

Packaged Solutions 

Besides add-on products, an increasing num ber of organizations are looking to buy com pre-

hensive, off-the-shelf packages that provide near instant insight to their data. For instance,

m ore than a third of respondents (39 percent) said they are m ore attracted to vendors that

offer ETL as part of a com plete application or analytic suite. (See Illustration 5.)

K err-M cG ee C orporation, an energy and chem icals firm in O klahom a City, purchased a pack-

aged solution to support its O racle Financials application. “W e had valid num bers w ithin aw eek, w hich is incredible,”says B rian M orris, a data w arehousing specialist at the firm . “W e

don’t have a lot of tim e so anything that is pre-built is helpful.”

A packaged approach excels, as in Kerr-M cG ee’s case, w hen there is a single source, w hich is

a packaged application itself w ith a docum ented data m odel and A PI. The package K err-

M cG ee used also supplied analytic reports, w hich the firm did not use since it already had

reports defined in another reporting tool.

Some Skepticism. H ow ever, there is skepticism am ong som e users about ETL vendors w ho w ant

to deliver end-to-end BI packages. “W e w ant ETL vendors to focus on their core com petency, not

their brand,”says Cynthia C onnolly, vice president of application developm ent at Alliance Capital

M anagem ent. “W e w ant them to enhance their ETL products, not branch off into new areas.”

Enterprise InfrastructureMulti-Faceted Usage. Finally, organizations also w ant to broaden their use of ETL to support

additional applications besides B I. (See Illustration 6.) In essence, they w ant to standardize on

a single infrastructure product to support all their enterprise data integration requirem ents.

A m ajority of organizations already use ETL tools for BI, data m igration, and conversion proj-

ects. But, a grow ing num ber w ill use ETL tools to support application integration and m aster

reference data projects in the next 18 m onths.

Illustration 5. Analytic suites. Based on 747 respondents.

 Yes (39%)

No (43%)

Not sure (18%)

39%

43%

18%

Analytic Suites

“ Does a vendor’s ability to offer ETL as part of a c omplete analytic suite make 

the vendor and its produc ts more attrac tive to your team?” 

Organizations Want aStandard DataIntegrationInfrastructure

Page 17: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 17/40

 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 15

Bui ld or Buy?

SUMMARYAs data w arehouses am ass m ore data from m ore sources, BI team s need ETL tools to process

data m ore quickly, efficiently, and reliably. They also need ETL tools to assum e a larger role in

coordinating the flow of data am ong source and target system s and m anaging data consistency

and integrity. Finally, m any w ant ETL vendors to provide m ore com plete solutions, either to

m eet an organization’s enterprise infrastructure needs or deliver an end-to-end BI solution.

Build or Buy?

First Decision. As BI team s exam ine their data integration needs, the first decision they need to

m ake is w hether to build or buy an ETL tool. D espite the ever-im proving functionality offered

by vendor ETL tools, there are still m any organizations that believe it is better to hand w rite

ETL program s than use off-the-shelf ETL softw are. These com panies point to the high cost ofm any ETL tools and the abundance of program m ers on their staff to justify their decision.

Strong Market to “Buy.” N onetheless, the vast m ajority of BI team s (82 percent) have pur-

chased a vendor ETL tool, according to our survey. O verall, 45 percent use vendor tools

exclusively, w hile 37 percent use a com bination of tools and custom ETL code. O nly 18 per-

cent exclusively w rite ETL program s from scratch. (See Illustration 7.)

Illustration 6. Besides BI, organizations want to use ETL products for a variety of data integration needs. Based on 740 responses.

40%

45%

81%

87%Data warehousing/business intelligence

Reference data integration(e.g., customer hubs)

Application integration

Data migration/conversion

0 20 40 60 80 100

 Today

In 18 months

27%

29%

10%

11%

Current and Planned Uses of ETL

Illustration 7. The vast majori ty of organizations building data warehouses have purchased an ETL tool from a vendor. Based 

on 761 responses.

Build only (18%)

Buy only (45%)

Build & buy (37%)

18%

45%

37%

Build or Buy?

Most Companies BuyETL Tools

Page 18: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 18/40

16   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

W hy Buy? 

Maintaining CustomCodeThe prim ary reason that organizations purchase vendor ETL tools is to m inim ize the tim e and

cost of developing and m aintaining proprietary code.

“W e need to expand the scale and scope of our data w arehouse and w e can’t do it by hand-

cranking code,”says Richard Bow les, data w arehouse developer at Safew ay Stores plc in

London. Safew ay is currently evaluating vendor ETL tools in preparation for the expansion of

its data w arehouse.

The problem w ith Safew ay’s custom CO BO L program s, according to Bow les, is that it takes

too m uch tim e to test each program w henever a change is m ade to a source system or target

data w arehouse. First, you need to analyze w hich program s are affected by a change, then

check business rules, rew rite the code, update the C O BO L copybooks, and finally test and

debug the revised program s, he says.

“Custom code equals high m aintenance. W e w ant our team focused on building new routines

to expand the data w arehouse, not preoccupied w ith m aintaining existing ones,”Bow les says.

Tools that Speed Deployment Save Money.M any B I team s are confident that vendor ETL tools

can better help them deliver projects on tim e and w ithin budget. These m anagers ranked

“speed of deploym ent”as the num ber one reason they decided to purchase an ETL tool,

according to our survey. (See Table 1.)

K en K irchner, a data w arehousing m anager at W erner Enterprises, Inc., in O m aha, N E, is using

an ETL tool to replace a team of outside consultants that botched the firm ’s first attem pt at

building a data w arehouse.

“O ur consultants radically underestim ated w hat needed to be done. It took them 15 m onths to

code 130 stored procedures,”says Kirchner. “A [vendor ETL] tool w ill reduce the num ber of

people, tim e, and m oney w e need to deliver our data w arehouse.”

ED S C anada’s Light says in m any situations, clients have older ETL environm ents w here the

business rules are encapsulated in CO BO L or other 3G Ls, m aking it im possible to do im pact

analysis. M oving to an engine-based tool allow s the business rules to be m aintained in a sin-

gle location w hich facilitates m aintenance.”

Table 1. Ranking of reasons for buying an ETL tool.

 To deploy ETL more quickly

It had all the functionality we need

Help us manage meta data

Build a data integration infrastructure

1

2

3

4

0 1000 2000 3000 4000 5000

Ranking Score

Reasons for Buying an ETL Tool - Ranking

Custom Code IsLaborious to Modify

Page 19: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 19/40

 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 17

Bui ld or Buy?

W hy Build? 

Cheaper and Faster. H ow ever, som e organizations believe it is cheap er and quicker to code

ET L program s than use a vendor ETL tool. (See Table 2.)

“Custom softw are is w hat saves us tim e and m oney, even in m aintenance, because it’s gearedto our business,”says M ike G albraith, director of G lobal Business System s D evelopm ent at

Tyco Electronics in H arrisburg, PA .

The key for ET L program s—or any softw are develop m ent project—is to w rite good code,

G albraith says. “W e use softw are engineering techniques. W e w rite our code from specifica-

tions and a m eta data m odel. Everything is self docum enting and that has saved us a lot.”

H ubert G oodm an, director of business intelligence at Cum m ins, Inc., a leading m aker of diesel

engines, says the cost to m aintain his group’s custom ETL code is less than the annual m ainte-

nance fee and training costs that another departm ent is paying to use a vendor ETL tool.

G oodm an’s team uses object-oriented techniques to w rite efficient, m aintainable PL/SQ L code.

To keep costs dow n and speed turnaround tim es, he outsources ETL code developm ent over-

seas to program m ers in India. (Som e practitioners say it is im practical to outsource G U I-based

ETL developm ent because of the interactive nature of such tools.)

The Hard Job of ETL.Several BI m anagers also said vendor ETL tools don’t address the really

challenging aspects of m igrating source data into a data w arehouse, specifically how to identi-

fy and clean dirty data, build interfaces to legacy system s, and deliver near-real-tim e data. For

these m anagers, the m ath doesn’t add up—w hy spend $200,000 or m ore to purchase an ETL

tool that only handles the “easy”w ork of m apping and transform ing data?

Build and Buy 

Slow Migration to ETL Packages. D espite these argum ents, there is a slow and steady m igration

aw ay from w riting custom ETL program s. M ore than a quarter (26 percent) of organizations that

use custom code for ETL w ill soon replace it w ith a vendor ETL tool. Another 23 percent are

planning to augm ent their custom code w ith a vendor ETL tool. (See Illustration 8.)

W hen asked w hy they are sw itching, m any B I m anagers said that “vendor ETL tools didn’t

exist”w hen they first im plem ented their data w arehouse. O thers said they initially couldn’t

afford a vendor ETL tool or didn’t have tim e to investigate new tools or train team m em bers.

Typically, these team s leave existing hand code in p lace and use the ETL package to support

a new project or source system . O ver tim e, they displace custom code w ith the vendor ETL

Table 2. Ranking of reasons for buying an ETL tool. Based on respondents who code ETL exclusively.

Less expensive

Our culture encourages in-house development

 Build a data integration infrastructure

Deploy more quickly

0 500 1000 1500 2000

1

2

3

4

Ranking Score

Reasons for Coding ETL Programs - Ranking

Cummins OutsourcesETL Coding Overseasto Speed Development

Firms Are GraduallyReplacing HomeGrown Code

Page 20: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 20/40

18   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

tool as their team gains experience w ith the tool and the tool proves it can deliver adequate

perform ance and functionality.

Mix and Match. Som e team s, how ever, don’t plan to abandon custom code. Their strategy is

to use w hatever approach m akes sense in each situation. These team s often use custom code

to handle processes that vendor ETL tools don’t readily support out of the box.

For exam ple, Preetam Basil, a Priceline.com data w arehousing architect, points out that

Priceline.com uses a vendor ETL tool for the bulk of its ETL processing, but it has developed

U N IX and SQ L code to extract and scrub data from its volum inous W eb logs.

User Satisfaction w ith ETL Tools

O verall, m ost BI team s are generally pleased w ith their ETL tools. According to our survey, 28

percent of data w arehousing team s are “very satisfied”w ith their vendor ETL tool, and 51 per-

cent are “m ostly satisfied.”Vendor ETL tools get a slightly higher satisfaction rating than hand-

coded ETL program s. (See Illustration 9.)

A m ajority of organizations (52 percent) plan to upgrade to the next version of their ETL tool

w ithin 18 m onths, a sign that the team is com m itted to the product. Another 22 percent w ill

m aintain the product “as is”during the period. A sm all fraction plan to replace their vendor ETL

tool w ith another vendor ETL tool (7 percent) or custom code (2 percent). (See Illustration 10.)

Illustration 8. Almost half (49 percent) of respondents plan to replace or augment their custom code with a vendor ETL tool.

Based on 415 respondents.

Continue to enhance the custom code (43%)

Replace it with a vendor ETL tool (26%)

Augment it with a vendor ETL tool (23%)43%

26%

23%

7%2%

Outsource it to a third party (2%)

Other (7%)

Eighteen-Month Plans for Your ETL Custom Code?

Users Are “MostlySatisfied” withETL Tools

Strategy: Custom Code

Where Appropriate

Illustration 9. Vendor ETL tools have a higher satisfaction rating than custom ETL programs for organizations that use either vendor ETL

tools or custom code exclusively. Based on 341 and 134 responses, respectively.

Very Satisfied

Mostly Satisfied

Somewhat Satisfied

46%

24%

12%

18%

Not Very Satisfied

51%

16%

5%

28%

0 10 20 30 40 50 60

Vendor ETL tool

Custom code

Satisfaction Rating: Vendor Tools versus Custom Code

Page 21: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 21/40

 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 19

Bui ld or Buy?

Very few organizations have abandoned vendor ETL tools. M ost have purchased one tool and

continue to use it in a production environm ent. (See Illustrations 11a and 11b.) In other

w ords, there is very little ETL shelfw are.

Chall enges in Deploying ETL

D espite this rosy picture, BI team s encounter num erous challenges w hen deploying vendor

ETL tools. The top tw o challenges are “ensuring adequate data quality”and “understanding

source data,”according to our survey. (See Table 3.) Although these problem s are not neces-

sarily a function of the ETL tool or code, they do represent an opportunity for ETL vendors to

expand the capabilities of their toolset to m eet these needs.

Complex Transformations. The next m ost significant challenge is designing com plex trans-

form ations and m appings. “The goal of [an ETL] tool is to m inim ize the am ount of code

you have to w rite outside of the tool to handle transform ations,”says A lliance Capital

M anagem ent’s Connolly.

U nfortunately, m ost ETL tools fall short in this area. “W e w ish there w as m ore flexibility in

w riting transform ations,”says B rian K oscinski, ETL project lead at Am erican Eagle O utfitters in

W arrendale, PA. “W e often need to drop a file into a tem porary area and use our ow n code to

sort the data the w ay w e w ant it before w e can load it into our data w arehouse.”

Illustration 11a. Most BI teams only use one ETL tool. Based on 617 responses. Illustration 11b. Very few ETL tools go unused. Based on 617 responses.

ETL Products in Production

Number of ETL Tools

0 (6%)

1 (60%)

2 (25%)

60%

25%

6% 6%3%

3 (6%)

4+(3%)

0 (81%)

1 (15%)

2 (4%)

81%

15%4%1%

3 (1%)

ETL Shelfware

“ How many vendor ETL tools has your teampurcha sed that it does not use?” 

Illustration 10.Most teams are committed to their current vendor ETL tool. Based on 622 responses.

Upgrade to the next or latest version (52%)

Maintain as is (22%)

Augment it with custom code or functions (11%)

52%

22%

11%

3%7%

Replace it with a tool from another vendor (7%)

Augment it with a tool from another vendor (3%)

2%4%

Replace it with hand-written code (2%)

Outsource it to a third party (0%)

Other (4%)

Eighteen-Month Plans for Your Vendor ETL Tool

Data Quality Issues Top all Challenges

Goal: Minimize CodeWritten Outside theETL Tool

Page 22: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 22/40

20   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

According to our survey, only a sm all percentage of team s (11 percent) are currently extend-

ing vendor ETL tools w ith custom code. The rest are using a “reasonable am ount”of custom

code or “none at all”to augm ent their vendor ETL tools.

Skilled ETL Developers.Another significant challenge, articulated by m any survey respondents and

users interview ed for this report, is finding and training skilled ETL program m ers. M any say the

skills of ETL developers and fam iliarity w ith the tools can seriously affect the outcom e of a project.

Priceline.com ’s Basil agrees. “Initially, w e didn’t have ETL developers w ho w ere know ledge-

able about data w arehousing, database design, or ETL p rocesses. As a result, they built ineffi-

cient processes that took forever to load.”

ED S C anada’s Light says it’s im perative to hire experienced ETL developers at the outset of a

project, no m atter w hat the cost, to establish developm ent standards. O therw ise, the “prob-

lem s encountered dow n the line w ill be m uch larger and harder to grapple w ith.”O thers add

that it’s w ise to hire at least tw o ETL developers in case one leaves in m idstream .

M ost BI m anagers are frustrated by the tim e it takes developers to learn a new tool (about three

m onths) and then becom e proficient w ith it (12 or m ore m onths.) Although the skill and experi-

ence of a developer counts heavily here, a good ETL developm ent environm ent—along w ith

high-quality training and support offered by the vendor—can accelerate the learning curve.

The Temptation to Code. D evelopers often com plain that it takes m uch longer to learn the ETL

tool and m anipulate transform ation objects than sim ply coding the transform ations from

scratch. And m any give in to tem ptation. But perseverance pays off. D evelopers are five to

six tim es m ore productive once they learn an ET L tool com pared to using a com m and-line

interface, according to Pieter M im no.

Table 3. Data quality issues are the top two challenges facing ETL developers.

Ensuring adequate data quality

Creating complex mappings

Creating complex transformations

Understanding source data

Ensuring adequate performance

Providing access to meta data

 Finding skilled ETL programmers

 Collecting and maintaining meta data

Ensuring adequate scalability

Loading data in near real time

Ensuring adequate reliability

Integrating with third-party tools or applications

Creating and integrating user defined functions

Integrating with load utilities

Debugging programs

0 1000 2000 3000 4000 5000 6000 7000 8000

1

11

10

9

8

7

6

5

4

3

2

15

1413

12

Ranking Score

Ranking of ETL Challenges

Developers Take 12 orMore Months to BeProficient with anETL Tool

Page 23: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 23/40

 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 21

Bui ld or Buy?

PricingSticker Shock. Perhaps the biggest hurdle w ith vendor ETL tools is their price tag. M any data

w arehousing team s experience sticker shock w hen vendor ETL salespeople quote license fees.

M any team s w eigh the perceived value of a tool against its list price—w hich often exceeds

$200,000 for m any tools—and w alk aw ay from the deal.

“For $200,000 you can buy at least 4,000 m an hours of developm ent and the tool’s ongoing m ain-

tenance fee pays for one-half of a head count,”says data w arehousing consultant D arrell Piatt.

The price of ETL tools is a “difficult pill to sw allow ”for com panies that don’t realize that a

data w arehouse is different from a transaction system , says Bryan LaPlante, senior consultant

w ith Pragm atek, a M inneapolis consultancy that w orks w ith m id-m arket com panies. These

com panies don’t realize that an ETL tool m akes it easier to keep up w ith the continuous flow

of changes that are part and parcel of every data w arehousing program , he says.

But even com panies that understand the im portance of ETL tools find it difficult to open their

checkbooks. “Although ETL tools are m uch m ore functional than they w ere five years ago,

m ost are still not w orth buying at list price,”says Piatt.

Spread the Wealth.To justify the steep costs, m any team s sell the tool internally as an infrastruc-

ture com ponent that can support m ultiple projects. “The initial purchase is expensive, but it is

easier to justify w hen you use it for m ultiple projects,”says H eath H atchett, group m anager of

business intelligence engineering at Intuit in M ountain View , CA. O thers negotiate w ith vendors

until they get the price they w ant. “It’s a buyers’m arket right now ,”says Safew ay’s Bow les.

G iven the current econom y and the increasing m aturity of the ETL m arket, the list prices for ETL

products are starting to fall. The dow nw ard pressure on prices w ill continue for the next few

years as softw are giants bundle robust ETL program s into core data m anagem ent program s,

som etim es at no extra cost, and BI vendors package ETL tools inside of com prehensive B I suites.

In addition, a few start-ups now offer sm all-footprint products at extrem ely affordable prices.

Recommendations 

Every B I project team has different levels and types of resources to m eet different business

requirem ents. Thus, there is no right or w rong decision about w hether to build or buy ETL capa-

bilities. But here are som e general recom m endations to guide your decision based on our research.

Buy If You …• Want to Minimize Hand Coding. ET L tools enable developers to create ETL w ork-

flow s and objects using a graphical w orkbench that m inim izes hand coding and the

problem s that it can create.

• Want to Speed Project Deployment. In the hands of experienced ETL program m ers,

an ETL tool can boost productivity and deploym ent tim e significantly. (H ow ever, be

careful, these productivity gains don’t happen until ETL program m ers have w orked w itha tool for at least a year.)

• Want to Maintain ETL Rules Transparently. Rather than bury ETL rules w ithin code,

ETL tools store target-to-source m appings and transform ations in a m eta data repository

accessible via the tool’s G U I and/or an A PI. This reduces dow nstream m aintenance

costs and insulates the team w hen key ETL program m ers leave.

•Have Unique Requirements the Tool Supports. If the tool supports your unique

requirem ents—such as providing adapters to a packaged application your com pany just

purchased—it m ay be w orthw hile to invest in a vendor ETL tool.

Hard to J ustify thePrice Compared toUsing Programmers

Downward PricingPressures Now Exist

Page 24: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 24/40

22   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

•Need to Standardize Your BI Architecture. Purchasing a scalable ETL tool w ith suffi-

cient functionality is a good w ay to enable your organization to establish a standard

infrastructure for BI and other data m anagem ent projects. Reusing the ETL tool in m ulti-

ple projects helps to justify purchasing the tool.

Build If You …•Have Skilled Programmers Available. Building ETL program s m akes sense w hen

your IT departm ent has skilled program m ers available w ho understand BI and are not

bogged dow n delivering other projects.

•Practice Sound Software Engineering Techniques. The key to coding ETL is w riting

efficient code that is designed from a com m on m eta m odel and rigorously docum ented.

•Have a Solid Relationship with a Consulting Firm. Consultancies that you’ve w orked

w ith successfully in the past and understand data w arehousing can efficiently w rite and

docum ent ETL program s. (Just m ake sure they teach you the code before they leave!)

•Have Unique Functionality. If you have a significant num ber of com plex data sources

or transform ations that ETL tools can’t support out of the box, then coding ETL pro-

gram s is a good option.

•Have Existing Code. If you can reuse existing, high-quality code in a new project,

then it m ay be w iser to continue to w rite code.

Data Integration Platforms

As m entioned earlier, m any vendors are extending their ETL products to m eet new user

requirem ents. The result is a new generation of ETL products that TD W I calls data in tegration 

platforms . These products extend ETL tools w ith a variety of new capabilities, including data

cleansing, data profiling, advanced data capture, increm ental updates, and a host of new

source and target system s. (See Illustration S3.)

W e can divide the m ain features of a data integration platform into platform characteristics and

data integration characteristics. N o ETL product today delivers all the features outlined below .

Illustration S3.Data integration platforms extend the capabilities of ETL tools.

Extract

Administration& operations

services

 Transport services

Load

Meta dataimport/export

Databases& files

Source adapters

 Transform

Runtimemeta dataservices

 Target adapters

Designmanager

Legacyapplications

Applicationintegration software

GlobalMeta datarepository

Applicationpackages

Applicationintegrationsoftware

Web &XML data

Eventdata

Databases& files XML

Adapter

developmentkit

Capture

Update

Clean

Profile

Data Integration Platforms

Page 25: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 25/40

 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 23

Data Integration Platforms

H ow ever, m ost ETL vendors are m oving quickly to deliver full-fledged data integration platform s.

PlatformCharacteristics:•H igh perform ance and scalability

•Built-in data cleansing and profiling

•Com plex, reusable transform ations•Reliable operations and robust adm inistration

Data Integration Characteristics:•D iverse source and target system s

•U pdate and capture facilities

•N ear-real-tim e processing

•G lobal m eta data m anagem ent

High Performance and Scalability 

Parallelization.D ata integration platform s offer exceedingly high throughput and scalability

due to parallel processing facilities that exploit high-perform ance com puting platform s.

The parallel engine processes w orkflow stream s in parallel using m ultiple operating system

threads, m ulti-pass SQ L, and an in-m em ory data cache to hold tem porary data w ithout storing

it to disk. W here dependencies exist am ong or w ithin stream s, the engine uses pipelining to

op tim ize throughput.

The high-perform ance com puting platform supports clusters of servers running m ultiple CPU s

and load balancing softw are to optim ize perform ance under large, sustained loads.

Benchmarks.In the m id-1990s, ETL engines achieved a m axim um throughput of 10 to 15 giga-

bytes of data per hour, w hich w as sufficient to m eet the needs of m ost BI projects at the tim e.

Today, how ever, ETL engines now boast throughput of betw een 100 and 150 gigabytes an

hour or m ore, thanks to steady im provem ents in m em ory m anagem ent, parallelization, distrib-

uted processing, and high-perform ance servers.1

Built-In Data Cleansing and Profiling 

To avoid the “code, load, and explode”syndrom e m entioned earlier, good data integration

platform s provide built-in support for data cleansing and profiling functionality.

Data Profiling Tools. D ata profiling tools provide an exhaustive inventory of source data.

(Although m any com panies use SQ L to sam ple source data value, this is akin to sighting the

tip of an iceberg.) D ata profiling tools identify the range of values and form ats of every field

as w ell as all colum n, row , and table dependencies. The tool spits out reports that serve as

the Rosetta Stone of your source files, helping you translate betw een hopelessly out-of-date

copy books and w hat’s really in your source system s.

Data Cleansing Tools.D ata cleansing tools validate and correct business rules in source data. M ost

data cleansing tools apply specialized functions for scrubbing nam e and address data: (1) parsing

and standardizing the form at of nam es and addresses, (2) verifying those nam es/addresses against

a third-party database, (3) m atching nam es to rem ove duplicates, and (4) householding nam es to

consolidate m ailings to a single address.

1 There are m any variables that affect throughput—the com plexity of transform ations, the num ber of sourcesthat m ust be integrated, the com plexity of target data m odels, the capabilities of the target database, and soon. Your throughput rate m ay vary considerably from these rates.

Parallel Engines andHigh-Performance

Computing Platforms

 Today’s ETL EnginesCan Process150GB/Hour

Profiling Tools: theRosetta Stone toSource Systems

Page 26: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 26/40

24   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

Som e data cleansing tools now can also apply m any of the above functions to specific types of

non-nam e-and-address data, such as products.

D ata integration platform s integrate data cleansing routines w ithin a visual ETL w orkflow . (See

Illustration S4.) At runtim e, the ETL tool issues calls to the data cleansing tool to perform its

validation, scrubbing, or m atching routines.

Ideally, data integration platform s exchange m eta data w ith profiling and cleansing tools. For

instance, a data profiling tool can share errors and form ats w ith an ETL tool so developers can

design m appings and transform ations to correct the dirty data. Also, a data cleansing tool can

share inform ation about m atched or householded custom er records so the ETL tool can com -

bine them into a single record before loading the data w arehouse.

Complex, Reusable Transformations Extensible, Reusable Objects. D ata integration platform s provide a robust set of transform ation

objects that developers can extend and com bine to create new objects and reuse them in vari-

ous ETL w orkflow s. Ideally, developers should be able to deploy a custom object that dynam -

ically adapts to its environm ent, m inim izing the tim e developers spend updating m ultiple

instances of the object.

“M ost ETL tools today lim it reuse because they are context sensitive,”says Steve Tracy at H artford

Life Insurance in Sim sbury, CT. “They let you copy and paste an object to support 37 instance of a

source system , but you have to configure each object separately for each. This becom es painful if

there is a change in the source system and you have to reconfigure all 37 instances of the object.”

External Code. Although ETL tools have gained a raft of built-in and extensible transform ationobjects, developers alw ays run into situations w here they need to w rite custom code to sup-

port com plex or unique transform ations.

To address this pain point, data integration platform s provide internal scripting languages that

enable developers to w rite custom code w ithout leaving the tool. But if a developer is m ore

com fortable coding in C, C++, or another language, the tool should also be able to call exter-

nal routines from w ithin an ETL w orkflow and m anage those routines as if they w ere internal

objects. In the sam e w ay, the tool should be able to call third-party schedulers, security sys-

tem s, m eta data repositories, and analytical tools.

Illustration S4 . Data integration platforms integrate data cleansing routines into an ETL graphical workflow.

Integrating Cleansing and ETL

Calls to External CodeShould Be Transparent

Most “Reusable” ETLObjects Are ContextSensitive

Page 27: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 27/40

 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 25

Data Integration Platforms

Reliable Operations and Robust Administration

N o m atter how m uch horsepow er a system possesses, if its runtim e environm ent is unstable,

overall perform ance suffers. In our survey, 86 percent of respondents rated “reliability”as a

“very im portant”ETL feature. (See Illustration 13) This w as the highest rating of any ETL fea-

ture respondents w ere asked to evaluate.

Robust Schedulers.D ata integration platform s provide m uch m ore robust operational and

adm inistrative features than the current generation of ETL tools. For exam ple, they provide

robust schedulers that orchestrate the flow of inter-dependent jobs (both ETL and non-ETL)

and run in a lights-out environm ent. O r they interface w ith a third-party scheduling tool so the

organization can m anage both ETL and non-ET L jobs w ithin the sam e environm ent in a highly

granular fashion.

Conditional Execution.Job processing is m ore reliable because data integration platform s are

“sm art”enough to perform conditional execution based on thresholds, content, or inter-

process dependencies. This reduces the am ount of coding needed to prevent such outages,

especially as BI environm ents becom e m ore com plex.

Debugging and Error Recovery.In addition, data integration platform s provide robust debug-

ging and error recovery capabilities that m inim ize how m uch code developers have to w rite to

recover from errors in the design or runtim e environm ents. These tools also deliver reports

and diagnostics that are easy to understand and actionable. Instead of logging w hat happened

and w hen, users w ant the tools to say why it happened and recommend fixes.

Enterprise Consoles. Surprisingly, only 18 percent of com panies said a single ETL operations

console and graphical m onitoring w as “very im portant.”(See Illustration 15) This indicates that

m ost organizations do not yet need to m anage m ultiple ETL servers. This w ill change as ETL

usage grow s and organizations decide to standardize the scheduling and m anagem ent of ET L

processes across the enterprise.

W e suspect m ost organizations w ill use in-house enterprise system s m anagem ent tools, such

as Com puter Associates’U nicenter, to m anage distributed ETL system s.

Diverse Source and Target Systems

Intelligent Adapters. The m ark of a good data integration platform is the num ber and diversity

of source and target system s it supports. Besides supporting relational databases and flat

files—w hich m ost ETL tools support today—data integration platform s provide adapters to

intelligently connect to a variety of com plex and unique data stores.

Application Interfaces and Data Types. For exam ple, operational application packages, such as

SA P’s R/3, contain unique data types, such as pooled and clustered tables, as w ell as several

different application and data interfaces. D ata integration platform s offer native support for

these interfaces and can intelligently handle unique data types.

Web Services and XML.D ata integration platform s also provide am ple support to the W eb

w orld. W e expect XM L and W eb Services to constitute an ever greater portion of the process-

ing that ETL tools perform . XM L is becom ing a standard m eans of interchanging both data and

m eta data. W eb Services (w hich use X M L) is becom ing the standard m ethod for enabling het-

erogeneous applications to interact over the Internet. N ot surprising, alm ost half (47 percent)

of survey respondents said W eb Services w ere an “im portant”or “fairly im portant”feature for

ETL tools to possess. It’s likely that W eb Services and XM L w ill becom e a de facto m ethod for

Users Want GreaterReliability aboveAll Else

Web Services andXML Are FundamentalServices for ETL

Page 28: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 28/40

26   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

stream ing X M L data from operational system s to ETL tools and from data w arehouses to

reporting and analytical tools and applications.

Desktop Data Stores. Also, data integration platform s support pervasive desktop applications,

such as M icrosoft Excel and Access. By connecting to these sources via XM L and W eb

Services, data integration platform s enable organizations to halt the grow th of “spreadm arts”and the organizational infighting they create.

Adapter Development Kits. Finally, data integration platform s provide adapter developm ent kits that

let developers create custom adapters to link to unique or uncom m on data sources or applications.

Update and Capture Utilities 

To m eet business requirem ents for m ore tim ely data and negotiate shrinking batch w indow s,

data integration platform s support a variety of data capture and loading techniques.

Incremental Updates. First data integration platform s update data w arehouses increm entally

instead of rebuilding or “refreshing”them (i.e., dim ensional tables) from scratch every tim e.

M ore than one-third of our survey respondents (34 percent) said increm ental update is a “very

im portant”feature in ETL tools. (See Illustration 14.)

Change Data Capture. To increm entally update a data w arehouse, ETL tools need to detect and

dow nload only the changes that occurred in the source system since the last load, a process

called “change data capture.”(Som etim es, the source system has a facility that identifies

changes since the last extract and feeds them to the ETL tool.) The ETL tool then uses SQ L

update, insert, and delete com m ands to update the target file w ith the changed data.

This process is m ore efficient than “bulk refreshes”w here the ETL tool com pletely rebuilds all

dim ension tables from scratch and appends new transactions to the fact table in a data w are-

house. M ore than a third (38 percent) of our respondents said that change data capture w as a

very im portant feature in ETL tools. (See Illustration 14.)

More Frequent Loads. Another w ay ETL tools can m inim ize load tim es is to extract and load

data at m ore frequent intervals—every hour or less—executing m any sm all batch runs

throughout the day instead of a single large one at night or on the w eekends.

For exam ple, Intuit Inc. uses a vendor ETL tool to continuously extract data every couple of

m inutes from an O racle 11i source system and load it into the data w arehouse, according to

Intuit’s H atchett. H ow ever, for its large nightly loads, it uses hardw are clusters on its ETL serv-

er and load balancing to m eet its ever-shrinking batch w indow s.

O ne w ay to accom m odate continuous loads w ithout interfering w ith user queries, is for

adm inistrators to create m irror im ages of tables or data partitions. The ETL tool then loads the

m irror im age in the background w hile end users query the “active”table or partition in the

foreground. O nce the load is com pleted, adm inistrators sw itch pointers from the original parti-

tion to the new ly updated one.2

Bulk Loading. D ata integration platform s leverage native bulk load utilities to “block slam ”data

into target databases rather than sim ply using SQ L inserts and updates. D ata integration plat-

form s can update target tables by selectively applying SQ L w ithin a bulk load operation,

som ething that tools today can’t support.

26   THEDATA WAREHOUSING INSTITUTE www.dw-institute.com

2 Adm inistrators m ust m ake sure that the ETL tool or database does not up date sum m ary tables in the targetdatabase until a “quiet period”—usually at night—and users m ust be educated about data’s volatility duringthe period w hen it’s being continuously updated.

A Remedy toSpreadmarts?

Full Refresh versusIncremental Updating

Organizations AreLoading Data MoreFrequently

Page 29: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 29/40

 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 27

Data Integration Platforms

Near-real-time Data Capture.Finally, organizations seek ETL tools that capture and load data

into the data w arehouse in near real tim e. This “trickle feed”process is needed because busi-

ness users w ant integrated data delivered on a tim elier basis (i.e., the previous day, hour, or

m inute—so they can m ake critical tactical and operational decisions w ithout delay).

Priceline.com , for exam ple, uses m iddlew are to capture source data changes into a stagingarea and then use a vendor ETL tool to load it into the data w arehouse. It has built a custom

program to feed m ultiple sessions in parallel to its vendor ETL tool, but is planning to

upgrade a new version that supports parallel processing soon.

EAI Software. As m entioned earlier in this report, m ost ETL tools today partner w ith EAI tools

to deliver near-real-tim e capabilities. The EAI tools typically use application-level interfaces to

peel off new transactions or events as soon as they are generated by the source application.

They then deliver these events to any application that needs them either im m ediately (near real

tim e), at predefined intervals (scheduled), or w hen the target application asks for them (publish

and subscribe).

Bi-directional, Peer-to-Peer Interfaces.Although EAI tools are associated w ith delivering real-

tim e data, they support a variety of delivery m ethods. Also, in contrast to ETL tools, their

application interfaces are typically bidirectional and peer-to-peer in nature, instead of unidirec-

tional and hub-and-spoke. EAI softw are flow s data and events back and forth am ong applica-

tions, each alternatively serving as sources or targets.

U nlike m ost of today’s ETL tools, data integration platform s w ill blend the best of both ETL

and EAI capabilities in a single toolset. Few organizations w ant to support dual data integra-

tion platform s and w ould prefer to source ETL and EAI capabilities from a single vendor.

The upshot is that data integration platform s w ill process data in batch or near real tim e

depending on business requirem ents. They w ill also sup port unidirectional and bidirectional

flow s of data and events am ong disparate applications and data stores using a hub-and-spoke

or peer-to-peer architecture.

Global M eta Data M anagement 

To optim ize and coordinate the flow of data am ong applications and data stores, data inte-

gration platform s offer global m eta data m anagem ent services. The tools autom atically docu-

m ent, exchange, and synchronize m eta data am ong various applications and data stores in a

BI environm ent.

Meta Data Repository and Interchange. To do this, the tools m aintain a global repository of

m eta data. The repository stores m eta data about ETL m appings and transform ations. It m ay

also contain relevant m eta data from up stream or dow nstream system s, such as data m odeling

and analytical tools.

Common Warehouse Metamodel. M ore im portantly, the tools exchange and synchronize m eta

data w ith other tools via a robust m eta data interchange interface, such as the O bject

M anagem ent G roup’s Com m on W arehouse M etam odel (CW M ), w hich is based on X M L.

According to our survey, 48 percent of respondents rated “openness”and 41 percent rated

“m eta data interface”as “very im portant”design features. (See Illustration 12) Although som e

ETL vendors recently announced support for CW M and although the specifications are consid-

ered robust, it’s still too early to tell w hether these standards w ill gain substantial m om entum

to facilitate w idespread m eta data integration.

 Trickle Feeding Datain Near Real Time

Data IntegrationPlatforms Merge theBest of ETL and EAI

Data IntegrationPlatforms SynchronizeMeta Data

Openness and MetaData Interfaces AreImportant Features

Page 30: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 30/40

28   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

Impact Analysis Reports. D ata integration platform s also generate im pact analysis reports that

analyze the dependencies am ong elem ents used by all system s in a B I environm ent. The

reports identify objects that m ay be affected by a change, such as adding a colum n, revising a

calculation, or changing a data type or attribute.

Such reports “reduce the am ount of tim e and effort required to [m anage changes] and reducethe need for custom w ork,”w rote one survey respondent. M oreover, w hen approp riate, the

tools autom atically update each other w ith such changes w ithout IT intervention. (See

Illustration S5 for a sam ple im pact analysis report.) Thirty-three percent of respondents said

im pact analysis rep orts are a “very im portant”feature.

Data Lineage. In addition, data integration platform s offer “data lineage”reports that describe or

depict the origins of a data elem ent or report and the business rules and transform ations that

w ere applied to it as it m oved from a source system to the data w arehouse or a data m art.

D ata lineage reports help business users understand the nature of the data they are analyzing.

SUMMARY D ata integration platform s represent the next generation of ETL tools. These tools extend cur-

rent ETL capabilities w ith enhanced perform ance, transform ation pow er, reliability, adm inistra-

tion, and m eta data integration. The platform s also support a w ider array of source and target

system s and provide utilities to capture and load data m ore quickly and efficiently, including

supporting near-real-tim e data capture.

ETL Evaluation Criteria

Guidelines for Selecting ETL Products. It m ay take several years for m ajor ETL tools to evolve

into data integration platform s and support all the features described above. In the m eantim e,

you m ay need to select an ETL tool or reevaluate the one you are using. If so, there are

m any things to consider. The follow ing list is a detailed—but by no m eans com prehensive—

list of evaluation criteria.

Illustration S5.ETL tools should quickly generate reports that clearly outl ine dependencies among data elements in a BI envi- 

ronment and highlight the impact of changes in source systems or downstream applications.

Impact Analysis Report

Impact Analysis ReportsSave Time and Money

Page 31: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 31/40

 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 29

ETL Evalu atio n Crit eri a

It’s im portant to rem em ber that not all the criteria below m ay be applicable or im portant to

your environm ent. The key is to first understand your needs and requirem ents and then

build criteria from there. The list that follow s can help trigger ideas for im portant features

that you m ay need.

W e’ve tried to provide descriptive attributes to qualify the requirem ents so you can understandthe extent to w hich a tool m eets the criteria. If asked, all vendors w ill say their tool can per-

form certain functions. It’s better to phrase questions in a w ay that lets you ascertain their true

level of support.

Available ResourcesETL Matrix.The evaluation list below is a sum m ary version of a m atrix docum ent that TD W I

M em bers can access at the follow ing U RL: w w w .dw -institute.com /etlm atrix/. Another set of cri-

teria is available at: w w w .clickstream dataw arehousing.com /Clickstream ETLCriteria.htm .

ETL Product Guide.If you w ould like to view a com plete list of ETL products, go to TD W I’s

Marketplace Onl in e (w w w .dw -institute.com /m arketplace) and click on the “D ata

Integration”category, and then select the “D W M apping and Transform ation”subcategory,

and related subcategories.

Courses on ETL. Finally, TD W I also offers several courses on ETL, including TD W IData 

Acqui siti on, TDWI Data Cleansin g, and Evaluati ng and Selecting ETL and Data Cleansin g 

Tools . See our latest conference and sem inar catalogs for m ore details (w w w .dw -

institute.com /education/).

Vendor Attributes 

Before exam ining tools, you need to evaluate the com pany selling the ETL tool. You w ant to

find out (1) w hether the vendor is financially stable and w on’t disappear, and (2) how com -

m itted the vendor is to its ETL toolset and the ETL m arket. Asking these questions can quickly

shorten your list of vendors and the tim e spent analyzing products!

For m ore details, ask questions in the follow ing areas:

•M ission and Focus

o Prim ary focus of com pany

o Percent of revenues and profits from ETL

o ETL m arket and product strategy

•M aturity and Scale

o Years in business. Public or private?

o N um ber of em ployees? Salespeople? D istributors? VA RS?

•Financial H ealth

o Y-Y trends in revenues, profits, stock price

o C ash reserves; D ebt•Financial Indicators

o License-to-service revenues

o Percent of revenues spent on R& D

•Relationships

o N um ber of distributors, O EM s, VA Rs

Focus on YourRequirements

How Stable Is theVendor?

Page 32: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 32/40

30   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

Overal l Product Considerations 

Before diving in and evaluating an ETL tool’s features and functions, m ake sure you have the

right m indset. The first principle of tool selection is to question and verify everything. “It’s

never as easy as the vendor says,”w rites a B I professional from a m ajor electronics retailer.

“Verify everything they say dow n to the last turn of the screw before the sale is com plete and

the softw are license agreem ent is signed.”

Platforms.After adopting a skeptical attitude, step back and see if the product m eets your

overall needs in key areas. Topping the list is w hether the product runs on platform s that you

have in house and w ill w ork w ith your existing tools and utilities (e.g., security, schedulers,

analytical tools, data m odeling tools, source system s, etc.).

Full Lifecycle Support.Second, find out w hether the product supports all ETL processes and

provides full lifecycle support for the developm ent, testing, and production of ETL program s.

You don’t w ant a partial solution. Also, m ake sure it supports the preferred developm ent style

of your ETL developers, w hether that’s procedural, object-oriented, or G U I-driven.

Requisite Performance and Functionality.Finally, m ake sure the tool provides adequate p er-form ance and supports the transform ations you need. The only w ay to validate this is to have

vendors perform a proof of concept (PO C) w ith your data. Although a PO C m ay take several

days to set up and run, it is the only w ay to really determ ine w hether the tool w ill m eet your

business and technical requirem ents and is w orth buying. You should also never pay the ven-

dor to perform a PO C or agree to purchase the product if they com plete the PO C successfully.

“It’s really im portant to take these tools for a test drive w ith your data,”says ED S C anada’s

Light. M im no adds, “A proof of concept exposes how different the tools really are—this is

som ething you’ll never ascertain from evaluations on paper only.”

Design Features 

Ease of Use. M ost BI team s w ould like vendors to m ake ETL tools easier to learn and use. Agraphical developm ent environm ent can enhance the usability of an ETL product and acceler-

ate the steep learning curve. Alm ost all survey respondents (84 percent) said a visual m apping

interface, for exam ple, w as either a “very im portant”or “fairly im portant”design feature.

Illustration 12. Ease of use is rated as a “very important” design feature. Based on 746 respondents.

48%

45%

51%

40%

41%

59%

70%

34%

0 10 20 30 40 50 60 70 80

Visual mapping interface

Single logical meta data store

Meta data interfaces

Ease of use

Openness

Debugging

Meta data reports (e.g., impact analysis)

 Transformation language and power

Ratings of Importance of Design, Transformation, andMeta Data Features

 You Don’t Want aPartial Solution

Page 33: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 33/40

 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 31

ETL Evalu atio n Crit eri a

G ood ETL tools save developers tim e by letting them com bine and reuse objects, tasks, and

processes in various w orkflow s. W orkflow s are visual and interactive, letting developers focus

on one step at a tim e. W izards and cut-and-paste features also speed developm ent, as w ell as

interactive debuggers that don’t require developers to w rite code.

M ore specifically, ask w hether the ETL tool provides:

•An integrated, graphical developm ent environm ent

•Interactive w orkflow diagram s to depict com plex flow s of jobs, processes, tasks, and data

•A graphical drag-and-drop interface for m apping source and target data

•The ability to com bine m ultiple jobs or tasks into a “container”object that can be visu-

ally depicted and opened up in a w orkflow diagram

•Reuse of containers and custom objects in m ultiple w orkflow s and projects

•A procedural or object-oriented developm ent environm ent

•A scripting language to w rite custom transform ation objects and routines

•Exits to call external code or routines on the sam e or different platform s

•Version control w ith check-in and check-out

•An audit trail that tracks changes to every process, job, task, and object

M eta Data M anagement Features 

As the hub of a data w arehousing or data integration effort, an ETL tool is w ell positioned to

capture and m anage m eta data from a m ultiplicity of tools. Today, m ost ETL tools autom atical-

ly docum ent inform ation about their develop m ent and runtim e processes in a m eta data

repository (e.g., relational tables or a proprietary engine) w hich can be accessed via query

tools, a docum ented API, or a W eb brow ser.

Today, the best ETL tools im port and export technical and business m eta data w ith targeted

up stream and dow nstream system s in the B I environm ent. They also generate im pact analysis

and data lineage reports across these tools. M ost should be w orking on w ays to autom ate the

exchange of m eta data using CW M and W eb Services standards and keep heterogeneous BIsystem s in synch.

For m ore details on m eta data, ask w hether an ETL product provides:

•Self-docum enting m eta data for all design and runtim e processes

•Support for technical, business, and operational m eta data

•A rich repository for storing m eta data

•A hierarchical repository that can m anage and synchronize m ultiple local or globally

distributed m eta data repositories

•The ability to reverse-engineer source m eta data, including flat files and spreadsheets

•A robust interface to interchange m eta data w ith third-party tools

•Robust im pact analysis reports

•D ata lineage reports that show the origin and derivation of an object

Transformation Features 

Extensible, Reusable, Context-Independent Objects. The best ETL tools provide a rich library of

transform ation objects that can be extended and com bined to create new , context-independ-

ent, reusable objects. The tools also provide an internal scripting language and transparent

support for calls to external routines.

Evaluate the Realityversus Promise of

a Vendor’s MetaData Strategy

Page 34: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 34/40

32   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

To learn about transform ation features, ask w hether the tool provides:

•A rich library of base transform ation objects

•The ability to extend base objects

•Context-independent objects or containers

•Record- and set-level processing logic and transform s•G eneration of surrogate keys in m em ory in a single pass

•Support for m ultiple types of transform ation and m apping functions

•The ability to create tem porary files, if needed for com plex transform ations

•The ability to perform recursive processing or loops

Data Quali ty Features 

The best ETL tools provide data p rofiling facilities, and can spaw n data cleansing routines

from w ithin ETL w orkflow s. To get details on data profiling and cleansing facilities, ask

w hether the ETL product:

•Provides internal or third-party data profiling and cleansing tools

•Integrates cleansing tasks into visual w orkflow s and diagram s•Enables profiling, cleansing and ETL tools to exchange data and m eta data

•Autom atically generates rules for ETL tools to build m appings that detect and fix data defects

•Profiles form ats, dependencies, and values of source data files

•Parses and standardizes records and fields to a com m on form at

•Verifies record and field contents against external reference data

•M atches and consolidates file records

Performance Features 

Perform ance features received som e of the highest ratings in our survey. (See Illustration 13.)

As the volum e of data increases, batch w indow s shrink, and users require m ore tim ely data,

team s are looking to ETL tools to m ake up the difference.

The best ETL tools offer a parallel processing engine that runs on a high-perform ance com put-

ing server. It leverages threads, clusters, load balancing, and other facilities to speed through-

put and ensure adequate scalability.

Illustration 13. Reliabi lity and performance/throughput were rated as “very important” features by more than 80 percent of 

750 survey respondents.

70%

56%

70%

22%

34%

81%

86%

Parallel processing

Real-time data capture

Incremental update

Reliability

Availability

Scalability

Performance and throughput

0 20 40 60 80 100

Important Performance Features: Rating

Page 35: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 35/40

To learn m ore about perform ance features, ask w hether the product provides:

•A flexible, high-perform ance w ay to process and transform source data

•Intelligent m anagem ent of aggregate/sum m ary tables

o B undled w ith p roduct or extra priced option

•H igh-speed processing using m ultiple concurrent w orkflow s, m ultiple hardw are proces-sors, and system s threads and clustered servers

•In-m em ory cache to avoid creating interm ediate tem porary files

•Linear perform ance im provem ent w hen adding processor and m em ory

Extract and Capture Features 

Leading ETL tools provide native support for a w ide range of source databases, a top require-

m ent am ong our survey respondents. (See Illustration 14.) The low est com m on denom inator is

support for O D BC/JD BC, but m ake sure the tool can directly access the sources you support,

preferably at no additional cost and w ithout the use of a third-party gatew ay.

U sers also em phasized the im portance of extracting and integrating data from m ultiple source

system s. “The ability to use an ETL tool to extract data from three sources and com bine them

together is w orth the effort com pared to w riting the routine in som ething like PL/SQ L,”says

Am erican Eagle’s K oscinski.

To understand extract capabilities, ask w hether the tool provides:

•The ability to schedule extracts by tim e, interval, or event.

•Robust adapters that extract data from various sources

•A developm ent fram ew ork for creating custom data adapters

•A robust set of rules for selecting source data•Selection, m apping, and m erging of records from m ultiple source files

•A facility to capture only changes in source files.

Load and Update Features 

To understand load and up date features, ask w hether the product can:

•Load data into m ultiple types of target system s

•Load data into heterogeneous target system s in parallel

 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 33

ETL Evalu atio n Crit eri a

Illustration 14. Users want an ETL tool to support a wide range of sources. Based on 745 respondents who rated the above items as “very important.” 

34%

33%

36%

12%

22%

38%

54%

Support for native load utilities

Web services support

Real-time data capture

Breadth of sources

Incremental update

Breadth of targets supported

Change data capture

0 10 20 30 40 50 60

Extract and Load Features: User Ratings

Page 36: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 36/40

•Support data partitions on the target

•Both refresh and append data in the target database

•Increm entally update target data

•Support high-speed load utilities via an API

•Turn off referential integrity and drop indexes, if desired

•Take a snapshot of data after a load for recovery purposes, if desired

•Autom atically generate D D L to create tables

Operate and Administer Component 

The ultim ate test of an ETL product is how w ell it runs the program s that ETL developers design,

and how w ell it recovers from errors. Thus, m onitoring and m anaging the ETL runtim e environ-

m ent is a critical requirem ent. O ur survey respondents rated “error reporting and recovery”espe-

cially high (79 percent) w ith “scheduling”not far behind (66 percent). (See Illustration 15.)

For detailed inform ation on adm inistrative features, ask w hether the product provides:

•A visual console for m anaging and m onitoring ETL processes

•A robust, graphical scheduler to define job schedules and dependencies

•The ability to validate jobs before running them .

•A com m and line or application interface to control and run jobs

•Robust graphical m onitoring that displays in near real tim e:

•The ability to restart from checkpoints w ithout losing data

•M API-com pliant notification of errors and job statistics

•Built-in logic for defining and m anaging rejected records

•Robust set of easy-to-read, useful adm inistrative reports•A detailed job or audit log that records all activity and events in the job:

•Rigorous security controls, including links to LD AP and other directories

Integrated Product Suites 

M any vendors today bundle ETL tools w ithin a business intelligence or data m anagem ent

suite. Their goal is to ensure a suite—or com plete, integrated solution—w ill provide m ore

value and be m ore attractive to m ore organizations.

34   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

Illustration 15. Error reporting leads all administration features, according to 745 respondents who rated the above items as “very important.”

40%

18%

51%

18%

66%

79%

Single operations console

Graphical monitoring

Error reporting and recovery

Single logical meta data store

Debugging

Scheduling

0 10 20 30 40 50 60 70 80

Rating of Administration Features

Page 37: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 37/40

To drill into add-on products, ask:

•Is the package geared to a specific source application or is it m ore generic?

•H ow integrated is the ETL tool w ith other tools in the suite?

o D o they have a com m on look and feel?

o D o they share m eta data in an autom ated, synchronized fashion?o D o they share a com m on install and m anagem ent system ?

Company Services and Pric ing

As m entioned earlier in this report, m any user organizations find the price of ETL tools too

high. Alm ost three-quarters of survey respondents (71 percent) consider pricing am ong the top

three factors affecting their decision to buy a vendor ETL tool. (See Illustration 16.)

Bundles versus Components.U sers pay close attention to vendor packaging options. Vendors

that offer all-in-one packages at high list prices can often price them selves out of contention.

Vendors that break their packages into com ponents that enable users to buy only w hat theyneed are m ore attractive to users w ho w ant to m inim ize their initial cost outlays. H ow ever,

runtim e charges can ruin the equation if a vendor charges the custom er each tim e a com po-

nent runs on a different platform .

M ore specifically, ask about:

•Pricing for a four CPU system on W indow s and U N IX

•Annual m aintenance fees

•Level of support, training, consulting, and docum entation provided

•Pricing for add-on softw are and interfaces required to run the product

•Annual user conferences

•Cost of m oving from a single server to m ultiple servers

•U ser group available

•O nline discussion groups

 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 35

ETL Evalu atio n Crit eri a

Illustration 16. Pricing is a significant factor affecting purchasing decisions. Based on 621 respondents.

A lot. It's a top factor. (10%)

Significantly. Among the top 3 factors. (61%)

Somewhat. An important but not critical factor. (26%)

61%

26%

4% 10%

Not much. It is a requirement for doing business. (4%)

How Does Pricing Affect Your ETL Purchasing Decision?

Page 38: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 38/40

Conclusion

ETL tools are the traffic cop for business intelligence applications. They control the flow of

data betw een m yriad source system s and BI applications. As BI environm ents expand and

grow m ore com plex, ETL tools need to change to keep pace.

Next Generation Functionality. Today, organizations need to m ove m ore data from m ore sources

m ore quickly into a variety of distributed BI applications. To m anage this data flow , ETL tools

need to evolve from batch-oriented, single-threaded processing that extracts and loads data in

bulk to continuous, parallel processes that capture data in near real tim e. They also need to pro-

vide enhanced adm inistration and ensure reliable operations and high availability.

As the traffic cop of the B I environm ent, they need to connect to a w ider variety of system s and

data stores (notably XM L, W eb Services-based data sources, and spreadsheets) and coordinate

the exchange of inform ation am ong these system s via a global m eta data m anagem ent system .

To sim plify and speed the developm ent of com plex designs, the tools need a visual w ork envi-

ronm ent that allow s developers to create and reuse custom transform ation objects.

Finally, ETL tools need to deliver a larger part of the B I solution by integrating data cleansing

and profiling capabilities, EAI functionality, toolkits for building custom adapters, and possibly

reporting and analysis tools and analytic applications.

This is a huge list of new functionality. The sum total represents a next-generation ETL prod-

uct—or a data integration platform . It w ill take ETL vendors several years to extend their

products to support these areas, but m any are w ell on their w ay.

Focus on Your Business Requirements.The key is to understand your current and future ETL

processing needs and then identify the product features and functions that support those

needs. W ith this know ledge, you can then identify one or m ore products that can adequately

m eet your ETL and BI needs.

36   THE DATA WAREHOUSING INSTITUTE www.dw-institute.com

Evaluating ETL and Data Integration Platforms

ETL Tools Need toKeep Pace

It Will take ETLVendors Several

 Years to Deliver DataIntegration Platforms

Conclusion

Page 39: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 39/40

Business Objects

3030 Orchard Parkway

San Jose, CA 95134

408.953.6000Web: www.businessobjects.com

Business Objects is the world's leading provider of business

intelligence (BI) solutions.Business intelligence lets organi-

zations access,analyze,and share information internally

with employees and externally with customers,suppliers,

and partners.It helps organizations improve operationalefficiency,build profitable customer relationships,and

develop differentiated product offerings.

The company's products include data integration tools, theindustry's leading integrated business intelligence platform,

and a suite of enterprise analytic applications.Business

Objects is the first to offer a complete BI solution that iscomposed of best-of-breed components,giving organiza-

tions the means to deploy end-to-end BI to the enterprise,from data extraction to analytic applications.

Business Objects has more than 17,000 customers in over

80 countries.The company's stock is publicly traded underthe ticker symbols NASDAQ:BOBJ and Euronext Paris

(Euroclear code 12074). It is included in the SBF 120 and IT

CAC 50 French stock market indexes.

DataMirror

3100 Steeles Avenue East,

Suite 1100

Markham, Ontario L3R 8T3

Canada

905.415.0310

Fax: 905.415.0340

Email: [email protected]

Web: www.datamirror.com

DataMirror (Nasdaq:DMCX;TSX:DMC),a leading

provider of enterprise application integration and resiliency

solutions,gives companies the power to manage,monitor,and protect their corporate data in real time.DataMirror’s

comprehensive family of LiveBusiness™ solutions enablescustomers to easily and cost effectively capture,transform,

and flow data throughout the enterprise.DataMirror

unlocks the experience of now™ by providing the instantdata access,integration,and availability companies require

today across all computers in their business.

1,700 companies have gone live with DataMirror software,including Debenhams,Energis,GMAC Commercial

Mortgage,the London Stock Exchange,OshKosh B’Gosh,

Priority Health,Tiffany & Co., and Union Pacific Railroad.

Hummingbird Ltd.

1 Sparks Avenue

Toronto, Ontario M2H 2W1

CanadaEmail: [email protected]

Web: www.hummingbird.com

Headquartered in Toronto, Canada,Hummingbird Ltd.

(NASDAQ:HUMC,TSE:HUM), is a global enterprise

software company employing 1,300 people in nearly 40

offices around the world. Hummingbird’s revolutionaryHummingbird Enterprise™,an integrated information and

knowledge management solution suite,manages the entire

lifecycle of information and knowledge assets.

Hummingbird Enterprise creates a 360-degree view of allenterprise content,both structured and unstructured data,

with a portfolio of products that are both modular and

interoperable.Today, five million users rely on

Hummingbird to connect,manage,access,publish,and

search their enterprise content. For more information,please visit:http://www.hummingbird.com.

Informatica Corporation

Headquarters

2100 Seaport Boulevard

Redwood City, CA 94063

650.385.5000

Toll-free U.S.: 800.653.3871

Fax: 650.385.5500

Informatica offers the industry's only integrated business

analytics suite, including the industry-leading enterprise data

integration platform,a suite of packaged and domain-specificdata warehouses,the only Internet-based business intelli-

gence solution,and market-leading analytic applications.The

Market Leading Data Integration Solution comprise of 

Informatica PowerCenter®,PowerMart®,and InformaticaPowerConnect(tm) products,the industry's leading data

integration platform helps companies fully leverage and

move data from virtually any corporate system into data

warehouses,operational data stores,staging areas,or other

analytical environment.Offering real-time performance,scalability,and extensibility,the Informatica data integration

platform can handle the challenging and unique analytic

requirements of even the largest enterprises.

SPONSORS

Page 40: Etl Report

7/26/2019 Etl Report

http://slidepdf.com/reader/full/etl-report 40/40

The Data Warehousing Institute5200 Southcenter Blvd., Suite 250

Seattle, WA 98188

Local: 206.246.5059

Membership

As the data warehousing and business intelligence field continues to evolve and develop,

it is necessary for information technology professionals to connect and interact with one

another. TDWI provides these professionals with the opportunity to learn from each

other, network, share ideas, and respond as a collective whole to the challenges and

opportunities in the data warehousing and BI industry.

 Through Membership with TDWI, these professionals make positive contributions to the

industry and advance their professional development. TDWI Members benefit through

increased knowledge of all the hottest trends in data warehousing and BI, which makes

 TDWI Members some of the most valuable professionals in the industry. TDWI Members

are able to avoid common pitfalls, quickly learn data warehousing and BI fundamentals,

and network with peers and industry experts to give their projects and companies a

competitive edge in deploying data warehousing and BI solutions.

 TDWI Membership includes more than 4,000 Members who are data warehousing

and information technology (IT) professionals from Fortune 1000 corporations,

consulting organizations, and governments in 45 countries. Benefits to Members

from TDWI include:

• Quarterly Business Intelligence Journal

• Biw eekly TDWI FlashPoint electronic bulletin

• Quarterly TDWI M ember Newsletter

• Annual Data W arehousing Salar y, Roles, and Responsibilities Report

• Quarterly Ten M istakes to Avoid series

• TDWI Best Practices Awar ds summaries

• Semiannual W hat Works: Best Practices in Business Intelligence and

Data Warehousing corporate case study compendium 

• TDWI’s M arketplace Online comprehensive product and service guide

• Annual technology poster

• Periodic research report summaries

• Special discounts on all conferences and seminars

• Fifteen-percent discount on all publications and merchandise

Membership with TDWI is available to all data warehousing, BI, and IT professionals

for an annual fee of $245 ($295 outside the U.S.). TDWI also offers a Corporate

Membership for organizations that register 5 or more individuals as TDWI Members.


Recommended