+ All Categories
Home > Documents > An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang...

An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang...

Date post: 20-Jan-2016
Category:
Upload: lynn-perkins
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
53
Transcript
Page 1: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.
Page 2: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform

Simon TangInfoSphere Technical ManagerIBM GCG

Page 3: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

“How can I see how this is used” –Governance Steward

Pain Point: Understanding Core Information Assets

“What systems will be impacted from this change” - DBA

“I’m not sure what the business wants” - Developer

“This data does not look right” – Business User“I don’t have the

information I need” – Business Analyst “We are not leveraging

our information” - Architect

Page 4: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Impact of NOT Managing Core Information Assets

Inaccurate or incomplete data is a leading cause of failure in

business-intelligence and CRM projects

83% of data integration projects either overrun or fail

Low data quality costs companies $611 billion

annually

Undetected defects will cost 10 to 100 times as much to fix upstream

25% of time is spent clarifying

bad data

Lack of consumer confidence

Lost opportunities

Scrap and reworkIncreased $$$

Page 5: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

5

Who are Looking for Trusted Information?

Target Audience

• Data/Business Analysts

• Subject Matter Experts

• Architects

• Governance Stewards

What are they working on?

• Information-centric projects:

• BI & Data Warehousing

• Master Data Management

• Application Implementation, Consolidation or Migration

• Information Architecture

• Governance Initiatives

What do these roles do today?

• Manage information manually in disconnected tools, documents, and spreadsheets

What is wrong with what they do today?

• Time consuming – churn between business & IT

• Imprecise & error prone – manual processes not thorough enough

• No collaboration – different roles work in silos

• Lacks audit trail – no ongoing record

• Redundancy – duplication of effort & storage

Subject Matter Experts GovernanceStewards

Data/Business Analysts

Architects

Trusted Information1. Accurate

2. Complete

3. Insightful

4. Real Time

Page 6: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

A Flexible Platform for Managing, Integrating, Analyzing and Governing Information

AnalyzeIntegrate

Transactional

& Collaborative

Applications

Manage

Business Analytics

Applications

External

Information

Sources

Cubes

Streams

Big Data Master

Data

Content

Data

Streaming

Information

Data

Warehouses

GovernQuality

Security &

PrivacyLifecycle

Page 7: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Challenges in Data Management

• Inconsistent islands of information underlying applications

• Complex, manual & costly copy synchronization• Inconsistent and poor quality data• Inability to exploit enterprise meta data across tools

• Touching data multiple times at its source – storing multiple times and updating multiple times

• Inability to share common business rules across projects, processes and applications

• Lack single, repeatable methodology for consistency across all projects

CRM Order Proc

SupplyChain

Procure-ment

Page 8: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Convert information into a trusted strategic asset

• Discover and understand the data across heterogeneous systems• Design trusted information structures for business optimization • Govern that information over time

Only IBM has

invested to provide

the breadth of

capabilities to

define and govern

your information…

• Business Vocabulary• Data Relationships• Data Quality Compliance• Data Models and

Mapping• Business Specification

Rules• Provenance of

information

Page 9: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Remedy: 10 Proven Strategies

Consider where your organization’s most SIGNIFICANT data pain exists – take that

approach first

No single path is THE panacea to all corporate data problems - multiple approaches must

be employed

Page 10: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Strategy #1 – Understand Source Systems

Business Analysis

Data Analysis

1. Discovers actual characteristics of data

2. Verify if characteristics of data conform to established / known business rules

3. Report on the assessment and variances / exceptions

Page 11: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Strategy #1 – Understand Source Systems Poor data quality costs U.S. businesses over $600 billion each year Data deteriorates up to 3% every month What is the key to integrating corporate data? – Having the right

data before you start

0 10 20 30 40 50 60 70 80 90 100

Ensuring adequate data qualityUnderstanding source data

Creating complex transformationsCreating complex mappings

Ensuring adequate performance

Collecting and maintaining meta data

Finding skilled programmers

Providing access to meta data

Ensuring adequate scalability

Integrating 3rd party tools

Ensuring adequate reliability

Page 12: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Recommended Best Practices: Automated Data Profiling

No coding

Advice: You won’t have the time, $ or energy to profile 100%

quickly so go automated

Foreign Key &Duplicate Analysis

Table & Primary Key Analysis

Co

lum

nA

nalysis

Foreign Key &Duplicate Analysis

Source 1

Source 2

Page 13: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Strategy #2 – Build-in Data Quality

• Same company / person?• Same address?• Same parts?• Same instructions?

NAME ADDRESS

IBM 187 N. Pk. Str. Salem NH 01456

I.B.M. Inc. 187 N. Pk. St. Sarem NH 01456

International Bus. M. 187 No. Park St Salem NH 04156

Int. Bus. Machines 187 Park Ave Salem NH 01456

Inter-Nation Consult. 15 Main St. Andover MA 02341

Int. Bus. Consultants PO Box 9 Boston MA 02210

I.B. Manufacturing Park Blvd. Boston MA 04106

PART DESCRIPTION

WING ASSY DRILL 4 HOLE USE 5J868A HEXBOLT ¼ INCH

WING ASSEMBLY, USE 5J868-A HEX BOLT .25” – DRILL FOUR HOLES

USE 4 5J868A BOLTS (HEX .25) – DRILL HOLES FOR EA ON WING ASSEM

RUDER, TAP 6 HOLES, SECURE W/KL 2301 RIVETS (10 CM)

Spelling ErrorsLack of Standards in

Synonyms, Acronyms, Abbreviations

Error Codes?Assembly

Part SizeInstruction

Page 14: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Blk 1 |First St|05-00

Blk 1 |First St|05-00

1 |First St|#05-00

Blk 1 |First St|#05-00

1 |St |#05-00

Building | Street | Unit

Recommended Best Practices: Data Cleansing

Data Re-Engineering

Blk 1, 1 St, 05-00

05-00 Frist St, Block 1

1 First Str, #05-00

Block 1, First Str, #05-00

1, St, #05-00

Original

Standardize

Blk 1 |First St|05-00

Blk 1 |First St|05-00

1 |First St|#05-00

Blk 1 |First St|#05-00

1 |St |#05-00

Building | Street | Unit

Match Survive

#05-00, Blk 1, First St

#05-00, 1, St

Final Result

Page 15: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Strategy #3 – Share Common Meta Data

CustomerCustomerNumberNameAddressComments

From Data Model

CustomerTblCustomerIDNameAddressAddress1Comments

From ETL Tool

CustomerDetailsCustomerNumberNameAddressRemarks

From BI ToolCustomerIDNameAddress1Address2Descr

From Database

The Identifier of customers that are tracked for ordering purposes. Corporate customer identifiers are assigned by the Sales Data Controller according to the corporate data description and naming policy for reference identifiers. Unique identifier of

customers that are tracked for ordering purposes. Values start with 02 for non-Corporate customers and 01 for Corporate customers.

<NULL>Customer’s identifier numbers. Values start with 01 for Corporate customers, 02 for non-Corporate customers, 03 for overseas-based Customers.

Which meta data is right?

Which one is current?

Which one should be used?

Page 16: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Recommended Best Practices: Create a common repository

Integrated Meta Data Repository

Modeling tool BI tool

BI Repository

COBOL definition files

Other sources’ definition files

ETL Tool + Processes

Integrate by gathering in from

diverse applications and sources

Page 17: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Shared Metadata Server

& Repository

Category: Costs

Term: Tax Expense

Full Name: Tax to be paid on Gross Income

“The expense due to taxes …..”

(John Walsh is responsible for updates. 90% reliable source)

Status: CURRENT

Database = DB2

Schema = NAACCT

Table = DLYTRANS

Column = TAXVL

data type = Decimal (14,2)

Derivation: SUM(TRNTXAMT)

Achieve a common vocabulary between business & technical users!

InfoSphere DataStage InfoSphere Business Glossary

Create a Common Vocabulary

Page 18: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

GL Organizational Unit

STEWARD: Controllers OfficeFORMAT: X(7)DEFINITION: A seven digit number designating the organizational unit to which this account belongs.

I’ve noticed that the last two digits

of the GL Organizational

Unit, which indicate the sub-department, are

often blank.

Author Standard Definitions

Annotate and Share

Feedback

Collaborate and Share Feedback

Page 19: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

• Categorize Information Assets according to Business Logic• Map Business Terms to Information Assets • Find and view relevant details of Information Assets• View the Stewardship of Information Assets

Extend Business Information

Page 20: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Where does a Field of Data in this Report Come From?

• Import & Browse Full BI Report Metadata• Navigate through report attributes• Visually navigate through data lineage across tools• Combines operational & design viewpoint

Page 21: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

IBM Confidential

Metadata Lineage available from Studio & Viewers

Page 22: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Access Business Glossary from Cognos Studios

IBM Confidential

Page 23: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Strategy #4 – Connect to Any System, Anywhere

DB2, Informix, Netezza, ODBC,

Oracle, Red Brick, SAS,

Sybase, Teradata, etc

Adabas, Allbase/SQL, Datacom/DB,

DB2/400, DB2/OS390,

Essbase, FOCUS,

IDMS/SQL, IMS, NonStopSQL,

RDB, VSAM, etc

WebSphere MQ, SeeBeyond, JMS, XML, EJB, Web Services, EXML, XMLS, EDI, SWIFT, etc

Oracle Applications, PeopleSoft, SAP R/3,

SAP BW, Siebel

Page 24: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Recommended Best Practices: Native Connectivity Software

Do you wish to worry what will be your next application or database to connect to?

Do you wish to worry what will be your next application or database to connect to?

Advice:

Go for pre-built connectors with little/no coding

Page 25: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Strategy #5 – Abandon Hand-coding

These Visual BASIC, Java, C++, UNIX codes can be developed cheaply and they work …

These Visual BASIC, Java, C++, UNIX codes can be developed cheaply and they work …

… but what happens when there is a new source or requirement?

Cheap? Works? Maybe not.

… but what happens when there is a new source or requirement?

Cheap? Works? Maybe not.

Page 26: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Recommended Best Practices: Graphical ETL Tools

Benefits:

1. Jobs are easy to develop, understand, debug and maintain

2. Robust, fully-tested, best practices approach to data migration or extraction

Page 27: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Recommended Best Practices: Graphical ETL Tools

Benefits:

1. Complex transformations can be made very simple with mere point-and-click

Page 28: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Workflow Process - Sequences

• Workflow is as important as dataflow.• Dynamic workflow processes can be defined during

the workflow itself.• DataStage can run external processes and perform

complex evaluations inline.• Advanced concepts such as looping are supported.

Page 29: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Physical Machine UtilizationDisk ThroughputAverage Process Distribution

Percent CPU UtilizationFree Memory Whisker Box

Page 30: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Strategy #6 – Implement a Highly Scalable Foundation

Prediction: Your data

volume is not going to get

smaller

Prediction: Your data

volume is not going to get

smaller

as much Data and ContentOver Coming Decade

2009

800,000 petabytes

2020

35 zettabytes

44x

Page 31: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Strategy #6 – Implement a Highly Scalable Foundation

32

Number of Processors1 8 16 24 32 . . .

Processing Time(Hours)

1

8

16

24

.

.

.

Number of Processors1 8 16 24 32 . . .

Processing Throughput

(Hundreds of Gigabytes)

1X

8X

16X

24X

32X

.

.

.

2 considerations in handling growth:

You want these

or

Not these

Page 32: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Strategy #6 – Implement a Highly Scalable Foundation

Three Elements of a Scalable Infrastructure

Scalable Database Platform

Database vendors have offered a scalable parallel relational database for more than 5 years.

Scalable Hardware Platform

Hardware vendorshave offered scalableparallel computers for more than 5 years.

Scalable Data Integration Platform

Data integration vendors are starting to offer “scalable” “parallel” platforms

Page 33: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Recommended Best Practices: Parallelism

Make sure you get this

S h a r e d M e m o r y

S h a r e d D is k

S h a r e d M e m o r y

S M P S y s t e m

C P U C P U C P U C P U

S h a r e d M e m o r y

S h a r e d D is k

S h a r e d M e m o r y

S M P S y s t e m

C P U C P U C P U C P U

S h a r e d M e m o r y

S h a r e d D is k

S h a r e d M e m o r y

S M P S y s t e m

C P U C P U C P U C P U

S h a r e d M e m o r y

S h a r e d D is k

S h a r e d M e m o r y

S M P S y s t e m

C P U C P U C P U C P U

S h a r e d M e m o r y

S h a r e d D is k

S h a r e d M e m o r y

S M P S y s t e m

C P U C P U C P U C P U

S h a r e d M e m o r y

S h a r e d D is k

S h a r e d M e m o r y

S M P S y s t e m

C P U C P U C P U C P U

S h a r e d M e m o r y

S h a r e d D is k

S h a r e d M e m o r y

S M P S y s t e m

C P U C P U C P U C P U

S h a r e d M e m o r y

S h a r e d D is k

S h a r e d M e m o r y

S M P S y s t e m

C P U C P U C P U C P U

S h a r e d M e m o r y

S h a r e d D is k

S h a r e d M e m o r y

S M P S y s t e m

C P U C P U C P U C P U

S h a r e d M e m o r y

S h a r e d D is k

S h a r e d M e m o r y

S M P S y s t e m

C P U C P U C P U C P U

S h a r e d M e m o r y

S h a r e d D is k

S h a r e d M e m o r y

S M P S y s t e m

C P U C P U C P U C P U

S h a r e d M e m o r y

S h a r e d D is k

S h a r e d M e m o r y

S M P S y s t e m

C P U C P U C P U C P U

Shared Memory

Shared Disk

Shared Memory

SMP System

CPU CPU CPUCPU

Shared MemoryShared Memory

SMP System

CPU CPUCPU CPUCPUCPUCPU

Shared Disk

S h a r e d M e m o r y

S h a r e d D is k

S h a r e d M e m o r y

S M P S y s t e m

C P U C P U C P U C P U

Shared Memory

Shared Disk

Shared Memory

SMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared Memory

SMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared Memory

SMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared Memory

SMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared Memory

SMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared Memory

SMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared Memory

SMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared Memory

SMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared Memory

SMP System

CPU CPU CPUCPU

Not this

Page 34: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Application Execution: Sequential or Parallel

Sequential 4-Way Parallel 64-Way Parallel

Uniprocessor SMP System MPP, GRID, and Clustered Systems

Source Data

TRANSFORM ENRICH LOADData

Warehouse

Recommended Best Practices: Parallelism

One application assembly

Auto parallel-enabled and parallel-aware run-time execution

Serial

Scan

Join

SortTime toProcess

Parallel Parallel

Page 35: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Strategy #7 – Architect for “Right-Time”

• In an InformationWeek 2003 survey of 467 business professionals about how often their IT systems provide business managers with timely updates of primary products or services:– 3% no such process– 1% annually– 17% monthly– 13% weekly– 36% daily– 5% hourly– 8% every minute

• In that same report:– “Whereas 57% of sites surveyed a

year ago said that real-time business information was a key company focus, 70% see it that way today.”

Page 36: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Recommended Best Practices: Right-Time

campaign initiated tuning

customer churns win-back

website click offer made

fraud committed prevention

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

AcceptableLatency

Event OccursEvent Occurs AwarenessAwareness AppropriateAppropriateResponseResponse

Recognition ResponseBusiness

EventOccurs

Latency Latency

Latency is defined as the elapsed time between when an eventoccurs and when an appropriate response or action is made

Page 37: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Recommended Best Practices: Right-Time

1. Improving the ability to recognize business events

Latency RecognitionBusiness

EventOccurs

Recognition ResponseLatency

2. Improving the ability to respond to those events

Page 38: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Log-Based Change Data Capture

Database Logs

Source Engine Target Engine

TCP/IP

Monitoring and Configuration

Database

Message Queue

Web Services

DB2, Oracle,SQL Server, etc

Flat files

Key Benefits:– Low impact– Flexible implementation

– Heterogeneous platform support– Easy to use

Information Server InfoSphere Information

Server

Page 39: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

InfoSphere CDC & InfoSphere DataStage (ETL)

Native

LogDB

Retail

Point Of Sale

“CDC”

Continuous

IBM Information Server

Queue 1

Staging Table

Message

Queue

Direct

Connect

Flat File

Data Stage Consumption

ETL Load

Oracle

Info

rmatio

n S

erver C

han

ge D

ata Cap

ture

IBM Information Server EDW

Out of the box

Out of the box DataStage DSX file format

TCP via Data Stage operator

Teradata, DB2, Oracle,

SQL Server, Sybase…

Including BalOp (ELT)

Page 40: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Strategy #8 – Extend Quality and Transformation Capabilities throughout the Enterprise

1. Hand-coded rules in each project/tool are not re-usable to other projects/tools

2. High costs associated with building & maintaining data access, data quality and transformation rules in each project

Portals

EAI, BPM, EII

Web applications

Dashboards

Legacy Apps

Packaged Apps

Business Partner Data

Data Warehouses

Master Data

Stores

Page 41: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Recommended Best Practices: Data Integration Services

1. Service-Oriented Architecture (SOA) approach packages data integration logic of SOA-friendly applications as services

2. Services can be invoked as Web Services, EJB, JMS by any third-party applications

Java,Application

Servers

MessageQueues,

EAI

Web Services

Business Partner Data

get customer

Service-OrientedArchitecture

Legacy Apps

Packaged Apps

Data Warehouses

Master Data

Stores

SOA Approach

Page 42: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Strategy #9 – Choose a Proven Deployment Methodology designed for Quick Success

• Many available out there• How many and which are workable – who knows?• Be aware there are as much risks in deployment methodology as there in

tools usage

Page 43: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Recommended Best Practices: Iterative Deployment Plan

Establish BusinessDrivers

Deploy Solution

Evaluate Results

Derive BusinessValue

Start

End

12 -

24

Wee

ks

investigate

design

develop deploy

operate

plan

proto-type

unittest

systemtest

UAT

Prod-uctionaudit

regressiontest

maint-enance

etc.

iteration

monitor

manage

Page 44: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

A Blueprint Director The GPS for your information project

Palette free form “sketching” elements

Diagram for a blueprint

•Method browser (displaying method content)•Asset browser (browsing metadata repository)•Glossary explorer (showing glossary tree view)

Context specific property view

•Outline (zoom in/out view)•Blueprint explorer (shows tree view of the elements in the blueprint)

Page 45: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Business and IT: Working Together

Business Business RequirementsRequirements

Successful Data Successful Data Integration ProjectIntegration Project Successful Data Successful Data Integration ProjectIntegration Project

Business Analyst

Collects business terms and business requirements; Converts into business rules in a spec

Developer

Takes those business rules and mapping spec and turns them into code, such as a DataStage job.

Business terms

Mapping specification created – critical to collaboration between IT and business

•extract•transform•load

Create DataStage jobs and data flows that reflect business needs.

Page 46: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Track business requirements to application deployment

• Single, central managed infrastructure to track requirements to deployment

• Import Excel mapping spreadsheets

• Define and link business terms to physical structures

• Generate DataStage jobs with annotated to-do tasks for developer

• Generate historical documentation for tracking

Flexible reporting and tracking

Auto-generate DataStage jobs

Define mapping specification with

business rules and terms

Page 47: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Strategy #10 – Ensure Interoperability of Integration Infrastructures

The Goal

Connected, integrated, seamlessly

The Reality

Cobbled, piece-meal, manual-intensive

Page 48: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Data Integration Projects require a Collaborative Effort

Developer

Business Analyst

Data Modeler

Data Analyst

transformation rules

business terms

data flow

data model

•extract•transform•load

businessrequirements

application

Business user

Page 49: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

49

Metadata Server

Establish Platform Import & Enhance Industry Model

Assess, Monitor, Manage Data Quality Rules

Information Analyzer

1

2

Business Glossary

Populates

Links

DataStage & QualityStage

Generate Logic to Load Warehouse

Map Sources to Target Model

FastTrack

3

Simplification & Content: reduces project time, risk and cost!

CognosData Architect

Deliver Reports

4 6

7

Define Business Requirement & Glossary 5

Discovery

Understand Data Relationships

Page 50: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Recommended Best Practices: Integrated Tool Suites

Business GlossaryDataStage

Parallel ProcessingRich Connectivity to Applications, Data, and

Content

Enterprise Data Dictionary

Extract, Transform, and Load in Batch or

Real-time

Information Services Director

Metadata Server / Metadata Workbench / FastTrack

Publish SOA services for informationintegration and access

Information Analyzer

Data Source Profiling & Problem Diagnosis

Manage and track consistent metadata across information integration tasks and automate generation of data flow

logic

Federation ServerVirtualize access to

disparate information

CDC & ReplicationDeliver and replicate

changed data

QualityStage

Global Name Recognition

Recognize & ClassifyMulti-cultural names

Data Quality: Standardize,Correct & Match Data

Page 51: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Summary

1. A number of large enterprises have successfully integrated their enterprise systems resulting in business results that drove revenue and lowered costs

2. These enterprises accomplished this through a set of technologies collectively known as Enterprise Data Integration

3. There are 10 proven strategies for success in an enterprise data integration initiative; although no single path is THE panacea to all corporate data problems - multiple approaches must be employed

Page 52: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

Test Data Generation

Application Consolidation

Data De-identification

Data Quality

Data Integration

Data Archival

Master Data Management

Data Warehousing

Convert Data into Trusted Information

InfoSphere Information Server

Page 53: An Effective Data Integration: Strategy to Drive Innovation on the InfoSphere Platform Simon Tang InfoSphere Technical Manager IBM GCG.

53

Your Choice…

Integrated Platform

++ ++ ++ ++ ++ ++

Point Products

++ ++ ++ ++ ++? ?++Models Cleansing ETL MDM Warehouse BI Mashups


Recommended