+ All Categories
Transcript
Page 1: a Power Exchange Oracle CDC

Informatica PowerExchange with Oracle CDCRaghavendra Parasharam Venkata

Page 2: a Power Exchange Oracle CDC

White PaperRaghavendra Parasharam Venkata

Informatica PowerExchange with Oracle CDC

Page 3: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential3

Informatica PowerExchange with Oracle CDC

This is a controlled document. Unauthorized access, copying, replication and usage for a purpose other than for which this is intended are prohibited

All trademarks that appear in the document have been used for identifi cation purposes only and belong to their respective companies.

Confi dentiality Statement

Page 4: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential4

Informatica PowerExchange with Oracle CDC

AbstractThis paper explains the concept of PowerExchange, Oracle CDC and the implementation of PowerExchange with CDC for PowerCenter. And, describes the installation of PowerExchange plug-in, testing the rows through PowerExchange Navigator.

This paper also discusses about the benefi ts and Challenges of PowerExchange with CDC.

DomainPowerExchange provides the extract and applies functionality required to support data warehouse and migration initiatives. PowerExchange has become the standard in this area, demonstrating excellent performance, scalability, productivity and ease-of-use. Informatica PowerExchange, based on a services-oriented architecture (SOA), provides on-demand access to data in all critical enterprise data systems, including mainframe, midrange, and fi le-based systems.

Organization can respond to business events at the moment they occur with the option of CDC, rather than to wait for hours or days. Moreover, extend data integration to satisfy today’s need for timely, accurate data. CDC PowerExchange deals with sensitive customer data from all domains.

Page 5: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential5

Informatica PowerExchange with Oracle CDC

Contents

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6OVERVIEW ON INFORMATICA POWEREXCHANGE . . . . . . . . . . . . . . . . . . . . . . . . . . . .6OVERVIEW ON POWEREXCHANGE WITH CHANGE DATA CAPTURE . . . . . . . . . . . . .7

1. When it is required? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72. When it is recommended? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73. Difference between with and without CDC: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84. Example of a Change Data Capture System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95. Components and Terminology for Synchronous Change Data Capture . . . . . . . . . . . . . . . . . . . . . . . . 10

1. POWEREXCHANGE INSTALLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121.1 CHECK LIST FOR INSTALLATION: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121.2 INSTALLATION STEPS: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .131.3 AUTHENTICATION OF INSTALLATION: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

2. CREATING CDC MAPPINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

3. POWEREXCHANGE NAVIGATOR - TO VIEW THE CHANGES . . . . . . . . . . . . . . . . . . .18

4. POWEREXCHANGE LISTENER - TO START/STOP THE POWEREXCHANGE . . . . . .23

5. POWEREXCHANGE WITH CDC - CHALLENGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . .245.1 PERFORMANCE CHALLENGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .245.2 IMPLEMENTATION CHALLENGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24

6. POWEREXCHANGE WITH CDC - BENEFITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26

ADDENDUM A: PERFORMANCE TIPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27

ADDENDUM B: CHANGE DATA CAPTURE VIEWS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28

CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

Page 6: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential6

Informatica PowerExchange with Oracle CDC

IntroductionOverview on Informatica PowerExchangeInformatica PowerExchange, based on a services-oriented architecture (SOA), provides on-demand access to data in all critical enterprise data systems, including mainframe, midrange, and fi le-based systems. Tightly integrated with Informatica PowerCenter, PowerExchange helps organizations leverage mission-critical operational data by making it available to people and processes without requiring manual coding of data extraction programs. Its SQL access to native database APIs provides high-performance extraction, conversion, and fi ltering of data without intermediary staging and programming. Shared services offer data delivery options that enable IT organizations to fl exibly and effi ciently manage processing demands.

Organizations today demand immediate access to accurate information for quick decision making and high-speed operations. At the same time, the volume and variety of data is exploding, stretching the capacity of IT resources and infrastructures. To leverage the full value of their information, organizations must be able to integrate data from a wide variety of transactional applications and systems for easy access and “right time” delivery.

PowerExchange provides on-demand access to immediate, accurate, and understandable data. With Informatica PowerExchange you can:

Access and deliver data in “right time” �

Extend existing IT investments �

Unlock complex systems without coding �

Access data on demand:PowerExchange offers several options for capturing data and making it available to a range of targets. It can capture data from relational and non-relational data sources either as whole data sets or as incremental changed data, in real-time or in a scheduled batch process. PowerExchange enables organizations to schedule data delivery to multiple targets weekly, daily, hourly—even at the sub-second. Informatica PowerExchange provides a single architecture that allows a seamless transition from batch and bulk, to batch and changed data capture, to real-time changed data capture delivery. PowerExchange also eliminates the need for multiple-step processes with extraction, fi le transfer, and load scripts for batch data processing.

Extend existing IT investments:Based on an extensible, service-oriented architecture, PowerExchange supports a variety of platforms. As a business decides to make more of its data sources available to other applications across the enterprise, it can add platforms easily. SQL access to native database APIs delivers high performance: PowerExchange extracts, converts, fi lters, and makes data available to target systems without intermediary staging and program coding.

Page 7: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential7

Informatica PowerExchange with Oracle CDC

Unlock complex systems without coding:Unlocking the value of legacy systems once meant costly new system development and data migration, or complex hand coding to access critical data. Informatica PowerExchange signifi cantly reduces the time and resources required to leverage existing investments, streamlining access to legacy systems and delivering data to a range of business applications. It masks the complexity of source systems from developers and offers an intuitive GUI with SQL-like access, eliminating the need for lengthy training and implementation.

Overview on PowerExchange with Change Data CaptureOrganization can respond to business events at the moment they occur with the option of CDC, rather than to wait for hours or days. Moreover, extend data integration to satisfy today’s need for timely, accurate data.

The PowerExchange CDC option recognizes business events, such as customer creation or order shipment data, by capturing the database inserts, updates, and deletes these events as soon as they occur. This captured stream of database activity can be delivered to multiple targets in real time, without intermediate queues or staging tables.

1. When it is Required?PowerExchange Change Data Capture (CDC) Option is essential whenever you need timelier access to data.

Data extraction is an integral part of all data warehousing. Data is often extracted on a nightly basis from the transactional systems and transported to the data warehouse. Typically, all the data in the data warehouse is refreshed with data extracted from the source system. However, this involves the extraction and transportation of huge volumes of data and is very expensive in terms of both resource and time.

Since the data extraction takes place daily, it would be much more effi cient to extract and transport the data that has changed since the last extraction only. However, in most source systems, it is extremely diffi cult, if not impossible, to identify and extract only the recently changed data.

2. When it is Recommended?Beyond the challenge of identifying the recently changed data, many extraction, transformation, and loading (ETL) environments involve one source system feeding data into multiple target systems. It is also a challenge to sync up ‘change data’ from one source to many targets

Database extraction from INSERT, UPDATE, and DELETE operations occurs �

immediately, at the same time the changes occur to the source tablesCapture updates from relational and pre-relational databases on mainframe, �

midrange, and commodity systemsDevelop and deploy a real-time data integration service with Informatica PowerCenter �

Page 8: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential8

Informatica PowerExchange with Oracle CDC

3. Difference Between with and without CDC:Below table describes the differences between with and without Change Data Capture

Database Extraction with Change Data Capture without Change Data Capture

Extraction

Database extraction from INSERT, UPDATE, and DELETE operations occurs immediately, at the same time the changes occur to the source tables.

Database extraction is marginal at best for INSERT operations, and problematic for UPDATE and DELETE operations, because the data is no longer in the table.

Staging Stages data directly to relational tables; there is no need to use fl at fi les.

The entire contents of tables are moved into fl at fi les.

Interface

Provides an easy-to-use publish and subscribe interface using DBMS_LOGMNR_CDC_PUBLISH and DBMS_LOGMNR_CDC_SUBSCRIBE packages.

Error prone and manpower intensive to administer.

Cost

Supplied with the Oracle9i (and later) database server. Reduces overhead cost by simplifying the extraction of change data.

Expensive because you must write and maintain the capture software yourself, or purchase it from a third-party vendors.

A Change Data Capture system is based on the interaction of a publisher and subscribers to capture and distribute change data.

Publish and Subscribe ModelMost Change Data Capture systems have one publisher that captures and publishes change data for any number of Oracle source tables. There can be multiple subscribers accessing the change data. Change Data Capture provides PL/SQL packages to accomplish the publish and subscribe tasks.

Publisher – The publisher is usually a database administrator (DBA) who is in charge of creating and maintaining schema objects that make up the Change Data Capture system. The publisher performs these tasks:

Determines the relational tables (called source tables) from which the data �

warehouse application is interested in capturing change data. Uses the Oracle supplied package, DBMS_LOGMNR_CDC_PUBLISH, to set up the �

system to capture data from one or more source tables. Publishes the change data in the form of change tables. �

Allows controlled access to subscribers by using the SQL GRANT and REVOKE �

statements to grant and revoke the SELECT privilege on change tables for users and roles.

Page 9: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential9

Informatica PowerExchange with Oracle CDC

Subscribers – The subscribers, usually applications, are consumers of the published change data. Subscribers subscribe to one or more sets of columns in source tables. Subscribers perform the following tasks:

Use the Oracle supplied package, DBMS_LOGMNR_CDC_SUBSCRIBE, to �

subscribe to source tables for controlled access to the published change data for analysis. Extend the subscription window and create a new subscriber view when the �

subscriber is ready to receive a set of change data. Use SELECT statements to retrieve change data from the subscriber views. �

Drop the subscriber view and purge the subscription window when fi nished �

processing a block of changes. Drop the subscription when the subscriber no longer needs its change data. �

4. Example of a Change Data Capture SystemThe Change Data Capture system captures the effects of DML statements, including INSERT, DELETE, and UPDATE, when they are performed on the source table. As these operations are performed, the change data is captured and published to corresponding change tables.

To capture change data, the publisher creates and administers change tables, which are special database tables that capture change data from a source table.

For example, for each source table for which you want to capture data, the publisher creates a corresponding change table. Change Data Capture ensures that none of the updates are missed or duplicated.

Each subscriber has its own view of the change data. This makes it possible for multiple subscribers to simultaneously subscribe to the same change table without interfering with one another. The following diagram shows publish and subscribe model in a Change Data Capture system.

For example, assume that the change tables in contains all of the changes that occurred between Monday and Friday, and also assume that:

Subscriber 1 is viewing and processing data from Tuesday. �

Subscriber 2 is viewing and processing data from Wednesday to Thursday. �

Page 10: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential10

Informatica PowerExchange with Oracle CDC

Subscribers 1 and 2 each have a unique subscription window that contains a block of transactions. Change Data Capture manages the subscription window for each subscriber by creating a subscriber view that returns a range of transactions of interest to that subscriber. The subscriber accesses the change data by performing SELECT statements on the subscriber view that was generated by Change Data Capture.

When a subscriber needs to read additional change data, the subscriber makes procedure calls to extend the window and to create a new subscriber view. Each subscriber can walk through the data at its own pace, while Change Data Capture manages the data storage. As each subscriber fi nishes processing the data in its subscription window, it calls procedures to drop the subscriber view and purge the contents of the subscription window. Extending and purging windows is necessary to prevent the change table from growing indefi nitely, and to prevent the subscriber from seeing the same data again.

Thus, Change Data Capture provides the following benefi ts for subscribers:

Guarantees that each subscriber sees all of the changes, does not miss any �

changes, and does not see the same change data more than once. Keeps track of multiple subscribers and gives each subscriber shared access to �

change data. Handles all of the storage management, automatically removing data from change �

tables when it is no longer required by any of the subscribers.

5. Components and Terminology for Synchronous Change Data CaptureThe Change Data Capture components are as shown in below diagram. The publisher is responsible for all of the components, except for the subscriber views. The publisher creates and maintains all of the schema objects that make up the Change Data Capture system, and publishes change data so that subscribers can use it.

Subscribers are the consumers of change data and are granted controlled access to the change data by the publisher. Subscribers subscribe to one or more columns in source tables.

With synchronous data capture, the change data is generated as data manipulation language (DML) operations are made to the source table. Every time a DML operation occurs on a source table, a record of that operation is written to the change table.

Page 11: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential11

Informatica PowerExchange with Oracle CDC

Components in a Synchronous Change Data Capture System

Change Data Capture components in more detail

Source System: A source system is a production database that contains source tables for which Change Data Capture will capture changes.

Source Table: A source table is a database table that resides on the source system that contains the data you want to capture. Changes made to the source table are immediately refl ected in the change table.

Change Source: A change source represents a source system. There is a system-generated change source named SYNC_SOURCE.

Change Set: A change set represents the collection of change tables. There is a system-generated change set named SYNC_SET.

Change Table: A change table contains the change data resulting from DML statements made to a single source table. A change table consists of two things: the change data itself, which is stored in a database table, and the system metadata necessary to maintain the change table. A given change table can capture changes from only one source table. In addition to published columns, the change table contains control columns that are managed by Change Data Capture. See “Columns in a Change Table” for more information.

Publication: A publication provides a way for publishers to publish multiple change tables on the same source table, and control subscriber access to the published change data. For example, Publication A consists of a change table that contains all the columns from the EMPLOYEE source table, while Publication B contains all the columns except the salary column from the EMPLOYEE source table. Because each change table is a separate publication, the publisher can implement security on the salary column by allowing only selected subscribers to access Publication A.

Page 12: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential12

Informatica PowerExchange with Oracle CDC

Subscriber View: A subscriber view is a view created by Change Data Capture that returns all of the rows in the subscription window. In Figure 15-2, the subscribers have created two views: one on columns 7 and 8 of Source Table 3 and one on columns 4, 6, and 8 of Source Table 4 The columns included in the view are based on the actual columns that the subscribers subscribed to in the source table.

Subscription Window: A subscription window defi nes the time range of change rows that the subscriber can currently see. The oldest row in the window is the low watermark; the newest row in the window is the high watermark. Each subscriber has a subscription window.

1. PowerExchange InstallationThe PowerExchange and the PowerExchange Client for Powercenter, these two different products need to be installed in separate directories in order for this to work.

If you installed the PowerCenter Integration Service on a 32-bit platform, you must install the 32-bit version of PowerExchange and if you have installed the Integration Service on a 64-bit platform, you must install the 64-bit version of PowerExchange on the Integration Service platform.

1.1 Check List for Installation: 1. PowerExchange is installed in a stand alone directory usually in $PWX_HOME. So,

Set the environment variables $PM_HOME, $PWX_HOME 2. Now PowerExchange Client for Powercenter is installed in 3 different areas a. The fi rst area where the PowerExchange Client for Powercenter is installed in is the

same directory where the pmserver is already installed, usually $PM_HOME. b. The second place where the PowerExchange Client for Powercenter is installed in

is the repository server plug-in dir. c. The third place where the PowerExchange Client for Powercenter is installed in is

on the windows box, where your designer resides. 3. Open the ‘.bash_profi le’ or ‘.profi le’ in $HOME a. Make sure that both PM_HOME and PWX_HOME are present in ‘path’ statement

and make sure that PM_HOME is before the PWX_HOME. b. Make sure that both PM_HOME and PWX_HOME are present in ‘ld_library_path’

statement and make sure that PM_HOME is before the PWX_HOME. 4. Make sure the following environment variables are confi gured properly. a. ODBCHOME, b. ORACLE_HOME, c. ODBCINI, d. PATH, e. LD_LIBRARY_PATH.

Page 13: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential13

Informatica PowerExchange with Oracle CDC

1.2 Installation Steps:No special installations steps are required for PowerExchange Client for PowerCenter (PWXPC) on the PowerCenter Client machine. PWXPC is a native plug-in and is installed automatically when the PowerCenter Client is installed. You must still confi gure PowerExchange confi guration fi les on the Integration Service node.

To confi gure PowerExchange Client for PowerCenter for use on the PowerCenter Integration Service and Client, you must add NODE statements in the PowerExchange dbmover.cfg fi le on the PowerCenter Client and Integration Service machines for those PowerExchange

Listeners to which you wish to connect. Setup is as followsCreate a directory to place the PowerExchange fi les and untar the PowerExchange and a

hotfi x04 for PowerExchange. The fi les are

pwxrhas_v522_p04_hf04.tar �

pwxrhas_v522_p04.tar �

Command to untar the fi les

Page 14: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential14

Informatica PowerExchange with Oracle CDC

List of fi les in the directory after untar these two fi les

Update the license key, which might have been provided by Informatica vendor and keys are server specifi c.

Now register the plug-in through Repository server admin console

Page 15: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential15

Informatica PowerExchange with Oracle CDC

1.3 Authentication of Installation:Verify that the above both options are installed properly by doing the following

a. PowerExchange Client for PowerCenter 7.1.3 – Repository Plugin: 1. Open up the Repository Server Administration Console 2. Connect to your repository 3. Double click on the repositories folder 4. Click on the plus sign next to the repository from the left navigation bar click

on registered packages 5. Click on the plus sign next to the registered packages 6. Click on the registered packages right below that and on the right hand side

should see a lot of different plug-ins

b. PowerExchange Client for PowerCenter 7.1.3 – Server:On the Linux / UNIX box log in as the Informatica user then go to the $PM_HOME dir and check for these fi les by using the commands

1. ls *cdc* 2. ls *vsam* 3. ls *db2* 4. ldd libpmdb2390.so

Page 16: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential16

Informatica PowerExchange with Oracle CDC

c. Make sure that dbmover.cfg has been created properly by checking following variables

LISTENER, ●NODE, ●CAPT_XTRA, ●CAPT_PATH, ●LOGPATH, ●ORACLEID. ●

d. Go to the directory where the ‘repserver’ is installed and look in the plug-in dir for the pwxplugin.xml fi le

Page 17: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential17

Informatica PowerExchange with Oracle CDC

2. Creating CDC MappingsOnce source or target defi nition has been created, they can include it in a mappings to extract data from the source or load data to the target. Source data can be extracted in batch, change, or real-time mode. We can use one source defi nition and one mapping for all modes. With PowerExchange Client for PowerCenter, Source defi nitions can be read from the following relational databases:

DB2/390 �

DB2/400 �

DB2/UDB �

Microsoft SQL Server (MSSQL) �

Oracle �

An also from the following non-relational sources:

ADABAS �

DATACOM �

IDMS �

IMS �

Sequential fi les (SEQ) �

VSAM �

However, this paper is limited to Oracle, relational database.

2.1 CDC MappingOnce source or target defi nition imported from oracle database, create a mapping that has all tables in it to minimize the reads on the archive log. Any changes that need to be done on the worktables should be done in this mapping. Similarly, corresponding workfl ow has all worktables to minimize the reads on the archive log. Any changes that need to be done on the worktables should be done in this workfl ow. This CDC workfl ow should run forever unless the encounter an error

Checklist to Start the CDC Process:Make sure archive logging is turned on. �

Make sure the Alter Table scripts to capture supplemental logging on the tables has �

been run against the source database, this only has to be done once.Make sure there is a catalog copy sent to the archive log (see below) �

Make sure the Detail Listener is running on Linux /Unix server �

Make sure the PMServer is running on Linux /Unix server �

Remove token fi les if necessary. �

Page 18: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential18

Informatica PowerExchange with Oracle CDC

Steps to Start / Stop CDCTo start, start the work fl ows from the WorkFlow manager �

To stop them right click on the session and select stop. �

To Restart, Restart the Session in RECOVERY mode. This cleans up any rows in the �

buffer and updates the token fi le

*Catalog copy to archive log command:

begin

sys.dbms_logmnr_d.build(options => sys.dbms_logmnr_d.store_in_redo_logs);

end;

3. PowerExchange Navigator – To view the ChangesThe fi rst thing that we need to do after installation is to make sure that the PowerExchange piece is working and the only way to do this is by doing a row test through the ‘PowerExchange Navigator’ - view your changes. Once we can confi rm that the row test works, we can assume that PowerExchange has been installed properly.

Instructions for creating/changing Extraction mappings in PowerExchange for CDC:

3.1 Log on to PowerExchange Navigator 1. If the Registration Group does NOT already exist, a. Right-Click on the Registration Groups folder, b. Select Add Registration Group (see screen shot example below). c. Fill out the form as below, making sure to UNSELECT the Add Registration box. d. This should automatically create the extraction group, but if not, proceed to the

next step. 2. Similarly, if the Extraction Group does NOT already exist, a. Right-Click on the Extraction Groups folder, b. select Add Extraction Group c. Fill out the form similar to the one below, making sure to UNSELECT the Add

Extraction Defi nition box.

Page 19: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential19

Informatica PowerExchange with Oracle CDC

Note: Name can really be whatever you want (like CDCSIMSR)

3.2 Re-Create Applicable Extract Defi nitionsGo into the Extraction Groups. Double-click the Registration group you created. It will now list all the extraction maps that we have created for CDC

The process to change an existing extraction map is to Delete the current one and rebuild it. This assumes all the fi elds are in the SIMSR/IIMSR/PIMSR database. These names are special, you only have eight characters to work with so I got a little creative.

Page 20: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential20

Informatica PowerExchange with Oracle CDC

Eg: loppdoc is Loan_Post_Pch_Doc. But, it is important to keep the names the same since they get referenced in the Workfl ow as the Data Extraction mapping (see Row Testing). The basic steps are:

1. First make a backup from /u01/opt/powerexchange/v522_p04/capture/camps a. Create a new backup directory on the server (e.g. with date in folder name) b. Copy (cp) *.dmx to the new backup directory (even though you will not be replacing

ALL of them). Use tool like PuTTY. 2. Note the 8-character Names of the groups to be deleted 3. Delete the group from two different places. NOTE: Delete ONLY those groups affected

by this migration: a. Delete the Extract Defi nition from within the Extraction Groups. b. Delete the Capture Registration from the Registration Groups (Resources tab). 4. Add back the Capture Registration : a. Add Capture Registration from the Registration Group (right-click the Name, e.g.

CDCSIMR). Schema will be “IMSR”.

5. Double click the Table name and press the button “Select All Columns”, Then hit Next

Page 21: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential21

Informatica PowerExchange with Oracle CDC

6. The next screen actually creates the supplemental logging SQL which needs to be run in the database. Change the Status to ACTIVE, click fi nish and save the supplemental logging script to a fi le (if not done previously).

Now you are done. This process has created a data extraction mapping on the [smspai142L] server as well named, for example:/u01/opt/powerexchange/v522/capture/camps/d8oracoll.commit.dmx

Page 22: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential22

Informatica PowerExchange with Oracle CDC

3.3. Row TestingOne way to test your extraction mapping is by doing row test. This is done from the Extraction group.

If you just fi nished re-creating extraction groups, you must fi rst:

1. Stop and Start the Detail Listener 2. Create Catalog Copy

Double click the Resource, and double click the Extract Defi nition Click on the Row Test button.

This is what follows: Change the DB Type to CAPXRT (real time) and select GO. The name generated in the FROM clause “d8oracoll.poolmstr_POOL_MASTER” is the Data Extraction mapping name that goes into the work fl ow of the session.

If successful, you should see the list of fi elds displayed, possibly with some data.

Repeat the Row Test for EACH of the tables that have been modifi ed.

Page 23: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential23

Informatica PowerExchange with Oracle CDC

The CDC feature was introduced in Oracle9i Database. CDC helps identify the data in the source system that has changed since the last extraction. With CDC, data extraction takes place at the same time the INSERT, UPDATE, or DELETE operations occur in the source tables, and the change data is stored inside the database in change tables. The change data, thus captured, is then made available to the target systems in a controlled manner, using database views.

*Considering these step are based on the Linux or Unix

4. PowerExchange Listener – To Start/Stop the PowerExchange

PowerExchange Listener is to start or stop the PowerExchange, below are the steps to follow so.

4.1 To Start the ListenerFollow the below steps to start the listener in Linux / UNIX server

1. Go to the PowerExchange directory $PWX_HOME $cd /u01/opt/powerexchange/v522

2. Issue the below command $nohup dtllst node1&

4.2 To Stop the ListenerFollow the below steps to start the listener in Linux / UNIX server

1. Find the dtllst node1 session id and kill it $ps –aef | grep dtllst

2. Issue the below command $kill <session id from the output of above command>

Helpful logfi les: Go to the PowerExchange directory $PWX_HOME

detail.log � (This fi le is extremely helpful when trying to debug a failed CDC)nohup.out �

Page 24: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential24

Informatica PowerExchange with Oracle CDC

5. PowerExchange with CDC – Challenges5.1 Performance ChallengesPowerExchange runs as a standard application within an address space, and processes native database requests as any application would. In addition to reading data using native calls, PowerExchange handles the following types of functionality during the process of moving data from source to target - and this is a no coding solution: EBCDIC/ASCII conversion, data fi ltering, data type conversions (i.e. packed to unpacked), data type validity checking, and moving the data over TCP/IP onto a target environment.

Major factors that infl uence the amount of CPU that will be consumed during these processes:

Filtering (leveraging available indexes) performed on the mainframe, to ensure that �

only desired records are selected for extractAmount of packed data that needs to be unpacked �

Length of records �

The bigger the record, the more conversion will occur ●For non-relational sources, selecting only a subset of the record’s fi elds will reduce ●conversion time

The priority of the PowerExchange Listener job that is used to access data �

Adjusting the priority doesn’t reduce CPU consumption but it does ensure that the ●Listener doesn’t slow down other higher priority (critical) tasks

Transporting the data from source to target across the network �

The network bandwidth size will infl uence the time needed to transport from ●source to target

All of these factors will have an affect on CPU consumption, and these factors will vary according to environments. PowerExchange has several tuning parameters (i.e. APPBUFSIZE, COMPRESS, OFFLOAD PROCESSING for VSAM/Seq fi les, TCP/IP packet sizing) that can be used to enhance performance and reduce CPU consumption. PowerExchange is a TCP socket application that was built with performance optimization as a top deliverable.

5.2 Implementation ChallengesBelow are typical errors, encounters during CDC implementation and resolution provided for each of them

Error # 1: [Informatica][SCLI PWX Driver] PWX-10704 CAPI: ERROR: Subordinate CAPI_Read returned 8[Informatica][SCLI PWX Driver] PWX-10970 Oracle Capture: Unformatted log data found at SCN <05729994267265> sequence <2>.[Informatica][SCLI PWX Driver] update “UNKNOWN”.”OBJ# 1151780” set

Page 25: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential25

Informatica PowerExchange with Oracle CDC

“COL 13” = HEXTORAW(‘4e’) where “COL 13” IS NULL and ROWID = ‘AAEZMkAAMAAAAIUAAA’;

Solution: To resolve this issue add the BYPASSUF=Y parameter to the CAPI_CONNECTION statement in the PowerExchange dbmover.cfg fi le

Error # 2:060713 095657 LNX 22786 PWX-10824 Oracle Capture: Error occurred, RC <-1> whilst executing the following statement. Oracle messages follow.060713 095657 LNX 22786 SELECT TO_CHAR(MAX(NEXT_TIME)) AS NEXT_TIME FROM V$ARCHIVED_LOG WHERE DICTIONARY_BEGIN = ‘YES’ AND (THREAD# = (SELECT VALUE FROM V$PARAMETER WHERE NAME = ‘instance_number’) OR THREAD# = (SELECT VALUE FROM V$PARAMETER WHERE NAME = ‘instance_number’ AND VALU060713 095657 LNX 22786 E = ‘0’) + 1)060713 095657 LNX 22786 PWX-10809 Oracle Error Code <235>: Error message follows.060713 095657 LNX 22786 ORA-00235: controlfi le fi xed table inconsistent due to concurrent update

Solution: To resolve this issue stop the listener, remove the token fi les and re-started the PowerExchange Listener

Error # 3:ORA-04031: unable to allocate 4160 bytes of shared memory (“shared pool”,”SELECT CM_LOAN_MASTER_WK_S_0...”,”sga heap(1,0)”,”kglsim heap”)

Solution: Increase the memory allocation for the Oracle instance.Review the Oracle instance’s confi guration (init.ora fi le) and the alert.log for further information on this error. Refer to the Oracle documentation for further help on troubleshooting this error.One solution is Decrease the DTM Buffer size and restart it again

Error # 4:[Informatica][SCLI PWX Driver] PWX-01251 DBNTC Failed to CONNECT comms to location “as400_2”, rcs 1219/-1/1203.[Informatica][SCLI PWX Driver] PWX-01203 DTLNET Request time out (see CFG timeout values)

Solution: To resolve this make sure that the port number specifi ed in theDBMOVER.CFG fi le on the client machine is the port number where the PowerExchange Listener is running on.

Error # 5:[Informatica][SCLI PWX Driver] PWX-07038 Group Fetch read connection failed: rc=271 rc1=? rc2=?.[Informatica][SCLI PWX Driver] PWX-00271 DBAPI Error. DB_READ failed for fi le <Capture Extract Realtime>.[Informatica][SCLI PWX Driver] PWX-01266 DBNTC Receive READ header for fi le <Capture Extract Realtime> failed, rcs 0/2011/9951.[Informatica][SCLI PWX Driver] PWX-02011 SQL fetch error. SQLCODE = 9951.[Informatica][SCLI PWX Driver] PWX-04566 Capture Extract RC=9951 from CREAD_Read[Informatica][SCLI PWX Driver] PWX-09951 CAPI i/f: RC=8 from CAPI_Read[Informatica][SCLI PWX Driver] PWX-10704 CAPI: ERROR: Subordinate CAPI_Read returned 8

Page 26: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential26

Informatica PowerExchange with Oracle CDC

[Informatica][SCLI PWX Driver] PWX-10824 Oracle Capture: Error occurred, RC <-1> whilst executing the following statement. Oracle messages follow. ].

Solution: This error occur when hlq.LOAD is not concatenated in the steplib of the listener job. hlq.LOAD must be concatenated in the Listener STEPLIB in order to do row tests with changed data (CAPXRT) as this is done via the Listener. To resolve this recycle the listener with hlq.LOAD concatenated in steplib.

Error # 6PWX-00277 DBAPI Error. DB_OPENCONNECTION failed for location MVS.PWX-01289 DBNTC Connection target is “10.7.14.35” “10.7.14.35” Port < 8576>PWX-01251 DBNTC Failed to CONNECT comms to location “MVS”, rcs 1217/5/10061.PWX-00652 TCP/IP CONNECT Error, rc=10061, reason <Connection refused.>

It could also be that the Listener is of a different version to the Navigator that you are using on the client. For example, a PowerExchange version 5.1 Navigator client trying to establish connection with a PowerExchange version 5.2 Listener on the server.PWX-00277 DBAPI Error. DB_OPENCONNECTION failed for location MVS.PWX-01251 DBNTC Failed to CONNECT comms to location “MVS”, rcs 120/0/0.PWX-00120 Version/Release mismatch, local=5.0, remote=5.2

Solution: Determine the cause by reviewing the accompanying messages and take the corrective action.For example, Start the Listener, Check the Listener port number in the confi guration fi le, and/or Ensure that the versions of Server/Client are the same (including any installed patch updates).

6. PowerExchange with CDC – Benefi tsBelow are the benefi ts by using the PowerExchange with CDC

Access and deliver the data in “right time” �

Improve system uptime by eliminating the need for batch update windows �

Free development resources to tackle new initiatives �

Unlock complex systems without coding �

Accelerate data integration initiatives with fl exible access to source data �

Speed development and deployment with an easy-to-use graphical interface �

Reduce errors with point-and-click development methods �

Support rapid impact assessment and deployment as changes arise, powered by �

automated metadata captureEnable business events, not technology, to drive your business processes by allowing �

virtually any application to create business events without modifi cation

Page 27: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential27

Informatica PowerExchange with Oracle CDC

Real-time Benefi ts:In real time, Batch time has been reduced by four hours after implementation of �

PowerExchange with CDC. It is helps in meeting the SLAs well in time. PowerExchange with CDC helped in reduction of on-going development activities, �

maintenance costs and database tables, which include more than 50 mappings and 35 database tables.Signifi cant reduction on development cost. �

Literally, users used to wait until next day to access the data, now they are able to �

access the data well in time after implementation of PowerExchange with CDC.

Addendum A: Performance Tips6.1 Reduce the Amount of Data Being MovedA signifi cant potential benefi t that one can get is to reduce the data being moved over the wire. For example, using FTP one has to move the complete fi le from OS/390 to PowerCenter. With PowerExchange, one can move “just what is needed” and that alone can dramatically reduce network traffi c and data movement. This can be done in 2 places, either in the Navigator or when setting up the PowerCenter mappings. In the Navigator, defi ne a table with just the columns needed or in PowerCenter only select the columns needed for the session.

If the Navigator is not involved (say DB2) then this can only be done in PowerCenter, by specifying only the needed columns only.

All the extract logic is done on the source system, so only the reduced data will be moved over the wire and put through EBCDIC to ASCII conversions.

6.2 Using CompressionIt is usually assumed that setting compression on will always improve performance. This is not always true and hence one should check this carefully, as on lightly loaded CPUs and networks, better performance will be obtained by not using compression.

In the DBMOVER.cfg fi le, specify COMPRESS=Y. Then, test the data movement and transfer rates. Now rerun an identical test with COMPRESS=N and compare results. If the results are quite similar, it is probably better to set COMPRESS=Y, as the reduction in data traffi c will probably be better for other applications.

Also remember that any timing test should ideally be done at the time of the day that the real production run is going to occur. Daytime machine performance (real-time, interactive) is often substantially different from the night time (loads of batch jobs).

Page 28: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential28

Informatica PowerExchange with Oracle CDC

6.3 Minimizing Passes through File and DatabasesEven though PowerExchange delivers good performance, when there is a multi-record fi le VSAM or fl at fi le, it often is easiest to set things up to select say all the record types 01 in on pass, then the record type 15s in another pass, record type 23s in another one etc. This will produce 3 passes of the fi le, which if it is very large will take some time. If it is a tape, the three cannot be run concurrently due to tape contention, so they would have to be serialized. An analysis of the fi le might reveal that all the data can be obtained in a single pass and then PowerCenter can then split this single source into the multiple desired outputs in a transformation. Sometimes this is not possible, but it is always worth getting a detailed understanding of the relationships between the different record types to see whether this is possible.

The same concept can be very signifi cant for IMS. IMS databases are designed mainly for random access, so the database is not optimized for sequential reading. So, any reduction in passes to extract data will be very benefi cial.

6.4 Adjust the Application Buffer SizeIn the DBMOVER.cfg fi le, try increasing the APPBUFSIZE.

Addendum B: Change Data Capture ViewsInformation about the Change Data Capture environment is provided in the views

1. CHANGE_SOURCES – Allows a publisher to see existing change sources 2. CHANGE_SETS – Allow a publisher to see existing change sets 3. CHANGE_TABLES – Allows a publisher to see existing change tables 4. ALL_SOURCE_TABLES – Allows subscribers to see all of the published source

tables for which the subscribers have privileges to subscribe 5. DBA_SOURCE_TABLES – Allows a publisher to see all of the existing (published)

source tables 6. USER_SOURCE_TABLES – Allows the user to see all of the published source tables

for which this user has privileges to subscribe 7. ALL_SOURCE_TAB_COLUMNS – Allows subscribers to see all of the source table

columns that have been published, as well as the schema name and table name of the source table

8. DBA_SOURCE_TAB_COLUMNS – Allows subscribers to see all of the source table columns that have been published, as well as the schema name and table name of the source table

9. USER_SOURCE_TAB_COLUMNS – Allows users to see all of the source table columns that have been published, as well as the schema name and table name of the source table

Page 29: a Power Exchange Oracle CDC

Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

TCS Confi dential29

Informatica PowerExchange with Oracle CDC

10 ALL_PUBLISHED_COLUMNS – Allows a subscriber to see all of the published source table columns for which the subscriber has privileges

11. DBA_PUBLISHED_COLUMNS – Allows a subscriber to see all of the published source table columns for which the subscriber has privileges

12. USER_PUBLISHED_COLUMNS – Allows a user to see all of the published source table columns for which the user has privileges

13. ALL_SUBSCRIPTIONS – Allows a user to see all current subscriptions 14. DBA_SUBSCRIPTIONS – Allows a publisher to see all of the subscriptions 15. USER_SUBSCRIPTIONS – Allows a subscriber to see all of their current

subscriptions 16. ALL_SUBSCRIBED_TABLES – Allows a user to see all of the published tables for

which there are subscribers 17. DBA_SUBSCRIBED_TABLES – Allows a publisher to see all of the published tables

to which subscribers have subscribed 18. USER_SUBSCRIBED_TABLES – Allows a subscriber to see all of the published

tables to which the subscriber has subscribed 19. ALL_SUBSCRIBED_COLUMNS – Allows a user to see all of the published columns

for which there are subscribers 20. DBA_SUBSCRIBED_COLUMNS – Allows a publisher to see all of the columns of

published tables to which subscribers have subscribed 21. USER_SUBSCRIBED_COLUMNS – Allows a publisher to see all of the columns of

published tables to which the subscriber has subscribed

ConclusionThe PowerExchange Change Data Capture (CDC) Option can satisfy our business requirements for up-to-the-minute data.

PowerExchange Change Data Capture (CDC) Option is available for all popular enterprise database systems, this Option is essential whenever you need timelier access to data.

When combined with real-time data integration platform, event-driven data can be accessed, transformed, and cleansed continuously and used to drive business results in any enterprise, large or small.

AcknowledgementsI would like to express my gratitude to all those who gave me the possibility to complete this paper. I want to thank Mr. David Chaise and Mr. Kishor Desai for all their help, support and valuable hints.

Referenceshttp://www.oracle.comhttp://www.informatica.com/

Page 30: a Power Exchange Oracle CDC

Top Related