© 2015 IBM Corporation
IBM InfoSphere Data Replication’s 11.3.3.1Change Data Capture (CDC) Enhancements
2 © 2015 IBM Corporation
© IBM Corporation 2015. All Rights Reserved.
Disclaimer: Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
3 © 2015 IBM Corporation
Overview
� The following is available in the IIDR 11.3.3.1 Release– CDC Oracle Window Redo-log– Recovery mechanism after Oracle failover– Oracle Exadata ASM & Flex ASM Support– WebHDFS– Cloudant Apply– Extension to DataStage Custom Flat File Formatter– CDC i Refresh While active support– MS SQL Server support of online index rebuild operations– MC/AS Upgrade support (version 11.3.3)
4 © 2015 IBM Corporation
IBM InfoSphere Data Replication (IIDR) Coverage
DB2 (i, LUW)Informix
Oracle/ExadataMS SQL Server
Sybase
DB2 (z/OS, i, LUW)
Informix
Oracle/Exadata
MS SQL Server
Sybase
Pure Data for Analytics (Netezza)
Teradata
Information Server
Cloudant
DataStage to GreenPlum, …
Message Queues
Files
Customized Apply
FlexRep (JDBC targets)
IMS
IMS
VSAM
VSAM
DB2 z/OS
DB2 z/OS
MySQL, EnterpriseDB…
ESB, MQ Series, JMS, …
Flat file, HDFS…
Hadoop/Streams HDFS/Hive, WebHDFS, User Exit
5 © 2015 IBM Corporation
InfoSphere Data Replication - Expansive supportSOURCES TARGETS O/S HARDWARE
DB2 (z/OS, i, LUW) All Sources1 z/OS System z
Oracle/Exadata Pure Data for Analytics Red Hat / SuSE System z
MS SQL Server Information Server AIX System p
Informix Hadoop/Streams2 IBM i OS System i
Sybase Cloudant Red Hat / SuSE Intel / AMD/Power4
IMS FlexRep (MySQL, EnterpriseDB) MS Windows Intel / AMD
VSAM Teradata HP-UX HP- Itanium
MQ Series / JMS HP-UX HP PA-RISC
WebMethods / BEA / TIBCO Solaris Sun Sparc
Greenplum3
1. IMS is only a Target for IMS Sources. VSAM is only a Target for VSAM Sources
2. Via HDFS, WebHDFS or custom user exit
3. Via DataStage
4. Power-8 with Little Endian for DB2 LUW only
6 © 2015 IBM Corporation
New Database/Platform Support for IIDR’s CDC
� CDC Oracle Windows Redo– Supports Oracle Redo on Windows same as other platforms including
Oracle RAC and ASM– Supports all configuration modes supported by the Linux/Unix CDC
Oracle versions such as local, remote and the various log shipping modes
– Supports MBCS which the trigger version did not
7 © 2015 IBM Corporation
Recovery mechanism after Oracle DataGuard failover
� Recovery mechanism after Oracle failover to DataGua rd(DG) Standby database− In a fail-over to a DG standby, the re-instantiation of the database
results in a new incarnation− CDC only supports reading logs from current incarnation of the database
and will not “see” any unprocessed logs from previous incarnation− Use new dmfailoverrecovery command to instruct CDC to scrape the
required logs (if there was latency at time of failover) from the previous incarnation of the database− The command only works for the previous incarnation of the database− The dmfailoverrecovery starts mirroring with a scheduled end so
replication will stop once the last log entry is read from the previous incarnation of the database
− If command does not succeed, then a refresh of the tables will be required
− This can occur for instance if the last log is corrupted
8 © 2015 IBM Corporation
Recovery mechanism after Oracle DataGuard failover
� dmfailoverrecovery command syntax
<CDC_INSTALL_HOME>/bin>./dmfailoverrecovery -I <instance name> [-d | -r ]
Options-d: Displays the information the command will use f or running the recovery process. It does not start the recovery itself.
-r: Start the recovery process. Recovery time depen ds on the number of data that needs to be processed.
If the recovery succeeds, users can then resume normal replication. if the recovery process fails, users will need to perform a full refresh of all tables to ensure data consistency.
9 © 2015 IBM Corporation
Oracle Exadata ASM & Flex ASM Support
� CDC can now be configured to operate directly from the Exadata appliance– CDC can be locally installed on Exadata and seamlessly read from
ASM– If you wish to read the Exadata logs remotely, you will need to ship
the logs to a non-ASM location that is accessible to CDC
� The same user configuration experience is provided for both traditional Oracle database ASM configuration and ASM on Exadata
10 © 2015 IBM Corporation
Oracle Exadata ASM & Flex ASM Support…
� CDC now seamlessly supports ASM automatic rebalancing
� The new ASM support requires the user to add a tns entry to the tnsnames.ora file to point to the ASM instance
Config tool modified to specify a tnsnames.ora location
*
11 © 2015 IBM Corporation
Oracle Exadata ASM & Flex ASM Support…
� CDC now supports Oracle 12c Flex ASM– Flex ASM essentially allows an Oracle RAC node to use ASM from
other nodes instead of tying it to the ASM on its node– For CDC to support, you must make minor changes to the
tnsnames.ora. Eg.ASM =(DESCRIPTION =(ADDRESS_LIST =
(FAILOVER = on)(ADDRESS = (PROTOCOL = TCP)(HOST = cdcrac2.torolab.ibm.com)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = cdcrac3.torolab.ibm.com)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = cdcrac1.torolab.ibm.com)(PORT = 1521))
)(CONNECT_DATA =
(SERVER = DEDICATED)(SERVICE_NAME = +ASM)
))
Specify the node names
If one ASM instance is down it will automatically switch to the next one
12 © 2015 IBM Corporation
WebHDFS Support Overview
� New WebHDFS support utilizes Rest APIs– Allows much greater flexibility on where the CDC target is installed
• CDC install no longer required to be part of the Hadoop cluster• Added benefit that changes/upgrades of the Hadoop cluster will not impact the
server where the CDC target engine is running– Allows CDC to target any Hadoop distribution
� Removes the restriction on what underlying file system is being used– As such, now supports replicating to Hadoop on GPFS
� Allows CDC to interact with a Hadoop install which is configured to use Kerberos security
� WebHDFS using Kerberos authentication has approximately the same throughput performance as using the CDC local HDFS option– WebHDFS with simple authentication has lower performance
13 © 2015 IBM Corporation
WebHDFS Support Configuration
� New WebHDFS option is available when mapping tables
14 © 2015 IBM Corporation
WebHDFS Support Configuration…
Specify the name of the directory where the HDFS files will reside
Note, the server name is specified in the Hadoop Properties
15 © 2015 IBM Corporation
Configuring Hadoop Properties for the Subscription
� WebHDFS Connection Information specified in the Hadoop Properties– Supports Simple and Kerberos Authentication– Supports both http and https
� Note that the fully qualified connection string must be supplied including the /webhdfs/v1/� Additional examples by Hadoop service for WebHDFS with default configuration:
– Through httpfs proxy : BigInsights 3.0• http://192.xxx.xxx.xxx:14000/webhdfs/v1/• https://192.xxx.xxx.xxx:14443/webhdfs/v1/
– Through knox gateway : BigInsights 4.0• https://192.xxx.xxx.xxx:8443/gateway/default/webhdfs/v1/
– Directly to HDFS namenode : rarely permitted in production• http://192.xxx.xxx.xxx:50070/webhdfs/v1/• https://192.xxx.xxx.xxx:50470/webhdfs/v1/
16 © 2015 IBM Corporation
Naming Convention of Files
� CDC uses the following convention to name the HDFS flat files that are produced during replication
– (_)[Table].D[Date].[Time][# Records]• _ = Currently open HDFS file. Removed when completed• [Date] = Julian date (year, day number within year)• [Time] = hh24mmss when flat file was created (in GMT)• [# Records] = Optionally the number of records can be added
� For those who are familiar with standard IIDR flat file production, there are some behavior difference with IIDR HDFS files compa red with standard flat file production
– File prefix is different• HDFS uses _ instead of @ for working file
– Fields are not quoted in files produced in HFDS– HDFS doesn’t create [Table].STOPPED file when subscription is stopped
17 © 2015 IBM Corporation
HDFS Record Format
� Standard columns containing information about the c hange:– DM_TIMESTAMP - The timestamp obtained from the log of when the
operation occurred (contains the value from the &TIMSTAMP journal control field)
– DM_TXID - Transaction identifier (contains the value from the &CCID journal control field)
– DM_OPERATION_TYPE contains a single character indicating the type of operation:
• "I" for an insert. • "D" for a delete. • For Single Record Format there is one type that represents the update image
� "U" represents an update. • For Multiple Record Format there are two separate types that represent before and
after image� "B" for the row containing the before image of an update. � "A" for the row containing the after image of an update.
– DM_USER - The user that performed the operation (contains the value from the &USER journal control field)
18 © 2015 IBM Corporation
HDFS Record Format…
� Single record– In this format an update operation is sent as a single row– The before and after image is contained in the same record – E.g. Inserting 1 row followed by Deleting 1 row
2015-07-15 22:09:46,6674163,I,GSAKUTH,\N,\N,\N,4381 Kelly Ave,San Jose,CA2015-07-15 22:09:47,6674174,D,GSAKUTH,4381 Kelly Ave,San Jose,CA,\N,\N,\N
� Multiple record format– An update operation is sent as two rows, the first row being the before image
and the second row containing the after image.
� Note that the following characters will be escaped:– Comma: escaped with “\”– Escape: escaped with “\\”– Null: escaped with “\N” (as illustrated in the example above)
� Binary Data is encoded in base64
� Sample customer formatter (SampleDataFormatForWebHd fs.java) is provided with product if customization of output for mat required
19 © 2015 IBM Corporation
Cloudant Target Support
� New CDC target engine that applies directly to Cloudant– Receives changed data based on “relational” tables and transforms the data to
equivalent JSON documents
� Utilizes the existing CDC DataStage target engine infrastructure
New ‘Cloudant’ delivery method is available
20 © 2015 IBM Corporation
Cloudant Target Support…
� Cloudant URL and login credentials are provided via the subscription properties dialog
� The connection to Cloudant is secure utilizing HTTPS
21 © 2015 IBM Corporation
Cloudant Target Support…
Indicates which Cloudant database to apply to
Select Parent Table for compound documents
customer
22 © 2015 IBM Corporation
Cloudant Target Support…
� Full CHCCLP scripting support available…. E.g.
add table mapping
[name] Specifies the name of the subscription. If a name is not provided, the subscription that is currently identified as the context will be
used. To view the current context, use the "show context" command.
[sourceDatabase] Database for the source table.sourceSchema Schema for the source table.sourceTable Name of the source table.type Table mapping type.
VALID VALUES:cloudant
cloudantDatabase Name of the Cloudant database.primaryKeyColumns Set of columns comprising the primary key of the source table.[parentSchema] Schema of the parent table in the source database.[parentTable] Name of the parent table in the source database.
add table mapping sourceSchema cdcschema sourceTable Invoices type cloudantcloudantDatabase inv primaryKeyColumns "inv_number"
23 © 2015 IBM Corporation
Cloudant Apply Detail…
� A Changed Row (insert/update/delete) in the source table will be replicated to a JSON document in Cloudant
� In the JSON document, there will be a document id (“_id”) based on the key of the source table (internal relationship)
INSERT into CDCSCHEMA.TABLE_1 values (25, “94401”, ….)
Changed Data Capture
The _id is used to resolve to the document in Cloudant
24 © 2015 IBM Corporation
Cloudant Apply Detail…
� Apply behavior is to replicate source data regardless of the document existence in Cloudant
� The apply mode is conceptually similar to ‘adaptive apply’
25 © 2015 IBM Corporation
Cloudant Apply: Compound JSON Documents
� Designate Parent – Child Relationships– Compound documents are created on the fly
PROFILEKey – CARD_NUM
TRANASCTIONKey – CARD_NUM
- TRANS_ID
Source Database
Parent
Child
Card Holder
Multiple Transactions(Repeating Elements in the Parent document)
Cloudant Doc in ODS
26 © 2015 IBM Corporation
Extended Custom Flat file formatter
� Allows users to write custom user exits that supports customizing temporary flat file and hardening flat file names
� Supports customization for DS flat file generated on both LUW and Hadoop file systems (HDFS)
� Allows users to suppress update before images to be logged in the flat file– When before images are suppressed contents of single file and multiple file
mode will be similar
27 © 2015 IBM Corporation
Extended Custom Flat file formatter
� Four new methods for extended custom formatter– getContextForDataStageExtendedFileCharacteristicsIF
• This is called once for each table mapping, just before the first operation being processed by the apply for that table. The returned context object will be provided to all subsequent calls to the following methods: � getHardenedFileName� getTempFileName� assumeNoKeyChangesOccur
– getHardenedFileName• Used to provide the fully qualified name to use for the hardened file. This method
will be called just before the file is to be hardened after all the data has been written to it.
– getTempFileName• Used to provide the fully qualified name to use for the temporary file. This method
will be called just before the first data is to be written to this temporary file. – assumeNoKeyChangesOccur
• The file will be written as if the Multiple Record option were selected with the Update Before Image rows not included regardless whether "Single Record" or "Multiple Records" are selected. Note: If any update operations change the key columns being used by the application consuming these files that application will not be able to maintain an accurate copy of the data.
28 © 2015 IBM Corporation
CDC i Refresh While active support
� Now CDC i has the option to acquire a lock for a ver y short period of time before refreshing the table to ensure that a mirroring point is established in the log (journal)
� As such, you can now do a refresh while active (*RW A) and respect the commit boundaries on the target app ly – For example on LUW target engines the system parameter
mirror_commit_on_transaction_boundary no longer needs to be set to false
29 © 2015 IBM Corporation
MS SQL Server support of online index rebuild operations
� CDC now transparently handles index rebuilds and reorgs on tables that CDC is replicating– Thus, if no structural table change was performed before the index
rebuild or reorg, mirroring will just continue normally– Previously users were required to following the manual DDL
procedures as replication would have ended when the index rebuild or reorg was encountered in the log
30 © 2015 IBM Corporation
MC/AS Upgrade support
� The latest IIDR 11.3.3 Management Console build (5101) allows the user to perform an upgrade– You simply upgrade Management Console by installing a later
version of the software over top of an existing 11.3.3 installation
� Similarly, the latest IIDR 11.3.3 Access Server build allows the user to perform an upgrade– You simply upgrade access server by installing a later version of the
software over top of an existing 11.3.3 installation
31 © 2015 IBM Corporation
Additional Resources
� IBM Developer Works CDC community: – https://www.ibm.com/developerworks/mydeveloperworks/groups/service/html/community
view?communityUuid=a9b542e4-7c66-4cf3-8f7b-8a37a4fdef0c
� IBM CDC Knowledge Center:– http://www-01.ibm.com/support/knowledgecenter/SSTRGZ_11.3.3/
� CDC Redbook:– http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247941.html?Open
� IBM CDC Support:– http://www-
947.ibm.com/support/entry/portal/product/information_management/infosphere_change_data_capture?productContext=-873715215
� Passport Advantage:– https://www-112.ibm.com/software/howtobuy/softwareandservices/passportadvantage
32 © 2015 IBM Corporation
33 © 2015 IBM Corporation
Legal Disclaimer
• © IBM Corporation 2015. All Rights Reserved.• The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained
in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.
• References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.
• If the text contains performance statistics or references to benchmarks, insert the following language; otherwise delete:Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
• If the text includes any customer examples, please confirm we have prior written approval from such customer and insert the following language; otherwise delete:All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.
• Please review text for proper trademark attribution of IBM products. At first use, each product name must be the full name and include appropriate trademark symbols (e.g., IBM Lotus® Sametime® Unyte™). Subsequent references can drop “IBM” but should include the proper branding (e.g., Lotus Sametime Gateway, or WebSphere Application Server). Please refer to http://www.ibm.com/legal/copytrade.shtml for guidance on which trademarks require the ® or ™ symbol. Do not use abbreviations for IBM product names in your presentation. All product names must be used as adjectives rather than nouns. Please list all of the trademarks that you use in your presentation as follows; delete any not included in your presentation. IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld and Lotusphere are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both.
• If you reference Adobe® in the text, please mark the first use and include the following; otherwise delete:Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
• If you reference Java™ in the text, please mark the first use and include the following; otherwise delete:Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
• If you reference Microsoft® and/or Windows® in the text, please mark the first use and include the following, as applicable; otherwise delete:Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.
• If you reference Intel® and/or any of the following Intel products in the text, please mark the first use and include those that you use as follows; otherwise delete:Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
• If you reference UNIX® in the text, please mark the first use and include the following; otherwise delete:UNIX is a registered trademark of The Open Group in the United States and other countries.
• If you reference Linux® in your presentation, please mark the first use and include the following; otherwise delete:Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
• If the text/graphics include screenshots, no actual IBM employee names may be used (even your own), if your screenshots include fictitious company names (e.g., Renovations, Zeta Bank, Acme) please update and insert the following; otherwise delete: All references to [insert fictitious company name] refer to a fictitious company and are used for illustration purposes only.