2
Informatica Data Quality Upgrade
Marlene Simon, Practice Manager
IPS Data Quality Vertical
Informatica
3
Biography
Marlene Simon
• Practice Manager IPS Data Quality Vertical
• Based in Colorado
• 5+ years with Informatica Professional Services
• 20+ years experience in delivering and managing
data quality projects
4
Agenda
• Informatica Data Explorer (IDE)
• Data Quality v9.1 Functionality
• IDE Migration/Upgrade
• Informatica Data Quality (IDQ)
• Overview of version differences
• IDQ Migration/Upgrade
• Questions
5
Informatica Data Explorer
6
Data Quality v9.1 Functionality
IDQ 9.1 IDE 9.1
Structural Profiling - Table Profiling
- Primary Key Inference
- Functional
Dependency - Cross Table Profiling
- Foreign Key Inference
- Overlap Discovery - Profile Model
NOT entitled to DQ
Transformations
- DQ Transformations
NOT entitled to
Structural Profiling
features
Common to
BOTH IDQ & IDE
Informatica DQ Analyst
- 5 named users (additional
users available on tiered
pricing model by seat)
Informatica Analyst Viewer
- Unlimited (‘read only’
Analyst user)
Informatica Developer
- Unlimited
- Entitled to core transforms
Server
-CPU Cores
Base Profiling
Features - Column Profiling - Rule Profiling
- Scorecarding
- RTM - Midstream
- Comparative
Profiling - Mapping generation
from Profile
- Join Validation
Feature Supported By: • Both Analyst and Developer
• Analyst Only
• Developer Only
Note: Effective in 9.1.0, Informatica Data Explorer Standard Edition is referred to as
Informatica Data Explorer Legacy and Informatica Data Explorer Advanced Edition is
renamed to Informatica Data Explorer.
7
IDE Migration/Upgrade Overview
• Supported versions
• Informatica Data Explorer Legacy 8.6.x – 9.1
• Informatica Data Explorer 9.1
• You can transfer the following objects to the
Informatica Data Explorer platform as part of the
migration:
• Database connections
• Data sources
• Column profiling
• Schemas
• Projects
8
IDE Migration/Upgrade Overview
• IDE Migration Object Conversion
Informatica Data Explorer
Legacy Informatica Data Explorer
Data Source Connections Data Source Connections
Tables Data Objects
Rules Expression Rules
Action Items Comments
Notes Comments
Schemas Folders
Projects Projects
9
IDE Migration path to current release
IDE 9.1 IDE 9.5
IDE3.0
Informatica V9 – Mercury Platform
Upgrade to 9.5
Legacy
IDE 8.5 – 8.6.2
Legacy
IDE 9.0
Legacy
IDE 9.1
Migration Utility
Install IDE Legacy Migration
Server
Run migration utility to
upgrade to IDE 9.1
Migration Server is part of
IDE Legacy 9.1 Installation
run utility to upgrade
10
Migration Reference Documents
• Documentation is available on the Informatica
website
• 0281_MigratingObjectsFromIDELegacyToIDE.pdf
• Outlines the migration process from Informatica Data Explorer Legacy (V8.6.2) to Informatica Data Explorer (V9.1)
11
Informatica Data Quality
12
IDQ Overview – Version Differences
• Different Architecture
• Differences between 8.6.2 Components and V9
transformations
• Change in terminology between Versions
13
IDQ 8.6.2 Architecture
• 8.6.2 Common Configuration
• One repository per user
• Reference data on local file system
• Data quality metadata contained in IDQ plan
• Connection details embedded within IDQ plan
IDQ Client
with local
repository
IDQ Server
and shared
repository
14
Informatica 9 Architecture
• 9.1 Common Configuration
• Central repository shared by all users
• Analyst and Developer Tool
• Reference data in Reference Table Manager
• Data Quality metadata in 9.0.1 models
• Connection details stored and managed centrally
MRS
Informatica Server Environment
Data Quality Server
Informatica
Repository
Address Doctor 5
File System
Address Doctor 5
Reference Data
Informatica
Analyst
Informatica
Developer
15
8.6.2 Component vs. V9 Transformation
Direct
Object
Conversion
8.6.2 Component converts to V9 Transformation
• Source, Target or component convert directly to 9.0.1 transformation
• Merge Merge :: To Upper Case
Multiple
Objects
in 9.0.1
8.6.2 Component converts to multiple V9 Transformations
Supported
Enhanced
Functionality
V9 Transformation provides Enhanced functionality
Not Supported
8.6.2 Component does not Migrate to V9
• Source, Target or component convert to multiple transform or mapplet
• Context Parser = Labeller + Parser (set to pattern base)
• Provide equivalent functionality however have evolved from 8.6.2
• Jaro, Bigram, Edit and Hamming distance convert to Comparison
• Component not supported in 9.0.1 repository
• Scripting component :: will convert to expression
16
Example Migration Exclusions
• Report Targets (flat file and relational)
• Scorecards now part of Analyst tool and based on profiling results (of columns or rules)
• Scripting (TCL)
• Version 9.0.1 uses Java for scripting requirements
• TCL script can also be in the RBA – will be converted to a decision transformation
• Mixed Field Matcher
• Component no longer supported
NOTE: Full set of component to transformation conversions
provided in Appendix A of this presentation
17
Changes in terminology
IDQ 8.6.2 IDQ 9
Plan Mapping
Component Transform /
Transformation
Dictionary Reference Table
18
IDQ Migration/Upgrade
• Migration/Upgrade Path
• Migration/Upgrade Considerations
• Migration Pre-requisites
• Migration Process
• Post Migration Tasks
• Upgrade Process
• Post Upgrade Tasks
19
Migration/Upgrade Path
IDQ 3.0 IDQ 3.1 IDQ 8.5
IDQ 8.6.x
prior to
8.6.2 IDQ 8.6.2 IDQ 9.0.1 IDQ 9.1 IDQ 9.5
IDQ 3.0
IDQ 3.1
IDQ 8.5
IDQ 8.6.x
prior to
8.6.2
IDQ 8.6.2
Informatica V9 – Mercury Platform
Upgrade
3.0 to 3.1 Upgrade 3.1 to 8.6.2
Upgrade 3.1 to 8.6.2
Upgrade 8.5 to 8.6.2
Upgrade to
8.6.2
Migration Utility
-- Only works
between 8.6.2 and
9.0.1 versions
Export plans and
dictionaries from
8.6.2
Migrate to 9.0.1
Validate mappings
and mapplets
Upgrade to 9.1 or 9.5
20
Migration/Upgrade Considerations
• Identify Level of effort for Migration/Upgrade
• Stand Alone Data Quality Migration
• IDQ Server Only (shared repository)
• IDQ Clients (local repositories)
• Data Quality Batch scripts
• PowerCenter Integration
• Verify all integrated DQ objects exist in the IDQ repository for migration/upgrade
• How many PC mappings require upgrade/modification
• Contain GAV component
• Contain deprecated components
21
Migration/Upgrade Considerations
• Review plans and dictionaries slated for
migration. Remove or delete those that do not
need to be migrated
• Identify any duplicate plans and determine which
need be migrated
• Review batch scripts and remove or delete any
which are obsolete
22
Migration/Upgrade Considerations
• Migrate plans or re-develop mappings
• Less than 4 plans and low to medium complexity
• Recommended to redevelop in IDQ Developer tool than perform migration and upgrade
• Ensure you plan adequate time for testing and QA
• More than 4 plans or extremely complex plans
• Typically more efficient to perform migration/upgrade
• Ensure you plan adequate time for testing and QA
23
Migration/Upgrade Considerations
• IDQ/PC Integration and IDQ repository no longer
exists
• If the integrated DQ plans do not contain any GAV or deprecated components – run the CRS utility in the PowerCenter environment, the 8.6.2 plans continue running in the 9.x PowerCenter mappings
• If the integrated plans contain GAV or deprecated components
• You CAN NOT export IDQ 8.6.2 plans from PowerCenter to IDQ Workbench or Developer tools – you will have recreate the DQ mappings in the Developer tool and export to PowerCenter
24
Migration/Upgrade Considerations
IDQ 8.6.2 Local
repositories
IDQ 8.6.2 Server
repository
Migration
ClientPackage
MRS
Informatica 9.0.1
Domain
ServerPackage
XML Import
MRS
Informatica
9.1 or 9.5
Domain
Stand Alone Data Quality Migration/Upgrade
Upgrade
25
Migration/Upgrade Considerations
Data Quality Migration/Upgrade with PowerCenter Integration
Single Informatica 9.1 or 9.5 Domain
MRS
Informatica
9.1 or 9.5
Domain
PowerCenter
Repository
* Requires separate
9.0.1 domain for
upgrade or VMware
Run CRS utility -allows 8.6.2 DQ
plans to run in 9.x
environment
Export DQ
mappings requiring upgrade
AV transformation
deprecated
components Upgrade
MRS
Informatica
9.1 or 9.5
Domain Export upgraded MRS and Content
IDQ 8.6.2 local
repositories
IDQ 8.6.2 Server
repository
Migration
ClientPackage
MRS
Informatica
9.0.1 Domain
ServerPackage
XML Import
26
Migration/Upgrade Considerations
MRS
Informatica DQ
9.1 or 9.5
Domain
Data Quality Migration/Upgrade with PowerCenter Integration
Separate DQ 9.x Domain and PowerCenter 9.x Domain
Run CRS utility -allows 8.6.2 DQ
plans to run in 9.x
environment
Export DQ mappings requiring upgrade
AV transformation
deprecated components
Informatica
9.1 or 9.5
Domain
PowerCenter
Repository
IDQ 8.6.2 local
repositories
IDQ 8.6.2 Server
repository
Migration
ClientPackage
MRS
Informatica
9.0.1 Domain
ServerPackage
XML Import
27
Migration/Upgrade Considerations
• Create Migration/Upgrade project plan
• In order to manage the IDQ migration/upgrade project it is crucial to identify and understand all task which will be executed in order to complete the project.
• Clearly define and list all steps required for the multi-step migration and multi-version upgrade of IDQ.
• Allocate adequate time for IDQ re-development and testing after upgrade.
• Allocate adequate time for PowerCenter/DQ integration testing
• Include V9 product training as part of the plan
28
Migration Pre-requisites
• Informatica 9.0.1 needs to be properly installed,
configured, and running before starting the migration
process
• IDQ is at version 8.6.2. Earlier versions cannot be
directly migrated to Informatica 9.0.1
• Ensure Java version is current
• Gather all required login and connection information
• Review the migration steps and create a project plan
• Download and extract IDQMigration.zip
29
3 Step Migration Process
Client Package
IDQ 8.6.2 Client
• Export IDQ plans from IDQ repository
• Identify connection details
• Gather local directories
• Package data for next step
Server Import
9.0.1 Server
9.01 Client
XML Import
• Unpack data from Client Package
• Create connections
• Import dictionary data into Reference Table Manager
• Convert plans to 9.0.1 XML
• Import XML from Server Import into 9.0.1 Repository
30
Client Package
• Executing the ClientPackage will create the
Package and Stage folders along with the
corresponding logs
31
ClientPackage - Report • <MigrationPackageLocation>/Package/PackageReport.html
Identify Dictionaries used by plans and dictionaries
that exist but are not used by any plan
Database Connections used by plans. One entry for
every DSN/Username/Password combination
32
ServerImport
• Pre-requisites (perform in 9.0.1 Developer Tool)
• Create new blank project for mappings to be imported to
• Create new folder for imported reference tables
• Install Informatica Content packages in shared project
• Server Import
• Unpacks data from ClientPackage (MigrationPackage.zip)
• Creates connections
• Import dictionary data into Reference Table Manager
• Converts 8.6.2 Plans to 9.0.1 Mapping XML
33
ServerImport – Summary/Overview Report
• Overall status of conversion
• Links to detail / individual reports
• Default location
• <MigrationPackageLocation>/migration_reports
34
ServerImport – Detail Reports
• One Detail report per 8.6.2 plan/9.0.1 mapping
• Component / Port level detail
• Includes warnings / errors
• Default location
• <MigrationPackageLocation>/migration_reports
35
Client XML Import
• Using Developer tool, import the
MigratedMappings.xml file produced during the
ServerImport process.
36
Post Migration Tasks
• Validate all DQ mappings
• Make any required changes to get mappings to
validate – do not start re-development effort at
this time
37
Upgrade Process
• Stand Alone Data Quality
• Upgrade 9.0.1 repository to most current version of Informatica – 9.1 or 9.5
• PowerCenter and Data Quality Integration
• Single Domain for PowerCenter and Data Quality
• Upgrade 9.0.1repository to current domain version
• Separate Domain for PowerCenter and Data Quality
• Upgrade 9.0.1 repository to most current version of Informatica
• Run CRS Utility to enable 8.6.2 plans to continue running in PowerCenter 9.1 or 9.5 environments
• If new PowerCenter install and repo upgrade, the 8.6 DQ binaries need to be accessible to new environment (either moved or referenced)
38
Post Upgrade Tasks
• Re-develop DQ mappings with deprecated
components
• During migration deprecated components convert to expression, modify mappings to convert to appropriate logic
• Address Validation – some 8.6.2 output fields are not available in the 9.x releases. Status codes in 9.x are different from 8.6.2 codes.
• Validate output for all mappings in Developer
tool to ensure desired results are achieved
39
Post Upgrade Tasks
• Export modified DQ mappings to PowerCenter
• Only mappings which contain deprecated components or Address Validation transformation must be re-exported
• Verify PowerCenter mappings which utilize Address Validation match status codes are producing desired results
• Perform any necessary code changes to PowerCenter mappings utilizing updated DQ mappings
• QA results from PowerCenter mappings to ensure desired output is being achieved with updated DQ logic
40
IDQ Migration Reference Documents
• Documentation is available on the Informatica
website
• IN_901_DQ_Repository_Migration_Guide_en
• Installation and configuration documentation for current version is also available at http://communities.informatica.com – product documentation
41
Informatica Velocity Methodology
Velocity is the blueprint for delivering efficient and successful Informatica solutions that solve business problems.
New Website • New search capability • Filtering/viewing content by
• project type
• project phase • or other tags
• New accelerator tools
• Hot links between the articles
Access at: mysupport.informatica.com
Visit the Informatica Pavilion at the Technology and Solutions Fair for more details. Check out
more than 100
new articles!
42
Questions?
43
Appendix A Component to Transformation Conversion
44
Component to Transformation conversion 8.6.2 Component 9.0.1 Component
Aggregation Aggregator transformation
Association [for PowerCenter] Association transformation
Bigram Comparison transformation
Character Labeler Labeler transformations
Consolidation [for PowerCenter] Consolidation transformation
Context Parser Labeler and Parser transformations. The Parser is set to pattern-based
parsing mode.
Count Mapplet containing Aggregator, Union, Expression, Joiner, Sorter, and
Filter transformations
CSV Dual Match Source Two file-based data sources and a Match transformation
CSV Identity Group Source File-based data source and Match transformation. May convert to a
mapplet.
CSV Match Sink File-based data target
CSV Match Source File-based data target and Match transformation
CSV Merge Sink File-based data target
CSV Sink File-based data target
CSV Source File-based data source
DB Identity Group Source Relational data source and Match transformation. May convert to a
mapplet.
DB Match Source Relational data source and Match transformation
DB Report Sink File-based data target
DB Sink SQL transformation and relational data target
45
Component to Transformation conversion
8.6.2 Component 9.0.1 Component
DB Source Relational data source
Dual Group Source Multiple file-based data sources and Union transformation if required. May
convert to a mapplet
Edit Distance Comparison transformation
Fixed Width Sink File-based data target
Fixed Width Source File-based data source
Global AV [Address Doctor engine] Address Validator transformation. This transformation needs additional
configuration following import to the 9.0.1 Model repository.
Global AV [Melissa Data engine] Address Validator transformation. This transformation needs additional
configuration following import to the 9.0.1 Model repository.
Global AV [QAS engine] Address Validator transformation. This transformation needs additional
configuration following import to the 9.0.1 Model repository.
Global AV [SDK] Not supported
Group Sink Flat-file data target and Sorter and Expression transformations
Group Source Multiple file-based data sources and Union transformation if required. May
convert to a mapplet.
Hamming Distance Comparison transformation
Identity Group Target Mapplet output transformation
Identity Match Match transformation
Jaro Distance Comparison transformation
Match Key Sink Relational data target
46
Component to Transformation conversion
8.6.2 Component 9.0.1 Component
Merge Merge transformation
MinAvgMax Mapplet containing Aggregator, Union, Expression, Joiner, and Router
transformations
Missing Values Mapplet containing Aggregator, Expression, Joiner transformations
Mixed Field Matcher Not supported
Normalization [SDK] Not supported
NYSIIS Key Generator transformation
Parsing [SDK] Not supported
Profile Standardizer Parser transformation. The Parser is set to pattern-based parsing mode.
Range Counter Linear Range: Aggregator, Expression, Joiner, and Sorter transformations
Variable Range: Aggregator, Expression, Union transformations
Realtime Sink Mapplet containing data target
Realtime Source Mapplet containing data source
Report Sink Flat-file data target
Rule Based Analyzer Decision transformation
SAP Sink Not supported
SAP Source Not supported
Scripting Not supported
Search Replace Standardizer transformation
Similarity [SDK] Not supported
Soundex Key Generator transformation
47
Component to Transformation conversion
8.6.2 Component 9.0.1 Component
Splitter Labeler, Parser, and Expression transformations. The Parser is set to pattern-
based parsing mode.
Sum Mapplet containing Aggregator, Expression, Joiner, Sorter, and Union
transformations
To Upper Case Converter transformation
Token Labeler Labeler transformation
Token Parser Parser transformation
Weight Based Analyzer Weighted Average transformation
Word Manager Standardizer transformation