Post on 14-Jul-2016
description
transcript
1Informatica confidential. For discussion purposes only.
Pushdown Optimization
Jason Hamby
2Informatica confidential. For discussion purposes only.
Agenda
• Pushdown Optimization Overview and Benefits
• How it works
• How to Configure Pushdown Optimization
• What Is and What Is Not Supported• What can/can not be pushed down• Limitations – details of rules• When is Pushdown Optimization appropriate
• Demo
3Informatica confidential. For discussion purposes only.
Overview
4Informatica confidential. For discussion purposes only.
Pushdown Optimization Overview
Push transformation processing to data sources
Benefits- Reduce data moved when source and target are the same- Utilize database-specific processing that may be more
optimal- Maintain metadata and lineage in PowerCenter
5Informatica confidential. For discussion purposes only.
Customer Scenario
Batch transformation and load -- staging and target tables in the same target database
Transformation and load from real-time status table to data warehouse in the same database
Staging Warehouse
Step 1 Step 2
DataSources
TargetDatabase
6Informatica confidential. For discussion purposes only.
Solution Overview
Pushdown optimization is an option that user selects SQL to be processed in DB is automatically generated
A session may be partially, or completely pushed down
DI Server
MetadataRepository
Optimizer
SQL
Staging Warehouse
Step 1 Step 2
DataSources
TargetDatabase
7Informatica confidential. For discussion purposes only.
How Does It Work
8Informatica confidential. For discussion purposes only.
How It Works Available as a session property Pushdown Optimization Options
– Partial pushdown optimization to source– Partial pushdown optimization to target– Full pushdown optimization
• Integration Service analyzes the mapping and generates one or more SQL statements based on the mapping transformation logic
• Integration Service executes SQL against the database instead of processing the transformation logic itself
9Informatica confidential. For discussion purposes only.
How It Works (cont’d) Integration Service analyzes the mapping and
session to determine the transformation logic it can push to the database
Integration Service processes transformation logic that it cannot push down to the database
Generated SQL is not saved in the repository Displayed results in session mapping tab (in
Workflow Manager)– Transformations that can/can’t be pushed down– Generated SQL– Reason why certain transformations can’t be pushed down
10Informatica confidential. For discussion purposes only.
Configuration (from Workflow Mgr)
11Informatica confidential. For discussion purposes only.
Viewing the Result
12Informatica confidential. For discussion purposes only.
Preview from Session—Mapping Tab
Transformations Pushed to Source or Target Database
Generated SQL Statement
13Informatica confidential. For discussion purposes only.
What Is and What Is Not Supported
14Informatica confidential. For discussion purposes only.
Supported Databases
• Teradata (V2R5 or above)
• Oracle (9i or above)
• DB2 (v8 or above)
• SQL Server (7 and above)
• Sybase (ASE 12.5)
• ODBC source/target
15Informatica confidential. For discussion purposes only.
Supported Transformations
• To Source• Aggregator• Expression• Filter• Joiner• Lookup• Sorter• Union
• To Target• Expression• Lookup
16Informatica confidential. For discussion purposes only.
Unsupported Transformations
• Custom Transformation
• External Procedure
• XML
• Normalizer
• Rank
• Router
• Sequence Generator
• Stored Procedure
• TCT
• Update Strategy
17Informatica confidential. For discussion purposes only.
Partial Source Pushdown
• Condition:• One or more transformations can be processed in source database
• Virtual source – transformations pushed to source
• Generated SQL: • SELECT … FROM s … WHERE (filter/join condition)… GROUP
BY…
• a
SourceDB
Extract
Target
LoadTransform
18Informatica confidential. For discussion purposes only.
• Condition:• One or more transformations can be processed in target
database
• Virtual target – transformations pushed to target
• Generated SQL: • INSERT INTO t (…) VALUES (?+1, SOUNDEX(?))
• a
Partial Target Pushdown
Source
Extract
TargetDB
LoadTransform
19Informatica confidential. For discussion purposes only.
Full Pushdown
• Condition:• Source and target are in the same RDBMS• All transformations can be processed in database
• Data not extracted outside of DB
• Generated SQL:• INSERT INTO t (…) SELECT … FROM s …
• z
SourceDB
TargetDB
LoadTransformExtract
20Informatica confidential. For discussion purposes only.
Design (Two-Pass)
• Pass 1: • Start from the source and traverse transformations
downstream, and build SQL query (SELECT statement).• Stop if a transformation cannot be processed in source
database and settle for partial pushdown to source.• If target is reached, then full pushdown can be done with
INSERT SELECT statement
21Informatica confidential. For discussion purposes only.
Design (Two-Pass)
Pass 2: Bypass if phase 1 results in full pushdown optimization Start from the target and traverse transformations upstream
and build SQL statement (INSERT, DELETE, and UPDATE) for partial pushdown to target
Stop if a transformation cannot be processed in target database or already pushed to source database
22Informatica confidential. For discussion purposes only.
Considerations
Error handling subject to DBMS error handling No row-level error logging For mappings that generate long transaction
– Require more database resources (locks and log space)– No partial commit: entire transaction rolled back when an error is encountered
Result when executing in PowerCenter vs. pushed to DB may be different based on DB config– Case sensitivity– How null is treated in sort order– Formats (numeric value conversion to char; date conversion to char)– Data precision
23Informatica confidential. For discussion purposes only.
Limitations
A transformation will not be pushed down / stops the optimization if: A Source Qualifier, lookup, update transformation contains a SQL override
Optimizer does not parse user-defined SQL override (i.e. lookup, update, DSQ) DSQ SQL override limitation will be removed in GA by using temporary views
Use mapping variable Contains a variable port Override default values for input/output ports An expression uses a function that has no equivalent function in the
database It is part of a data profiling session Debugging is turned on An external loader is used (can only push to source, not to target) Row error logging is enabled
24Informatica confidential. For discussion purposes only.
Limitations
• A transformation will not be pushed down / stops the optimization if:• Mapping has too complex – i.e. too many pipeline branches (max 64
two-way branches, 43 three-way branches, or 32 four-way branches) • Partitioning is configured where:
• The partition type is not pass thru • There are different partition types for transformations in the pipeline and the
optimizer can’t remerge the partitions • Multiple match for lookup is configured (except for error report)• Limited by single SQL statement generated at target (INSERT into).
Optimizer doesn’t use temp tables or views (in FCS, GA will use temporary views)
• Generated SQL can’t be modified
25Informatica confidential. For discussion purposes only.
Appropriate Use of Pushdown Optimization
• Pushdown Optimization is ideal where:• Source and target are located
in the same database• Transformations processed in
the source DB reduces the amount of data moved
• Such as filters, aggregators
• Processing within PowerCenter is used when :• Operation can’t be done in
database (i.e. using SQL)• Source or target is not a
database