+ All Categories
Home > Documents > Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes...

Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes...

Date post: 13-Mar-2018
Category:
Upload: dothuan
View: 269 times
Download: 9 times
Share this document with a friend
28
Page | 1 Dynamic Data Quality Rule Management Framework Marketplace Listing User Guide Document Document Name Dynamic Data Quality Rule Management Framework – Marketplace Listing User Guide Document Author Andrew Faulkner, Analyst Informatica Professional Services Emily Deng, Analyst Informatica Professional Services Update Date March 24, 2016 Version Version 1
Transcript
Page 1: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 1

Dynamic Data Quality Rule Management Framework Marketplace Listing User Guide Document

Document Name Dynamic Data Quality Rule Management Framework – Marketplace Listing User Guide Document

Author Andrew Faulkner, Analyst Informatica Professional Services Emily Deng, Analyst Informatica Professional Services

Update Date March 24, 2016

Version Version 1

Page 2: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 2

Table of Contents Purpose ......................................................................................................................................................... 3

Scope ......................................................................................................................................................... 3

Rule Management Framework ..................................................................................................................... 4

Framework Workflow ............................................................................................................................... 4

Framework Component Overview ............................................................................................................ 4

Control Table ......................................................................................................................................... 4

Data Quality Rules ................................................................................................................................. 4

Composite Mapplet .............................................................................................................................. 5

Mapping ................................................................................................................................................ 5

Framework Development Best Practices .................................................................................................. 5

Control Table ................................................................................................................................................. 6

Pre-Check vs. Check .............................................................................................................................. 6

Control Table Development ...................................................................................................................... 7

Add a New Level 1 Simple Rule Check .................................................................................................. 7

Add a New Level 2 Complex Rule Check ............................................................................................... 8

Data Quality Rules ....................................................................................................................................... 10

Rule #1: rule_Completeness ................................................................................................................... 10

Rule #2: rule_Date_Format ..................................................................................................................... 11

Rule #3: rule_Email_Format ................................................................................................................... 12

Rule #4: rule_Numeric ............................................................................................................................ 13

Rule #5: rule_Uniqueness ....................................................................................................................... 14

Rule #6: rule_A_to_B .............................................................................................................................. 15

Rule #7: rule_A_to_B_Date .................................................................................................................... 17

Rule #8: rule_A_to_B_Exact_Col ............................................................................................................ 19

Rule A to B Use Cases .............................................................................................................................. 21

Rule Development .................................................................................................................................. 21

Composite Mapplet .................................................................................................................................... 24

Composite Mapplet Development .......................................................................................................... 24

Mapping ...................................................................................................................................................... 26

Mapping Development ........................................................................................................................... 26

Page 3: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 3

Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the

Dynamic Data Quality Rule Management Framework designed by Informatica Professional Services.

With maintainability in mind, reusable IDQ Rules and Mapplets have been created to cleanse,

standardize, and validate the incoming data from various sources according to project specific

requirements.

Scope Describe the step-by-step approach to a select set of IDQ Rules and mapplets built with the goal of

orienting the reader to a sufficient degree that they will be able to maintain and extend these artifacts in

the future if needed.

There are several Out-of-the-Box Accelerator rules available in Informatica Data Quality. If these are

used as part of a rule, they are assumed to be sufficiently documented with comments and standard

product documentation. No details on how they work will be added.

Page 4: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 4

Rule Management Framework The Rule Management Framework is a dynamic, reusable solution with a business user focus. This

solution provides a flexible way to manage and build Data Quality rule checks using a control table

driven framework and reusable mapplets. It allows for increased collaboration between business and

technical users, and it has the capability to build rule checks for data across heterogeneous source

systems without the need to develop landing processes. The solution does not rely on complex SQL to

execute rule checks. By producing easy-to-maintain code on the Informatica platform, the Rule

Management Framework maximizes reusability while minimizing non-reusable development efforts.

Rule development is performed on Informatica Analyst and Informatica Developer, which has a clean

user interface, powerful data discovery, and profiling capabilities.

Framework Workflow To utilize the Rule Management Framework, the development workflow for business and technical users

is as follows:

1. Business and Technical users collaborate on and plan what rules will be implemented

2. Technical user develops data quality rule mapplets in Developer

3. Technical user develops composite mapplet in Developer

4. Technical user develops mapping for each table in Developer

5. Business and Technical users enter rules into Control Table in Analyst

o Business users enter rules with status “Needs Review”

o Technical users review rules entered by business users, develops necessary components

and changes status to “Complete” to signal framework to run the particular rule

6. Run mappings through Developer or command line batch

Framework Component Overview

Control Table

The Control Table is a reference table that directly drives the entire Rule Management Framework. Once

all the rule components are developed, users can simply enter desired data quality rules into the Control

Table to check data across tables and source systems. The table is referenced by the other framework

components to minimize development work and maximize reusability. The Control Table can be

accessed from Informatica Developer or Analyst, which enables collaboration between business users

and technical developers.

Data Quality Rules

The data quality rules are built as mapplets. These are the smallest component of the Rule Management

Framework and are a one-time development effort. It can include custom-built or Out-of-the-Box

Accelerator rules, but must have particular inputs and outputs to function in the framework. A project

can have as many data quality rules as desired.

Within the Rule Management Framework, rules are categorized into three categories:

Page 5: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 5

Level 1: Simple checks involving only 1 column

Level 2: Complex checks involving more than 1 column, can include a conditional Pre-Check

Composite Mapplet

The Composite Mapplet is a single mapplet containing all of the data quality rule mapplets. It contains a

router that directs the data to the appropriate data quality rule check. This component requires

additional development as new data quality rules are added.

Mapping

The mapping is developed for each database table that the project wishes to implement data quality

checks on. This component is a one-time development effort for all Level 1 Simple checks, but requires

development for every additional Level 2 Complex check. A project can have as many mappings as

desired.

Framework Development Best Practices Rule development should be conducted with small representative test data sets as input to a mapping.

Once logic of the rule is completed and tested in the mapping, the contents of the rule can be copied

into a Mapplet and validated as a Rule. Debugging tools such as Data Preview and mid-stream profiling

are not available in a Mapplet. Detailed step-by-step guidelines are provided for developing new

component pieces.

Page 6: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 6

Control Table The Control Table is a reference table that contains the user-inputted data quality rules. It can be

accessed from both the Analyst tool and the Developer Tool by business users and technical developers

to collaborate in rule creation. The Control Table drives the Rule Management Framework.

The columns in the Control Table are as follows:

Rule ID: Unique identifying key for a rule

DQ Rule Name: Brief descriptive name for rule, used mainly for reporting purposes

DQ Rule Description: Explanatory description for rule

Rule Status: Completion status, rules will only run if status is “Complete”

Rule Type: Reusable rule type

Key Name: Key column(s) for table that is to be checked

Table Name: Table name that is to be checked

Column Name: Column name that is to be checked, this column name will trigger the rule check

Pre-Check Columns (Level 2 Rules Only): Column name for the condition to be checked

Pre-Check Operators (Level 2 Rules Only): Comparison used in the condition

o Can be evaluated as <, >, =, <>

Pre-Check Values (Level 2 Rules Only): Values or column name(s) that the Pre-Check Column will

be checked against, can also be NULL

Check Columns (Level 2 Rules Only): Column name for the rule check

Check Operators (Level 2 Rules Only): Comparison used in the rule check

Check Values (Level 2 Rules Only): Values or column name(s) that the Check Column will be

checked against

Check Table (Level 2 Rules Only): Table where the Pre-Check or Check Columns and Values can

be found

Complexity Index: Level and Index of the rule complexity

Pre-Check vs. Check

The pre-check acts as a conditional for the data quality rule check. The data must first pass the pre-

check condition before the rule check itself is evaluated. In other words, if the data does not pass the

pre-check condition, the data is not applicable to the data quality rule check and outputs a NA value. If

Page 7: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 7

the data does pass the pre-check condition, the data will be then evaluated against the rule check for a

PASS or FAIL value.

Control Table Development

Add a New Level 1 Simple Rule Check

1. Click the menu arrow in the top right corner of the screen. Select Add Row

2. Fill out form for new row. See screenshot below for examples of a Level 1 and Level 2 rule.

3. Rule ID should be the next sequential integer.

4. The DQ Rule Name should describe the rule type and the column name that will be checked.

5. Enter a short description for the DQ Rule Description.

6. Enter the rule status. If the rule is a draft and needs review, enter Review. If the rule has

finished development and testing, set the status to Complete to enable rule execution.

7. Enter the Rule Type that exactly matches the rule’s mapplet name.

8. Enter the table’s key column as Key Name.

9. Enter the table name.

10. Enter the column name that will be checked.

11. Skip the Pre-Check and Check fields and leave them blank.

12. Enter 1 as the Complexity Index. This indicates to the framework that this rule is a Level 1 Simple

check.

13. Click OK to add the new row.

Page 8: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 8

Example: Organization ID should only be numeric.

Add a New Level 2 Complex Rule Check

1. Repeat Steps 1 through 10.

a. Note: In Step 10, the column name can be any of the columns involved in the rule, as

long as it is from the table entered in Step 9 as Table_Name.

2. For Pre-Check Column, enter the column name used as the condition. If entering more than one

column name, separate with commas.

3. For the Pre-Check Operators, enter the operator comparison that will be performed on the Pre-

Check Column. If there are multiple columns in the condition, enter an operator for each and

separate with commas.

4. For the Pre-Check Values, enter the value that will be compared to the value in the Pre-Check

Column. If there are multiple columns in the condition, enter a value for each and separate with

commas.

5. For the Check Columns, enter the column name that will be checked. If entering more than one

column name, separate with commas.

Page 9: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 9

6. For the Check Operators, enter the operator comparison that will be performed on the Check

Column. If there are multiple columns in the rule check, enter an operator for each and separate

with commas.

7. For the Check Values, enter the value that will be compared to the value in the Check Column. If

there are multiple columns in the rule check, enter a value for each and separate with commas.

8. For the Check Table, enter the table name that the check columns are sourced from. All pre-

check and check columns are from the same table, enter the same name as the Table Name

field. This field makes the development process easier for the technical user but does not drive

the Rule Management Framework.

9. For the Complexity Index, enter a 2_ followed by the sequential number of this Level 2 Rule

within the same table. I.E. “2_1” for the first Level 2 rule for the ORGANIZATIONS table, “2_2”

for the second Level 2 rule for the ORGANIZATIONS table, etc.

Example: If quantity is greater than 0, then unit price should be greater than 0.

Page 10: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 10

Data Quality Rules

Rule #1: rule_Completeness This rule is used to check whether the data is null or not. It is Level 1 complexity.

The rule takes in the following as input:

Rule ID

Key Value

Column Value

The rule gives the following as output:

Rule ID

Key Value

Result

The transformation logic is as follows:

1. The expression in dq_IsComplete checks whether the value is null or an empty string. It outputs

a ‘PASS’ or ‘FAIL’.

Page 11: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 11

Rule #2: rule_Date_Format This rule is used to validate that the date is in the correct format. It is Level 1 complexity.

The rule takes in the following as input:

Rule ID

Key Value

Column Value

The rule gives the following as output:

Rule ID

Key Value

Result

The transformation logic is as follows:

1. The null_check expression runs a simple check to make sure the field is not null.

2. The format_check expression checks that it is in the correct Date format according to the

desired format in the in_Date_Format variable, as seen below. It will output a ‘PASS’ or ‘FAIL’.

Page 12: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 12

Rule #3: rule_Email_Format This rule is used check for valid email addresses. It is Level 1 complexity.

The rule takes in the following as input:

Rule ID

Key Value

Column Value

The rule gives the following as output:

Rule ID

Key Value

Result

The transformation logic is as follows:

1. Exp_trim_spaces trims any excess spaces to the left or right of the input value.

2. Case_Converter converts the value to all lowercase. This standardizes any variation in casing.

3. Dq_Validate_Domain is a Labeler transformation that labels the domain in the value according

to the Informatica reference table of valid email domains. If a domain is not found or is not

valid, it is not labelled.

4. Dq_Validate_Email_Format is an expression that checks whether the domain is valid and the ‘@’

character is present and not the first character. The expression outputs a ‘PASS’ or ‘FAIL’ result.

Missing and not applicable inputs outputs a ‘PASS’.

Page 13: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 13

Rule #4: rule_Numeric This rule is used check whether a value only contains numbers. It is Level 1 complexity.

The rule takes in the following as input:

Rule ID

Key Value

Column Value

The rule gives the following as output:

Rule ID

Key Value

Result

The transformation logic is as follows:

1. Dq_IsNumeric trims excess spaces off the left and right of the value and evaluates whether the

input is a number. It outputs ‘PASS’ or ‘FAIL’ as the result.

Page 14: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 14

Rule #5: rule_Uniqueness This rule is used to check if the column value within the specified table is unique, i.e. only one instance

of this value occurs in the table. It is Level 1 complexity.

The rule takes in the following as input:

Rule ID

Key Values

Column Value

The rule gives the following as output:

Rule ID

Key Values

Result

The transformation logic is as follows:

1. The aggregator transformation is used to separate the data and identify where the data has

identical sets of data. The sorter transformation will enhance the aggregator’s performance.

From the Help documents:

2. The sorter transformation sorts those results in order to have them listed in order. This will

improve performance downstream in the joiner.

3. The joiner transformation brings the data back together in one compiled list.

4. The expression transformation pulls in the values and the count of each value in order to send

them through to the output to determine whether or not a value is unique.

5. The rule will output ‘PASS’ or ‘FAIL’.

Page 15: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 15

Rule #6: rule_A_to_B This rule is used to compare a column value to the user-inputted value from the Control Table. It is Level

2 complexity.

The rule takes in the following as input:

Rule ID

Key Values

Pre-Check Columns

Check Columns

The rule gives the following as output:

Rule ID

Key Values

Result

The transformation logic is as follows:

1. The mplt_lkp_ControlTable_RULE_ID is a reusable mapplet that contains a lookup to the Control

Table using Rule ID. It returns all data from the Control Table, including the Pre-Check/ Check

Operators and Values. These values are static, thus can simply be referenced from the Control

Table. The Pre-Check/ Check Columns, however, are not static (the values that are being

checked against the Pre-Check/Check Values) and must therefore be passed as input from the

source table to the composite mapplet at the mapping level.

2. The mplt_Parse_Ctrl_Table is a reusable mapplet that is able to standardize and parse a comma

separated list of column values for up to 5 column values. For this rule, it is parsing the values

for Pre-Check/Check Columns, Operators, and Values.

3. The exp_Cond_Check applies the comparison logic between the Columns and Values to be

checked. It evaluates the hard-coded operator looked up from the control table and then

executes that comparison. A passing comparison evaluates as ‘P’ for PASS. If a Pre-Check

comparison fails, it evaluates as ‘N’ for NOT APPLICABLE. If a Check comparison fails, it evaluates

Page 16: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 16

as ‘F’ for FAIL. If there are less than 5 comparisons, the empty columns will evaluate as ‘A’ for

ABSENT.

4. The expression transformation then evaluates the all the comparisons. If there is any ‘N’

present, the final result will output ‘NA’ for not applicable. Then if there is any ‘F’ present, it will

output ‘FAIL’, otherwise it will output ‘PASS’.

Page 17: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 17

Rule #7: rule_A_to_B_Date This rule is used to conditionally compare a date column value to another date value from a column or a

user-inputted date value from the Control Table. It is Level 2 complexity.

The rule takes in the following as input:

Rule ID

Key Values

Pre-Check Columns

Pre-Check Values

Check Columns

Check Values

The rule gives the following as output:

Rule ID

Key Values

Result

The transformation logic is as follows:

1. The mplt_lkp_ControlTable_RULE_ID is a reusable mapplet that contains a lookup to the Control

Table using Rule ID. It returns all data from the Control Table, including the Pre-Check/ Check

Operators. These values are static, thus can simply be referenced from the Control Table. The

Pre-Check/ Check Columns and Pre-Check/ Check Values (unlike rule_A_to_B) are not static and

must be passed as input from the source table to the composite mapplet at the mapping level.

This allows the rule to dynamically compare the values from two columns.

2. The mplt_Parse_Ctrl_Table is a reusable mapplet that is able to standardize and parse a comma

separated list of column values for up to 5 column values. For this rule, it is parsing the values

for Pre-Check/Check Columns, Operators, and Values.

3. The exp_Cond_Check applies the comparison logic between the Columns and Values to be

checked. It evaluates the hard-coded operator looked up from the control table and then

Page 18: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 18

executes that comparison. Only the Check comparisons compare dates, as stored in the

V_date_compare variable. A passing comparison evaluates as ‘P’ for PASS. If a Pre-Check

comparison fails, it evaluates as ‘N’ for NOT APPLICABLE. If a Check comparison fails, it evaluates

as ‘F’ for FAIL. If there are less than 5 comparisons, the empty columns will evaluate as ‘A’ for

ABSENT.

5. The expression transformation then evaluates the all the comparisons. If there is any ‘N’

present, the final result will output ‘NA’ for not applicable. Then if there is any ‘F’ present, it will

output ‘FAIL’, otherwise it will output ‘PASS’.

Page 19: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 19

Rule #8: rule_A_to_B_Exact_Col This rule is used to compare a column value to another column value. It is Level 2 complexity.

The rule takes in the following as input:

Rule ID

Key Values

Pre-Check Columns

Pre-Check Values

Check Columns

Check Values

The rule gives the following as output:

Rule ID

Key Values

Result

The transformation logic is as follows:

1. The mplt_lkp_ControlTable_RULE_ID is a reusable mapplet that contains a lookup to the Control

Table using Rule ID. It returns all data from the Control Table, including the Pre-Check/ Check

Operators. These values are static, thus can simply be referenced from the Control Table. The

Pre-Check/ Check Columns and Pre-Check/ Check Values (unlike rule_A_to_B) are not static and

must be passed as input from the source table to the composite mapplet at the mapping level.

This allows the rule to dynamically compare the values from two columns.

2. The mplt_Parse_Ctrl_Table is a reusable mapplet that is able to standardize and parse a comma

separated list of column values for up to 5 column values. For this rule, it is parsing the values

for Pre-Check/Check Columns, Operators, and Values.

3. The exp_Cond_Check applies the comparison logic between the Columns and Values to be

checked. It evaluates the hard-coded operator looked up from the control table and then

executes that comparison. A passing comparison evaluates as ‘P’ for PASS. If a Pre-Check

Page 20: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 20

comparison fails, it evaluates as ‘N’ for NOT APPLICABLE. If a Check comparison fails, it evaluates

as ‘F’ for FAIL. If there are less than 5 comparisons, the empty columns will evaluate as ‘A’ for

ABSENT.

6. The expression transformation then evaluates the all the comparisons. If there is any ‘N’

present, the final result will output ‘NA’ for not applicable. Then if there is any ‘F’ present, it will

output ‘FAIL’, otherwise it will output ‘PASS’.

Page 21: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 21

Rule A to B Use Cases Rule_A_to_B, rule_A_to_B_Date, and rule_A_to_B_exact_col are very flexible rules that can be used in

various scenarios. Use the following examples as guidance for the use cases for each rule.

Use Case Reusable Rule Type Example

Check for presence of B value based on presence of A.

Rule_A_to_B If order date is present, then order number should be present. Pre-Check: ORDER_DATE <> null Check: ORDER_NO <> null

Check for presence of B value based on specific value of A.

Rule_A_to_B If shipping confirmation is ‘Y’ then corresponding shipping date should be present. Pre-Check: SHIPPING_CONFIRMATION = Y Check: SHIPPING_DATE <> null

Check for specific value of B based on presence of A.

Rule_A_to_B If Order ID is present then corresponding Order Confirmation is ‘Y’. Pre-Check: ORDER_ID <> null Check: ORDER_CONFIRMATION = Y

Check for specific values of B based on specific values of A, multiple columns.

Rule_A_to_B If the state is California, then the country name should be USA. Pre-Check: STATE = CA Check: COUNTRY = USA

Check that B date is greater or lesser than A date, can also add a pre-check condition.

Rule_A_to_B_Date Order date should not be greater than shipping date. Check: ORDER_DATE < SHIPPING_DATE

Check that column value B is equal to column value A, can also add a pre-check condition.

Rule_A_to_B_Exact_Col Order Details table has an Order ID not present in the Order table. Check: ORDER_ID = ORDER_ID Check Table: ORDER_DETAILS

Rule Development Creating a new data quality rule check.

1. Create a new mapplet.

2. Rename the mapplet with the prefix rule_[name] and click Finish.

Page 22: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 22

3. Create a new Input transformation.

4. Right-click the transformation title, select Rename and enter “INPUT” as the new name.

5. In the bottom pane, under Properties > Ports, create three new ports.

6. Enter the following as the input fields:

Name Type Precision

RULE_ID string 200

KEY_VALUE_COMMA string 200

COL_VALUE string 60

a. These three fields must be present in all rules. More complex rules will have additional

inputs, see example mapplet rule_A_to_B.

7. Add whatever transformation(s) necessary to perform the data quality rule check using the field

COL_VALUE. It must lead to a RESULT output as “PASS” or “FAIL”.

8. Add a new Output transformation and rename it OUTPUT.

9. Enter the following as the output fields:

Page 23: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 23

Name Type Precision

RESULT string 10

KEY_VALUE_COMMA string 200

RULE_ID string 200

a. RULE_ID is important to maintain rule traceability within the Rule Management

Framework.

10. Connect the KEY_VALUE_COMMA and RULE_ID directly from the input transformation.

11. Validate and save your new rule mapplet.

Page 24: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 24

Composite Mapplet This Composite Mapplet consolidates all Level 1 and Level 2 reusable rule types into one mapplet, and is

used once per table/mapping. Each rule from the Control Table is processed through the Composite

Mapplet and is routed to the appropriate reusable rule type.

The transformation logic is as follows:

1. The mplt_lkp_ControlTable_TAB_COL is a reusable mapplet that contains a lookup on the

Control Table using Table Name and Column Name. This returns all the data from the Control

Table for the rules assigned to that table.

2. The Expression transformation performs a simple clean-up on the Column Name to remove

extra spaces and commas.

3. The Router reads the Rule Type that is referenced by the Lookup mapplet, and sends the

information from the Control Table to the appropriate reusable rule that will perform the

assigned rule check.

4. The Union brings together the results from all the reusable rule types, and then the results are

outputted.

Composite Mapplet Development Adding a new rule to the Composite Mapplet.

1. Open the mplt_Composite mapping.

2. Select the Router transformation and go to Properties > Groups.

3. Create a new group using your new rule name as the Group Name.

4. Enter the Group Filter Condition as RULE_TYPE = ‘[your rule name here]’

a. Be sure to use single quotes.

5. Drag and drop your new rule mapplet into the Composite Mapplet workspace.

6. Connect the necessary outputs from the new Router group to your rule inputs.

Page 25: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 25

7. Select the Union transformation and go to Properties > Groups.

8. Create a new group using your rule name as the Group Name.

9. Connect the outputs from your rule to the group inputs from the Union.

10. Validate and save the Composite Mapplet.

Page 26: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 26

Mapping This mapping performs data quality checks on the CUSTOMER table. Each mapping contains one source

table, and it ends by loading the data quality check results into the Fact and Dimension tables to be used

in reporting. This is a representative mapping that would need to be produced for each table.

The transformation logic is as follows:

1. Exp_prep_table_name contains the hard-coded name of the Table to be checked.

2. Lookup_ref_DQ_Rule_Control looks up the Control Table by Table Name and returns all the

rules that apply to the table.

3. Rtr_Complexity_Levels then routes the data based on the Complexity Index and checks that the

Rule Status is set as ‘Complete’.

4. Exp_Level_1 preps the data for all the simple rules. The Key Values are hard-coded as a comma

separated list. The Column Value is obtained through an IF statement and are formatted into a

comma separated list if more than one column is checked.

5. Exp_Level_2_1 preps the data for a singular complex rule. Key Values are hard-coded as a

comma separated list. Pre-Check Column and Check Column must be hard-coded for

rule_A_to_B. In addition, Pre-Check Value and Check Value must be hard-coded for

rule_A_to_B_exact_col and rule_A_to_B_Date.

6. Additional Level 2 complex rule checks have its own expressions to prep data, e.g.

Exp_Level_2_2, Exp_Level_2_3, etc.

7. Un_Input_For_Composite unions all the prepped column names and values for Level 1 and Level

2 rules as input to the Composite Mapplet.

8. Mplt_composite_TEST_REF is the composite mapplet, further detailed in the “Mapplet #1:

Composite Mapplet” section.

9. Un_All_Levels combines all the results from Level 1 and Level 2rules.

10. Exp_Anchor acts as an anchor to test and adjust the mapping before the final step.

11. Write_RESULTS is the target table for the results.

Mapping Development Adding a new Level 2 rule to an existing mapping.

Level 2

Level 1

Page 27: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 27

1. Select the Router transformation and go to Properties > Groups.

2. Create a new group. Enter the Group Name as the complexity index number of that rule, e.g.

Level_2_[next sequential number].

3. Enter the following as the Group Filter Condition:

LKP_COMPLEXITY_INDEX= '2_[next sequential number]' AND NOT

ISNULL(LKP_TABLE_NAME) AND NOT ISNULL(LKP_COLUMN_NAME) AND LKP_RULE_STATUS

= 'Complete'

4. Copy and paste another Level 2 expression and rename it to the appropriate complexity index.

5. Connect the output ports from the Router group to the new expression.

a. Tip: use the Auto Link function for this step.

6. Select the Union transformation and go to Properties > Groups.

7. Create a new group. Enter the complexity index as the Group Name.

8. Connect the output ports from the Level 2 expression to the Union group.

9. Validate and save your mapping.

Creating a new mapping for applying the Rule Management Framework to a new source table.

1. Copy any mapping and paste it to the same mappings folder.

2. Select Reuse object dependencies in new object.

a. Copying the dependencies will create copies of all the associated mapplets, reference

tables, etc.

3. Rename the mapping to m_[Source Table Name]_load_DQ_rule_results.

4. Delete the old source table, and drag and drop the new source table with Read access.

5. Now we will have to replace all the column names from the old source table with the new

columns.

6. In Exp_prep_table_name, delete all the old input columns (everything except the first field

O_TABLE_NAME). Drag over all the ports from the source.

Page 28: Dynamic Data Quality Rule Management Framework DQ... · P a g e | 3 Purpose This document describes the technical design of the Informatica Data Quality (IDQ) components for the Dynamic

P a g e | 28

7. Do the same for Lookup_ref_DQ_rule_control and for the Input Group of

Rtr_Complexity_Levels.

8. From the corresponding output groups in the Router, replace the old source fields in their

respective Level 1 and Level 2 expressions.

9. Now we must change the V_COL_VALUE variable ports in all the expressions. Using the Variable

Concatenation Excel spreadsheet, copy and paste all the column names in the first column. This

will generate the variable expression in the second column.

a. Tip: A quick way to copy all the column names is to profile the source table and export

it to an excel spreadsheet. The first page of this profile will list all the column names.

10. Copy the second column (without the heading).

11. Select the Exp_Level_1 expression and go to Properties > Ports > Variables. Go to the

V_COL_VALUE variable and open the expression field.

12. Paste the expression that was generated from the Variable Concatenation Excel spreadsheet.

Delete the last two || characters.

13. Validate the expression.

14. If there are no Level 2 rules on this mapping, you are finished. Validate and save the mapping.

15. If you do have Level 2 rules, you must modify the variables in the Exp_Level_2 transformations.

Change the V_PRE_CHECK_COL, V_PRE_CHECK_VALUE, V_CHECK_COL, and V_CHECK_VALUE to

match the column name that was entered into the control table for those fields. If there is more

than one column, you must separate the column names with ||’,’||.

16. Validate the expression. Validate and save the mapping.


Recommended