+ All Categories
Home > Documents > Slowly Changing Dimensions with Oracle9i Warehouse · PDF fileSlowly Changing Dimensions with...

Slowly Changing Dimensions with Oracle9i Warehouse · PDF fileSlowly Changing Dimensions with...

Date post: 05-Feb-2018
Category:
Upload: trinhcong
View: 221 times
Download: 2 times
Share this document with a friend
31
Slowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003
Transcript
Page 1: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder

An Oracle White Paper June 2003

Page 2: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page1

Slowly Changing Dimensions with Oracle9i Warehouse Builder

Introduction................................................................................................... 2

Case Study Environment ............................................................................. 3

The Scenario .................................................................................................. 3

Type I Slowly Changing Dimension with Warehouse Builder............... 6

Type II Slowly Changing Dimensions with Warehouse Builder............ 9

Type III Slowly Changing Dimensions with Warehouse Builder........22

Deploy and Execute the Mappings ..........................................................27

Conclusion ...................................................................................................28

Page 3: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page2

Slowly Changing Dimensions with Oracle9i Warehouse Builder

INTRODUCTION The Slowly Changing Dimension (SCD) is a well-defined strategy to manage both current and historical data across the time span in data warehouses. It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records.

There are typically three types of SCD to implement:

1. Type I: Overwriting. There is only one version of the dimension record, and the record is modified with no history tracking required.

2. Type II: Creating another dimension record. There are multiple versions of the same dimension record. When new versions are created, old versions are retained.

3. Type III: Creating a current value field. There are two versions of the same dimension record: an old value and a current value. The old value is retained while the current value is modified.

Users have to decide which type of Slowly Changing Dimension they should build based on their business requirements. For more information, refer to The Data Warehouse Toolkit1.

Once a particular SCD type has been selected, users should:

1. Create a dimension that can keep history data.

2. Create a mapping that extracts data from the source system, transforms it, and then loads it into the pre-defined dimension target.

3. Generate and deploy both the dimension and the mapping into Oracle Database.

4. Execute the mapping.

1 By Ralph Kimball

Page 4: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page3

This paper includes steps to design and deploy different Slowly Changing Dimensions with Warehouse Builder quickly and simply. This paper also provides an outline of how Warehouse Builder code uses the Oracle9i Database engine.

CASE STUDY ENVIRONMENT This feature is implemented using the Warehouse Builder 9.0.4 release. An Oracle9i Release 2 database stores the repository for both design and runtime, as well as the schema for the warehouse targets.

• Oracle Warehouse Builder Release 2 (9.0.4)

• Oracle9i Database Release 2

THE SCENARIO We demonstrate how to construct an SCD with the following sample scenario:

• The source system includes a source table named GEO_SRC from which data is extracted.

• The target warehouse includes the following:

o A sequence named DIM_ID to populate surrogate keys.

o A dimension named GEO_DIM as a Type I SCD to which data is loaded.

o A mapping to load data from GEO_SRC to GEO_DIM as a Type I SCD target.

o A dimension named GEO_DIM_TYPE2 as a Type II SCD to which data is loaded.

o A mapping to load data from GEO_SRC to GEO_DIM as a Type II SCD target.

o A dimension named GEO_DIM_TYPE3 as a Type III SCD to which data is loaded.

o A mapping to load data from GEO_SRC to GEO_DIM_TYPE3 as a Type III SCD target.

Source Description

We use a geographical data source as our case study for illustration. A typical example of simplified geographical source data has the following attributes:

Page 5: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page4

Target Description

We use a geography dimension as our case study for illustration. A typical example of a geography dimension has two levels: city and state. A city level is the lowest level in the geographical hierarchy, while state level is a higher level. A simplified city level has the following attributes:

Page 6: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page5

• ID: This is the surrogate key for city level.

• NAME: This is the natural key for city level.

• POPULATION: Population of the city.

A simplified state level has the following attributes:

• ID: This is the surrogate key for state level.

• NAME: This is the natural key for state level.

• BUDGET: Budget of the state.

Star Schema

We use a star schema to store data for all levels on the same dimension table target, because it is one of the most commonly used strategies.

Page 7: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page6

TYPE I SLOWLY CHANGING DIMENSION WITH WAREHOUSE BUILDER With Type I SCD, the target table retains no history and only stores the latest value for the dimension record. The following is a sample dimension table defined for GEO_DIM to support Type I SCD:

Once the GEO_DIM dimension is defined, it can be used as a target in a mapping. To load a Type I slowly changing dimension, data is extracted from the source and directly loaded into the target. GEO_SRC is the example source table here from which data is extracted and loaded into GEO_DIM.

Populating surrogate key

To ensure that unique numbers are assigned for surrogate keys for new dimension records, a sequence operator is used to map to CITY_ID (the lowest level key and the surrogate key column of GEO_DIM).

Page 8: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page7

Configuring target properties

The properties for the GEO_DIM operator are configured to ensure that data load properly. First, the loading type of GEO_DIM is configured as ‘UPDATE/INSERT’.

In addition, the following configuration is required for each mapped column:

• CITY_ID is the surrogate key and is to be loaded only when inserting rows.

• CITY_NAME is the natural key and is to be loaded only when inserting rows. It is also to be matched when updating rows.

Page 9: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page8

• CITY_POPULATION is to be loaded both when inserting and updating rows. STATE_NAME and STATE_BUDGET are configured in the same way.

Generating Code

If the target database type (configurable from warehouse module configuration properties) is set to Oracle9i, then the MERGE feature is already set when code is generated.

Page 10: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page9

TYPE II SLOWLY CHANGING DIMENSIONS WITH WAREHOUSE BUILDER With Type II SCD, a new version of the dimension record is created, and the existing version is marked as history. To accommodate this, extra metadata is required for the dimension table, including an effective date column and an expiration date column. These columns are used to differentiate a current version from a historical version as follows:

• Effective date column stores the effective date of the version; also known as start date.

• Expiration date column stores the expiration date of the version; also known as end date.

• Expiration date value of the current version is always set to NULL or a default date value.

The user must identify the columns whose history will be tracked (by creating a new version) whenever their values are changed, such as CITY_POPULATION. These columns are known as trigger columns and should be described as part of the metadata.

The following is a sample dimension table defined for GEO_DIM_TYPE2 to support Type II:

Page 11: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page10

Once the GEO_DIM_TYPE2 dimension is defined, it can be used as a target in the mapping. GEO_SRC is the sample source table here from which data are to be loaded into GEO_DIM_TYPE2.

To load a Type II slowly changing dimension, data is extracted from the source and transformed before it is load to the target. The following mapping graph where data is first extracted from GEO_SRC, transformed by a series of operators, and finally loaded into GEO_DIM_TYPE2, makes this possible.

How is the data actually transformed? Warehouse Builder supports any operators required for Type II Slowly Changing Dimensions. With Warehouse Builder, the whole ETL process of Type II Slowly Changing Dimensions can be performed in

Page 12: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page11

one single mapping. We demonstrate how data is transformed in a step-by-step fashion.

Detect a match

First, each source row from GEO_SRC which matches a current dimension record in GEO_DIM_TYPE2 must be identified. A Joiner operator is used to match GEO_SRC with GEO_DIM_TYPE2 exclusively (using outer join), with natural key columns as the join condition.

Also notice that GEO_SRC should only match current dimension records in GEO_DIM_TYPE2, rather than history dimension records. To do this, we apply a Filter operator to filter out history records from matching.

Page 13: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page12

Split join results

After the Joiner operator, the output data now includes both the source data rows and the matched target rows. Each output row from the Joiner operator must be categorized into the following groups:

1. OPEN_SET is defined to create a new version or overwrite a current version.

2. CLOSE_SET is defined to mark a current version as historical.

Perform this categorizing by splitting the Joiner output into OPEN_SET and CLOSE_SET groups using a Splitter operator.

Page 14: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page13

A Joiner output row is written into the OPEN_SET group if the row comes from GEO_SRC and either matches any current record in GEO_DIM_TYPE2, or matches with no record in GEO_DIM_TYPE2. This is accomplished by specifying the Splitter condition for the OPEN_SET group.

A Joiner output row is written into the CLOSE_SET group if both the following conditions are true:

1. If the record comes from a row in GEO_SRC that matches any current version in GEO_DIM_TYPE2, and

2. If any trigger column from GEO_DIM_TYPE2 does not equal to that from GEO_SRC.

To this end, ‘AND’ is used for the above two condition clauses in the Splitter condition for the CLOSE_SET group.

Page 15: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page14

Determining merge rows

With OPEN_SET and CLOSE_SET, the following two delta sets which will be loaded to GEO_DIM_TYPE2 are computed:

• From CLOSE_SET to update GEO_DIM_TYPE2; also known as UPDATE_DELTA_ROW

• From OPEN_SET to update/insert GEO_DIM_TYPE2; also known as MERGE_DELTA_ROW

Expression operators can accomplish both tasks. UPDATE_DELTA_ROW and MERGE_DELTA_ROW are created as two separate Expression operators from the output of CLOSE_SET and OPEN_SET, respectively. The output groups of both Expression operators are then UNION by using a Set operator, whose output row set is ready to be mapped to GEO_DIM_TYPE2 directly.

Expression UPDATE_DELTA_ROW

UPDATE_DELTA_ROW represents the row set that will overwrite the final target row in order to mark a current matched version as historical. Specifically, the target expiration timestamp must be updated with the current system date value. This operation is also known as closing the current version. The expression of attribute DATE_EXP is specified as SYSDATE.

Page 16: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page15

The rest of the columns do not have to be updated to specify the original target column values for the corresponding expressions.

Expression MERGE_DELTA_ROW

MERGE_DELTA_ROW represents the row set which will overwrite the final target row in order to:

Page 17: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page16

• Create another current version if there is either no current version matched in target, or the matched version has a different value in any of the trigger columns.

• Otherwise, update the matched version directly.

Specifically, an expression is built for each final target column to differentiate between the above two scenarios by instantiating a CASE expression, i.e., ‘Case When (…) Then (…) Else (…) End’. Fortunately, Warehouse Builder supports a user-friendly expression builder to easily accomplish this.

For DATE_EFF or any effective timestamp column, the expression is specified to:

• Either preserve the current system time (or SYSDATE) if the creation of another version is required, or

• Preserve the effective timestamp value derived from the target (i.e., update the matched version).

The following is an example of how to specify the expression for DATE_EFF or any effective timestamp column:

For DATE_EXP or any expiration timestamp column, specify the expression to:

• Either preserve a default value (such as NULL or some future timestamp 01/01/2004) to mark any version as the current version if the creation of another version is required, or

• Preserve the expiration timestamp value derived from the target (i.e., update the matched version).

The following is an example of how to specify the expression for DATE_EXP or any expiration timestamp column:

Page 18: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page17

For CITY_NAME or any natural key column, the record is always overwritten with the natural key value derived from the source. The following is an example of how to specify the expression for CITY_NAME or any natural key column:

For CITY_ID_KEY or any surrogate key column, preserve the surrogate key value that was derived from the target in order to match it against the final target row for the purpose of updating if updating the matched version is required.

The derived target surrogate key would be NULL if it were to create a new version; a sequence number would be introduced later to ensure that a unique surrogate key value is assigned for creating a dimension record.

The following is an example of how to specify the expression for CITY_ID_KEY or any surrogate key column:

Page 19: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page18

For STATE_NAME or any non-trigger column, the record is always overwritten with the value derived from the source.

For CITY_POPULATION or any trigger column, the record is always overwritten with the value derived from the source.

Populating the surrogate key

To ensure that unique numbers are assigned as surrogate keys for new dimension records, use a Sequence operator to insert CITY_ID, which is the surrogate key column of GEO_DIM_TYPE2.

Page 20: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page19

The derived target surrogate key from UNION would be used to match with the final target surrogate key during loading. An additional attribute, MATCHING, is created for the final target and then map from the derived target surrogate key, CITY_ID_KEY, to it.

The MATCHING attribute stands for the unique key of the final target chosen as the matching criteria to ensure that data loads properly. Here, the final target surrogate key column CITY_ID is used as the MATCHING attribute. The bound name is set to be the same as CITY_ID:

Page 21: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page20

Configuring target properties

This section discusses how to configure the properties for GEO_DIM_TYPE2 operator to ensure that data loads properly.

First, configure the loading type of GEO_DIM_TYPE2 to be ‘UPDATE/INSERT’.

Additionally, the following must be configured for each mapped column:

• CITY_ID is the surrogate key and is loaded only when inserting rows.

Page 22: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page21

• CITY_NAME is the natural key and is loaded only when inserting rows.

• CITY_POPULATION is loaded both when inserting and updating rows. STATE_NAME, EFFECTIVE_DATE and EXPIRATION_DATE are configured in the same way.

Page 23: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page22

• MATCHING is to be matched when updating rows.

Generating Code

If the target database type (configurable from warehouse module configuration properties) is set to Oracle9i, the MERGE feature is already set when code is generated.

TYPE III SLOWLY CHANGING DIMENSIONS WITH WAREHOUSE BUILDER With Type III SCD, a current value field is created to keep the current value of dimension record apart from its previous value. To accomplish this, two columns are created for each data field: one storing the current value and one storing the previous value, respectively. The following is a sample dimension table defined for GEO_DIM_TYPE3 to support Type III SCD:

Page 24: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page23

Once the GEO_DIM_TYPE3 dimension is defined, it can be used as a target in a mapping. GEO_SRC is the sample source table from which data is extracted and loaded into GEO_DIM_TYPE3.

To load Type III Slowly Changing Dimensions, records are extracted from the source and then transformed before being directly loaded into the target. The following sample mapping first extracts records from GEO_SRC, transforms them by a series of operators, and finally loads them into GEO_DIM_TYPE3.

The following sections explore how data is transformed in a step-by-step fashion.

Page 25: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page24

Detect a match

First, each source row from GEO_SRC that matches a current dimension record in GEO_DIM_TYPE3 must be identified. For this, a Joiner operator is used to match GEO_SRC with GEO_DIM_TYPE3 exclusively (using an outer join) with natural key columns as the join condition.

Populating current value

For Type III SCD, the current value columns of the target are always overwritten with the records extracted from the source. This is accomplished by creating mapping lines from the Joiner output directly to the target, GEO_DIM_TYPE3.

Populating previous value columns by Expression

For Type III SCD, when and how to overwrite the target’s previous value columns, including CITY POPULATION_OLD, CITY STATE_BUDGET_OLD, and CITY STATE_NAME_OLD, is important. Specifically, it is important to:

• Overwrite the previous value column with current value column of the target if current value of the target is different from that of the source; or

• Otherwise, no change is required for the previous value column.

To accomplish this, build an Expression from the previous Joiner result and instantiate the expression using a CASE expression.

Page 26: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page25

The following is an example of how to instantiate the expression using a CASE expression.

Page 27: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page26

Populating the surrogate key

This is similar to what the steps for Type I SCD.

Configuring target properties

This is similar to the steps for creating a Type I SCD.

The loading type of GEO_DIM_TYPE3 must be ‘UPDATE/INSERT’.

CITY_ID is the surrogate key and is to be loaded only when inserting rows.

CITY_NAME is the natural key and is to be loaded only when inserting rows. It is also to be matched when updating rows.

Others are to be loaded both when inserting and updating rows.

Generating Code

If the target database type (configurable from warehouse module configuration properties) is set to Oracle9i, the MERGE feature is already set when code is generated.

Page 28: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page27

DEPLOY AND EXECUTE THE MAPPINGS Once the dimensions and mappings are constructed, they must be deployed and executed using the Deployment Manager. Set-based mode is available if the target database is Oracle9i to ensure best performance.

Page 29: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page28

CONCLUSION We have demonstrated that Warehouse Builder is ideal for implementing all types of Slowly Changing Dimensions. By leveraging the Oracle9i Database engine, Warehouse Builder is superior in generating native SQL code to perform complex ETL tasks, such as loading data into Slowly Changing Dimensions. This is made possible with custom mappings, which are well defined and easy to create using the Warehouse Builder mapping operators. In conclusion, the following Warehouse Builder qualities are ideal for creating Slowly Changing Dimensions:

• Easy to program

• Highly performant code

• Fully supports all types of SCD

Page 30: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page29

White Paper Title June 2003 Author: Yuan-lu Chang Contributing Authors: N/A Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065 U.S.A. Worldwide Inquiries: Phone: +1.650.506.7000 Fax: +1.650.506.7200 www.oracle.com Copyright © 2003, Oracle. All rights reserved. This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.

Page 31: Slowly Changing Dimensions with Oracle9i Warehouse  · PDF fileSlowly Changing Dimensions with Oracle9i Warehouse Builder An Oracle White Paper June 2003

Slowly Changing Dimensions with Oracle9i Warehouse Builder Page30


Recommended