+ All Categories
Home > Documents > DWH Training CTS

DWH Training CTS

Date post: 28-Nov-2014
Category:
Upload: bharat-kumar-kakani
View: 124 times
Download: 4 times
Share this document with a friend
Popular Tags:
167
Introduction Introduction to Data to Data Warehousing Warehousing
Transcript
Page 1: DWH Training CTS

Introduction to Data Introduction to Data WarehousingWarehousing

Introduction to Data Introduction to Data WarehousingWarehousing

Page 2: DWH Training CTS

2©Copyright 2004, Cognizant Academy, All Rights Reserved

Session Objectives

• Overview of Data Warehousing

• Data Warehouse Architectures

• How to create a data warehouse

• How to design a data warehouse

• Understand the ETL process

• What is metadata

• How to administer a data warehouse

Page 3: DWH Training CTS

Operational SystemsOperational SystemsOperational SystemsOperational Systems

Page 4: DWH Training CTS

4©Copyright 2004, Cognizant Academy, All Rights Reserved

What is an Operational System?

• Operational systems are just what their name implies; they are the

systems that help us run the day-to-day enterprise operations.

• These are the backbone systems of any enterprise, such as order

entry inventory etc.

• The classic examples are airline reservations, credit-card

authorizations, and ATM withdrawals etc.,

Page 5: DWH Training CTS

5©Copyright 2004, Cognizant Academy, All Rights Reserved

Characteristics of Operational Systems

• Continuous availability

• Predefined access paths

• Transaction integrity

• Volume of transaction - High

• Data volume per query - Low

• Used by operational staff

• Supports day to day control operations

• Large number of users

Page 6: DWH Training CTS

6©Copyright 2004, Cognizant Academy, All Rights Reserved

Historical Look at Informational Processing

The goal of Informational Processing is to turn data into

information!

Why?

Because business questions are answered using information and

the knowledge of how to apply that information to a given problem.

DataData InformationInformation KnowledgeKnowledge

Page 7: DWH Training CTS

7©Copyright 2004, Cognizant Academy, All Rights Reserved

• Data : Informational data is distinctly

different from operational data in its

structure and content .

• Processing : Informational processing is

distinctly different from operational

processing in its characteristics and use of

data

Need for a Separate informational system

Page 8: DWH Training CTS

8©Copyright 2004, Cognizant Academy, All Rights Reserved

The Information Center

• Management requires business information

• A request for a report is made to the

Information Center

• Information Center works on developing the

report

• Requirements for the report must be clarified

Page 9: DWH Training CTS

9©Copyright 2004, Cognizant Academy, All Rights Reserved

• Report provided to analyst

• Analyst manipulates data for decision making

• Management receives information, but...

What took so long? and

How do I know it’s right?

The Information Center

Page 10: DWH Training CTS

10©Copyright 2004, Cognizant Academy, All Rights Reserved

Too Many Steps Involved!

The Information Center

Page 11: DWH Training CTS

11©Copyright 2004, Cognizant Academy, All Rights Reserved

Tactical InformationInventory Control System

Production quantity

Transported Quantity

Order quantity

Supports day to day control operations

Transaction Processing

High Performance Operational Systems

Fast Response Time

Initiates immediate action

OLTP Server

Page 12: DWH Training CTS

12©Copyright 2004, Cognizant Academy, All Rights Reserved

Strategic Information

• Understand Business Issues

• Analyze Trends and Relationships

• Analyze Problems

• Discover Business Opportunities

• Plan for the Future

FinancePayroll

Marketing Production & Inventory

Page 13: DWH Training CTS

13©Copyright 2004, Cognizant Academy, All Rights Reserved

Operational data helps the organization meet operational and tactical requirements for data.

While the Data Warehouse data helps the organization meet strategic requirements for information

Need for Tactical and Strategic informationOLTP Server

Strategic Information

Tactical Information

OperationalData

PeriodicRefresh

Data Warehouse Server

Page 14: DWH Training CTS

14©Copyright 2004, Cognizant Academy, All Rights Reserved

Operational Analytical

Primarily primitive,

Current; accurate as of

now

Constantly updated

Minimal redundancy

Highly detailed data

Referential integrity

Supports day-to-day

business functions

Normalized design

Primarily derived,

Historical; accuracy

maintained over time

Less frequently updated

Managed redundancy

Summarized data

Historical integrity

Supports long-term

informational requirements

De-normalized design

Operational Vs Analytical systems

Page 15: DWH Training CTS

Data WarehousingData WarehousingData WarehousingData Warehousing

Page 16: DWH Training CTS

16©Copyright 2004, Cognizant Academy, All Rights Reserved

Subject Oriented

Integrated

Time variant

Non-volatile collection of data in support of management decision

processes

The Data Warehouse is

Data Warehouse Definition

Page 17: DWH Training CTS

17©Copyright 2004, Cognizant Academy, All Rights Reserved

Accounting

Order Entry

Billing

Customer

Usage

Revenue

Operational data is organized by specific processes or tasks and is maintained by separate systems

Warehoused data is organized by subject area and is populated from many operational systems

OperationalSystems

DataWarehouse

Data Warehouse- Differences from Operational Systems

Page 18: DWH Training CTS

18©Copyright 2004, Cognizant Academy, All Rights Reserved

Application Specific Integrated

Applications and their databases were designed and built separately

Evolved over long periods of time

Integrated from the start

Designed (or “Architected”) at one time, implemented iteratively over short periods of time

OperationalSystems

Data Warehouse

Data Warehouse- Differences from Operational Systems

Page 19: DWH Training CTS

19©Copyright 2004, Cognizant Academy, All Rights Reserved

Primarily concerned with current data

Generally concerned with historical data

OperationalSystems

DataWarehouse

Data Warehouse- Differences from Operational Systems

Page 20: DWH Training CTS

20©Copyright 2004, Cognizant Academy, All Rights Reserved

Load/ Update

Consistent Points in Time

Updated constantly

Data changes according to

need, not a fixed schedule

Added to regularly, but loaded data

is rarely directly changed

Does NOT mean the Data

warehouse is never updated or

never changes!!

Constant Change

Operational systems Database

Data warehouse

Datawarehouse- Differences from Operational Systems

Insert

Insert

Update

Initial Load

Incremental Load

Incremental Load

Update

Delete

Page 21: DWH Training CTS

21©Copyright 2004, Cognizant Academy, All Rights Reserved

Data in a Data Warehouse

What about the data in the Datawarehouse?

• Separate DSS data base

• Storage of data only, no data is created

• Integrated and Scrubbed data

• Historical data

• Read only (no recasting of history)

• Various levels of summarization

• Meta data

• Subject

• Easily oriented accessible

Page 22: DWH Training CTS

22©Copyright 2004, Cognizant Academy, All Rights Reserved

Data Warehousing Features

• Strategic enterprise level decision support

• Multi-dimensional view on the enterprise data

• Caters to the entire spectrum of management

• Descriptive, standard business terms

• High degree of scalability

• High analytical capability

• Historical data only

Page 23: DWH Training CTS

23©Copyright 2004, Cognizant Academy, All Rights Reserved

Datawarehouse - Business Benefits

Benefits To Business

• Understand business trends

• Better forecasting decisions

• Better products to market in timely manner

• Analyze daily sales information and make quick decisions

• Solution for maintaining your company's competitive edge

Page 24: DWH Training CTS

24©Copyright 2004, Cognizant Academy, All Rights Reserved

Data Warehouse- Application Areas

Following are some Business Applications of a data warehouse:

• Risk management

• Financial analysis

• Marketing programs

• Profit trends

• Procurement analysis

• Inventory analysis

• Statistical analysis

• Claims analysis

• Manufacturing optimization

• Customer relationship management

Page 25: DWH Training CTS

Data MartsData MartsData MartsData Marts

Page 26: DWH Training CTS

26©Copyright 2004, Cognizant Academy, All Rights Reserved

What is a Data mart?

• Data mart is a decentralized subset of data found either in a data warehouse or as a standalone subset designed to support the unique business unit requirements of a specific decision-support system.

• Data marts have specific business-related purposes such as measuring the impact of marketing promotions, or measuring and forecasting sales performance etc,.

Data Mart

Data Mart

EnterpriseData Warehouse

Page 27: DWH Training CTS

27©Copyright 2004, Cognizant Academy, All Rights Reserved

Data marts - Main Features

Main Features:

• Low cost

• Controlled locally rather than centrally, conferring power on the user group.

• Contain less information than the warehouse

• Rapid response

• Easily understood and navigated than an enterprise data warehouse.

• Within the range of divisional or departmental budgets

Page 28: DWH Training CTS

28©Copyright 2004, Cognizant Academy, All Rights Reserved

Datamart Advantages :

• Typically single subject area and fewer dimensions

• Limited feeds

• Very quick time to market (30-120 days to pilot)

• Quick impact on bottom line problems

• Focused user needs

• Limited scope

• Optimum model for DW construction

• Demonstrates ROI

• Allows prototyping

Advantages of Datamart over Datawarehouse

Page 29: DWH Training CTS

29©Copyright 2004, Cognizant Academy, All Rights Reserved

Data Mart disadvantages :

Does not provide integrated view of business information.

Uncontrolled proliferation of data marts results in redundancy

More number of data marts complex to maintain

Scalability issues for large number of users and increased

data volume

Disadvantages of Data Mart

Page 30: DWH Training CTS

Different Approaches for Different Approaches for Implementing Data marts Implementing Data marts Different Approaches for Different Approaches for

Implementing Data marts Implementing Data marts

Page 31: DWH Training CTS

31©Copyright 2004, Cognizant Academy, All Rights Reserved

Q:When is a Data Warehouse not a Data Warehouse?

A:When it’s an unarchitected collection of data marts

Non-architected Data marts

Page 32: DWH Training CTS

32©Copyright 2004, Cognizant Academy, All Rights Reserved

Significant and expensive duplication of effort and data.

Non-architected Data martsSource systems Data marts End user access

Page 33: DWH Training CTS

33©Copyright 2004, Cognizant Academy, All Rights Reserved

The upsides of Non-architected Data marts are:

1. Speed

2. Low cost

The downsides of Non-architected Data marts are:

1.Multiple extraction processes

2. Multiple business rules

3. Multiple semantics

4. Extremely challenging to integrate

Upsides and Downsides of Non-architected Datamarts

Page 34: DWH Training CTS

34©Copyright 2004, Cognizant Academy, All Rights Reserved

Architected Data Warehouse

EnterpriseData Warehouse

Metadata

Source systems

Data Staging

End user access

Page 35: DWH Training CTS

35©Copyright 2004, Cognizant Academy, All Rights Reserved

Unarchitected Data marts Vs Data warehouse

Architected

Data and results consistent

Redundancy is managed

Detailed history available for drill-down

Metadata is consistent!

Easy to do, Not architected

? Are the extracts, transformations, integration's & loads consistent?

? Is the redundancy managed?

? What is the impact on the sources?

Unarchitected Data Marts Data Warehouse

EnterpriseData Warehouse

Page 36: DWH Training CTS

The Operational Data The Operational Data StoreStore

The Operational Data The Operational Data StoreStore

Page 37: DWH Training CTS

37©Copyright 2004, Cognizant Academy, All Rights Reserved

ODS Definition

The ODS is defined to be a structure that is:

• Integrated• Subject oriented• Volatile, where update can be done• Current valued, containing data that is a day or perhaps a month

old• Contains detailed data only.

Page 38: DWH Training CTS

38©Copyright 2004, Cognizant Academy, All Rights Reserved

Why We Need Operational Data Store?

Need

• To obtain a “system of record” that contains the best data that exists in a

legacy environment as a source of information

• Best here implies data to be

– Complete

– Up to date

– Accurate

• In conformance with the organization’s information model

Page 39: DWH Training CTS

39©Copyright 2004, Cognizant Academy, All Rights Reserved

• ODS data resolves data integration issues

• Data physically separated from production environment to insulate it from the processing demands of reporting and analysis

• Access to current data facilitated.

Operational Data Store - Insulated from OLTP

Tactical Analysis

OLTP Server

ODS

Page 40: DWH Training CTS

40©Copyright 2004, Cognizant Academy, All Rights Reserved

• Detailed data

– Records of Business Events

(e.g. Orders capture)

• Data from heterogeneous sources

• Does not store summary data

• Contains current data

Operational Data Store - Data

Page 41: DWH Training CTS

41©Copyright 2004, Cognizant Academy, All Rights Reserved

ODS- Benefits

• Integrates the data

• Synchronizes the structural differences in data

• High transaction performance

• Serves the operational and DSS environment

• Transaction level reporting on current dataFlat files

RelationalDatabase

Operational Data Store

60,5.2,”JOHN” 72,6.2,”DAVID”

Excel files

Page 42: DWH Training CTS

42©Copyright 2004, Cognizant Academy, All Rights Reserved

• Update schedule - Daily or less time

frequency

• Detail of Data is mostly between 30

and 90 days

• Addresses operational needs

• Weekly or greater time frequency

• Potentially infinite history

• Address strategic needs

Operational Data Store- Update schedule

ODSData

Data warehouse Data

Page 43: DWH Training CTS

43©Copyright 2004, Cognizant Academy, All Rights Reserved

ODS Vs Data warehouse Characteristics

Parameters ODS Datawarehouse

Integrated andsubject oriented

Updated ByTransactions

Stores Summarizeddata

Used for Strategicdecisions

Used at manageriallevel

Used for tacticaldecisions

Contains currentand detailed data

Lengthy historicalperspective

Page 44: DWH Training CTS

OLAPOLAPOLAPOLAP

Page 45: DWH Training CTS

45©Copyright 2004, Cognizant Academy, All Rights Reserved

What is OLAP

• OLAP tools are used for analyzing data

• It helps users to get an insight into the organizations data

• It helps users to carry out multi dimensional analysis on the available

data

• Using OLAP techniques users will be able to view the data from

different perspectives

• Helps in decision making and business planning

• Converting OLTP data into information

• Solution for maintaining your company's competitive edge

Page 46: DWH Training CTS

46©Copyright 2004, Cognizant Academy, All Rights Reserved

OLAP Terminology

• Drill Down and Drill Up

• Slice and Dice

• Multi dimensional analysis

• What IF analysis

Page 47: DWH Training CTS

Data Warehouse Data Warehouse

ArchitectureArchitectureData Warehouse Data Warehouse

ArchitectureArchitecture

Page 48: DWH Training CTS

48©Copyright 2004, Cognizant Academy, All Rights Reserved

Basic Data Warehouse ArchitectureMeta Data Management

Administration

Mining

Operational & External

data

ODS

Data Staging

layer

Information Information AccessAccess

Information Information AccessAccess

Reporting tools

Web Browsers

OLAP

Data warehouse

Information Servers

Data Marts

Page 49: DWH Training CTS

49©Copyright 2004, Cognizant Academy, All Rights Reserved

Basic Data Warehouse Architecture

Page 50: DWH Training CTS

50©Copyright 2004, Cognizant Academy, All Rights Reserved

• The database-of-

record• Consists of system

specific reference

data and event data • Source of data for the

data warehouse. • Contains detailed

data • Continually changes

due to updates • Stores data up to the

last transaction.

Operational &

ExternalDataLayer

Operational &

ExternalDataLayer

Operational & External Data layer

Page 51: DWH Training CTS

51©Copyright 2004, Cognizant Academy, All Rights Reserved

• Extracts data from operational and external databases.

• Transforms the data and loads into the data warehouse.

• This includes decoding production data and merging of records from multiple DBMS formats.

Data

Staginglayer

Data

Staginglayer

Data Staging layer

Page 52: DWH Training CTS

52©Copyright 2004, Cognizant Academy, All Rights Reserved

• Stores data used for

informational analysis

• Present summarized

data to the end-user for

analysis

• The nature of the

operational data, the

end-user requirements

and the business

objectives of the

enterprise determine

the structure

Data ware houseLayer

Data Warehouse layer

Page 53: DWH Training CTS

53©Copyright 2004, Cognizant Academy, All Rights Reserved

• Metadata is data about data.

• Stored in a repository.

• Contains all corporate Metadata resources: database catalogs, data dictionaries

Meta Data Layer

Meta Data layer

Page 54: DWH Training CTS

54©Copyright 2004, Cognizant Academy, All Rights Reserved

Process Management Layer

• Scheduler or the high-level job control

• To build and maintain the data warehouse and data directory information

• To keep theData warehouse

up-to-date.

Process Management layer

Page 55: DWH Training CTS

55©Copyright 2004, Cognizant Academy, All Rights Reserved

Information Access Layer

• Interfaced with the

data warehouse

through an OLAP

server.• Performs analytical

operations and

presents data for

analysis.• End-users

generates ad-hoc

reports and perform

multidimensional

analysis using

OLAP tools

Information Access layer

Page 56: DWH Training CTS

56©Copyright 2004, Cognizant Academy, All Rights Reserved

The following should be considered for a successful implementation of

a Data Warehousing solution:

Architecture :

• Open Data Warehousing architecture with common interfaces for

product integration

• Data warehouse database server

Tools :

• Data Modeling tools

• Extraction and Transformation/propagation tools

• Analysis/end-user tools: OLAP and Reporting

• Metadata Management tools

Data Warehouse Architecture - Implementation

Page 57: DWH Training CTS

Different Approaches Different Approaches for Implementing an for Implementing an

Enterprise Enterprise DatawarehouseDatawarehouse

Different Approaches Different Approaches for Implementing an for Implementing an

Enterprise Enterprise DatawarehouseDatawarehouse

Page 58: DWH Training CTS

58©Copyright 2004, Cognizant Academy, All Rights Reserved

• An Enterprise Data Warehouse (EDW) contains detailed as well

as summarized data

•Separate subject-oriented database.

• Supports detailed analysis of business trends over a period of time

•Used for short- and long-term business planning and decision making

covering multiple business units.

What is an Enterprise Datawarehouse?(EDW)

Page 59: DWH Training CTS

59©Copyright 2004, Cognizant Academy, All Rights Reserved

Heterogeneous Source Systems

Staging

Common Staging interface Layer

EDW- “Top Down”Approach

Data mart bus architecture Layer

Enterprise Datawarehouse

Source1

Source2

Source3

Incremental Architected data marts

DM 1 DM 3DM 2

Page 60: DWH Training CTS

60©Copyright 2004, Cognizant Academy, All Rights Reserved

• An EDW is composed of multiple subject areas, such as finance,

Human resources, Marketing, Sales, Manufacturing, etc.

• In a top down scenario, the entire EDW is architected, and then a small

slice of a subject area is chosen for construction

Subsequent slices are constructed, until the entire EDW is complete

EDW- “Top Down” Approach - Implementation

Page 61: DWH Training CTS

61©Copyright 2004, Cognizant Academy, All Rights Reserved

The upsides to a “Top Down” approach are:

1. Coordinated environment

2. Single point of control & development

The downsides to a “Top Down” approach are:

1. “Cross everything” nature of enterprise project

2. Analysis paralysis

3. Scope control

4. Time to market

5. Risk and exposure

Upsides and Downsides of Top-Down Approach

Page 62: DWH Training CTS

62©Copyright 2004, Cognizant Academy, All Rights Reserved

Heterogeneous Source Systems

Staging

Common Staging interface Layer

EDW- “Bottom up”Approach

Data mart bus architecture Layer

Source1

Source2

Source3

Incremental Architected data marts

DM 1 DM 3DM 2

Enterprise Datawarehouse

Page 63: DWH Training CTS

63©Copyright 2004, Cognizant Academy, All Rights Reserved

• Initially an Enterprise Data Mart Architecture (EDMA) is developed

• Once the EDMA is complete, an initial subject area is selected for the

first incremental Architected Data Mart (ADM).

• The EDMA is expanded in this area to include the full range of detail

required for the design and development of the incremental ADM.

EDW- “Bottom Up” Approach - Implementation

Page 64: DWH Training CTS

64©Copyright 2004, Cognizant Academy, All Rights Reserved

The upsides to a “bottom up” approach are:

1. Quick ROI

2. Low risk, low political exposure learning and development environment

3. Lower level, shorter-term political will required

4. Fast delivery

5. Focused problem, focused team

6. Inherently incremental

The downsides to a “bottom up” approach are:

1. Multiple team coordination

2. Must have an EDMA to integrate incremental data marts

Upsides and Downsides of Bottom Up Approach

Page 65: DWH Training CTS

65©Copyright 2004, Cognizant Academy, All Rights Reserved

• Lot of tools and technologies

• Data warehouse system architectures.

• Top down approach

• Bottom up approach

Data warehouse Architecture - Summary

Page 66: DWH Training CTS

Building a Data Building a Data WarehouseWarehouse

Building a Data Building a Data WarehouseWarehouse

Page 67: DWH Training CTS

67©Copyright 2004, Cognizant Academy, All Rights Reserved

Building a Data Warehouse

The initiatives involved in building a data warehouse are

• Identify the need and justify the cost

• Architect the warehouse

• Choose product and vendors

• Create a dimensional business model

• Create the physical model

• Design & develop extract, transform and load systems

• Test and refine the data warehouse

Data Warehouse design is driven by business users; Not by the IS team

Page 68: DWH Training CTS

68©Copyright 2004, Cognizant Academy, All Rights Reserved

Data Data WareWarehousehouse

Data Data WareWarehousehouse

EnterpriseData

Warehouse

EnterpriseData

Warehouse

Info Info AccessAccess

Info Info AccessAccess

Reporting tools

Web Browsers

OLAP

Mining

ETLETLETLETL

External Data External Data StorageStorage

BusinessBusinessRequirementRequirement

Map DataMap Datasourcessources

ReverseReverseEngg.Engg.

Map Map Req. to Req. to OLTPOLTP

OLTP OLTP SystemSystem

LogicalLogicalModelingModeling

RefineRefineModelModel

Data Warehouse Life cycle

Page 69: DWH Training CTS

ER ModelingER ModelingER ModelingER Modeling

Page 70: DWH Training CTS

70©Copyright 2004, Cognizant Academy, All Rights Reserved

Review of Logical Modeling Terms & Symbols

• Entities define specific groups of information

Sales Organization

Sales Org IDDistribution Channel

Entity

Page 71: DWH Training CTS

71©Copyright 2004, Cognizant Academy, All Rights Reserved

Review of Logical Modeling Terms & Symbols

• Entities are made up of attributes

Sales Organization

Sales Org IDDistribution Channel

Attributes

Page 72: DWH Training CTS

72©Copyright 2004, Cognizant Academy, All Rights Reserved

Review of Logical Modeling Terms & Symbols

• One or more attribute uniquely identifies an instance of an entity

Sales Organization

Sales Org IDDistribution Channel

Identifier

Page 73: DWH Training CTS

73©Copyright 2004, Cognizant Academy, All Rights Reserved

Review of Logical Modeling Terms & Symbols

• The logical model identifies relationships between entities

Sales Detail

Sales Record ID

Sales Rep

Sales Rep ID

Relationship{

Page 74: DWH Training CTS

74©Copyright 2004, Cognizant Academy, All Rights Reserved

Logical Data Model

Sales Detail

Sales Record ID

Customer

Customer ID

Product

Product SKU

Suppliers

Supplier ID

Manufacturing Group

Manufacturing Org ID

Factory

Factory ID

Sales Organization

Sales Org IDDistribution Channel

Sales Rep

Sales Rep ID

Retail

Market

Product Sales Plan

Plan ID

Wholesale

Industry

Page 75: DWH Training CTS

Dimensional Modeling Dimensional Modeling Dimensional Modeling Dimensional Modeling

Page 76: DWH Training CTS

76©Copyright 2004, Cognizant Academy, All Rights Reserved

• Facts or Measures are the Key Performance Indicators of

an enterprise

• Factual data about the subject area

• Numeric, summarized

Net ProfitSale

s Rev

enue

Gross Margin

ProfitabilityCost

Facts and Measures

Page 77: DWH Training CTS

77©Copyright 2004, Cognizant Academy, All Rights Reserved

• Dimensions put measures in perspective

• What, when and where qualifiers to the measures

• Dimensions could be products, customers, time, geography etc.

Sales Reve

nue

(Mea

sure

)

What was sold ? Whom was it sold to ? When was it sold ? Where was it sold ?

Dimension

Page 78: DWH Training CTS

78©Copyright 2004, Cognizant Academy, All Rights Reserved

The following Dimensions are common in all Data warehouses in

various forms

• Product Dimension

• Service Dimension

• Geographic Dimension

• Time dimension

Some Examples of Data warehousing Dimensions

Page 79: DWH Training CTS

79©Copyright 2004, Cognizant Academy, All Rights Reserved

• Components of a dimension

• Represents the natural elements in the business dimension

• Directly related to the dimension

• Facilitates analysis from different perspectives of a dimension

• Often referred to as levels of a dimension.

TimeProduct

Geography

Dimension Elements

Page 80: DWH Training CTS

80©Copyright 2004, Cognizant Academy, All Rights Reserved

• Represents the natural business hierarchy within dimension elements

• Clarifies the drill up, drill down directions

• Each element represents different levels of aggregation

• End users may need custom hierarchies

1999

April May

9/4/99 28/4/99 5/5/99 17/5/99

Year

Month

Date

Dri

ll U

p

Drill D

own

Dimension HierarchyTime Dimension

Page 81: DWH Training CTS

81©Copyright 2004, Cognizant Academy, All Rights Reserved

Multi-Dimensional Analysis

• Characteristic of online analytical processing (OLAP)

Geography

Time

Pro

duct

1stQtr

2ndQtr

3rdQtr

4thQtr

East A

West A

North A

0.0

20.0

40.0

60.0

80.0

100.0

East A

East B

West A

West B

North A

North B

1st Qtr 2nd Qtr 3rd Qtr 4th QtrEast A 20.4 27.4 90.0 20.4

B 19.8 26.6 87.3 19.8West A 30.6 38.6 34.6 31.6

B 29.7 37.4 33.6 30.7North A 45.9 46.9 45.0 43.9

B 44.5 45.5 43.7 42.6

Page 82: DWH Training CTS

82©Copyright 2004, Cognizant Academy, All Rights Reserved

Drill Up & Drill Down

• Drill down is a process of requesting for detailed information

• Drill up is a process of summarizing the existing information

1999East 158.2West 135.4North 181.7

1st Qtr 2nd Qtr 3rd Qtr 4th QtrEast 20.4 27.4 90 20.4West 30.6 38.6 34.6 31.6North 45.9 46.9 45 43.9

Up

Down

Current Result Set

Page 83: DWH Training CTS

83©Copyright 2004, Cognizant Academy, All Rights Reserved

Dimensional Modeling

Subject Area What do you want to know about?

Atomic Detail What level of detail do you need?

Dimensions Analyze key performance indicators

Facts Measures

Frequency of Update How fresh do you need it?

Depth of History How far back do you need to know it?

Page 84: DWH Training CTS

84©Copyright 2004, Cognizant Academy, All Rights Reserved

Requirements for a Dimensional model

• Clean, current, accurate logical models

• Physical models

• A subject area model

• Star / Snowflake schema design

Page 85: DWH Training CTS

85©Copyright 2004, Cognizant Academy, All Rights Reserved

Dimensional Modeling Methodology

Business Req

Data Sources

External

Refine model.

OLTP System

Map

Req

. to

OL

TP

Logical Modeling

Page 86: DWH Training CTS

86©Copyright 2004, Cognizant Academy, All Rights Reserved

Techniques for Implementing a Dimensional model

• Star Schema

• Snow-flake Schema

• Hybrid Schema

• Optimal Snow-flake Schema

Page 87: DWH Training CTS

87©Copyright 2004, Cognizant Academy, All Rights Reserved

Star schema- Logical structure

EmployeeProductCustomerDayUnits soldRevenue

Time

Product

Customer

Employee

Fact Table

Dimension

DimensionDimension

Dimension

Page 88: DWH Training CTS

88©Copyright 2004, Cognizant Academy, All Rights Reserved

Star schema: Physical view

Time_dimday_codedateday_of_weekmonth_seqmonth_nummonth_long_namemonth_short_nameqtr_seqqtr_numquarteryear

Geography_dimemp_codeemp_namecity_codecitystate_codestate region_coderegion

Product_dimprod_codeprod_namebrandcolor_code

Customer_dimcust_codecust_nameage_codeage sex_codesex city_codecity

Fact tableemp_codeprod_codeday_codecust_codeunitsrevenue

Page 89: DWH Training CTS

89©Copyright 2004, Cognizant Academy, All Rights Reserved

Star schema characteristics

• A star schema is a highly denormalized, query-centric model

where the basic premise is that information can be broken into two groups: facts and dimensions.

• In a star schema, facts are in a single place (the fact table) and the descriptions (or elements) that lead to those facts are in dimension tables.

• The star schema is built for simplicity and speed. The assumption behind it is that the database is static with no updates being performed online

Page 90: DWH Training CTS

90©Copyright 2004, Cognizant Academy, All Rights Reserved

Star schema: Dimension Table

Empl_Code empl_name city_code city state_code state region_code region2341 Mike King 101 Atlantic city NJ New Jersey 1 New Jersey3424 Jim McCann 106 Chicago IL Illinois 2 Illinois1232 Kitty Stokes 104 Austin PA Pennsylvania 1 New Jersey3554 Clem Akins 102 Medford NJ New Jersey 1 New Jersey3963 Duncan Moore 101 Atlantic city NJ New Jersey 1 New Jersey2924 Dawn McGuire 103 Englewood NJ New Jersey 1 New Jersey2673 Joe Becker 105 Alverton PA Pennsylvania 1 New Jersey3253 Geoff Bergren 107 Springfield IL Illinois 2 Illinois234 Garth Boyd 106 Chicago IL Illinois 2 Illinois

2342 Lin Cepele 104 Austin PA Pennsylvania 1 New Jersey

Geography_dim

Region Region

State

City

Employee

State

City

Employee

ElementsAttributes

PK

• De-normalized structure• Easy navigation within the

dimension

Page 91: DWH Training CTS

91©Copyright 2004, Cognizant Academy, All Rights Reserved

Star schema: Fact Table

day_code prod_code cust_code empl_code units sold revenue1211 345 1231123 1232 23 79351211 22 1245223 3554 12 2641211 112 1522342 3963 6 6721212 233 1524665 2924 34 79221212 112 1366454 2673 76 85121212 22 1403453 3554 22 484

sales_factDimension Keys

Measures

• Contains columns for measures and dimensions

Page 92: DWH Training CTS

92©Copyright 2004, Cognizant Academy, All Rights Reserved

Snow-flake schema

RevenueUnits SoldNet Profit

Product

Time

Customer

City

Brand

Color

Region

Country

Page 93: DWH Training CTS

93©Copyright 2004, Cognizant Academy, All Rights Reserved

Snow-flake: Physical view

emp_codecust_codeprod_codeday_codeunitsrevenue

emp_codeemp_name

emp_codecity_codecityname

city_codestate_codestatename

state_coderegion_coderegionname

region_codecountry_codecountryname

prod_codebrand_codeprod_name

brand_codebrand_namecolor_code

color_codecolor_name

day_codeday_nameweek_code

week_codeweek_namemonth_code

month_codemonth_namequarter_codeyear

cust_codecust_nameage_codeage sex_codesex city_codecity

Page 94: DWH Training CTS

94©Copyright 2004, Cognizant Academy, All Rights Reserved

Hybrid schema: Physical view

emp_codecust_codeprod_codeday_codeunitsrevenue prod_code

brand_codeprod_name

brand_codebrand_namecolor_code

color_codecolor_name

day_codeday_nameweek_code

week_codeweek_namemonth_code

month_codemonth_namequarter_codeyear

emp_codeemp_namecity_codecitystate_codestate region_coderegion

cust_codecust_nameage_codeage sex_codesex city_codecity

Page 95: DWH Training CTS

95©Copyright 2004, Cognizant Academy, All Rights Reserved

Optimal Snow-flake schema

emp_codecust_codeprod_codeday_codebrand_codeunitsrevenue

prod_codebrand_codeprod_name

brand_codebrand_namecolor_code

color_codecolor_name

day_codeday_nameweek_code

week_codeweek_namemonth_code

month_codemonth_namequarter_codeyear

emp_codeemp_namecity_codecitystate_codestate region_coderegion

cust_codecust_nameage_codeage sex_codesex city_codecity

Page 96: DWH Training CTS

96©Copyright 2004, Cognizant Academy, All Rights Reserved

What is a Slowly Changing Dimension?

• Although dimension tables are typically static lists, most dimension tables

do change over time.

• Since these changes are smaller in magnitude compared to changes in fact

tables, these dimensions are known as slowly growing or slowly changing

dimensions.

Page 97: DWH Training CTS

97©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimension -Classification

Slowly changing dimensions are classified into three different types

• TYPE I

• TYPE II

• TYPE III

Page 98: DWH Training CTS

98©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimensions Type I

Shane

Name

[email protected]

1001

EmailEmp id

Shane

Name

[email protected]

1001

EmailEmp id

Shane

Name

[email protected]

1001

EmailEmp id

Shane

Name

[email protected]

1001

EmailEmp id

Source

Source Target

Target

[email protected]

Page 99: DWH Training CTS

99©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimensions Type II

Shane

Name

[email protected]

10

EmailEmp id

[email protected]

Email

Shane

Name

10

Emp id

1000

PM_PRIMARYKEY

0

PM_VERSION_NUMBER

Source

Target

Page 100: DWH Training CTS

100©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimensions -Versioning

Shane

Name

[email protected]

10

EmailEmp id

Source

Target

[email protected]

Shane101000

[email protected]

Shane101001

EmailNameEmp idPM_PRIMARYKEY

PM_VERSION_NUMBER

Page 101: DWH Training CTS

101©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimensions -Versioning

Shane

Name

[email protected]

10

EmailEmp id

Source

Target

[email protected]

Shane101001

[email protected]

Shane101003

[email protected]

Shane101000

EmailNameEmp idPM_PRIMARYKEY

PM_VERSION_NUMBER

Page 102: DWH Training CTS

102©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimensions Type II - Flag

Shane

Name

[email protected]

10

EmailEmp id

[email protected]

Email

Shane

Name

10

Emp id

1000

PM_PRIMARYKEY

1

PM_CURRENT_FLAG

SourceTarget

Page 103: DWH Training CTS

103©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimensions - Flag Current

Shane

Name

[email protected]

10

EmailEmp id

Source

Target

[email protected]

Shane101000

[email protected]

Shane101001

EmailNameEmp idPM_PRIMARYKEY

PM_CURRENT_FLAG

Page 104: DWH Training CTS

104©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimensions - Flag Current

Shane

Name

[email protected]

10

EmailEmp id

Source

Target

[email protected]

Shane101001

[email protected]

Shane101003

[email protected]

Shane101000

EmailNameEmp idPM_PRIMARYKEY

PM_CURRENT_FLAG

Page 105: DWH Training CTS

105©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimensions Type II

Shane

Name

[email protected]

10

EmailEmp id

01/01/00

PM_BEGIN_DATE

[email protected]

Email

Shane

Name

10

Emp id

1000

PM_PRIMARYKEY

PM_END_DATE

Source

Target

Page 106: DWH Training CTS

106©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimensions -Effective Date

Shane

Name

[email protected]

EmailEmp id

Source

Target

03/01/00

01/01/00

PM_BEGIN_DATE

03/01/[email protected]

Shane101000

[email protected]

Shane101001

EmailNameEmp idPM_PRIMARYKEY

PM_END_DATE

Page 107: DWH Training CTS

107©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimensions - Effective Date

Shane

Name

[email protected]

EmailEmp id

Source

Target

05/02/00

03/01/00

01/01/00

PM_BEGIN_DATE

05/02/[email protected]

Shane101001

[email protected]

Shane101003

03/01/[email protected]

Shane101000

EmailNameEmp idPM_PRIMARYKEY

PM_END_DATE

Page 108: DWH Training CTS

108©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimensions Type III

Shane

Name

[email protected]

10

EmailEmp id

PM_Prev_Column Name

[email protected]

Email

Shane

Name

10

Emp id

1

PM_PRIMARYKEY

01/01/00

PM_EFFECT_DATE

Source Target

Page 109: DWH Training CTS

109©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimensions Type III

Shane

Name

[email protected]

EmailEmp id

Source

Target

[email protected]

PM_Prev_ColumnName

01/02/[email protected]

Shane101

EmailNameEmp idPM_PRIMARYKEY

PM_EFFECT_DATE

Page 110: DWH Training CTS

110©Copyright 2004, Cognizant Academy, All Rights Reserved

Slowly Changing Dimensions Type III

Shane

Name

[email protected]

EmailEmp id

Source

Target

[email protected]

PM_Prev_ColumnName

01/03/[email protected]

Shane101

EmailNameEmp idPM_PRIMARYKEY

PM_EFFECT_DATE

Page 111: DWH Training CTS

111©Copyright 2004, Cognizant Academy, All Rights Reserved

Conformed Dimensions

• Conformed dimensions are those which are consistent across Data marts.

• Essential for integrating the Data marts into an Enterprise Data warehouse

Page 112: DWH Training CTS

112©Copyright 2004, Cognizant Academy, All Rights Reserved

Casual Dimensions

• Casual dimensions can be used for explaining why a record exists in a fact table

• Casual dimensions should not change the grain of the fact table

Page 113: DWH Training CTS

113©Copyright 2004, Cognizant Academy, All Rights Reserved

Casual Dimension - Example

Example:

• Why did a customer buy a particular product• Why did a customer use a particular ATM machine

Page 114: DWH Training CTS

114©Copyright 2004, Cognizant Academy, All Rights Reserved

Factless Fact Tables

The two types of factless fact tables are:

• Coverage tables

• Event tracking tables

Page 115: DWH Training CTS

115©Copyright 2004, Cognizant Academy, All Rights Reserved

Factless Fact Tables - Coverage Tables

Coverage tables are required when a primary fact table is sparse

Example: Tracking products in a store that did not sell

Page 116: DWH Training CTS

116©Copyright 2004, Cognizant Academy, All Rights Reserved

Factless Fact Tables - Event Tracking

These tables are used for tracking a event:

Example: Tracking student attendance

Page 117: DWH Training CTS

117©Copyright 2004, Cognizant Academy, All Rights Reserved

Helper Tables

• Helper tables are used when there are multi valued dimensions. That is when there is a many to many relationship between a fact table and a dimension table

• Helper table can be placed between two dimensions tables or between a dimension table and a fact table.

Page 118: DWH Training CTS

118©Copyright 2004, Cognizant Academy, All Rights Reserved

Helper Tables - Example

Example : A customer having more than one bank account

Page 119: DWH Training CTS

119©Copyright 2004, Cognizant Academy, All Rights Reserved

Surrogate Keys

• Joins between fact and dimension tables should be based on surrogate keys

• Surrogate keys should not be composed of natural keys glued together

• Users should not obtain any information by looking at these keys

• These keys should be simple integers

Page 120: DWH Training CTS

120©Copyright 2004, Cognizant Academy, All Rights Reserved

Why Existing Keys Should Not Be Used

• Keys may be reused after they have been purged even thought they are used in the warehouse

• A product description or a customer description could be changed without changing the key

• Key formats may be generalized to handle some new situation

• A mistake could be made and a key could be reused

Page 121: DWH Training CTS

ETL- Extraction,ETL- Extraction,Transformation & Transformation &

LoadingLoading

ETL- Extraction,ETL- Extraction,Transformation & Transformation &

LoadingLoading

Page 122: DWH Training CTS

122©Copyright 2004, Cognizant Academy, All Rights Reserved

What is ETL?

• ETL(Extraction, Transformation and Loading) is a process by which data is integrated and transformed from the operational systems into the Data warehouse environment

Operational systemsOperational systems

Filters andFilters andExtractorsExtractors

TransformationTransformationRulesRules

•• Rule 1Rule 1•• Rule 2Rule 2•• Rule 3Rule 3

IntegratorIntegrator

CleaningCleaningRulesRules

•• Rule 1Rule 1•• Rule 2Rule 2•• Rule 3Rule 3

TransformationTransformationEngineEngine

CleanserCleanser

LoaderLoader WarehouseWarehouse

ErrorErrorViewCheckCorrect

ErrorErrorViewCheckCorrect

Page 123: DWH Training CTS

123©Copyright 2004, Cognizant Academy, All Rights Reserved

Operational Data - Challenges

• Data from heterogeneous sources

• Format differences

• Data Variations

• Context

– Across locations the same code could represent different customers

– Across periods of time a product code could have been reused

Page 124: DWH Training CTS

124©Copyright 2004, Cognizant Academy, All Rights Reserved

Extraction

Oracle

Sybase

Text files

Target

80 tables

50 tables

Data from 30 tables

Filter

Data from 10 tables Where

Date<10/12/99

Data from files

Page 125: DWH Training CTS

125©Copyright 2004, Cognizant Academy, All Rights Reserved

Transformation

FirstName

LastName

Emp id

IndianaJones10001

SherlockHolmes10002

Name = Concat(First Name,

Last Name)

Indiana Jones

Sherlock Homes

Staging Area

Source

Page 126: DWH Training CTS

126©Copyright 2004, Cognizant Academy, All Rights Reserved

Loading

Staging Area

Source Data Warehouse

Direct Load

Cleaning,

Transformation

& Integration of Raw data

Clean,Transformed & integrated

data load

Page 127: DWH Training CTS

127©Copyright 2004, Cognizant Academy, All Rights Reserved

Volume of ETL in a Data warehouseSource OLTPSystems Data MartsData Marts

•Design•Mapping

•Design•Mapping

•Extract•Scrub•Transform

•Extract•Scrub•Transform

•Load•Index•Aggregation

•Load•Index•Aggregation

•Replication•Data Set Distribution

•Replication•Data Set Distribution

•Access & Analysis•Resource Scheduling & Distribution

•Access & Analysis•Resource Scheduling & Distribution

Meta DataMeta Data

System MonitoringSystem Monitoring

EnterpriseData Warehouse

Metadata

60 to 80% of the work is here

Page 128: DWH Training CTS

128©Copyright 2004, Cognizant Academy, All Rights Reserved

Factors Influencing ETL Architecture

• Volume at each warehouse component.

• The time window available for extraction.

• The extraction type (Full,Periodic etc.)

• Complexity of the processes at each stage.

Page 129: DWH Training CTS

Extraction Types Extraction Types Extraction Types Extraction Types

Page 130: DWH Training CTS

130©Copyright 2004, Cognizant Academy, All Rights Reserved

Extraction Types

Extraction

Full ExtractPeriodic/

IncrementalExtract

Page 131: DWH Training CTS

131©Copyright 2004, Cognizant Academy, All Rights Reserved

Source System

Full Extract

Existing data

Data Mart

Full Extract

Page 132: DWH Training CTS

132©Copyright 2004, Cognizant Academy, All Rights Reserved

Full Extract

Source System

Full Extract

Data Mart

New data

Page 133: DWH Training CTS

134©Copyright 2004, Cognizant Academy, All Rights Reserved

Incremental Extract

Data Mart

Source SystemIncremental Extract

Existing data

IncrementalData

Page 134: DWH Training CTS

135©Copyright 2004, Cognizant Academy, All Rights Reserved

Incremental Extract

Data Mart

Source SystemIncremental Extract

New data

Changed data

Existing data

IncrementalData

Page 135: DWH Training CTS

136©Copyright 2004, Cognizant Academy, All Rights Reserved

Incremental Extract

Data Mart

Source SystemIncremental Extract

New data

Changed data Existing data updated using changed data

IncrementalData

Incremental addition to data mart

Page 136: DWH Training CTS

TransformationTransformationTransformationTransformation

Page 137: DWH Training CTS

138©Copyright 2004, Cognizant Academy, All Rights Reserved

Data Transformation

• Conversions

– Data type (e.g. Char to Date)

– Bring data to common units (Currency,Measuring Units)

• Classifications

– Changing continuous values to discrete ranges (e.g. Temperatures to

Temperature Ranges)

• Splitting of fields

• Merging of fields

• Aggregations (e.g. Sum, Avg., Count)

• Derivations (Percentages, Ratios, Indicators)

Page 138: DWH Training CTS

139©Copyright 2004, Cognizant Academy, All Rights Reserved

Structural Transformations

• Additive

Orders arrive every

two minutesAggregate

Average

Daily Productivity

figuresAverage

OLTP

OLTP

Data warehouse

Data warehouse

Page 139: DWH Training CTS

140©Copyright 2004, Cognizant Academy, All Rights Reserved

Format transformation

Splitting

Data Type Conversions

Source Schema

“32”

Transformation

Target Schema

32

Age as a String Age as an Integer

“15-10-1992”

Source Schema

Date as a String

Transformation15 10 1999

Target Schema

Day Month Year

Date as a combination of 3 integer fields

Page 140: DWH Training CTS

141©Copyright 2004, Cognizant Academy, All Rights Reserved

Simple Conversions

• Transformations using Simple Conversions

Source Schema

Rs. 10000Multiply by 1/43

Target Schema

$232.56

Revenue in Rupees

Revenue in Dollars

1000 lbs.Multiply by 0.4536

453.56 kgs.

Production in Pounds

Production in Kilograms

Source Schema

Target Schema

Page 141: DWH Training CTS

142©Copyright 2004, Cognizant Academy, All Rights Reserved

Classification

Name AgeJohn Black 27Richard Wayne 53Jennifer Goldman 45Helmut Koch 37Anna Ludwig 32Shito Maketha 28Tracy Withman 39Ada Zhesky 25David Rosenberg 33Pankaj Sharma 29Zhu Ling 44George Kurtz 27Rita Hartman 34

Grouping

Age GroupFrequency20-25 126-30 431-35 336-40 241-45 246-50 151-55 156-60 0

Page 142: DWH Training CTS

143©Copyright 2004, Cognizant Academy, All Rights Reserved

Data Consistency Transformations

Source 1Gender

Male – MFemale – F

Source 2Gender

Male – MaleFemale – Female

Source 1GenderMale – 1

Female – 2

TargetGender

Male – MFemale – F

Page 143: DWH Training CTS

144©Copyright 2004, Cognizant Academy, All Rights Reserved

Reconciliation of Duplicated dataJoe Smith123 Maine St.MA - 70127

Joseph Smith123 Maine St.MA - 70127

J.R.Smith123 Maine St.MA - 70127

Joseph R Smith123 Maine St.MA - 70127

Page 144: DWH Training CTS

145©Copyright 2004, Cognizant Academy, All Rights Reserved

Data Aggregation - Design Requirements

• Aggregates must be stored in their own fact tables and each level should have its own fact table

• Dimension tables attached to the aggregate fact tables should where ever possible be shrunken versions of the dimension tables attached to the base fact table

• The base fact table and all of its related aggregate fact tables must be associated together as a family of schemas

Page 145: DWH Training CTS

Loading Loading Loading Loading

Page 146: DWH Training CTS

147©Copyright 2004, Cognizant Academy, All Rights Reserved

Types of Data warehouse Loading

• Target update types

– Insert

– Update

Page 147: DWH Training CTS

148©Copyright 2004, Cognizant Academy, All Rights Reserved

Types of Data Warehouse Updates

Insert

Full Replace

Selective Replace

Update

Update plus Retain History

Point in Time Snapshots New Data Changed Data

Data Warehouse

Source data Data Staging

Page 148: DWH Training CTS

149©Copyright 2004, Cognizant Academy, All Rights Reserved

New Data and Point-In-Time Data Insert

Source data

New data

OR

Point-in-Time Snapshot(e.g.. Monthly)

New Data Added to Existing Data

Page 149: DWH Training CTS

150©Copyright 2004, Cognizant Academy, All Rights Reserved

Changed Data Insert

Source data Changed Data Added to Existing Data

Changed data

Page 150: DWH Training CTS

151©Copyright 2004, Cognizant Academy, All Rights Reserved

When the value of dimension in a data warehouse changes,

then

History of change needs to be maintained.

Changed data alone needs to be identified

Changed data should be easier to access.

Reconstruction of the dimension table any point in time should be easier

Change of Dimension values

Page 151: DWH Training CTS

152©Copyright 2004, Cognizant Academy, All Rights Reserved

ETL - Approach in a nutshell

1) Identify the Operational systems based on data islands in the

target

2) Map source-target dependencies.

3) Define cleaning and transformation rules

4) Validate source-target mapping

5) Consolidate Meta data for ETL

6) Draw the ETL architecture

7) Build the cleaning, transformation and auditing routines

using either a tool or customized programs

Page 152: DWH Training CTS

Meta Data in a Meta Data in a Data WarehouseData WarehouseMeta Data in a Meta Data in a

Data WarehouseData Warehouse

Page 153: DWH Training CTS

154©Copyright 2004, Cognizant Academy, All Rights Reserved

• Data about data and the processes

• Metadata is stored in a data dictionary and repository.

• Insulates the data warehouse from changes in the schema of

operational systems.

• It serves to identify the contents and location of data in the

data warehouse

What is Metadata?

Page 154: DWH Training CTS

155©Copyright 2004, Cognizant Academy, All Rights Reserved

• Share resources

– Users

– Tools

• Document system

• Without meta data

– Not Sustainable

– Not able to fully utilize resource

Why Do You Need Meta Data?

Page 155: DWH Training CTS

156©Copyright 2004, Cognizant Academy, All Rights Reserved

The Role of Meta Data in the Data Warehouse

• Know what data you have

and

• You can trust it!

Meta Data enables data to become information, because with it you

Page 156: DWH Training CTS

157©Copyright 2004, Cognizant Academy, All Rights Reserved

Meta Data Answers….

How have business definitions and terms changed over time?

How do product lines vary across organizations?

What business assumptions have been made?

How do I find the data I need?

What is the original source of the data?

How was this summarization created?

What queries are available to access the data

Page 157: DWH Training CTS

158©Copyright 2004, Cognizant Academy, All Rights Reserved

Meta Data Process

• Integrated with entire process and data flow

– Populated from beginning to end

– Begin population at design phase of project

– Dedicated resources throughout

• Build

• Maintain

•Design•Mapping

•Design•Mapping

•Extract•Scrub•Transform

•Extract•Scrub•Transform

•Load•Index•Aggregation

•Load•Index•Aggregation

•Replication•Data Set Distribution

•Replication•Data Set Distribution

•Access & Analysis•Resource Scheduling & Distribution

•Access & Analysis•Resource Scheduling & Distribution

Meta DataMeta Data

System MonitoringSystem Monitoring

Page 158: DWH Training CTS

159©Copyright 2004, Cognizant Academy, All Rights Reserved

Types of ETL Meta Data

.

ETL Meta data

Technical Meta data

Operational Meta data

Page 159: DWH Training CTS

160©Copyright 2004, Cognizant Academy, All Rights Reserved

• Data Warehouse Meta data

This Meta data stores descriptive information about the physical

implementation details of data warehouse.

• Source Meta data

This Meta data stores information about the source data and

the mapping of source data to data warehouse data

Classification of ETL Meta Data

Page 160: DWH Training CTS

161©Copyright 2004, Cognizant Academy, All Rights Reserved

• Transformations & Integrations.

This Meta data describes comprehensive information about the

Transformation and loading.

• Processing Information

This Meta data stores information about the activities involved in the

processing of data such as scheduling and archives etc

• End User Information

This Meta data records information about the user profile and security.

ETL Meta Data

Page 161: DWH Training CTS

162©Copyright 2004, Cognizant Academy, All Rights Reserved

ETL -Planning for the Movement

The following may be helpful for planning the movement

• Develop a ETL plan

• Specifications

• Implementation

Page 162: DWH Training CTS

Data Warehouse Data Warehouse AdministrationAdministration

Data Warehouse Data Warehouse AdministrationAdministration

Page 163: DWH Training CTS

164©Copyright 2004, Cognizant Academy, All Rights Reserved

Data Warehouse Administrative Tasks

• Build and maintain the data warehouse• Maintaining the meta data• To keep the data warehouse up to date• Tuning the data warehouse• General administrative tasks

Page 164: DWH Training CTS

165©Copyright 2004, Cognizant Academy, All Rights Reserved

Dormant Data

• The data that is hardly used in a data warehouse is called dormant data

• The faster data warehouses grows the more data becomes dormant. Over a period of time the amount of dormant data in a data warehouse increases

Page 165: DWH Training CTS

166©Copyright 2004, Cognizant Academy, All Rights Reserved

Origins of Dormant Data

• Storing history data that is not required

• Storing columns that are never used

• Storing detail level data when only summary level data is used

• Creating summary data that is never used

Page 166: DWH Training CTS

167©Copyright 2004, Cognizant Academy, All Rights Reserved

Strategy For Removing Dormant Data

The strategy for removing dormant data might include:

• Removing data after a period of time say after two years

• Removing summary data that has not been accessed in the past six months

• Removing columns that have never or only very infrequently been accessed

• Storing data for high profile users even though that data has not been accessed

• Storing data for selected accounts even though that data has not been accessed

Page 167: DWH Training CTS

168©Copyright 2004, Cognizant Academy, All Rights Reserved

Tuning a Data Warehouse

Some of the techniques that can be used for tuning a data warehouse are:

• Handling dormant data

• Storing pre summarized data based on data pattern usage

• Creating indexes for data that is frequently used

• Merging tables that have common and regular access


Recommended