+ All Categories
Home > Documents > Dimensional_Modeling[1]

Dimensional_Modeling[1]

Date post: 24-Nov-2014
Category:
Upload: ramasatyam
View: 106 times
Download: 0 times
Share this document with a friend
Popular Tags:
101
1 Dimensional Dimensional Design Design Dr. Debashis Parida Presented by
Transcript
Page 1: Dimensional_Modeling[1]

1

Dimensional Dimensional DesignDesign

Dr. Debashis Parida

Presented by

Page 2: Dimensional_Modeling[1]

2

Course AgendaCourse Agenda

Rationale for dimensional modeling Dimensional modeling basics Dimensional modeling details Fact table details Dimension table details Design process Aggregate schemas Multiple fact tables Architected data marts

Page 3: Dimensional_Modeling[1]

3

Rationale for Rationale for Dimensional ModelingDimensional Modeling

Page 4: Dimensional_Modeling[1]

4

OLTP Design CharacteristicsOLTP Design Characteristics

Focus of OLTP Design

Individual data elements

Data relationships

Design goals Accurately model

business Remove redundancy

Page 5: Dimensional_Modeling[1]

5

OLTP Design ShortcomingsOLTP Design Shortcomings

Complex Unfamiliar to

business people Incomplete history Slow query

performance

Page 6: Dimensional_Modeling[1]

6

Emergence of Dimensional Emergence of Dimensional ModelModel Logical modeling technique

For designing relational database structures Addresses OLTP design shortcomings

For use in analytic systems First developed early 1980's

Packaged goods industry Popularized by Ralph Kimball, PhD.

1996 book: 'The Data Warehouse Toolkit'

Page 7: Dimensional_Modeling[1]

7

Dimensional Modeling Dimensional Modeling BasicsBasics

Page 8: Dimensional_Modeling[1]

8

Brand

Captain Coffee

Product

Standard Coffee Maker

Thermal Coffee Maker

Deluxe Coffee Maker

All Products

Units Sold

5,000

2,400

2,073

9,473

Units Shipped

3,800

1,632

1,658

7,090

% Shipped

76%

68%

80%

75%

Coffee Maker Fulfillment Report

FactsFacts

Process MeasurementProcess Measurement

Measures Metrics or indicators

by which people evaluate a business process

Referred to as “Facts” Examples

Margin Inventory Amount Sales Dollars Receivable Dollars Return Rate

Page 9: Dimensional_Modeling[1]

9

Perspective FocusPerspective Focus

Process-oriented business perspectives

categoryProduct, warehous

e

G/L account supplier

OperationsSales and Marketing

Customer Services

Product Developme

nt

Page 10: Dimensional_Modeling[1]

10

Brand

Captain Coffee

Product

Standard Coffee Maker

Thermal Coffee Maker

Deluxe Coffee Maker

All Products

Units Sold

5,000

2,400

2,073

9,473

Units Shipped

3,800

1,632

1,658

7,090

% Shipped

76%

68%

80%

75%

Coffee Maker Fulfillment Report

DimensionsDimensions

Process PerspectivesProcess Perspectives

Dimensions The parameters by which

measures are viewed Used to break out, filter

or roll up measures Often found after the

word “by” in a business question

Descriptive business terms

Examples Product Warehouse Customer Supplier

Page 11: Dimensional_Modeling[1]

11

Dimensional ModelDimensional Model

Definition Logical data model used to represent the

measures and dimensions that pertain to one or more business subject areas

Dimensional Model = Star Schema Serves as basis for the design of a

relational database schema Can easily translate into multi-

dimensional database design if required Overcomes OLTP design shortcomings

Page 12: Dimensional_Modeling[1]

12

Dimensional Model Dimensional Model AdvantagesAdvantages

Understandable Systematically

represents history

Reliable join paths

High performance

query

Enterprise scalability

Page 13: Dimensional_Modeling[1]

13

StoreStore

Star SchemaStar Schema

TimeTime

ProductProduct

FactsFacts

Schema SimplicitySchema Simplicity

Fewer tables Denormalized Consolidated

Dimensional Familiar to users Facts go in the fact

tables Dimensions in

dimension tables

Increases understandability

Page 14: Dimensional_Modeling[1]

14

Time Dimension

year

quarter

month

date

day of the week

holiday flag

ord_date

Data FamiliarityData Familiarity

Adding business context

Single source field Expanded into parts Decoded into business

terms Add special indicators

and flags e.g. time dimension

Increases understandability

Page 15: Dimensional_Modeling[1]

15

Store

Product

Facts

Time DimensionTime Dimension

Time Dimension

year

quarter

month

date

day of the week

holiday flag

Representing HistoryRepresenting History

Time dimension Part of every star

schema

Marks the date when

the facts (process

measurements)

occurred

Allows the schema to

easily add and query

data over time Especially useful for

performing comparison queries

Page 16: Dimensional_Modeling[1]

16

Fewer Join PathsFewer Join Paths

Star schema joins Defined during schema

design - not runtime

Business people can

easily understand

these relationships

One-to-many relations

between dimensions

and facts

Referential integrity

always enforced

Page 17: Dimensional_Modeling[1]

17

High Performance DesignHigh Performance Design

Fewer joins means less 'expensive' queries

Deterministic query patterns

Star schema query optimization supported by all major RDBMS vendors

Page 18: Dimensional_Modeling[1]

18

Subject area dimensional

models

Subject Area ModelsSubject Area Models

Manufacturing and Process

Control

Sales Order Entry and Campaign

Management

Customer Support and Relationship Management

Shipping and Inventory

Management

Subject area E/R models

OperationsSales and Marketing

Customer Services

Product Developme

nt

Page 19: Dimensional_Modeling[1]

19

Enterprise ModelsEnterprise Models

Enterprise Scope E/R model

Enterprise scope dimensional model

Page 20: Dimensional_Modeling[1]

20

Dimensional Design Dimensional Design DetailsDetails

Page 21: Dimensional_Modeling[1]

21

Dimension

Dimension

Dimension

Star Schema Dimension Star Schema Dimension TablesTables Dimension tables

Store dimension values

Textual content Dimension tables

usually referred to simply as 'dimensions'

Spend extra effort to add dimensional attributes

Page 22: Dimensional_Modeling[1]

22

key

key

key

Dimension

Dimension

Dimension

Dimension KeysDimension Keys

Synthetic keys Each table assigned

a unique primary key, specifically generated for the data warehouse

Primary keys from source systems may be present in the dimension, but are not used as primary keys in the star schema

Page 23: Dimensional_Modeling[1]

23

Key

attribute

attribute

attribute

Key

attribute

attribute

attribute

Key

attribute

attribute

attribute

Dimension

Dimension

Dimension

Dimension ColumnsDimension Columns

Dimension attributes Specify the way in

which measures are viewed: rolled up, broken out or summarized

Often follow the word “by” as in “Show me Sales by Region and Quarter”

Frequently referred to as 'Dimensions'

Page 24: Dimensional_Modeling[1]

24

Fact Table

fact1

fact2

fact3

Star Schema Fact TableStar Schema Fact Table

Process measures Start by assigning

one fact table per business subject area

Fact tables store the process measures (aka Facts)

Compared to dimension tables, fact tables usually have a very large number of rows

Page 25: Dimensional_Modeling[1]

25

Fact Table

fact1

fact2

fact3

keykeykey

Fact Table Primary KeyFact Table Primary Key

Every fact table Multi-part primary

key added Made up of foreign

keys referencing dimensions

Page 26: Dimensional_Modeling[1]

26

Fact Table SparsityFact Table Sparsity

Sparsity Term used to describe the very common

situation where a fact table does not contain a row for every combination of every dimension table row for a given time period

Because fact tables contain a very small percentage of all possible combinations, they are said to be "sparsely populated" or "sparse"

Page 27: Dimensional_Modeling[1]

27

Fact Table

Fact Table GrainFact Table Grain

Grain The level of detail

represented by a row in the fact table

Must be identified early

Cause of greatest confusion during design process

Example Each row in the fact

table represents the daily item sales total

Page 28: Dimensional_Modeling[1]

28

Designing a Star SchemaDesigning a Star Schema

Five initial design steps Based on Kimball's six steps Start designing in order Re-visit and adjust over project life

Page 29: Dimensional_Modeling[1]

29

1.1. Identify fact table

Start by naming the fact table with the name of the business subject area

Step OneStep One

Page 30: Dimensional_Modeling[1]

30

StepStep TwoTwo

2.2. Identify fact table grain

Describe what a row in the fact table represents - in business terms

Page 31: Dimensional_Modeling[1]

31

StepStep ThreeThree

3.3. Identify dimensions

Page 32: Dimensional_Modeling[1]

32

StepStep FourFour

4.4. Select facts

Page 33: Dimensional_Modeling[1]

33

StepStep FiveFive

5.5. Identify dimensional attributes

Page 34: Dimensional_Modeling[1]

34

Fact Table DetailsFact Table Details

Page 35: Dimensional_Modeling[1]

35

Example Fact TableExample Fact Table

Sales Factsmodel_key

dealer_key

time_key

revenue

quantity

Page 36: Dimensional_Modeling[1]

36

FactsFacts

Fully additive Can be summed across any and all

dimensions Stored in fact table Examples: revenue, quantity

Page 37: Dimensional_Modeling[1]

37

FactsFacts

Semi-additive Can be summed across most dimensions

but not all Anything that measures a “level” Must be careful with ad-hoc reporting Often aggregated across the “forbidden

dimension” by averaging

Page 38: Dimensional_Modeling[1]

38

FactsFacts

Non-Additive Cannot be summed across any dimension

All ratios are non-additive

Break down to fully additive components,

store them in fact table

Page 39: Dimensional_Modeling[1]

39

Factless Fact TableFactless Fact Table

A fact table with no measures in it Nothing to measure... …Except the convergence of

dimensional attributes Sometimes store a “1” for convenience Examples: Attendance, Customer

Assignments, Coverage

Page 40: Dimensional_Modeling[1]

40

Dimension TableDimension TableDetails

Page 41: Dimensional_Modeling[1]

41

Example Dimension TablesExample Dimension Tables

dealer_key

regionstatecitydealer

model_key

brandcategorylinemodel

Model time_key

yearquartermonthdate

Time

Dealer

Page 42: Dimensional_Modeling[1]

42

Dimension TablesDimension Tables

Characteristics Hold the dimensional attributes

Usually have a large number of attributes

(“wide”) Add flags and indicators that make it easy

to perform specific types of reports Have small number of rows in comparison

to fact tables (most of the time)

Page 43: Dimensional_Modeling[1]

43

Don’t Normalize DimensionsDon’t Normalize Dimensions

Saves very little space Impacts performance Can confuse matters when multiple

hierarchies exist A star schema with normalized

dimensions is called a "snowflake schema"

Usually advocated by software vendors whose product require snowflake for performance

Page 44: Dimensional_Modeling[1]

44

Slowly Changing DimensionsSlowly Changing Dimensions

Dimension source data may change

over time Relative to fact tables, dimension

records change slowly Allows dimensions to have multiple

'profiles' over time to maintain history Each profile is a separate record in a

dimension table

Page 45: Dimensional_Modeling[1]

45

Slowly Changing Dimension Slowly Changing Dimension ExampleExample Example: A woman gets married

Possible changes to customer dimension• Last Name• Marriage Status• Address• Household Income

Existing facts need to remain associated with her single profile

New facts need to be associated with her married profile

Page 46: Dimensional_Modeling[1]

46

Slowly Changing Dimension Slowly Changing Dimension TypesTypes Three types of slowly changing

dimensions Type 1

• Updates existing record with modifications• Does not maintain history

Type 2• Adds new record• Does maintain history• Maintains old record

Type 3: • Keep old and new values in the existing row• Requires a design change

Page 47: Dimensional_Modeling[1]

47

Designing Loads to Handle Designing Loads to Handle SCDSCD Design and implementation guidelines

Gather SCD requirements when designing data mapping and loading

SCD needs to be defined and implemented at the dimensional attribute level

Each column in a dimension table needs to be identified as a Type 1 or a Type 2 SCD

If one Type 1 column changes, then all Type 1 columns will be updated

If one Type 2 column changes, then a new record will be inserted into the dimension table

Page 48: Dimensional_Modeling[1]

48

Designing Loads to Handle Designing Loads to Handle SCDSCD Design and implementation guidelines

For large dimension tables, change data capture techniques may be used to minimize the data volume

For smaller dimension tables, compare all OLTP records with dimension table records

Balance data volume with change data capture logic complexities

Page 49: Dimensional_Modeling[1]

49

Degenerate DimensionsDegenerate Dimensions

Dimensions with no other place to go Stored in the fact table Are not facts Common examples include invoice

numbers or order numbers

Page 50: Dimensional_Modeling[1]

50

Dimensional Design Dimensional Design ProcessProcess

Project Context

Page 51: Dimensional_Modeling[1]

51

Development Phase

Deployment Phase

Design Phase

Data Mart DevelopmentData Mart Development

Dimensional modeling is a critical part of the data mart development effort

Page 52: Dimensional_Modeling[1]

52

Data Mart DevelopmentData Mart Development

Design phase Determine requirements and design schema

Development phase Iterative build and feedback

Deployment phase Automate load, document, train users

Page 53: Dimensional_Modeling[1]

53

Project DeliverablesProject Deliverables Design

Project definition document

Project plan Schema design Mapping document Report design

Development Populated data mart Load routines

(Sagent “Plans”) Query and reporting

environment

Deployment Automation Documentation Training materials

Page 54: Dimensional_Modeling[1]

54

Development Phase

Deployment Phase

Design Phase

Project ApproachProject Approach

The dimensional model is developed during the design stage

Scope of the project has already been determined

Page 55: Dimensional_Modeling[1]

55

Development Phase

Deployment Phase

Design Phase

Design Stage ActivitiesDesign Stage Activities

Gather requirements through requirements workshops

Develop star schema Conduct design review

Page 56: Dimensional_Modeling[1]

56

Gather RequirementsGather Requirements

Requirements definition User workshops Spreadsheets Sample reports

Source systems analysis DBA interviews Copybooks E/R diagrams

Page 57: Dimensional_Modeling[1]

57

Design DeliverablesDesign Deliverables

Deliverables The star schema itself Load mapping document

How these primary components are delivered will depend on needs and format chosen Modeling tools Spreadsheets Text documents

Page 58: Dimensional_Modeling[1]

58

NotationNotation

No recognized standard ER semantics unnecessary Clarity is the only characteristic that

really matters

Page 59: Dimensional_Modeling[1]

59

Design Naming StandardsDesign Naming Standards

Responsibility of data administration Extended to the data warehouse Important to start early in the project

Suggested conventions Fact tables Dimension tables Aggregate tables Keys

Page 60: Dimensional_Modeling[1]

60

Data Element DefinitionsData Element Definitions

Clear descriptions Facts Calculated formulae Dimensional attributes Multiple meanings/synonymous terms Aliases

Page 61: Dimensional_Modeling[1]

61

Data Element InstancesData Element Instances

Example of Data

As it will exist in the warehouse

After decoding

Adds to model understanding

Removes ambiguity/uncertainty

Page 62: Dimensional_Modeling[1]

62

Data Element MappingData Element Mapping

Where is the data coming from

Source system

Table

Column

Record

Field

Page 63: Dimensional_Modeling[1]

63

Data TransformationData Transformation

Changing the data

Serves as spec for ETL process

Decodes

Type conversion

Conditional logic

Handling of NULL’s

Page 64: Dimensional_Modeling[1]

64

Aggregates SchemasAggregates Schemas

Page 65: Dimensional_Modeling[1]

65

Aggregate DesignsAggregate Designs

Aggregates Pre-stored fact summaries Along one or more dimensions The most effective tool for improving

performance

Examples Summary of sales by region, by product, by

category Monthly sales

Page 66: Dimensional_Modeling[1]

66

Aggregate BackgroundAggregate Background

Aggregate rationale Improve end user query performance Reduce required CPU cycles Powerful cost saving tool

Restrictions Additive facts only Must use dimensional design

Page 67: Dimensional_Modeling[1]

67

Aggregate GuidelinesAggregate Guidelines

Don’t start with aggregates

Design and build based on usage Sooner or later you'll need to build

aggregates

Page 68: Dimensional_Modeling[1]

68

Aggregate TypesAggregate Types

Level field

Separate fact tables

Page 69: Dimensional_Modeling[1]

69

Aggregate TypesAggregate Types

Level field Old technique Requires “level” attribute in appropriate

dimensions Aggregates and base-level facts stored in

same table Same number of total fact records as

separate table approach Drawbacks

Every query must constrain on the level field Possibility of double counting

Page 70: Dimensional_Modeling[1]

70

Aggregate TypesAggregate Types

Separate Tables Separate fact table for every aggregate Separate dimension table for every aggregate

dimension Same number of fact records as level field

tables Advantage

Removes possibility of double counting Schema clarity

Caveat Requires software with aggregate navigation

capability

Page 71: Dimensional_Modeling[1]

71

Aggregate PitfallsAggregate Pitfalls

Sparsity failure Term used to describe the result of building

too many aggregate fact that do not summarize enough rows.

When Sparsity failure occurs, a relatively small star schema can grow (in terms of disk size) thousands of times.

Sparsity failure = aggregate explosion

Page 72: Dimensional_Modeling[1]

72

Aggregate Design GuidelinesAggregate Design Guidelines

Rule of twenty To avoid aggregate explosion Make sure each aggregate record

summarizes 20 or more lower-level records

Remember Total number of possible fact tables in any

given dimensional model = cartesian product of all levels in all the dimensions

Page 73: Dimensional_Modeling[1]

73

Year (1)

Quarter (4)

Month (12)

Date (365)

Time

5 years

20 quarters

60 months

1825 days

Hierarchies & Aggregate Hierarchies & Aggregate DesignDesign Hierarchy diagram

Helps visualize options for building aggregates

Adding cardinalities insures following the rule of 20

Not required to build initial star schema

Page 74: Dimensional_Modeling[1]

74

Aggregate NavigationAggregate Navigation

Description Function provided by software layer:

Aggregate Navigator Directs user queries to the most favorable

available aggregate

Transparent to the end user

Page 75: Dimensional_Modeling[1]

75

Business View

Designer View

Aggregate FrameworkAggregate Framework

Page 76: Dimensional_Modeling[1]

76

Aggregate DeploymentAggregate Deployment

Incremental

Based on usage

Transparent to users

Typically warehouse DBA responsibility

Page 77: Dimensional_Modeling[1]

77

Build SubjectArea 1No aggregates

Build SubjectArea 2No aggregates

BuildBuildaggregatesaggregatesforforSubject area 1Subject area 1

Build SubjectArea 3No aggregates

BuildBuildaggregatesaggregatesforforSubject area 2Subject area 2

Build SubjectArea 4No aggregates

BuildBuildaggregatesaggregatesforforSubject area 3Subject area 3

Some re-work requiredSome re-work required

Aggregate DeploymentAggregate Deployment

Page 78: Dimensional_Modeling[1]

78

Multiple Fact TablesMultiple Fact Tables

Page 79: Dimensional_Modeling[1]

79

Multiple Fact TablesMultiple Fact Tables

Different business processes usually require different fact tables

There are also several cases where a single business process will require multiple fact tables Core and custom Snapshot and transaction Coverage Aggregates

Page 80: Dimensional_Modeling[1]

80

Different Business ProcessesDifferent Business Processes

Different business processes usually require different fact tables

In practice, it may be hard to identify what a “process” is

Sometimes you can spot different processes because measures are recorded With different dimensions At differing grains

Page 81: Dimensional_Modeling[1]

81

Different Dimensions or Different Dimensions or GrainGrain Don’t take shortcuts with grain

The 'not applicable' dimension value Using a 'not applicable' row in a dimension

confuses the grain and can introduce reporting difficulty

Page 82: Dimensional_Modeling[1]

82

Different Points in TimeDifferent Points in Time

Sometimes, it is not easy to identify the discrete business processes

All measures may have the same dimensionality or grain

Different measures are recorded at different times Quantity sold is not recorded at the same

time as quantity shipped

Page 83: Dimensional_Modeling[1]

83

Different TimingDifferent Timing

Building a single fact table would require recording zero or null for measures that are not applicable at a point in time

Reports would contain a confusing combination of zeros, nulls, and absence of data

Page 84: Dimensional_Modeling[1]

84

Identifying Different Identifying Different ProcessesProcesses Look at the measures in question

Sort them into fact tables based on Dimensions

Grain

Differing timings of events measured

Page 85: Dimensional_Modeling[1]

85

Design Tools for Multiple Design Tools for Multiple TablesTables Create a set of matrices

Facts vs dimension Facts vs dimensional attributes

Mark where facts apply to dimensions Mark where facts apply to dimensional

attributes When facts don't apply, assume

separate fact table

Page 86: Dimensional_Modeling[1]

86

Multiple Fact Table SummaryMultiple Fact Table Summary

Different processes need different tables Identified with

Grain Dimensionality Timing

Same process may need multiple fact tables Heterogeneous attributes Coverage Snapshot and transaction Aggregates

Page 87: Dimensional_Modeling[1]

87

Architected Data Architected Data MartsMarts

Page 88: Dimensional_Modeling[1]

88

Data MartData Mart

Meaning of the term 'data mart' has shifted over the last several years...

Page 89: Dimensional_Modeling[1]

89

Operational Systems

E.T.L.E.T.L.

SoftwareSoftware

Data Warehouse

Analysis Users

Query & Query &

ReportinReportin

g g

SoftwareSoftware

E.T.L.E.T.L.

SoftwareSoftware

Data Marts

Data Mart Architecture 1993Data Mart Architecture 1993

Page 90: Dimensional_Modeling[1]

90

Operational Systems

E.T.L.

SoftwareData Marts

Analysis Users

Query & Reporting Software

Data Mart Architecture 1997Data Mart Architecture 1997

Page 91: Dimensional_Modeling[1]

91

Operational Systems

Analysis Users

Data Mart

Data Warehouse

Architected Data MartsArchitected Data Marts

E.T.LSoftwar

e

Query & Reporting Software

Page 92: Dimensional_Modeling[1]

92

Data MartData Mart

Warehouse Subject Area

Incremental warehouse development

Centralized architecture

Not new

Well - suited to star schemas

Page 93: Dimensional_Modeling[1]

93

Store Sales Facts

Product

Time (Day)

Product

Time (Day)

Shipments Facts

Warehouse

Warehouse

Inventory Facts

Product

Month

““Stovepipe” Data MartsStovepipe” Data Marts

“Stovepipe” data marts

Inconsistent and overlapping data

Difficult and costly to maintain

Redundant data load Can’t drill across Integration requires

starting over

Dimensions not conformed

Page 94: Dimensional_Modeling[1]

94

Conformed DimensionsConformed Dimensions

Definition Dimensions are conformed when they are

the same -or-

When one dimension is a strict rollup of

another

Page 95: Dimensional_Modeling[1]

95

Conformed DimensionsConformed Dimensions

Same dimensions must:

1. ... have exactly the same set of primary keys

and2. ... have the same number of records

Page 96: Dimensional_Modeling[1]

96

Conformed DimensionsConformed Dimensions

Rolled up dimension When one dimension is a strict rollup of

another

Which means Two conformed dimensions can be

combined into a single logical dimension by creating a union of the attributes

Page 97: Dimensional_Modeling[1]

97

Conformed DimensionsConformed Dimensions

Description Shared common dimensions

Integrates logical design

Ensures consistency between data marts

Allows incremental development

Independent of physical location

Some re-work may be required

Page 98: Dimensional_Modeling[1]

98

Conformed DimensionsConformed Dimensions

Advantages Enables an incremental development

approach

Easier and cheaper to maintain

Drastically reduces extraction and loading

complexity

Answers business questions that cross data

marts

Supports both centralized and distributed

architectures

Page 99: Dimensional_Modeling[1]

99

Store Dimensio

nSales Facts

Product Dimensio

n

Time Dimensio

nShipment Facts

Warehouse

Dimension

Inventory Facts

Month Dimensio

n

Conformed DimensionsConformed Dimensions

Interlocking Star SchemasInterlocking Star Schemas

Page 100: Dimensional_Modeling[1]

100 Store Product Day Warehouse Month

Sales Facts

Shipment Facts

Inventory Facts

Kimball’s Data Warehouse Kimball’s Data Warehouse BusBus

Page 101: Dimensional_Modeling[1]

101

Course ReviewCourse Review

Rationale for dimensional modeling Dimensional modeling basics Dimensional modeling details Fact table details Dimension table details Design process Aggregate schemas Multiple fact tables Architected data marts


Recommended