+ All Categories
Home > Technology > Data Warehouse Back to Basics: Dimensional Modeling

Data Warehouse Back to Basics: Dimensional Modeling

Date post: 18-Jan-2017
Category:
Upload: dunn-solutions-group
View: 140 times
Download: 1 times
Share this document with a friend
51
Jose Hernandez Director of Business Intelligence Dunn Solutions Group
Transcript

Jose Hernandez

Director of Business Intelligence

Dunn Solutions Group

Agenda

Introduction

What is a Data Warehouse?

Dimensional Modeling

Full-service IT consulting firm

Founded in 1988

Offices

Chicago Minneapolis Raleigh Bangalore, India

Overview

Chicago Minneapolis Raleigh Bangalore

Practice Areas

Business Intelligence

DI + EIM/Quality

Budgeting & Planning

End-to-End BI

Data Warehouse

Dashboards

Map Intelligence

Managed Services

Predictive Analytics

Training

Open-Enrollment

On-Site + Custom

Jumpstart/Mentoring

Packaged Solutions

Legal Dashboard

Visible Visitors

Application Development

Web Design

E-Commerce

Custom App Dev

Mobile App Dev

Portals

Introduction: This Presentation

We start with 50,000 foot view

Assuming you are new to data

warehousing

Keep it fundamental

Kimball point of view

What, Why and How

Data Warehouse Back to Basics

Why Build a Data Warehouse

We have mountains of data in this

company but we can’t access it!

We need to slice and dice the data

in a variety of ways.

You have to make it easy for

business people to get at the data.

Two people present the same

business metrics and the numbers

are different!

We want people to make decisions

based on facts.

Why Build a Data Warehouse

Operational systems are not

integrated

• IDs and Codes not conformed

• Inconsistent format

• Data quality issues

Operational systems generally

not ideal for reporting

• Lack history

• Complex data structure

• Moving target

• Poor query performance

Goals of a Data Warehouse

Make an organization’s data easy

to access

Present the organization’s data

consistently

Be adaptive and resilient to

change

Trusted and secure

Serve as the foundation for

informed decisions

Business community must accept

the warehouse if it is to be

successful

Agenda

Introduction

What is a Data Warehouse?

Dimensional Modeling

What is a Data Warehouse?

• A simple question

- does not seem

to have simple

answer!

• Many definitions

• Two that you

should consider

• Ralph Kimball

• Bill Inmon

What is a Data Warehouse

“A data warehouse is a system that extracts,

cleans, conforms and delivers source data into a

dimensional data store and then supports and

implements querying and analysis for the purpose

of decision making...”

…“It’s the place where users go to get their data”Ralph Kimball

What a Data Warehouse is NOT

It is NOT…

A product

A language

A project

A data model

A copy of your transactional systems

*Note: There are bundled products that come close to covering many aspects of

a data warehouse!

Jose

The BI StackS

ou

rce

Syste

ms Legacy

mainframe systems

Production databases

Transactional systems

Subscription data

ET

L S

yste

m Extract

Clean

Conform

Deliver

ETL Management Services

ETL Data Stores

Pre

se

nta

tio

n S

erv

er Data Marts

Stars & Snowflakes

Conformed Dimensions

Conformed Facts BI A

pp

lica

tio

ns Reporting

systems

Ad hoc systems

Dashboards

Analytics systems

Back Room Front Room

Metadata

Infrastructure and Security

So

urc

e S

yste

ms Legacy

mainframe systems

Production databases

Transactional systems

Subscription data

ET

L S

yste

m Extract

Clean

Conform

Deliver

ETL Management Services

ETL Data Stores

Pre

se

nta

tio

n S

erv

er Data Marts

Stars & Snowflakes

Conformed Dimensions

Conformed Facts BI A

pp

lica

tio

ns Reporting

systems

Ad hoc systems

Dashboards

Analytics systems

Our Focus Today

Back Room Front Room

Metadata

Infrastructure and Security

Agenda

Introduction

What is a Data Warehouse?

Dimensional Modeling

Dimensional Modeling

Dimensional modeling

is a technique which

allows you to design a

database that meets

the goals of a data

warehouse.

Steps

Identify Business Process

Identify Grain (level of

detail)

Identify Dimensions

Identify Facts

Build Star

Identify the Business Process

Requirements + Data Availability

Determine discrete business

processes (e.g.)

Sales

Inventory

Student Registration

Identify the Grain

Grain is the level of detail

stored in the data

warehouse.

• Do we store all products, or

just product categories?

• Each month, week, day,

hour?

• Has a big impact on size of

database.

Can be a different grain

for each fact

Typically implement the

lowest possible

dimension grain:

• not because users need

individual records

• because they want to

aggregate in many different

ways

Identify Dimensions

Selection Criteria (where Gender=“Female”)

Row Headers (“College Name”, “Region”, …)

How do you want to slice the data?

What are the artifacts of your business?

Time Dimension - Always present

Conforming Dimensions – very important aspect

of a successful data warehouse!**More on this later

Identify the Facts

Facts are the storage place for the measurements

we take...

Flavors of Facts

Counts, Sums

Additive

Non-Additive

Semi-Additive

Fact-less Facts

Transaction Grain

Periodic Snapshot Grain

Accumulating Snapshot

Grain

Dimensional Modeling - Stars

Why is it called a star?

Dimensional Modeling - Stars

Because it looks like a

star! (kinda)

Fact Table in the center

Dimension Tables

surrounding it

Dimensional Modeling - Constellation

Dimensional Modeling – Fact Tables

Fact Tables

The center of the star

schema

Based on a business

process

Contains the business

process measures

All measures in the fact are

of the same grain

Fact tables are narrow but

deep

Dimensional Modeling – Dim Tables

Dimension Tables

Business entities used to

slice up (determine the

grain) of the Facts

Verbose and textual

Should be conformed

across the organization

Wide but shallow

Always use surrogate

keys*

*exception for the Date Dimension

Star Schema – Physical Model

Date Dimension (my favorite dimension)

The Basic Date Dimension

Date Dimension

Special Date Dimension Attributes

In another language

Semester (First Semester, Second

Semester, …)

High Season (Y/N), Low Season (Y/N)

Season (Winter, Spring, Summer, Fall)

Reporting Day (CurrDay, CurrDay-1D,

CurrDay-2d)

Reporting Month (CurrMonth,

CurrMont-1M, …)

Last Day of Quarter (Y/N)

Last Day of Week (Y/N)

American Holiday (Independence Day,

Christmas, …)

Canadian Holiday

And so many more!

Slowly Changing Dimensions

Known as SCDs

Dimensions change, how

do you handle this?

Three Basic Types

•Type 1

•Type 2

•Type 3

Hmmm.... these

are very

descriptive names.

Slowly Changing Dimensions (SCDs)

Type 1:

• Do not preserve history

• Overwrite the record

Type 2:

• Preserve all history

• Add a new record, indicate

current version

Type 3:

• Preserve a point-in-time

history

• Add additional column(s)

Type 2

Slowly Changing Dimensions: Type 2

SCD workhorse approach

When a dimension

attribute changes, add a

new row and update

effective dates

Old fact rows point to the

previous dimension row

New fact rows point to the

current dimension row

You can use a flag too

Other types of Dimensions

Rapidly Changing

Dimensions

Mini-dimensions

Degenerate Dimension

Junk Dimension

Outrigger

Rapidly Changing Dimensions

AKA: Rapidly Changing

Monster Dimensions

A dimension with

attributes that change

frequently is considered a

rapidly changing

dimension

Produces very large

dimension tables

Cannot be handled with

Type 2 approach (gets

too big)

Mini-dimensions

Technique for Rapidly

Changing Monster Dimension

Use mini-dimensions

• Split up the rapidly changing

attributes to a mini-dimension

• Join the mini-dimension to the fact

table

Use banded ranges

• Minimizes rows (no discrete values)

• A significant compromise

Customer Dimension

PK Customer Key

Customer ID Name Address DoB Date of First Order ------- Age Gender Annual Income Number of Children Marital Status

Fact Table

FK1 Customer Key More Foreign Keys Facts...

New Customer Dimension

PK Customer Key

Customer ID Name Address DoB Date of First Order

Customer Demographics Dim

PK Customer Demo Key

Age Band Gender Annual Income Band Num of Children Band Marital Status

Fact Table 2

FK2 Customer KeyFK3 Customer Demo Key More Foreign Keys Facts...

Other Dimensions

Rapidly Changing

Dimensions

Mini-dimensions

Degenerate Dimension

Junk Dimension

Outrigger

Other Dimensions

Rapidly Changing

Dimensions

Mini-dimensions

Degenerate Dimension

Junk Dimension

Outrigger

A dimension key that

has no attributes.

A dimensional attribute

stored in the fact table

Examples:

Transaction Number

Invoice Number

Line Item Number

Ticket Number

Other Dimensions

Rapidly Changing

Dimensions

Mini-dimensions

Degenerate Dimension

Junk Dimension

Outrigger

Do you have a drawer in

your kitchen that is a catch

all for stuff that you might

need...the junk drawer?

A collection of low

cardinality flags and

indicators that you might

need.

Examples: Payment Type,

Inbound/Outbound, Order

Type

Other Dimensions

Rapidly Changing

Dimensions

Mini-dimensions

Degenerate Dimension

Junk Dimension

Outrigger

Exception, not the rule!

The start of snow-flaking

A secondary dimension table is

connected to a dimension table

(not via a fact).

Human Resource Fact

FK1 Employee Key More FK HR Fact 1 HR Fact 2

Employee Dimension

PK Employee Key

Employee Attributes ......FK1 Emp Skill Key

Emplyee Skill Group (Outrigger)

PK Emp Skill Key

Emp Skill Description Emp Skill Category

Just the Facts Tables

Home for the numerical measures

Typically Additive

Three types of Fact Tables

Transactional Grain

Periodic Snapshot Grain

Accumulating Snapshot Grain

Comparison of Fact Table Types

Characteristic Transaction Grain Periodic

Snapshot Grain

Accumulating

Snapshot Grain

Time period

represented

Point in time Regular,

predictable

intervals

Indeterminate time

span, typically

short-lived

Grain One row per

transaction event

One row per period One row per life

Fact table loads Insert Insert Insert and update

Fact row updates Not revisited Not revisited Revisited

whenever activity

Date dimensions Transaction date End of period date Multiple dates for

standard

milestones

Facts Transaction activity Performance for

predefined time

interval

Performance over

finite lifetime

What makes it Enterprise?

Conformed Dimensions & Facts

Common fields across the enterprise domains

Common definition across the enterprise domains

The Bus Architecture

Allows traversing across business processes

Promotes conformity

Conformed Dimensions / Bus Architecture

Dimensional Modeling Embellishments

Snowflaking

Normalizing a dimension

table

OLTP modeler tendency

Not optimal for query

performance

Outriggers

A dimension table is

referenced in another

dimension (i.e. hire date

example)

Bridges

Many to many

relationships not resolved

in fact tables

Sits between a dimension

and a fact

Ragged and variable

depth hierarchies

Snowflaking

What is Snowflaking?

Normalizing in a star

schema

Should be avoided

• Adds complexity to

presentation layer

• SQL is more complex

*good for low cardinality fields

• Adds burden to database optimizers

• Very little space savings

• Impacts Bitmap indexes*

Sometimes OK (Outriggers for low cardinality attributes)

Snowflaking

What is Snowflaking?

Normalizing in a star

schema

Should be avoided

• Adds complexity to

presentation layer

• SQL is more complex

*good for low cardinality fields

• Adds burden to database optimizers

• Very little space savings

• Impacts Bitmap indexes*

Sometimes OK (Outriggers for low cardinality attributes)

DW Tips: Dimensional Modeling Myths

Dimensional data warehouses

are appropriate for summary

level data only

Dimensional models

presuppose the business

questions and therefore are

inflexible

Dimensional models are

departmental

Brining a new data source into

a dimensional data warehouse

breaks existing schemas and

requires new fact tables

A good way to narrow the

scope and manage risk is to

focus on delivering the report

most often requested

Dimensional models are fully

de-normalized

Ralph Kimball invented the fact

and dimension terminology

Kimball University White Paper

DW Tips: 10 Essential Dim Mod Rules

Load detailed atomic data into

dimensional structures

Structure dimensional models

around business processes

Ensure every fact table has a

date dimension table

Ensure all facts in a Fact table

are the same grain

Resolve many-to-many

relationships in fact tables

Resolve many to one

relationships in dimension

tables

Store report lables and filter

domain values in dimension

tables

Dimension tables should use

surrogate keys

Create conformed dimensions

to integrate data across the

enterprise

Continuously balance

requirements and realities to

deliver a DW/BI solution that’s

accepted by business users

and that supports their

decision making

Kimball University Article, Margy Ross, InformationWeek

Thank You

Future Webinars

The ETL Process

Stars in Motion

Columnar and In-memory

databases

Modeling Business Process

• Retail Sales

• Inventory

• CRM

• HR


Recommended