The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · Data Modeling and...

Post on 25-Sep-2020

1 views 0 download

transcript

The Data WarehouseThe Data Warehouse

Chapter 6

6.1 Operational Databasesp

Data Modeling and Normalizationg

O O R l i hi• One-to-One Relationships• One-to-Many Relationships• Many-to-Many Relationships

Data Modeling and Normalizationg

• First Normal FormFirst Normal Form• Second Normal Form

Thi d N l F• Third Normal Form

Type IDYear

Make

Income Range

Customer ID

Vehicle - Type Customer

Figure 6.1 A simple entity-relationship diagram

The Relational ModelThe Relational Model

Table 6.1a • Relational Table for Vehicle-Type

Type ID Make Year

4371 Chevrolet 19956940 Cadillac 20004595 Chevrolet 20014595 Chevrolet 20012390 Cadillac 1997

Table 6.1b • Relational Table for Customer

C t ICustomer IncomeID Range ($) Type ID

0001 70–90K 23900002 30–50K 43710003 70 90K 69400003 70–90K 69400004 30–50K 45950005 70–90K 2390

Table 6.2 • Join of Tables 6.1a and 6.1b

C t ICustomer IncomeID Range ($) Type ID Make Year

0001 70–90K 2390 Cadillac 19970002 30–50K 4371 Chevrolet 19950003 70 90K 6940 Cadillac 20000003 70–90K 6940 Cadillac 20000004 30–50K 4595 Chevrolet 20010005 70–90K 2390 Cadillac 1997

6.2 Data Warehouse Designg

The Data Warehouse

“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile g , ,collection of data in support of management’s decision making process g g p(W.H. Inmon).”

Granularity

Granularity is a term used to describe the level of detail of stored information.

Dependent

ExternalData

pData Mart

Extract/Summarize Data

OperationalDatabase(s)

Decision Support SystemDataWarehouse

ETL Routine(Extract/Transform/Load)

IndependentData Mart

ReportData Mart

Figure 6.2 A data warehouse process model

Entering Data into the WarehouseEntering Data into the Warehouse

• Independent Data Mart• ETL (Extract, Transform, Load Routine)ETL (Extract, Transform, Load Routine)• Metadata

Structuring the Data Warehouse:Structuring the Data Warehouse: Two Methods

• Structure the warehouse model using the star schema

• Structure the warehouse model as a multidimensional arraymultidimensional array

The Star SchemaThe Star Schema

• Fact Table• Dimension TablesDimension Tables• Slowly Changing Dimensions

Purchase Key Category1 Supermarket2 Travel & Entertainment

Purchase Dimension

3 A t & V hi l Time Dimension

.

...

3 Auto & Vehicle4 Retail5 Restarurant6 Miscellaneous

Time Key Month10 Jan

Time DimensionYearQuarterDay

15 2002...

.

.

.

.

.

.

.

.

.

.

.

.. .

Cardholder Key Purchase Key1 2

Fact TableAmountTime KeyLocation Key

101 14.5015 4 115 8.251 2 103 22.40...

.

.

.

.

.

.

.

.

.

.

.

.

Location Key Street10 425 Church St

Location DimensionRegionStateCity

SCCharleston 3..

.

...

.

...

GenderMale

.Female

Income Range50 - 70,000

.70 - 90,000

Cardholder Key Name1 John Doe

. .2 Sara Smith

Cardholder Dimension

Figure 6.3 A star schema for credit card purchases

. . .. ...

.

...

.

.

The Multidimensionality of the St S hStar Schema

Cardholder Ci

1 2,10)

PurchaseKey A(C i,1

,2

Time Key

Location Key

Figure 6.4 Dimensions of the fact table shown in Figure 6.3

Additional Relational SchemasAdditional Relational Schemas

• Snowflake Schema• Constellation Schema

Time Key Month5 Dec

Time DimensionYearQuarterDay

431 20018 Jan 13 2002

10 J 15 2002Promotion Key Description

Promotion DimensionCost

1 t h 15 25...

.

.

.

.

.

.

.

.

.

.

.

.

10 Jan 15 2002...

.

.

.

.

.

.

1 watch promo 15.25

Purchase DimensionPurchase Key Category

1 Supermarket2 Travel & Entertainment

Purchase Dimension

3 Auto & Vehicle4 Retail5 Restarurant

Cardholder Key Purchase Key1 2

Purchase Fact TableAmountTime KeyLocation Key

101 14.5015 4 115 8.25

6 Miscellaneous

Cardholder Key Promotion Key1 1

Promotion Fact TableResponseTime Key

5 Yes2 1 5 No

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1 2 103 22.40...

.

.

.

.

.

.

.

.

.

Location Key Street5 425 Church St

Location DimensionRegionStateCity

SCCharleston 3...

.

.

.

.

.

.

.

.

.

.

.

.

GenderMale

.

.

Female

Income Range50 - 70,000

.

.

70 - 90,000

Cardholder Key Name1 John Doe

.

...

2 Sara Smith

Cardholder Dimension

Figure 6.5 A constellation schema for credit card purchases and promotions

. .. .

Decision Support: Analyzing theDecision Support: Analyzing the Warehouse Data

• Reporting Data• Analyzing Data• Knowledge Discoveryg y

6 3 On-line Analytical Processing6.3 On-line Analytical Processing

OLAP O iOLAP Operations

• Slice – A single dimension operation• Dice – A multidimensional operationDice A multidimensional operation• Roll-up – A higher level of generalization

ill d A l l f d il• Drill-down – A greater level of detail• Rotation – View data from a new perspective

Month = Dec.

Region = TwoCategory = Vehicle

Count = 110Amount = 6,720Region Two

Dec.

Sep.

Oct.

Nov.

May

Jun.

Jul.

Aug.

pM

onth

Mar.

Feb.

Apr.

y

Jan.

FourThreeTwo

Supe

rmar

ket

Mis

cella

neou

s

Res

taur

ant

Trav

el

Ret

ail

Vehi

cle

Category

RegionOne

wo

Figure 6.6 A multidimensional cube for credit card purchases

Category

Concept Hierarchy

A mapping that allows attributes to beA mapping that allows attributes to beviewed from varying levels of detail.

Region

State

CityCity

Street Address

Figure 6.7 A concept hierarchy for location

Month = Oct./Nov/Dec.Category = SupermarketRegion = OneCategory = Supermarket

Q4

Q2

Q3

Tim

e

On

FourThreeTwoQ1

Supe

rmar

ket

isce

llane

ous

Res

taur

ant

Trav

el

Ret

ail

Vehi

cle

RegionOne

S

Mi

Category

Figure 6.8 Rolling up from months to quarters

6.4 Excel Pivot Tables for Data AnalysisAnalysis

Creating a Simple Pivot Table

Figure 6.9 A pivot table template

Figure 6.10 A summary report for income range

Figure 6.11 A pie chart for income range

i bl f h iPivot Tables for HypothesisTestingTesting

Figure 6.12 A pivot table showing age and credit card insurance choice

Figure 6.13 Grouping the credit card promotion data by age

Figure 6.14 PivotTable Layout Wizard

Creating a Multidimensional Pivot Table

Watch Promo = No

Magazine Promo = YesLife Insurance Promo = Yes

No

ch P

rom

o

YesWat

c

No

agazineo

No

Yes

Yes

Life Insurance Promo

MagaPromo

Figure 6.15 A credit card promotion cube

Figure 6.16 A pivot table with page variables for credit card promotions