Post on 25-Sep-2020
transcript
The Data WarehouseThe Data Warehouse
Chapter 6
6.1 Operational Databasesp
Data Modeling and Normalizationg
O O R l i hi• One-to-One Relationships• One-to-Many Relationships• Many-to-Many Relationships
Data Modeling and Normalizationg
• First Normal FormFirst Normal Form• Second Normal Form
Thi d N l F• Third Normal Form
Type IDYear
Make
Income Range
Customer ID
Vehicle - Type Customer
Figure 6.1 A simple entity-relationship diagram
The Relational ModelThe Relational Model
Table 6.1a • Relational Table for Vehicle-Type
Type ID Make Year
4371 Chevrolet 19956940 Cadillac 20004595 Chevrolet 20014595 Chevrolet 20012390 Cadillac 1997
Table 6.1b • Relational Table for Customer
C t ICustomer IncomeID Range ($) Type ID
0001 70–90K 23900002 30–50K 43710003 70 90K 69400003 70–90K 69400004 30–50K 45950005 70–90K 2390
Table 6.2 • Join of Tables 6.1a and 6.1b
C t ICustomer IncomeID Range ($) Type ID Make Year
0001 70–90K 2390 Cadillac 19970002 30–50K 4371 Chevrolet 19950003 70 90K 6940 Cadillac 20000003 70–90K 6940 Cadillac 20000004 30–50K 4595 Chevrolet 20010005 70–90K 2390 Cadillac 1997
6.2 Data Warehouse Designg
The Data Warehouse
“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile g , ,collection of data in support of management’s decision making process g g p(W.H. Inmon).”
Granularity
Granularity is a term used to describe the level of detail of stored information.
Dependent
ExternalData
pData Mart
Extract/Summarize Data
OperationalDatabase(s)
Decision Support SystemDataWarehouse
ETL Routine(Extract/Transform/Load)
IndependentData Mart
ReportData Mart
Figure 6.2 A data warehouse process model
Entering Data into the WarehouseEntering Data into the Warehouse
• Independent Data Mart• ETL (Extract, Transform, Load Routine)ETL (Extract, Transform, Load Routine)• Metadata
Structuring the Data Warehouse:Structuring the Data Warehouse: Two Methods
• Structure the warehouse model using the star schema
• Structure the warehouse model as a multidimensional arraymultidimensional array
The Star SchemaThe Star Schema
• Fact Table• Dimension TablesDimension Tables• Slowly Changing Dimensions
Purchase Key Category1 Supermarket2 Travel & Entertainment
Purchase Dimension
3 A t & V hi l Time Dimension
.
...
3 Auto & Vehicle4 Retail5 Restarurant6 Miscellaneous
Time Key Month10 Jan
Time DimensionYearQuarterDay
15 2002...
.
.
.
.
.
.
.
.
.
.
.
.. .
Cardholder Key Purchase Key1 2
Fact TableAmountTime KeyLocation Key
101 14.5015 4 115 8.251 2 103 22.40...
.
.
.
.
.
.
.
.
.
.
.
.
Location Key Street10 425 Church St
Location DimensionRegionStateCity
SCCharleston 3..
.
...
.
...
GenderMale
.Female
Income Range50 - 70,000
.70 - 90,000
Cardholder Key Name1 John Doe
. .2 Sara Smith
Cardholder Dimension
Figure 6.3 A star schema for credit card purchases
. . .. ...
.
...
.
.
The Multidimensionality of the St S hStar Schema
Cardholder Ci
1 2,10)
PurchaseKey A(C i,1
,2
Time Key
Location Key
Figure 6.4 Dimensions of the fact table shown in Figure 6.3
Additional Relational SchemasAdditional Relational Schemas
• Snowflake Schema• Constellation Schema
Time Key Month5 Dec
Time DimensionYearQuarterDay
431 20018 Jan 13 2002
10 J 15 2002Promotion Key Description
Promotion DimensionCost
1 t h 15 25...
.
.
.
.
.
.
.
.
.
.
.
.
10 Jan 15 2002...
.
.
.
.
.
.
1 watch promo 15.25
Purchase DimensionPurchase Key Category
1 Supermarket2 Travel & Entertainment
Purchase Dimension
3 Auto & Vehicle4 Retail5 Restarurant
Cardholder Key Purchase Key1 2
Purchase Fact TableAmountTime KeyLocation Key
101 14.5015 4 115 8.25
6 Miscellaneous
Cardholder Key Promotion Key1 1
Promotion Fact TableResponseTime Key
5 Yes2 1 5 No
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 2 103 22.40...
.
.
.
.
.
.
.
.
.
Location Key Street5 425 Church St
Location DimensionRegionStateCity
SCCharleston 3...
.
.
.
.
.
.
.
.
.
.
.
.
GenderMale
.
.
Female
Income Range50 - 70,000
.
.
70 - 90,000
Cardholder Key Name1 John Doe
.
...
2 Sara Smith
Cardholder Dimension
Figure 6.5 A constellation schema for credit card purchases and promotions
. .. .
Decision Support: Analyzing theDecision Support: Analyzing the Warehouse Data
• Reporting Data• Analyzing Data• Knowledge Discoveryg y
6 3 On-line Analytical Processing6.3 On-line Analytical Processing
OLAP O iOLAP Operations
• Slice – A single dimension operation• Dice – A multidimensional operationDice A multidimensional operation• Roll-up – A higher level of generalization
ill d A l l f d il• Drill-down – A greater level of detail• Rotation – View data from a new perspective
Month = Dec.
Region = TwoCategory = Vehicle
Count = 110Amount = 6,720Region Two
Dec.
Sep.
Oct.
Nov.
May
Jun.
Jul.
Aug.
pM
onth
Mar.
Feb.
Apr.
y
Jan.
FourThreeTwo
Supe
rmar
ket
Mis
cella
neou
s
Res
taur
ant
Trav
el
Ret
ail
Vehi
cle
Category
RegionOne
wo
Figure 6.6 A multidimensional cube for credit card purchases
Category
Concept Hierarchy
A mapping that allows attributes to beA mapping that allows attributes to beviewed from varying levels of detail.
Region
State
CityCity
Street Address
Figure 6.7 A concept hierarchy for location
Month = Oct./Nov/Dec.Category = SupermarketRegion = OneCategory = Supermarket
Q4
Q2
Q3
Tim
e
On
FourThreeTwoQ1
Supe
rmar
ket
isce
llane
ous
Res
taur
ant
Trav
el
Ret
ail
Vehi
cle
RegionOne
S
Mi
Category
Figure 6.8 Rolling up from months to quarters
6.4 Excel Pivot Tables for Data AnalysisAnalysis
Creating a Simple Pivot Table
Figure 6.9 A pivot table template
Figure 6.10 A summary report for income range
Figure 6.11 A pie chart for income range
i bl f h iPivot Tables for HypothesisTestingTesting
Figure 6.12 A pivot table showing age and credit card insurance choice
Figure 6.13 Grouping the credit card promotion data by age
Figure 6.14 PivotTable Layout Wizard
Creating a Multidimensional Pivot Table
Watch Promo = No
Magazine Promo = YesLife Insurance Promo = Yes
No
ch P
rom
o
YesWat
c
No
agazineo
No
Yes
Yes
Life Insurance Promo
MagaPromo
Figure 6.15 A credit card promotion cube
Figure 6.16 A pivot table with page variables for credit card promotions