Date post: | 05-Apr-2018 |
Category: |
Documents |
Upload: | spear-brisko |
View: | 225 times |
Download: | 0 times |
of 84
7/31/2019 2 Final DW Concepts
1/84
Data Warehousing Concepts
Module 1
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
2/84
Agenda
Data warehousing
overview
Data warehouse Vs OLTP
Data warehouse Vs DataMart
Data Modeling
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
3/84
Decision Support
In order to make correct decisions,accurate, meaningful informationinformation
about business environments,external issues, and internalworkings must be available in atimely fashion.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
4/84
OLTP Systems
Capable of answering questions of aspecific nature and time frame.
How many items do I have instock today?
How many tickets were sold on aspecific date?
What is the current price of anitem?
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
5/84
OLTP Systems
Transaction based systems experiencegreat difficulty in answering analytical anddecision support questions.
Analysis takes a long time, interfering with:
transaction performance
daily operations
The nature of the data is dynamic anddispersed.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
6/84
OLTP Systems
Most organizations have created a spiderweb of systems and data sources.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
7/84
Which are ourlowest/highest margin
customers ?
Which are ourlowest/highest margin
customers ?
Who are my customers
and what productsare they buying?
Who are my customers
and what productsare they buying?
Which customersare most likely to goto the competition ?
Which customers
are most likely to goto the competition ?
What impact willnew products/services
have on revenue
and margins?
What impact willnew products/services
have on revenueand margins?
What product prom-
-otions have the biggestimpact on revenue?
What product prom-
-otions have the biggestimpact on revenue?
What is the mosteffective distribution
channel?
What is the mosteffective distribution
channel?
A producer wants to know.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
8/84
Data, Data everywhereyet ...
I cant find the data I need
data is scattered over the network
many versions, subtle differences
I cant get the data I needneed an expert to get the data
I cant understand the data Ifound
available data poorly documented
I cant use the data I found
results are unexpected
data needs to be transformed
from one form to other
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
9/84
What is a Data Warehouse?
A single, complete andconsistent store of dataobtained from a variety
of different sourcesmade available to endusers in a what theycan understand and use
in a business context.
[Barry Devlin]
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
10/84
What are the users saying...
Data should be integratedacross the enterprise
Summary data has a realvalue to the organization
Historical data holds thekey to understanding data
over timeWhat-if capabilities are
required
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
11/84
What is Data Warehousing?
A process of
transforming data into
information andmaking it available tousers in a timelyenough manner to
make a difference
Data
Information
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
12/84
Evolution
60s: Batch reportshard to find and analyze information
inflexible and expensive, reprogram every newrequest
70s: Terminal-based DSS and EIS (executiveinformation systems)
still inflexible, not integrated with desktop tools
80s: Desktop data access and analysis tools
query tools, spreadsheets, GUIs
easier to use, but only access operational databases 90s till now: Data warehousing with
integrated OLAP engines and tools, real timeDW
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
13/84
Data Warehousing --It is a process
Technique for assembling andmanaging data from varioussources for the purpose of
answering business questions.Thus making decisions that werenot previous possible
A decision support database
maintained separately from theorganizations operationaldatabase
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
14/84
Data Warehouse
A data warehouse is a
subject-oriented
integratedtime-varying
non-volatile
collection of data that is used primarily in
organizational decision making.
-- Bill Inmon, Building the Data Warehouse 1996
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
15/84
Explorers, Farmers and Tourists
Explorers: Seek out the unknown andpreviously unsuspected rewards hiding inthe detailed data
Farmers: Harvest informationfrom known access paths
Tourists: Browse informationharvested by farmers
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
16/84
Data Warehouse Architecture
Data Warehouse
Engine
Optimized Loader
ExtractionCleansing
Analyze
Query
Metadata Repository
Relational
Databases
Legacy
Data
Purchased
Data
ERPSystems
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
17/84
Data Warehouse for DecisionSupport & OLAP
Putting Information technology to help the
knowledge worker make faster and better
decisionsWhich of my customers are most likely to go
to the competition?
What product promotions have the biggest
impact on revenue?How did the share price of software
companies correlate with profits over last 10
years?
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
18/84
Decision Support
Used to manage and control business
Data is historical or point-in-time
Optimized for inquiry rather than update
Use of the system is loosely defined and
can be ad-hoc
Used by managers and end-users tounderstand the business and make
judgements
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
19/84
Data Mining works with WarehouseData
Data Warehousingprovides the Enterprisewith a memory
Data Mining providesthe Enterprise withintelligence
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
20/84
We want to know ... Given a database of 100,000 names, which persons are the
least likely to default on their credit cards?
Which types of transactions are likely to be fraudulentgiven the demographics and transactional history of aparticular customer?
If I raise the price of my product by Rs. 2, what is theeffect on my ROI?
If I offer only 2,500 airline miles as an incentive topurchase rather than 5,000, how many lost responses willresult?
If I emphasize ease-of-use of the product as opposed to itstechnical capabilities, what will be the net effect on myrevenues?
Which of my customers are likely to be the most loyal?
Data Mining helps extract such information
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
21/84
Application Areas
Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud AnalysisTelecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providers Value added data
Utilities Power usage analysis
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
22/84
What makes data mining possible?
Advances in the following areas aremaking data mining deployable:
data warehousingbetter and more data (i.e., operational,
behavioral, and demographic)
the emergence of easily deployed data
mining tools andthe advent of new data mining
techniques. -- Gartner Group
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
23/84
Why Separate Data Warehouse?
PerformanceOperational database designed & tuned for known transactions &
workloads.Complex OLAP queries would degrade performance. for op
transactions.
Special data organization, access & implementation methodsneeded for multidimensional views & queries.
FunctionMissing data: Decision support requires historical data, which
Operational database do not typically maintain.
Data consolidation: Decision support requires consolidation(aggregation, summarization) of data from many heterogeneoussources: operational databases, external sources.
Data quality: Different sources typically use inconsistent datare resentations codes and formats which have to be reconciled.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-APhttp://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
24/84
Benefits of a Data WarehouseReliable reporting
Rapid access to data
Integrated dataRapid access to data
Flexible presentation of data
Better decision making
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
25/84
What are Operational Systems?
They are OLTP systems
Run mission criticalapplications
Need to work withstringent performancerequirements forroutine tasks
Used to run abusiness!
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
26/84
RDBMS used for OLTP
Database Systems have been usedtraditionally for OLTP
clerical data processing tasksdetailed, up to date data
structured repetitive tasks
read/update a few records
isolation, recovery and integrity arecritical
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
27/84
Operational Systems
Run the business in real time
Based on up-to-the-second data
Optimized to handle large
numbers of simple read/writetransactions
Optimized for fast response topredefined transactions
Used by people who deal with
customers, products -- clerks,salespeople etc.
They are increasingly used bycustomers
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
28/84
Examples of Operational Data
Data Industry Usage Technology Volumes
CustomerFile
All TrackCustomerDetails
Legacy application, flatfiles, main frames
Small-medium
AccountBalance
Finance Controlaccountactivities
Legacy applications,hierarchical databases,mainframe
Large
Point-of-Sale data
Retail Generatebills, managestock
ERP, Client/Server,relational databases
Very Large
CallRecord
Telecomm-unications
Billing Legacy application,hierarchical database,mainframe
Very Large
ProductionRecord
Manufact-uring
ControlProduction
ERP,relational databases,AS/400
Medium
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
29/84
So, whats different?
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
30/84
Application-Orientation vs.Subject-Orientation
Application-Orientation
Operational
Database
LoansCreditCard
Trust
Savings
Subject-Orientation
Data
Warehouse
Customer
Vendor
Product
Activity
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
31/84
OLTP vs. Data Warehouse
OLTP systems are tuned for knowntransactions and workloads while workloadis not known a priori in a data warehouse
Special data organization, access methodsand implementation methods are needed tosupport data warehouse queries (typicallymultidimensional queries)
e.g., average amount spent on phone callsbetween 9AM-5PM in Pune during themonth of December
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
32/84
OLTP vs Data Warehouse
OLTPApplication
Oriented
Used to runbusiness
Detailed data
Current up to date
Isolated Data
Repetitive access
Clerical User
Warehouse (DSS)Subject Oriented
Used to analyze
businessSummarized and
refined
Snapshot data
Integrated Data
Ad-hoc access
Knowledge User(Manager)
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
33/84
OLTP vs Data Warehouse
OLTPPerformance
Sensitive
Few Recordsaccessed at a time(tens)
Read/Update Access
No data redundancyDatabase Size
100MB -100 GB
Data Warehouse
Performance relaxed
Large volumes accessed
at a time(millions)Mostly Read (Batch
Update)
Redundancy present
Database Size
100 GB - few terabytes
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
34/84
OLTP vs Data Warehouse
OLTP
Transactionthroughput is the
performance metricThousands of users
Managed inentirety
Data Warehouse
Query throughputis the performance
metricHundreds of users
Managed bysubsets
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
35/84
To summarize ...
OLTP Systems areused to runabusiness
The DataWarehouse helpsto optimizethebusiness
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
36/84
Why Now?
Data is being produced
ERP provides clean data
The computing power is available
The computing power is affordable
The competitive pressures are strong
Commercial products are available
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
37/84
Data Warehouses:Architecture, Design & Construction
DW Architecture
Loading, refreshing
Structuring/Modeling
DWs and Data Marts
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
38/84
Data Warehouse ArchitecturesGeneric Two-Level Architecture
Independent Data Mart
Dependent Data Mart andOperational Data Store
All involve some form ofextraction, transformation and loading(ETLETL)
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
39/84
Generic two-level architecture
E
T
LOne,company-
wide
warehouse
Periodic extraction data is not completely current in warehouse
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
40/84
Independent Data MartData marts:Data marts:Mini-warehouses, limited in scope
E
T
L
Separate ETL for each
independentdata mart
Data access complexity
due to multiple data marts
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
41/84
Dependentdata mart with operational data store
E
T
L
Single ETL for
enterprise data warehouse
(EDW)(EDW)
Simpler data access
ODSODSprovides option for
obtaining currentdata
Dependentdata marts
loaded from EDW
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
42/84
The ETL Process
Capture
Scrub or data cleansing
Transform
Load
ETL = Extract, transform, and load
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
43/84
Steps in data reconciliation
Static extractStatic extract = capturing a
snapshot of the source data at
a point in time
Incremental extractIncremental extract =
capturing changes that have
occurred since the last static
extract
Capture = extractobtaining a snapshot
of a chosen subset of the source data for
loading into the data warehouse
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
44/84
Steps in data reconciliation (continued)
Scrub = cleanseuses pattern
recognition and AI techniques to
upgrade data quality
Fixing errors:Fixing errors: misspellings,erroneous dates, incorrect field usage,
mismatched addresses, missing data,
duplicate data, inconsistencies
Also:Also: decoding, reformatting, timestamping, conversion, key generation,
merging, error detection/logging,
locating missing data
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
45/84
Steps in data reconciliation (continued)
Transform = convert data from format
of operational system to format of data
warehouse
RecordRecord--level:level:Selection data partitioning
Joining data combining
Aggregation data summarization
FieldField--level:level:single-field from one field to one field
multi-field from many fields to one, or
one field to many
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
46/84
Steps in data reconciliation (continued)
Load/Index= place transformed data
into the warehouse and create indexes
Refresh mode:Refresh mode:bulk rewriting oftarget data at periodic intervals
Update mode:Update mode: only changes insource data are written to data
warehouse
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
47/84
Data Warehouse vs. Data Marts
What comes first ?
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
48/84
Data Mart
Data mart is:
A functional segmentfunctional segmentof an enterpriserestricted for purposes of security, locality,
performance, or business necessity usingmodeling and information deliverytechniques identical to data warehousing.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
49/84
Data Mart
Why build a data mart?
Allows an organization to visualize the large but focuson the small and attainable.
Provides a platform for rapid delivery of an operationalsystem.
Minimizes risk.
A corporate warehouse can be constructed from theunion of the enterprise data marts.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
50/84
Data Mart
Data
Warehouse
Data From
Transaction Sources
Financial
Data Mart
Logistics
Data Mart
Contract
DataMart
Update From the
Warehouse
The data warehouse
populates
the data marts.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
51/84
Data Mart
DataWarehouse
Data FromTransaction Sources
FinancialData Mart
LogisticsData Mart
ContractData Mart
Update From the
Data Marts
The data marts populatethe data warehouse.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
52/84
Data Mart- Approach
Physical data warehouse (physical)
Data warehouse --> data marts
Data marts --> data warehouse
Parallel data warehouse and data marts
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
53/84
Top-down
SOURCE DATA
External
Data
Operational Data
Staging Area
Data Warehouse Data Marts
Physical Data Warehouse:Data Warehouse --> Data Marts
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
54/84
Bottom-up approach
SOURCE DATA
External
Data
Operational Data
Staging Area
Data Warehouse
Data Marts
Physical Data Warehouse:
Data Marts --> Data Warehouse
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
55/84
Hybrid
SOURCE DATA
External
Data
Operational Data
Staging Area
Data Warehouse
Data Marts
Physical Data Warehouse:
Parallel Data Warehouse & Data Marts
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
56/84
Data Modeling
WHAT IS A DATA MODELING?A data model is an abstraction of some aspect of the real world
(system).
WHY A DATA MODEL?
Helps to visualise the business
A model is a means of communication.
Model helps elicit and document requirements.
Models reduce the cost of change.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
57/84
What is Data ModelingWhat is Data Modeling
Data-oriented activity!
Part art, part science
Highly detailed, iterative process
Uses basic objects to deliver pictorial
image of requirementsEntities (ERD &DDM)
Attributes (ERD & DDM)
Relationships (ERD & DDM)
Uses Metadata to supplement datarequirements described by pictorial image
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
58/84
Based on use and enforcement of DataStandards
Depends on knowledgeable, committedparticipants for its ultimate success
Business subject matter experts (SME)
Information technology professionals
Foundation for future business data
requirements
More important now than ever!
What is Data ModelingWhat is Data Modeling
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
59/84
Why do you need one?Why do you need one?
To identify the basic informationstructure of the business and its systems.
To develop and promote a standard,unified vocabulary for data.
To provide a firm foundation for
delivering high-quality systems that meetbusiness needs.
To facilitate data integration.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
60/84
Why do you need one?Why do you need one?
To eliminate ambiguity of datadefinitions
To reduce software development cost
To reduce database development cost
To provide standardised methods ofhandling data
To facilitates application integrationTo position the corporation for future
database technology
To preserve data context.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
61/84
Why do you need one?
The quality of a database is
in its initial design!
It is a means, not an end!
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
62/84
What happens if you donWhat happens if you dont havet haveone?one?
Individual Data Store
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
63/84
What happens if you dont haveone?
Corporate Data Store
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
64/84
What happens if you donWhat happens if you dont havet haveone?one?
You Loose Data!
Individuals know where to find data.
Data is modified when moved.
Looses meaning and value.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
65/84
Business
ProcessConceptual
Logical
Model PhysicalModel
Levels of modeling
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
66/84
Levels of Data ModelingLevels of Data Modeling
Logical Modeling
Focused on business
requirements
Independent of technical
platform
Normalized - The Key,
the whole Key and
nothing but the Key, so
help me Codd!
Physical Modeling
Focused on technical
requirements
Dependent on a
technical platform and
DBMS
De-normalized
(Optimized) to
enhance performance
Conceptual Modeling
The 50,000 foot view of the businessrequirements
The precursor to Logical Modeling
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
67/84
What Data Modeling isWhat Data Modeling is notnot
A waste of time!
A one time effort
The ultimate IT application development
cure
A quick process
A function solely performed and
understood by and for IT professionals
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
68/84
Where Data Models are usedWhere Data Models are used
Operational SystemsTraditional Applications designed to run the day-to-day business of the Enterprise
External Systems ***Data used within an Enterprise that is obtained from
outside sources
Staging Areas ***Created to aid in the collection and transformation of
data that is targeted for a Data Warehouse
Operational Data Store ***W. H. Inmon and Claudia Imhoff definition: Asubject-oriented, integrated, volatile, current valueddata store containing only corporate detailed data.
*** - Not discussed here
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
69/84
Data Warehouse (DW)
W. H. Inmon definition: A subject-oriented, integrated,non-volatile, time-variant collection of data organizedto support management needs.
Data Mart (DM)
TDWI definition: A data structure that is optimized foraccess. It is designed to facilitate end-user analysis ofdata. It typically supports a single analytic applicationused by a distinct set of workers.
Where Data Models are usedWhere Data Models are used
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
70/84
Sample conceptual model
Products
Customer
Invoices
CustomerAddresses
Customers
GeographicBoundaries
Sales Reps
Sample
Conceptual
Model
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
71/84
Sample logical model
PRODUCT#PRODUCT CODE
.PRODUCT DESCRIPTION
CUSTOMER#CUSTOMER ID
#SNAPSHOT DATE.CUSTOMER NAME
CUSTOMER INVOICE
#INVOICE ID#LINE ITEM SEQ
.INVOICE DATE
SALES REP
#SALES REP ID
CUSTOMER ADDRESS#CUSTOMER ID
#ADDRESS ID
GEOGRAPHIC
BOUNDARY#GEO CODE
the bill for
purchased
by
the bill sent to
purchased at
the bill purchased by
purchased by
the general location of
located withinfor the
customer
sold to by
the salesman
for
the sales
manager for
for the
customer
managed by
the salesman
for
sold by
Sample Logical Model
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
72/84
Physical Model
A Physical data model may include
Referential Integrity
Indexes
Views
Alternate keys and other constraints
Tablespaces and physical storageobjects.
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
73/84
PRODUCTS
# PRODUCT_CODE
PRODUCT_DESCRIPTIONCATEGORY_CODECATEGORY_DESCRIPTION
CUSTOMER_INVOICES
#INVOICE_ID#LINE_ITEM_SEQ
INVOICE_DATECUSTOMER_ID
BILL_TO_ADDRESS_ID
SALES_REP_IDMANAGER_REP_IDORGANIZATION_ID
ORG_ADDRESS_ID
PRODUCT_CODEQUANTITY
UNIT_PRICEAMOUNT
oPRODUCT_COSTLOAD_DATE
CUSTOMERS
#CUSTOMER_ID#SNAPSHOT_DATE
CUSTOMER_NAMEoAGE
oMARITAL_STATUS
CREDIT_RATING
CUSTOMER_ADDRESSES
#CUSTOMER_ID
#ADDRESS_IDADDRESS_LINE1
oADDRESS_LINE2oPOSTAL_CODE
SALES_REP_IDGEO_CODE
LOAD_DATE
GEOGRAPHIC_BOUNDARIES
#GEO_CODE
CITY_NAMESTATE_NAME
COUNTRY_NAMEoCITY_ABBRV
oSTATE_ABBRVoCOUNTRY_ABBRV
SALES_REPS
#SALES_REP_IDLAST_NAMEFIRST_NAME
oMANAGER_FIRST_NAME
oMANAGER_LAST_NAME
Sample Physical
Model
Physical Model
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
74/84
Types of Data ModelsTypes of Data Models
Multiple varieties
Entity Relationship Model (ERM)
Dimensional Data Model (DDM)
Each serves specific purpose
Common to OLTP applications:
Entity Relationship Model
Common to OLAP applications:
Entity Relationship Model
Dimensional Data Model
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
75/84
Entity Relationship ModelEntity Relationship Model
is rented underreports to
completes
employs
receives
makes
rents under
is made on
is rented as
CREDIT CARD
credit card number
credit card exp
credit card type
CHECK
check bank number
check number
CUSTOMERcustomer number
name
address
phone
credit card
credit card exp
status code
MOVIEmovie number
name
director
description
star
rating
genre
rental rate
STOREstore number
manager
address
phone
EMPLOYEEemployee number
name
address
phone
ss#
hire_date
salary
supervisor (FK)
PAYMENTpayment transaction number
type
amount
date
status
MOVIE RENTAL RECORDrental record date
rental date
due date
rental status
rental rate
overdue charge
MOVIE COPYmovie copy number
general condition
IN STOCK MOVIEmovie-number
store-number
quantity
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
76/84
Dimensional Modeling
Database organizationmust look like business
must be recognizable by business user
approachable by business userMust be simple
De-normalized Schema TypesStar Schema
Snowflake schema
Fact Constellation Schema
C t f t ht h m
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
77/84
Components of a star schemastar schema
Fact tables containfactual or quantitative
data
Dimension tables contain
descriptions about thesubjects of the business
1:N relationship
between dimensiontables and fact tables
Excellent for ad-hoc queries,
but bad for online transaction processing
Dimension tables are
denormalized to
maximizeperformance
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
78/84
Star schema example
Fact tableprovides statistics for sales
broken down by product, period and storedimensions
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
79/84
Star schema with sample data
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
80/84
Snowflake schema
Represent dimensional hierarchy directly bynormalizing tables.
Easy to maintain and saves storage
Ti
m
e
prod
cust
city
fa
ct
date, custno, prodno, cityname, ...
regio
n
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-APhttp://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
81/84
Fact Constellation
Fact Constellation
Multiple fact tables that share manydimension tables
Booking and Checkout may share manydimension tables in the hotel industry
Hotels
Travel Agents
Promotion
Room Type
Customer
Booking
Checkout
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
82/84
DATA MODELING - FOR WHICH COMPONENT?
STAGING AREA
YES ! (maybe multiple data models arerequired)
ODS
YES !
DATAWAREHOUSEYES!
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
83/84
Where to use what?
Normalized Data ModelOLTP
Flat Table withoutconstraints
StagingArea
Normalized modelODS
Dimensional modelData marts
Types of ModelStages
http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP7/31/2019 2 Final DW Concepts
84/84
DW and role of E/R Modelling
Ralph Kimball says.
ER Models are too
complicated for end users to
understand
ER Modeling/normalisingonly suitable for OLTP or in
data staging area since it
eliminates redundancy
Results in too many tables to
be easy to query ER models are optimised for
update activity not high
performance querying
Bill Inmon says.
ER Model is suitable for data
warehouses because it is
stable, and supports
consistency and flexibility Normalised data is ideal
basis for the design of the
Data Warehouse and the
ODS
May not be suitable for thedata mart, which deals
heavily with regular query
activity and time-variant
analysis