of 38
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
1/38
Business Intelligence andDecision Support Systems
Session 15 -16 :
Data Warehousing
Course : M0574Decision Support Systems
Year : September 2012
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
2/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-2
Learning Objectives
Understand the basic definitions and conceptsof data warehouses
Learn different types of data warehousing
architectures; their comparative advantagesand disadvantages
Describe the processes used in developingand managing data warehouses
Explain data warehousing operations
Explain the role of data warehouses indecision support
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
3/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-3
Learning Objectives
Explain data integration and the extraction,transformation, and load (ETL) processes
Describe real-time (a.k.a. right-time and/or
active) data warehousing Understand data warehouse administration
and security issues
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
4/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-4
Opening Vignette:
DirecTV Thrives with Active DataWarehousing
Company background
Problem description
Proposed solution
ResultsAnswer & discuss the case questions.
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
5/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-5
Main Data Warehousing (DW) Topics
DW definitions
Characteristics of DW
Data Marts
ODS, EDW, Metadata
DW Framework
DW Architecture & ETL Process
DW Development
DW Issues
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
6/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-6
Data Warehouse Defined
A physical repository where relational dataare specially organized to provide enterprise-wide, cleansed data in a standardized format
The data warehouse is a collection ofintegrated, subject-oriented databases designto support DSS functions, where each unit of
data is non-volatile and relevant to somemoment in time
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
7/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-7
Characteristics of DW
Subject oriented
Integrated
Time-variant (time series)
Nonvolatile Summarized
Not normalized
Metadata
Web based, relational/multi-dimensional
Client/server
Real-time and/or right-time (active)
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
8/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-8
Data Mart
A departmental data warehouse thatstores only relevant data
Dependent data martA subset that is created directly from adata warehouse
Independent data martA small data warehouse designed for astrategic business unit or a department
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
9/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-9
Data Warehousing Definitions
Operational data stores (ODS)
A type of database often used as an interim area for adata warehouse
Oper marts
An operational data mart.
Enterprise data warehouse (EDW)
A data warehouse for the enterprise.
MetadataData about data. In a data warehouse, metadatadescribe the contents of a data warehouse and themanner of its acquisition and use
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
10/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-10
A Conceptual Framework for DW
Data
Sources
ERP
Legacy
POS
OtherOLTP/wEB
External
data
Select
Transform
Extract
Integrate
Load
ETL
Process
Enterprise
Data warehouse
Metadata
Replication
API
/
Middleware Data/text
mining
Custom built
applications
OLAP,
Dashboard,
Web
Routine
Business
Reporting
Applications
(Visualization)
Data mart
(Engineering)
Data mart
(Marketing)
Data mart
(Finance)
Data mart
(...)
Access
No data marts option
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
11/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-11
Generic DW Architectures
Three-tier architecture
1. Data acquisition software (back-end)
2. The data warehouse that contains the data &
software3. Client (front-end) software that allows users to
access and analyze data from the warehouse
Two-tier architecture
First 2 tiers in three-tier architecture is combinedinto one
sometime there is only one tier?
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
12/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-12
Generic DW Architectures
Tier 2:
Application server
Tier 1:
Client workstation
Tier 3:
Database server
Tier 1:
Client workstation
Tier 2:
Application & database server
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
13/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-13
DW Architecture Considerations
Issues to consider when deciding whicharchitecture to use: Which database management system (DBMS)
should be used? Will parallel processing and/or partitioning be
used?
Will data migration tools be used to load thedata warehouse?
What tools will be used to support dataretrieval and analysis?
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
14/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-14
A Web-based DW Architecture
Web
Server
Client
(Web browser)
Application
Server
Data
warehouse
Web pages
Internet/
Intranet/
Extranet
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
15/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-15
Alternative DW Architectures
Source
Systems
Staging
Area
Independent data marts
(atomic/summarized data)
End useraccess and
applications
ETL
(a) Independent Data Marts Architecture
Source
Systems
Staging
Area
End user
access and
applications
ETL
Dimensionalized data marts
linked by conformed dimentions
(atomic/summarized data)
(b) Data Mart Bus Architecture with Linked Dimensional Datamarts
Source
Systems
Staging
Area
End user
access and
applications
ETL
Normalized relational
warehouse (atomic data)
Dependent data marts
(summarized/some atomic data)
(c) Hub and Spoke Architecture (Corporate Information Factory)
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
16/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-16
Alternative DW Architectures
Source
Systems
Staging
Area
Normalized relational
warehouse (atomic/some
summarized data)
End user
access and
applications
ETL
(d) Centralized Data Warehouse Architecture
End useraccess and
applications
Logical/physical integration of
common data elementsExisting data warehouses
Data marts and legacy systmes
Data mapping / metadata
(e) Federated Architecture
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
17/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-18
Which Architecture is the Best?
Bill Inmon versus Ralph Kimball
Enterprise DW versus Data Marts approach
Empirical study by Ariyachandra and Watson (2006)
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
18/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-19
Data Warehousing Architectures
1. Information
interdependence betweenorganizational units
2. Upper managementsinformation needs
3. Urgency of need for adata warehouse
4. Nature of end-user tasks
5. Constraints on resources
6. Strategic view of the data
warehouse prior toimplementation
7. Compatibility with existingsystems
8. Perceived ability of the in-houseIT staff
9. Technical issues
10. Social/political factors
Ten factors that potentially affect thearchitecture selection decision:
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
19/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-20
Enterprise Data Warehouse(by Teradata Corporation)
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
20/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-21
Data Integration and the Extraction,Transformation, and Load (ETL) Process
Data integration
Integration that comprises three major processes:data access, data federation, and change capture.
Enterprise application integration (EAI)
A technology thatprovides a vehicle for pushing datafrom source systems into a data warehouse
Enterprise information integration (EII)
An evolving tool space that promises real-time dataintegration from a variety of sources
Service-oriented architecture (SOA)
A new way of integrating information systems
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
21/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-22
Extraction, transformation, and load (ETL) process
Data Integration and the Extraction,Transformation, and Load (ETL) Process
Packaged
application
Legacy
system
Other internal
applications
Transient
data source
Extract Transform Cleanse Load
Data
warehouse
Data mart
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
22/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-23
ETL
Issues affecting the purchase of and ETL tool
Data transformation tools are expensive
Data transformation tools may have a longlearning curve
Important criteria in selecting an ETL tool
Ability to read from and write to an unlimitednumber of data sources/architectures
Automatic capturing and delivery of metadata A history of conforming to open standards
An easy-to-use interface for the developer and thefunctional user
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
23/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-24
Benefits of DW
Direct benefits of a data warehouse Allows end users to perform extensive analysis
Allows a consolidated view of corporate data
Better and more timely information
Enhanced system performance Simplification of data access
Indirect benefits of data warehouse
Enhance business knowledge
Present competitive advantage Enhance customer service and satisfaction
Facilitate decision making
Help in reforming business processes
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
24/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-25
Data Warehouse Development
Data warehouse development approaches
Inmon Model: EDW approach (top-down)
Kimball Model: Data mart approach (bottom-up)
Which model is best? There is no one-size-fits-all strategy to DW
One alternative is the hosted warehouse
Data warehouse structure: The Star Schema vs. Relational
Real-time data warehousing?
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
25/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-26
DW Development Approaches
See Table 8.3 for details
(Inmon Approach) (Kimball Approach)
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
26/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-27
DW Structure: Star Schema(a.k.a. Dimensional Modeling)
Claim Information
Driver Automotive
TimeLocation
Start Schema Example for an
Automobile Insurance Data Warehouse
Dimensions:
How data will be sliced/
diced (e.g., by location,
time period, type of
automobile or driver)
Facts:
Central table that contains
(usually summarized)
information; also contains
foreign keys to access each
dimension table.
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
27/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-28
Dimensional Modeling
Data cubeA two-dimensional,three-dimensional, or
higher-dimensionalobject in which eachdimension of the datarepresents a measureof interest
- Grain- Drill-down- Slicing
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
28/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-29
Best Practices for Implementing DW
The project must fit with corporate strategy
There must be complete buy-in to the project
It is important to manage user expectations
The data warehouse must be built incrementally
Adaptability must be built in from the start
The project must be managed by both IT andbusiness professionals (a businesssupplierrelationship must be developed)
Only load data that have been cleansed/high quality
Do not overlook training requirements
Be politically aware.
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
29/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-30
Risks in Implementing DW
No mission or objective
Quality of source data unknown
Skills not in place
Inadequate budget
Lack of supporting software
Source data not understood
Weak sponsor
Users not computer literate Political problems or turf wars
Unrealistic user expectations
(Continued )
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
30/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-31
Risks in Implementing DW Cont.
Architectural and design risks
Scope creep and changing requirements
Vendors out of control
Multiple platforms
Key people leaving the project
Loss of the sponsor
Too much new technology
Having to fix an operational system Geographically distributed environment
Team geography and language culture
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
31/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-32
Things to Avoid for SuccessfulImplementation of DW
Starting with the wrong sponsorship chain
Setting expectations that you cannot meet
Engaging in politically naive behavior
Loading the warehouse with information justbecause it is available
Believing that data warehousing database
design is the same as transactional DB design Choosing a data warehouse manager who is
technology oriented rather than user oriented
(see more on page 356)
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
32/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-33
Real-time DW(a.k.a. Active Data Warehousing)
Enabling real-time data updates forreal-time analysis and real-time decisionmaking is growing rapidly
Push vs. Pull (of data)
Concerns about real-time BI Not all data should be updated continuously
Mismatch of reports generated minutes apart May be cost prohibitive
May also be infeasible
Evolution of DSS & DW
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
33/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-34
Evolution of DSS & DW
h
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
34/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-35
Active Data Warehousing(by Teradata Corporation)
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
35/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-36
Comparing Traditional and Active DW
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
36/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-37
Data Warehouse Administration
Due to its huge size and its intrinsic nature, aDW requires especially strong monitoring inorder to sustain its efficiency, productivityand security.
The successful administration andmanagement of a data warehouse entailsskills and proficiency that go past what is
required of a traditional databaseadministrator.
Requires expertise in high-performance software,hardware, and networking technologies
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
37/38
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-38
DW Scalability and Security
Scalability The main issues pertaining to scalability:
The amount of data in the warehouse
How quickly the warehouse is expected to grow
The number of concurrent users
The complexity of user queries
Good scalability means that queries and otherdata-access functions will grow linearly with the
size of the warehouse
Security
Emphasis on security and privacy
7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse
38/38
Copyright 2011 Pearson Education Inc Publishing as Prentice Hall8 39
BI / OLAP Portal for Learning
MicroStrategy, and much more www.TeradataStudentNetwork.com
Password: [**Keyword**]
http://www.teradatastudentnetwork.com/http://www.teradatastudentnetwork.com/