Date post: | 03-Apr-2018 |
Category: |
Documents |
Upload: | pattabhikv |
View: | 218 times |
Download: | 0 times |
of 50
7/28/2019 134294817 Data Warehousing
1/50
7/28/2019 134294817 Data Warehousing
2/50
17 July 2013 2
Contents
Data Warehouse Concepts
Data Warehouse ArchitecturesData Modeling Approaches
Data Modeling Development Cycle
7/28/2019 134294817 Data Warehousing
3/50
17 July 2013 3317 July 2013
Data Warehouse Concepts
7/28/2019 134294817 Data Warehousing
4/50
17 July 2013 4
Data Warehouse Concepts Agenda
A.What is a Data Warehouse (DW) ?B.What are the components of a DW ?
C.What are the various architectures/formats of a DW ?
D.Examples of Data Warehousing tools in use
7/28/2019 134294817 Data Warehousing
5/50
17 July 2013 5
Need for Data Warehousing Business View
Customer Centricity
Single view of each customer and his/her activities Integrated information from heterogeneous sources
Adaptability to rapidly changing business needs
Multiple ways to view business performance
Low cycle time, faster analytics
Increased Global competition
Crunch more and more data, faster and faster
Mergers and Acquisition
With each acquisition comes another set of disparate IT systems
affecting consistency and performance
7/28/2019 134294817 Data Warehousing
6/50
7/28/2019 134294817 Data Warehousing
7/50
17 July 2013
OLTP vs. DSS : A comparison
OLTP Environment get data IN
large volumes of simpletransaction queries
continuous data changes
low processing time
mode of processing
transaction details
data inconsistency
mostly current data
high concurrent usage
highly normalized datastructure
static applications
automates routines
DSS Environment get information OUT
small number of diversequeries
periodic updates only
high processing time
mode of discovery
subject oriented - summaries
data consistency
historical data is relevant
low concurrent usage
fewer tables, but morecolumns per table
Dynamic (ad-hoc) applications
facilitates creativity
7/28/2019 134294817 Data Warehousing
8/50
17 July 2013 8
Data Warehouse Defined
Data Warehouse is a
Subject-Oriented Integrated
Time-Variant
Non-volatile
co l lect ion of data enabl ing m anagement
decis ion making
7/28/2019 134294817 Data Warehousing
9/50
17 July 2013 9
Data Warehouse StorageTransactional Storage
Sales
Customers
Products
Entry
Sales Rep
Quantity Sold
Part Number
Date
Customer Name
Product Description
Unit Price
Mail Address
Process Oriented Subject Oriented
Subject Orientation
7/28/2019 134294817 Data Warehousing
10/50
17 July 2013 10
Load
Access
Mass Load / Access of DataRecord-by-Record Data Manipulation
Insert
Access
Insert
Change
Delete
Change
Volatile Non-Volatile
Data Warehouse StorageTransactional Storage
Data Volatility
7/28/2019 134294817 Data Warehousing
11/50
17 July 2013 11
Data Warehouse StorageTransactional Storage
Current Data Historical Data
0
5
10
15
20
Sales ( in lakhs
)
January February March
Year97
Sales ( Region , Year - Year 97 - 1st Qtr)
East
West
North
Time Variance
7/28/2019 134294817 Data Warehousing
12/50
7/28/2019 134294817 Data Warehousing
13/50
17 July 2013 13
Data Warehouse Concepts Agenda
A.What is a Data Warehouse (DW) ?B.What are the components of a DW ?
C.What are the various architectures/formats of a DW ?
D.Examples of Data Warehousing tools in use
7/28/2019 134294817 Data Warehousing
14/50
17 July 2013 14
Transmission
N
E
T
W
O
RK
Metadata Layer
Cleansing
Transformation
Aggregation
Summarization
Data MartPopulation
Knowledge Discovery
ODS DW
OLAP ANALYSIS
Extraction
DM1
DM2
DMn
Legacy System
FS1
FS2
FSn
.
.
.
S
T
A
G
I
N
G
A
R
EA
Data Warehouse Components
7/28/2019 134294817 Data Warehousing
15/50
17 July 2013 15
Data Warehouse Build Lifecycle
Data extraction
Data Cleansing and Transformation
Data Load and refresh
Build derived data and views
Service queries
Administer the warehouse
7/28/2019 134294817 Data Warehousing
16/50
17 July 2013 16
Data Warehouse Concepts Agenda
A.What is a Data Warehouse (DW) ?B.What are the components of a DW ?
C.What are the various architectures/formats of a DW ?
D.Examples of Data Warehousing tools in use
7/28/2019 134294817 Data Warehousing
17/50
17 July 2013 17
Data Warehouse Architectures
Virtual Data Warehouse
Enterprise Data Warehouse
Distributed Data Marts
Multi-tiered warehouse
7/28/2019 134294817 Data Warehousing
18/50
17 July 2013 18
Legacy
Client/
Server
OLTP
Application
External
REPORT
INGTOOL
U
S
ER
S
OperationalSystemsData
Virtual Data Warehouse
7/28/2019 134294817 Data Warehousing
19/50
17 July 2013 19
DATA WAREHOUSE
Legacy
OLTP
External
AP
I
U
S
ER
S
Select
Extract
Maintain
Transform
Integrate
Data Preparation
Metadata
Repository
Client/
Server
Enterprise Data Warehouse
Operationa
lSystemsData
REPORTINGTOOL
7/28/2019 134294817 Data Warehousing
20/50
17 July 2013 20
A
PI
U
S
E
R
S
Operational Systems
Data
Data Preparation
Data Mart
Data Mart
Data MartLegacy
OLTP
External
Select
Extract
Maintain
Transform
Integrate
Client/
Server
Distributed Data Marts
REPORTINGTOOL
7/28/2019 134294817 Data Warehousing
21/50
17 July 2013 21
DATA
WAREHOUSE
Legacy
Client/
Server
OLTP
External
A
PI
U
S
E
R
S
Operational Systems
Enterprise wide Data
Metadata
Repository
Data Mart
Data Mart
Data Mart
Select
Extract
Maintain
Transform
Integrate
Multi-tiered Data Warehouse: Option 1
REPORTINGTOOL
7/28/2019 134294817 Data Warehousing
22/50
17 July 2013 22
A
PI
U
S
E
R
S
Operational Systems
Data
Data Preparation
Data Mart
Data Mart
Data MartLegacy
OLTP
External
Select
Extract
Maintain
Transform
Integrate
Client/
ServerDATA
WAREHOUSE
Metadata
Repository
Multi-tiered Data Warehouse: Option 2
REPORTINGTOOL
7/28/2019 134294817 Data Warehousing
23/50
17 July 2013 23
Highly Summarized Data
Lightly Summarized Data
Current Detail Data
Older Detail Data
Metadata
Cont.
Relative Data sizes in a Data Warehouse
7/28/2019 134294817 Data Warehousing
24/50
17 July 2013 24
Monthly Sales by Product
for 1991-94
Weekly sales by
product/sub-product
for 1991-94
Sales Detail
for 1991-94
Sales Detail for
1985-90
Metadata
Weekly sales by
region for 1991-94
Monthly sales by
region for 1991-94
Data Warehouse - Example
7/28/2019 134294817 Data Warehousing
25/50
17 July 2013 25
Cont.
Building a Data Warehouse - Steps
Identify key business drivers, sponsorship, risks, ROI
Survey information needs and identify desired functionalityand define functional requirements for initial subject area.
Architect long-term, data warehousing architecture
Evaluate and Finalize DW tool & technology
Conduct Proof-of-Concept
7/28/2019 134294817 Data Warehousing
26/50
17 July 2013 26
Building a Data Warehouse - Steps
Design target data base schema
Build data mapping, extract, transformation, cleansing and
aggregation/summarization rules
Build initial data mart, using exact subset of enterprise datawarehousing architecture and expand to enterprise
architecture over subsequent phases
Maintain and administer data warehouse
7/28/2019 134294817 Data Warehousing
27/50
17 July 2013 27
Tool Category Products
ETL Tools ETI Extract, Informatica, IBM Visual WarehouseOracle Warehouse Builder
OLAP Server Oracle Express Server, Hyperion Essbase, IBM DB2OLAP Server, Microsoft SQL Server OLAP Services,Seagate HOLOS, SAS/MDDB
OLAP Tools Oracle Express Suite, Business Objects, WebIntelligence, SAS, Cognos Powerplay/Impromtu,KALIDO, MicroStrategy, Brio Query, MetaCube
Data Warehouse Oracle, Informix, Teradata, DB2/UDB, Sybase,Microsoft SQL Server, RedBricks
Data Mining &Analysis
SAS Enterprise Miner, IBM Intelligent Miner,SPSS/Clementine, TCS Tools
Representative DW Tools
7/28/2019 134294817 Data Warehousing
28/50
17 July 2013
Top-Down Approach
Using the top-down approach, you can discover and draft a description of the
business process. That description supplies you with concepts that will be usedas a starting place. This is a functional or process-driven analysis.You need to,
as the name implies, start at the top and drill downward, increasing the level
of detail in an iterative fashion. This typically needs more time for
development.
Without it you may miss the following: Assumptions everyone expects you to know
Future developments that could change your direction
Opportunities to increase the quality, usability, accessibility, and enterprise
data
7/28/2019 134294817 Data Warehousing
29/50
17 July 2013
Top-Down Approach
With it you gain the following:
An understanding of the way things fit together, from high to low levels ofdetail
A sense of the political environment that may surround the data
An enhancement of your understanding of data importance
The guide to the level of detail you need for different audiences
7/28/2019 134294817 Data Warehousing
30/50
17 July 2013
Top-Down Approach
In top-down analysis, people are the best source of your information.
The top down implementation can also imply more of a need for an enterprisewide or corporate wide data warehouse with a higher degree of cross
workgroup, department, or line of business access to the data.
A top down implementation can result in more consistent data definitions and
the enforcement of business rules across the organization, from the beginning.
However, the cost of the initial planning and design can be significant. It is atime-consuming process and can delay actual implementation, benefits, and
return-on-investment.
7/28/2019 134294817 Data Warehousing
31/50
17 July 2013
Bottom-Up Approach
The bottom-up approach focuses instead on the inventory of things in a process.
It implies an in-depth understanding of as much of the process as can be knownat this point. Using this approach you discover and draft a list of potential
elements without regard to how theyre used. The list usually consists of a mixed
set of very low-level, detailed notions and high-level concepts. The trick is to
aggregate them to the same level of detail. This is a data-driven analysis. You
concentrate on what things are. You concentrate on the parts rather than the
process. You need to, as the name implies, start at the bottom and aggregate up
while increasing your level of aggregation, again in an iterative fashion. Without
it you may miss a real-world understanding of the data and how it fits together,
as well as the following:
Data areas that everyone expects you to know
7/28/2019 134294817 Data Warehousing
32/50
17 July 2013
Bottom-Up Approach
Relationships
Fuzzy, currently undefined areas that need extra work to bring them to thesame level of understanding
With it you gain the following:
An understanding of the things involved
A sense of the quality levels that may be inherent to the data
An enhancement of your understanding of data definitions
7/28/2019 134294817 Data Warehousing
33/50
17 July 2013
Bottom-Up Approach
In bottom-up analysis, the current environment is the best source of your
information. The bottom up implementation approach has become the choice ofmany organizations, especially business management, because of the faster
payback. It enables faster results because data marts have a less complex design
than a global data warehouse. In addition, the initial implementation is usually
less expensive in terms of hardware and other resources than deploying the
global data warehouse. Typically Bottom Up approach is confined to only
limited set of requirements and focuses on short term solution which delivers the
reporting needs quickly.
Data Modeling Development Cycle
7/28/2019 134294817 Data Warehousing
34/50
17 July 2013
Conceptual Data
Modeling
Logical DataModeling
Physical Data
Modeling
This data model
includes all major
things that need
to be tracked,
along with
constraints.
Usually, specifiedin terms of
business
requirements,
forms, reports
etc.
This is a complete
model that includes
all required tables,
columns,
relationship,
database properties,
referential integrity
constraints for the
physical
implementation.
DBAs instruct the data
Modeling tool to createSQL code from physical
data model. The SQL
code is then executed
on the server to create
databases.
This is the actual
implementation of a
conceptual model in a
logical data model.
Usually expressed in
terms of entities,
attributes,
relationships, and
keys.
Data Modeling Development Cycle
Database
Creation
D l t C l C t l D t M d li
7/28/2019 134294817 Data Warehousing
35/50
17 July 2013
Development Cycle - Conceptual Data Modeling
CDM is the first step in constructing a data model in top-down approach
and is a clear and accurate visual representation of the business of an
organization. In many ways, it represents the users view of the business.
A Conceptual Data Model (CDM) visualizes the users view of the
business and provides high-level information about the subject areas of
an organization.
CDM discussion starts with main subject area of an organization. It relies
on specs, reports, forms, views, requirements, application demos, and
user interactions to form a conceptual view of business.
7/28/2019 134294817 Data Warehousing
36/50
7/28/2019 134294817 Data Warehousing
37/50
Development Cycle Physical Data Modeling
7/28/2019 134294817 Data Warehousing
38/50
17 July 2013
Development Cycle - Physical Data Modeling
Physical Data Models are used to design the internal schema of a database,
depicting the data tables (derived from the logical data entities), the data columnsof those tables (derived from the entity attributes), and the relationships between
the tables (derived from the entity relationships).
Database performance, indexing strategy, physical storage and denormalization
are important parameters of a physical model.
The transformations from logical model to physical model include imposing
database rules, implementation of referential integrity, super types and sub types
etc.
Once physical data model is completed, it is then forwarded to technical teams(developer, group lead, DBA) for review.
Development Cycle CDM LDM PDM comparisons
7/28/2019 134294817 Data Warehousing
39/50
17 July 2013
Development Cycle CDM, LDM, PDM comparisons
Conceptual
Data Model
Logical
Data Model
Physical
Data Model
Provides high-levelinformation about the
subject areas and users
view of an organization.
Represents businessinformation and defines
business rules
Represents the physicalimplementation of the model
in a database.
Subject Areas Entity Table
Things to track Attribute Column
No Keys identified Primary Key Primary Key Constraint
No Keys identified Alternate Key Unique Constraint or Unique
Index
No Rules or constraints Rule, Functional Dependencies Check Constraint, Default
Value, User Definedconstraints, referential
constraints
Relationship Relationship Foreign Key
No Definition or comment Definition Comment
Development Cycle Database Creation / Development
7/28/2019 134294817 Data Warehousing
40/50
17 July 2013
Development Cycle - Database Creation/ Development
A physical database definition (say DDL for DB2, schema for Sybase or Oracle)
can be generated by entering the gathered information into a physical designtool.
This must be reviewed carefully and in all likelihood modified to some degree,
since no physical design tool generates 100 percent perfect database definitions.
The script can then be run against the database management system to define
the physical environment.
Data Modeling for a Data Warehouse
7/28/2019 134294817 Data Warehousing
41/50
17 July 2013
Data Modeling for a Data Warehouse
Following are commonly followed data modeling techniques :
Dimensional Modeling :a) Star Schema ( Denormalized data)
b) Snow Flake Schema (Partial Normalized data)
ER Modeling or Relation Modeling (Normalized data 1 NF, 2NF, 3NF)
Pros and Cons of each Technique.
Star Schema
7/28/2019 134294817 Data Warehousing
42/50
17 July 2013
Fact Table This table is the core of the StarSchema Structure and contains
the Facts or Measures availablethrough the Data Warehouse.
These Facts answer the
questions of What, How
Much, or How Many.
Some Examples:
Sales Dollars, Units Sold, Gross
Profit, Expense Amount, Net Income,
Unit Cost, Number of Employees,
Turnover, Salary, Tenure, etc.
Star Schema
Star Schema
7/28/2019 134294817 Data Warehousing
43/50
17 July 2013
Dimension
Tables
These tables describe the Facts
or Measures. These tables
contain the Attributes and may
also be Hierarchical.
These Dimensions answer the
questions of Who, What,
When, or Where.
Some Examples:
Day, Week, Month, Quarter, Year
Sales Person, Sales Manager, VP of Sales
Product, Product Category, Product Line
Cost Center, Unit, Segment, Business,
Company
Star Schema
Star Schema
7/28/2019 134294817 Data Warehousing
44/50
17 July 2013
Time_Dim
TimeKeyTheDate...
Sales_Fact
TimeKeyEmployeeKey
ProductKeyCustomerKeyShipperKey
Required Data(Business Metrics)or (Measures)...
Employee_Dim
EmployeeKeyEmployeeID..
.
Product_Dim
ProductKeyProductID...
Customer_Dim
CustomerKey
CustomerID...
Shipper_Dim
ShipperKey
ShipperID...
Star Schema
Star Schema
7/28/2019 134294817 Data Warehousing
45/50
17 July 2013
Particular form of a dimensional model
Central fact table containing Measures
Surrounded by one perimeter of descriptors - Dimensions
Star Schema
Snow Flake Schema
7/28/2019 134294817 Data Warehousing
46/50
17 July 2013
Complex dimensions are re-normalized
Different levels or hierarchies of a dimension are kept separate
Given dimension has relationship to other levels of samedimension
Snow Flake Schema
7/28/2019 134294817 Data Warehousing
47/50
Modeling ER Model
7/28/2019 134294817 Data Warehousing
48/50
17 July 2013
Modeling ER Model
In ER modeling, naming entities is important for an easy and clear understanding
and communications. Usually, the entity name is expressed grammatically in the
form of a noun rather than a verb. The criteria for selecting an entity name is how
well the name represents the characteristics and scope of the entity. In the
detailed ER model, defining a unique identifier of an entity is the most critical
task. These unique identifiers are called candidate keys. From them we can
select the key that is most commonly used to identify the entity. It is called the
primary key.
Another important concept in ER modeling is normalization. Normalization is a
process for assigning attributes to entities in a way that reduces data
redundancy, avoids data anomalies, provides a solid architecture for updating
data, and reinforces the long-term integrity of the data model. The third normal
form is usually adequate. A process for resolving the many-to-many relationshipsis an example of normalization.
Modeling Example of ER model
7/28/2019 134294817 Data Warehousing
49/50
17 July 2013
Modeling Example of ER model
7/28/2019 134294817 Data Warehousing
50/50