Date post: | 02-Apr-2018 |
Category: |
Documents |
Upload: | balasubramanian-sharma |
View: | 218 times |
Download: | 0 times |
of 64
7/27/2019 01 Introduction to DW
1/64
Systech Education
DW Concepts
Introduction to DW
Author: Deepak Natarajan &Aditya Gollapudi
7/27/2019 01 Introduction to DW
2/64
Introduction to DW Systech Education 2
Course Agenda
Session 1 What is DW?
Elements of DW
Session 2 OLTP vs. OLAP
Types of OLAP Session 3
Implementing Life Cycle
7/27/2019 01 Introduction to DW
3/64
Introduction to DW Systech Education 3
Purpose
The purpose of this module is to give an insightinto the basic concepts and terminology of data
warehousing and business intelligence
7/27/2019 01 Introduction to DW
4/64
Introduction to DW Systech Education 4
Objective
Upon completion of this chapter a participant can
Define a Data warehouse.
Understand the elements of DW.
Differentiate between OLTP and OLAP system.
Explain the types of OLAP system
Understand the implementation life cycle
7/27/2019 01 Introduction to DW
5/64
Systech Education
What is DW?
Data Warehouse
7/27/2019 01 Introduction to DW
6/64
Introduction to DW Systech Education 6
The ProblemIBM
VAX
Freds PC
Hunter #?
WLIC
WASTE
?
?
HP
SI ERS
?
7/27/2019 01 Introduction to DW
7/64
Introduction to DW Systech Education 7
The Solution
DATAWAREHOUSE
meta data
7/27/2019 01 Introduction to DW
8/64
Introduction to DW Systech Education 8
Data Warehouse
A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in
support of management's decision making process.
7/27/2019 01 Introduction to DW
9/64
Introduction to DW Systech Education 9
Data Warehouse
Subject Oriented:
Data that gives information about a particular subject
instead of about a company's ongoing operations.
Integrated:
Data that is gathered into the data warehouse from a
variety of sources and merged into a coherent whole.
7/27/2019 01 Introduction to DW
10/64
Introduction to DW Systech Education 10
Data Warehouse
Time-variant:
All data in the data warehouse is identified with a
particular time period.
Non-volatile
Data is stable in a data warehouse. More data is added
but data is never removed. This enables management togain a consistent picture of the business.
7/27/2019 01 Introduction to DW
11/64
Systech Education
Elements of DW
Source SystemStaging AreaDimension ModelData MartData Warehouse ODSEDWKimballs ApproachInmons Approach
7/27/2019 01 Introduction to DW
12/64
Introduction to DW Systech Education 12
Source System
An operational system of record whose function isto capture the transactions of the business.
Characteristics of Source System:
Priorities are uptime and availability.
Queries against source system are narrow and severely
restricted. Maintains little historical data.
7/27/2019 01 Introduction to DW
13/64
Introduction to DW Systech Education 13
Staging Area
It is a storage area and set of processes that clean,transform, combine, de-duplicate, household,
archive and prepare source data for use in the datawarehouse.
Characteristics of Staging Area:
It is layered between the source system and presentationserver.
It does not provide query and presentation services.
7/27/2019 01 Introduction to DW
14/64
Introduction to DW Systech Education 14
Dimensional Model
It is a technique for modeling data that is alternative toentity-relationship (E-R) modeling. A dimensional model
contains the same information as an E-R model butpackages the data in a symmetric format.
Components Of Dimensional Model Fact Table
Is the primary table in each dimensional model that is meant to contain
measurements of the business. Dimension Table
Each dimension is defined by its primary key that serves as the basisfor referential integrity with any given fact table.
7/27/2019 01 Introduction to DW
15/64
Introduction to DW Systech Education 15
Data Mart
A logical subset of the complete data warehouse. Adata mart is usually built for a single part of the
business and organized around single businessprocess.
Every data mart must be represented by a dimensional
model. Basis for top-down and bottom-up approach in data
warehouse.
7/27/2019 01 Introduction to DW
16/64
Introduction to DW Systech Education 16
Data Warehouse
The queryable source of data in the enterprise. Thedata warehouse is the union of all the constituent
data marts. Historical data is maintained.
Data is fed from the data staging area.
It is also frequently updated on a controlled load basis asdata is corrected, snapshots are accumulated and label arechanged.
7/27/2019 01 Introduction to DW
17/64
Introduction to DW Systech Education 17
Operational Data Store
An ODS is an integrated database of operationaldata. Its sources include legacy systems and it
contains current or near term data. An ODS maycontain 30 to 60 days of information, while a datawarehouse typically contains years of data.
7/27/2019 01 Introduction to DW
18/64
Introduction to DW Systech Education 18
Operational Data Store
An ODS is usually designed to contain low level oratomic (indivisible) data such as transactions and
prices.
Only Data warehouse contains aggregate data.
7/27/2019 01 Introduction to DW
19/64
Introduction to DW Systech Education 19
Enterprise Data Warehouse
An Enterprise Data Warehouse is a data warehousecontaining all publishable quality data of a
permanent nature collected by an organization. Thisinevitably includes historic data from multiple datasources.
7/27/2019 01 Introduction to DW
20/64
Introduction to DW Systech Education 20
Operational transaction data is usually excluded dueto its volatile nature. Enterprise data warehouses are
valuable resources, are costly to construct, andrequire a long time to evolve.
7/27/2019 01 Introduction to DW
21/64
Introduction to DW Systech Education 21
Kimballs Approach
Start with clearly defined user requirements.
Build a subject area at a time based on a star
schema. The data warehouse is the union of all the data
marts but only if the dimensions conform across allthe fact tables.
Before loading the facts and dimensions, firstconcentrate on the staging area.
7/27/2019 01 Introduction to DW
22/64
Introduction to DW Systech Education 22
Inmons Approach
Advocates normalized enterprise data warehouse.
Time variant data structures.
Dont be concerned about requirements too much
up front.
Build it and they will come.
7/27/2019 01 Introduction to DW
23/64
Systech Education
OLTP vs. OLAP
OLTPER Model
OLAPER Model
Relationship between ER Model and Dimension ModelAdvantages of Dimension Modeling
7/27/2019 01 Introduction to DW
24/64
Introduction to DW Systech Education 24
OLTP - ER Model
Logical design technique that seeks to eliminatedata redundancy
conceptual data model that views the real world asentities and relationships
Entity-Relationship diagram is used to visually
represents data objects
7/27/2019 01 Introduction to DW
25/64
Introduction to DW Systech Education 25
Facts about ER Models
The ER modeling technique is a discipline used toilluminate the microscopic relationships among data
elements.
The highest art form of ER modeling is to removeall redundancy in the data.
This is immensely beneficial to transactionprocessing because transactions are made verysimple and deterministic.
7/27/2019 01 Introduction to DW
26/64
Introduction to DW Systech Education 26
Example:- The transaction of updating a customer'saddress may devolve to a single record lookup in a
customer address master table. This lookup iscontrolled by a customer address key, which definesuniqueness of the customer address record andallows an indexed lookup that is extremely fast.
It is safe to say that the success of transactionprocessing in relational databases is mostly due tothe discipline of ER modeling
7/27/2019 01 Introduction to DW
27/64
Introduction to DW Systech Education 27
7/27/2019 01 Introduction to DW
28/64
Introduction to DW Systech Education 28
OLAPDimensional Modeling
Logical design technique that seeks to present thedata in a standard framework that is intuitive and
allows for high performance access.
Every dimensional model is composed of
One table with a multi part key called the fact table
A set of smaller tables called dimension tables
7/27/2019 01 Introduction to DW
29/64
Introduction to DW Systech Education 29
7/27/2019 01 Introduction to DW
30/64
Introduction to DW Systech Education 30
The Relation between Dimensional
Modeling and Entity-RelationshipModeling
A single entity relationship diagram breaks downinto multiple fact table diagrams
ER diagrams are useful, but they are meant to be
viewed in small sections, not all at once.
7/27/2019 01 Introduction to DW
31/64
Introduction to DW Systech Education 31
OLTP OLAP
OLTP is a class of program thatfacilitates and manages transaction-oriented applications, typically for
data entry and retrieval transactions
OLAP enables a user to easily andselectively extract and view datafrom different points-of-view.
Source of data:
Operational data; OLTPs are theoriginal source of the data
Source of data:
Consolidation data;
OLAP data comes from the various
OLTP databases
7/27/2019 01 Introduction to DW
32/64
Introduction to DW Systech Education 32
OLTP OLAP
Purpose of Data:
To control and run
fundamental business tasks
Purpose of Data:
To help with planning,
problem solving, and
decision support
What the data reveals:
A snapshot of ongoing business
processes
What the data reveals:
Multi-dimensional views of variouskinds of business activities
7/27/2019 01 Introduction to DW
33/64
Introduction to DW Systech Education 33
OLTP OLAP
Inserts and Updates:
Short and fast inserts
and updates initiated
by end users
Inserts and Updates:
Short and fast inserts
and updates initiated
by end users
Queries: Relatively standardized
and simple queries returning
relatively few records
Queries: Often complex queries
involving aggregations
7/27/2019 01 Introduction to DW
34/64
Introduction to DW Systech Education 34
OLTP OLAP
Processing speed: Typically very fast Processing speed: Depends on the
amount of data involved;
batch data refreshes and
complex queries may
take many hours;query speed can be
improved by creating indexes
Space requirements: Can berelatively small
if historical data is archived
Space requirements: Larger due tothe existence of aggregationstructures and history data;
requires more indexes
than OLTP
7/27/2019 01 Introduction to DW
35/64
Introduction to DW Systech Education 35
OLTP OLAP
Database design: Highlynormalized with many tables
Database design: Typically de-normalized with fewer tables;
use of star and/or snowflake
schemas
Data Access:Very frequent access;small quantities of data peroperation;
Reading, writing,
modifying, deletion
Data Access: Moderate accessfrequency;large quantities of data;
predominantly reading
operations
7/27/2019 01 Introduction to DW
36/64
Introduction to DW Systech Education 36
OLTP OLAP
Backup and recovery: Backupreligiously; operational data iscritical to run the business,
data loss is likely to entailsignificant monetary
loss and legal liability
Backup and recovery: Instead ofregular backups,
some environments
may consider simply
reloading the OLTP data
as a recovery method
7/27/2019 01 Introduction to DW
37/64
Introduction to DW Systech Education 37
Steps Involved in Converting ER Model
to Dimensional Model The first step in converting an ER diagram to a set of DM
diagrams is to separate the ER diagram into its discrete
business processes and to model each one separately. The second step is to select those many-to-many
relationships in the ER model containing numeric andadditive nonkey facts and to designate them as fact tables.
The third step is to denormalize all of the remaining tablesinto flat tables with single-part keys that connect directly tothe fact tables. These tables become the dimension tables.
7/27/2019 01 Introduction to DW
38/64
Introduction to DW Systech Education 38
Structure of a Dimensional Model evolved
from an ER Model The master DM model of a data warehouse for a
large enterprise will consist of somewhere between
10 and 25 very similar-looking star join schemas. Each star join will have four to 12 dimension tables.
If the design has been done correctly, many of these
dimension tables will be shared from fact table tofact table.
7/27/2019 01 Introduction to DW
39/64
Introduction to DW Systech Education 39
Dimension Model and Drilling
Drilling Down means adding more dimensionattributes to the SQL answer set from within a
single star join. Drilling Up means removing more dimension
attributes from the SQL answer set within a singlestar join.
Drilling Across means linking separate fact tablestogether through the conformed (shared)dimensions.
7/27/2019 01 Introduction to DW
40/64
Introduction to DW Systech Education 40
Advantages of Dimensional Modeling
predictable, standard framework
predictable framework of the star join schema
withstands unexpected changes in user behavior
extensible to accommodate unexpected new dataelements and new design decisions
administrative utilities and software processes helpsto manage and use aggregates
7/27/2019 01 Introduction to DW
41/64
Systech Education
Types of OLAP
ROLAP
MOLAPHOLAP
7/27/2019 01 Introduction to DW
42/64
Introduction to DW Systech Education 42
Types of OLAP
ROLAP Relational OLAP. ROLAP systems store data in the
relational database. MOLAP
Multidimensional OLAP. MOLAP systems store data inthe multidimensional cubes.
HOLAP HOLAP technologies attempt to combine the advantages
of MOLAP and ROLAP.
7/27/2019 01 Introduction to DW
43/64
Introduction to DW Systech Education 43
ROLAP
Relational Viewers
SQL-
Read
SourceSystems
DataWarehouseServer
Clients
Relational Data
7/27/2019 01 Introduction to DW
44/64
Introduction to DW Systech Education 44
MOLAP
Multi Dimensional Viewers
Multi Dimensional Data
RDBMSServer
MDBMSServer
Clients
SQL-Read
7/27/2019 01 Introduction to DW
45/64
Introduction to DW Systech Education 45
7/27/2019 01 Introduction to DW
46/64
Introduction to DW Systech Education 46
HOLAP
Multi Dimensional Viewers
Multi Dimensional Data
SQL ReachTrough
RDBMSServer
MDBMSServer
Clients
7/27/2019 01 Introduction to DW
47/64
Systech Education
Implementing Life Cycle
PlanningBusiness Requirements DefinitionDimensional Modeling
Physical DesignData Staging Design and DevelopmentTechnical Architecture Design
7/27/2019 01 Introduction to DW
48/64
Systech Education
Implementing Life Cycle
(Contd..)
Product Selection and InstallationEnd User Application SpecificationEnd User Application Development
DeploymentMaintenance and GrowthProject Management
7/27/2019 01 Introduction to DW
49/64
Introduction to DW Systech Education 49
Lifecycle Approach
Successful implementation of a data warehousedepends on the appropriate integration of
numerous tasks and components. You need to coordinate the many facets of a data
warehouse and demonstrate strength across allaspects of the project for success.
The Business Dimensional Lifecycle ensures thatthe project pieces are brought together in the rightorder and at the right time.
7/27/2019 01 Introduction to DW
50/64
Introduction to DW Systech Education 50
Business Dimensional Lifecycle
ProjectPlanning
BusinessRequirementDefinition
TechnicalArchitecture
Design
DimensionalModeling
End-UserApplication
Specification
ProductSelection &Installation
PhysicalDesign
End-UserApplication
Development
Data StagingDesign &
Development
Deployment Maintenance& Growth
Project Management
Technology
Track
Data
Track
Application
Track
7/27/2019 01 Introduction to DW
51/64
Introduction to DW Systech Education 51
Project Planning
The lifecycle begins with project planning.
It addresses the definition and scoping of the data
warehouse project, including early critical tasks likereadiness assessment and business justification.
Then project planning focuses on resource and skill-levelstaffing requirements coupled with project task
assignments, duration and sequencing. Project planning is dependent on the business requirements,
as denoted by the two-way arrow between the activities.
7/27/2019 01 Introduction to DW
52/64
Introduction to DW Systech Education 52
Business Requirements Definition
A data warehouses success is greatly increased by a sound
understanding of the business end users and their analytical
requirements. The designers must understand the key factors driving the
business to effectively determine business requirements andtranslate them into design considerations.
The business requirements establish the foundation for thethree parallel tracks focused on technology, data and enduser applications.
7/27/2019 01 Introduction to DW
53/64
Introduction to DW Systech Education 53
Dimension Modeling
The definition of the business requirementsdetermines the data needed to address business
users analytical requirements. Then data models to support these analyses are
designed by the construction a matrix thatrepresents key business processes and their
dimensionality which will serve as a blueprint toensure that the data warehouse is extensible acrossthe organization over time.
7/27/2019 01 Introduction to DW
54/64
Introduction to DW Systech Education 54
Dimension Modeling (Cont.)
Coupling this data analysis with our earlierunderstanding of the business requirements, we
then develop a dimensional model with a fact tablegrain, associated dimensions, attributes, andhierarchical drill paths and facts.
The logical database design is completed with the
appropriate table structure and primary/foreign keyrelationships and also the preliminary aggregationplan is also developed.
7/27/2019 01 Introduction to DW
55/64
Introduction to DW Systech Education 55
Physical Design
Physical database design focuses on defining thephysical structures necessary to support the logical
database design. Primary elements include defining the naming
standards and setting up the database environment.
Preliminary indexing and partitioning strategies arealso determined.
7/27/2019 01 Introduction to DW
56/64
Introduction to DW Systech Education 56
Data Staging Design & Development
The data staging process has three major steps:
Extraction, Transformation and Load
The extract process always exposes data quality issues thathave been buried within the operational data store.
If not addressed properly they can significantly impact thecredibility of the data warehouse.
Also two warehouse staging processes need to be build: One for the initial population of the data warehouse.
Another for the regular, incremental loads.
7/27/2019 01 Introduction to DW
57/64
Introduction to DW Systech Education 57
Technical Architecture Design
The technical architecture design establishes theoverall architecture framework and vision.
Three factors must be considered simultaneously toestablish the data warehouse technical architecturedesign:
Business requirements, current technical environment,and planned strategic technical directions.
7/27/2019 01 Introduction to DW
58/64
Introduction to DW Systech Education 58
Product Selection and Installation
Using the technical architecture design as framework,specific architectural components such as the hardwareplatform, database management system, data staging tool,or data access tool will need to be evaluated and selected.
A standard technical evaluation process is defined alongwith specific evaluation factors for each architecturalcomponent.
Once the product has been evaluated and selected, they arethen installed and thoroughly tested to ensure end-to-endintegration with the data warehouse environment.
7/27/2019 01 Introduction to DW
59/64
Introduction to DW Systech Education 59
End User Application Specification
A set of standard end user application is usuallydefined since not all business users need ad hoc
access to the data warehouse. Application specification describe the report
template, user driven parameters, and requiredcalculations.
These specifications ensure that the developmentteam and the business users have a commonunderstanding of the applications to be delivered.
7/27/2019 01 Introduction to DW
60/64
Introduction to DW Systech Education 60
End User Application Development
The development of the end user applicationsinvolves configuring the tool metadata and
constructing the specified reports. Optimally these applications are build using an
advanced data access tool that provides significantproductivity gains for the development team.
In addition, it offers a powerful mechanism forbusiness users to easily modify existing reporttemplates.
7/27/2019 01 Introduction to DW
61/64
Introduction to DW Systech Education 61
Deployment
Deployment represents the convergence of technology,data, and end user applications accessible from the businessusers desktop.
Extensive planning is required to ensure that these puzzlepieces fit together properly.
Business users education integrating all aspects of theconvergence must be developed and delivered.
In addition, user support and communication or feedbackstrategies should be established before any business usershave access to the data warehouse.
7/27/2019 01 Introduction to DW
62/64
Introduction to DW Systech Education 62
Maintenance & Growth
Focus on the business users must continue byproviding them with ongoing support and
education. Also focus on the backroom must continue to
ensure that the processes and procedures are in
place for effective ongoing operations of the datawarehouse.
7/27/2019 01 Introduction to DW
63/64
Introduction to DW Systech Education 63
Project Management
Used to ensure that Business Dimensional Lifecycleactivities remain on track and in sync.
Activities include: Monitoring project status
Issue tracking
Change control to preserve scope boundaries
The development of a comprehensive projectcommunication plan that addresses both the businessand information systems organizations
7/27/2019 01 Introduction to DW
64/64
Guidelines for using the Business
Dimension Life Cycle Business Lifecycle diagram identifies:
High level task sequencing
Activities that should be happening concurrentlythroughout the technology data
Application tracts
Focus on sequencing and concurrency and notabsolute timelines.