Date post: | 02-Jun-2018 |
Category: |
Documents |
Upload: | puneetha89 |
View: | 241 times |
Download: | 0 times |
of 37
8/10/2019 Introduction DWH OLAP
1/37
Decision Support, DataWarehousing, and OLAP
By Prof. Sham Navathe
Georgia Institute of Technology
(Courtesy : Prof. Anindya Datta)
Extensions by Svetlana Mansmann
University of Konstanz
8/10/2019 Introduction DWH OLAP
2/37
Outline Terminology: OLAP vs. OLTP
Data Warehousing Architecture
Technologies Products
References
8/10/2019 Introduction DWH OLAP
3/37
Decision Support and OLAP
Information technology to help the knowledge worker
(executive, manager, analyst) make faster and better decisions
What were the sales volumes by region and product category for
the last year? How did the share price of computer manufacturers correlate
with quarterly profits over the past 10 years?
Which orders should we fill to maximize revenues?
Will a 10% discount increase sales volume sufficiently?
Which of two new medications will result in the best outcome:
higher recovery rate & shorter hospital stay?
On-Line Analytical Processing (OLAP) is an element of
decision support systems (DSS)
8/10/2019 Introduction DWH OLAP
4/37
Business Intelligence
8/10/2019 Introduction DWH OLAP
5/37
Evolution 60s: Batch reports
hard to find and analyze information
inflexible and expensive, reprogram every new request
70s: Terminal-based DSS and EIS (executive informationsystems) still inflexible, not integrated with desktop tools
80s: Desktop data access and analysis tools
query tools, spreadsheets, GUIs
easier to use, but only access operational databases
90s: Data warehousing with integrated OLAP enginesand tools
2000s: Personalization engines and e-commerce
8/10/2019 Introduction DWH OLAP
6/37
OLTP vs. OLAP
Clerk, IT Professional
Day to day operations
Application-oriented (E-Rbased)
Current, Isolated
Detailed, Flat relational
Structured, Repetitive
Short, Simple transaction
Read/write
Index/hash on prim. Key
Tens
Thousands
100 MB-GB
Trans. throughput
Knowledge worker
Decision support
Subject-oriented (Star,snowflake)
Historical, Consolidated
Summarized, Multidimensional
Ad hoc
Complex query
Read Mostly
Lots of Scans
Millions
Hundreds
100 GB-TB
Query throughput, response
User
Function
DB Design
Data
View
Usage
Unit of work
Access
Operations
# Records accessed
#Users
Db size
Metric
OLTP OLAP
8/10/2019 Introduction DWH OLAP
7/37
Data Warehouse A decision support database that is maintained
separately from the organizations operationaldatabases.
A data warehouse is a
subject-oriented,
integrated,
time-varying,
non-volatile
A collection of data that is used primarily inorganizational decision making
8/10/2019 Introduction DWH OLAP
8/37
Why Separate Data Warehouse?
Performance
Operational databases designed & tuned for known
taxes & workloads
Complex OLAP queries would degrade performance,
taxing operations
Special data organization, access & implementation
methods needed for multidimensional views & queries
8/10/2019 Introduction DWH OLAP
9/37
Why Separate Data Warehouse? Function
Missing data: Decision support requires historical data,
which operational databases do not typically maintain Data consolidation: Decision support requires
consolidation (aggregation, summarization) of data
from many heterogeneous sources: operational
databases, external sources.
Data quality: Different sources typically use
inconsistent data representations, codes, and formats
which have to be reconciled.
8/10/2019 Introduction DWH OLAP
10/37
Data Warehousing / OLAP Market
8/10/2019 Introduction DWH OLAP
11/37
Data Warehousing / OLAP Market
8/10/2019 Introduction DWH OLAP
12/37
Data Warehousing Market
8/10/2019 Introduction DWH OLAP
13/37
Data Warehousing Architecture
8/10/2019 Introduction DWH OLAP
14/37
Three-Tier Architecture Warehouse database server
Almost always a relational DBMS; rarely flat files
OLAP servers
Relational OLAP (ROLAP): extended relational DBMS thatmaps operations on multidimensional data to standard relationaloperations.
Multidimensional OLAP (MOLAP): special purpose server thatdirectly implements multidimensional data and operations.
Clients Query and reporting tools
Analysis tools
Data mining tools (e.g., trend analysis, prediction)
8/10/2019 Introduction DWH OLAP
15/37
8/10/2019 Introduction DWH OLAP
16/37
Data Warehouse vs. Data Marts Enterprise warehouse: collects all information about subjects
(customers, products, sales, assets, personnel) that span the entire
organization.
Requires extensive business modeling May take years to design and build
Data Marts: Departmental subsets that focus on selected subjects:
Marketing data mart: customer, products, sales.
Faster roll out, but complex integration in the long run
Virtual warehouse: views over operational DBs Materialize some summary views for efficient query processing
Easier to build
Requisite excess capacity on operational DB servers
8/10/2019 Introduction DWH OLAP
17/37
Design & Operational Process Define architecture. Do capacity planning.
Integrate DB and OLAP servers, storage and client tools.
Design warehouse schema, views.
Design physical warehouse organization: data placement, partitioning,
access methods.
Connect sources: gateways, ODBC drivers, wrappers.
Design & implement scripts for data extract, load refresh.
Define metadata and populate repository. Design & implement end-user applications.
Roll out warehouse and applications.
Monitor the warehouse.
8/10/2019 Introduction DWH OLAP
18/37
OLAP for Decision Support Goal of OLAP is to support ad-hoc querying for the
business analyst
Business analysts are familiar with spreadsheets
Extend spreadsheet analysis model to work with
warehouse data
Large data set
Semantically enriched to understand business terms (e.g., time,
geography)
Combined with reporting features
Multidimensional view of data is the foundation of OLAP
8/10/2019 Introduction DWH OLAP
19/37
OLAP for Decision Support Pivot table - a multidimensional spreadsheet
8/10/2019 Introduction DWH OLAP
20/37
Multidimensional Data Model Database is a set offacts(points) in a multidimensional
space
A fact has a measuredimension
quantity that is analyzed, e.g., sale, budget
A set of dimensions on which data is analyzed
e.g. , store, product, date associated with a sale amount
Dimensions form a sparsely populated coordinate system
Each dimension has a set of attributes
e.g., owner city and county of store
Attributes of a dimension may be related by partial order
Hierarchy: e.g., street > county >city
Lattice: e.g., date> month>year, date>week>year
8/10/2019 Introduction DWH OLAP
21/37
Multidimensional Data
10
47
30
12
Juice
Cola
Milk
Cream
Sales volume
as a function
of date, city
and product
3/1 3/2 3/3 3/4
Date
Pr
oduct
8/10/2019 Introduction DWH OLAP
22/37
Sample Data Cube
Diploma
B.Sc.
M.Sc.
Term
1st 2nd 3rd 4th
Country
Germany
Switzerland
U.S.A.
German students
in the 4th term
pursuing a diploma
Country
Germany
Switzerland
U.S.A.
8/10/2019 Introduction DWH OLAP
23/37
Operations in MultidimensionalData Model Aggregation (roll-up)
dimension reduction: e.g., total sales by city
summarization over aggregate hierarchy: e.g., total sales by cityand year -> total sales by region and by year
Navigation to detailed data (drill-down)
e.g., (sales - expense) by city, top 3% of cities by average income
Selection (slice) defines a subcube e.g., sales where city = Palo Alto and date = 1/15/96
Visualization Operations (e.g., Pivot)
8/10/2019 Introduction DWH OLAP
24/37
A Visual Operation: Pivot(Rotate)
10
47
30
12
Juice
Cola
Milk
Cream
3/1 3/2 3/3 3/4
Product
8/10/2019 Introduction DWH OLAP
25/37
Approaches to OLAP Servers Relational OLAP (ROLAP)
Relational and Specialized Relational DBMS to store and manage
warehouse data
OLAP middleware to support missing pieces Optimize for each DBMS backend
Aggregation Navigation Logic
Additional tools and services
Multidimensional OLAP (MOLAP)
Array-based storage structures
Direct access to array data structures
Domain-specific enrichment
8/10/2019 Introduction DWH OLAP
26/37
Relational DBMS as WarehouseServer Schema design
Specialized scan, indexing and join techniques
Handling of aggregate views (querying and
materialization)
Supporting query language extensions beyond
SQL Complex query processing and optimization
Data partitioning and parallelism
8/10/2019 Introduction DWH OLAP
27/37
Warehouse Database Schema ER design techniques not appropriate
Design should reflect multidimensional
view
Star Schema
Snowflake Schema
Fact Constellation Schema
8/10/2019 Introduction DWH OLAP
28/37
Example of a Star SchemaOrder No
Order Date
Customer No
Customer Name
Customer
Address
City
SalespersonID
SalespersonName
City
Quota
OrderNO
SalespersonID
CustomerNO
ProdNo
DateKeyCityName
Quantity
Total Price
ProductNO
ProdName
ProdDescr
Category
CategoryDescription
UnitPrice
DateKeyDate
CityName
State
Country
Order
Customer
Salesperson
City
Date
Product
Fact Table
8/10/2019 Introduction DWH OLAP
29/37
Star Schema A single fact table and a single table for each dimension
Every fact points to one tuple in each of the dimensions
and has additional attributes
Does not capture hierarchies directly
Generated keys are used for performance and maintenance
reasons
Fact constellation: Multiple Fact tables that share manydimension tables
Example: Projected expense and the actual expense may share
dimensional tables
8/10/2019 Introduction DWH OLAP
30/37
Example of a Snowflake SchemaOrder No
Order Date
Customer No
Customer Name
Customer
Address
City
SalespersonID
SalespersonName
City
Quota
OrderNO
SalespersonID
CustomerNO
ProdNo
DateKey
CityName
Quantity
Total Price
ProductNO
ProdName
ProdDescr
Category
Category
UnitPrice
DateKeyDate
Month
CityName
State
Country
Order
Customer
Salesperson
City
Date
Product
Fact Table
CategoryName
CategoryDescr
Month
YearYear
StateName
Country
Category
State
Month
Year
8/10/2019 Introduction DWH OLAP
31/37
Snowflake Schema Represent dimensional hierarchy directly by
normalizing the dimension tables
Easy to maintain Saves storage, but is alleged that it reduces
effectiveness of browsing (Kimball)
Galaxy schema: multiple fact tables with
shared dimension categories
8/10/2019 Introduction DWH OLAP
32/37
Population & Refreshing theWarehouse
Data extraction
Data cleaning
Data transformation
Convert from legacy/host format to warehouse format
Load
Sort, summarize, consolidate, compute views, checkintegrity, build indexes, partition
Refresh
Propagate updates from sources to the warehouse
8/10/2019 Introduction DWH OLAP
33/37
Metadata Repository Administrative metadata
source databases and their contents
gateway descriptions
warehouse schema, view & derived data definitions dimensions, hierarchies
pre-defined queries and reports
data mart locations and contents
data partitions
data extraction, cleansing, transformation rules, defaults
data refresh and purging rules
user profiles, user groups
security: user authorization, access control
8/10/2019 Introduction DWH OLAP
34/37
Metadata Repository .. 2 Business data
business terms and definitions
ownership of data charging policies
operational metadata
data lineage: history of migrated data and sequence of
transformations applied
currency of data: active, archived, purged
monitoring information: warehouse usage statistics, error
reports, audit trails.
8/10/2019 Introduction DWH OLAP
35/37
Warehouse Design Tools Creating and managing a warehouse is hard
Development tools
defining & editing metadata repository contents (schemas, scripts,
rules) Queries and reports
Shipping metadata to and from RDBMS catalogue (e.g., PrismWarehouse Manager)
Planning & analysis tools
impact of schema changes
capacity planning
refresh performance: changing refresh rates or time windows
8/10/2019 Introduction DWH OLAP
36/37
Warehouse Management Tools Monitoring and reporting tools (e.g., HP Intelligent
Warehouse Advisor)
which partitions, summary tables, columns are used
query execution times for summary tables, types & frequencies of roll downs
warehouse usage over time (detect peak periods)
Systems and network management tools (e.g., HPOpenView, IBM NetView, Tivoli): traffic, utilization
Exception reporting/alerting tools 9e.g., DB2 EventAlerters, Information Advantage InfoAgents & InfoAlert)
runaway queries
Analysis/Visualization tools: OLAP on metadata
8/10/2019 Introduction DWH OLAP
37/37
OLAP Tools Existing Tools: Seagate, Brio, Cognos
Functionality:
- Choice of tables
- Allowing user to specify interrelation relationships
- Use of filtering conditions
- Construction of cubes on the fly
Main Problems:Cost per license, poor semantics of aggregations across
tables, performance for multiple dimension cubes
Visual OLAP Tool Tableau:
htt // t bl ft / t ht
http://www.tableausoftware.com/ptour.htmhttp://www.tableausoftware.com/ptour.htm