+ All Categories
Home > Documents > Introduction DWH OLAP

Introduction DWH OLAP

Date post: 02-Jun-2018
Category:
Upload: puneetha89
View: 241 times
Download: 0 times
Share this document with a friend

of 37

Transcript
  • 8/10/2019 Introduction DWH OLAP

    1/37

    Decision Support, DataWarehousing, and OLAP

    By Prof. Sham Navathe

    Georgia Institute of Technology

    (Courtesy : Prof. Anindya Datta)

    Extensions by Svetlana Mansmann

    University of Konstanz

  • 8/10/2019 Introduction DWH OLAP

    2/37

    Outline Terminology: OLAP vs. OLTP

    Data Warehousing Architecture

    Technologies Products

    References

  • 8/10/2019 Introduction DWH OLAP

    3/37

    Decision Support and OLAP

    Information technology to help the knowledge worker

    (executive, manager, analyst) make faster and better decisions

    What were the sales volumes by region and product category for

    the last year? How did the share price of computer manufacturers correlate

    with quarterly profits over the past 10 years?

    Which orders should we fill to maximize revenues?

    Will a 10% discount increase sales volume sufficiently?

    Which of two new medications will result in the best outcome:

    higher recovery rate & shorter hospital stay?

    On-Line Analytical Processing (OLAP) is an element of

    decision support systems (DSS)

  • 8/10/2019 Introduction DWH OLAP

    4/37

    Business Intelligence

  • 8/10/2019 Introduction DWH OLAP

    5/37

    Evolution 60s: Batch reports

    hard to find and analyze information

    inflexible and expensive, reprogram every new request

    70s: Terminal-based DSS and EIS (executive informationsystems) still inflexible, not integrated with desktop tools

    80s: Desktop data access and analysis tools

    query tools, spreadsheets, GUIs

    easier to use, but only access operational databases

    90s: Data warehousing with integrated OLAP enginesand tools

    2000s: Personalization engines and e-commerce

  • 8/10/2019 Introduction DWH OLAP

    6/37

    OLTP vs. OLAP

    Clerk, IT Professional

    Day to day operations

    Application-oriented (E-Rbased)

    Current, Isolated

    Detailed, Flat relational

    Structured, Repetitive

    Short, Simple transaction

    Read/write

    Index/hash on prim. Key

    Tens

    Thousands

    100 MB-GB

    Trans. throughput

    Knowledge worker

    Decision support

    Subject-oriented (Star,snowflake)

    Historical, Consolidated

    Summarized, Multidimensional

    Ad hoc

    Complex query

    Read Mostly

    Lots of Scans

    Millions

    Hundreds

    100 GB-TB

    Query throughput, response

    User

    Function

    DB Design

    Data

    View

    Usage

    Unit of work

    Access

    Operations

    # Records accessed

    #Users

    Db size

    Metric

    OLTP OLAP

  • 8/10/2019 Introduction DWH OLAP

    7/37

    Data Warehouse A decision support database that is maintained

    separately from the organizations operationaldatabases.

    A data warehouse is a

    subject-oriented,

    integrated,

    time-varying,

    non-volatile

    A collection of data that is used primarily inorganizational decision making

  • 8/10/2019 Introduction DWH OLAP

    8/37

    Why Separate Data Warehouse?

    Performance

    Operational databases designed & tuned for known

    taxes & workloads

    Complex OLAP queries would degrade performance,

    taxing operations

    Special data organization, access & implementation

    methods needed for multidimensional views & queries

  • 8/10/2019 Introduction DWH OLAP

    9/37

    Why Separate Data Warehouse? Function

    Missing data: Decision support requires historical data,

    which operational databases do not typically maintain Data consolidation: Decision support requires

    consolidation (aggregation, summarization) of data

    from many heterogeneous sources: operational

    databases, external sources.

    Data quality: Different sources typically use

    inconsistent data representations, codes, and formats

    which have to be reconciled.

  • 8/10/2019 Introduction DWH OLAP

    10/37

    Data Warehousing / OLAP Market

  • 8/10/2019 Introduction DWH OLAP

    11/37

    Data Warehousing / OLAP Market

  • 8/10/2019 Introduction DWH OLAP

    12/37

    Data Warehousing Market

  • 8/10/2019 Introduction DWH OLAP

    13/37

    Data Warehousing Architecture

  • 8/10/2019 Introduction DWH OLAP

    14/37

    Three-Tier Architecture Warehouse database server

    Almost always a relational DBMS; rarely flat files

    OLAP servers

    Relational OLAP (ROLAP): extended relational DBMS thatmaps operations on multidimensional data to standard relationaloperations.

    Multidimensional OLAP (MOLAP): special purpose server thatdirectly implements multidimensional data and operations.

    Clients Query and reporting tools

    Analysis tools

    Data mining tools (e.g., trend analysis, prediction)

  • 8/10/2019 Introduction DWH OLAP

    15/37

  • 8/10/2019 Introduction DWH OLAP

    16/37

    Data Warehouse vs. Data Marts Enterprise warehouse: collects all information about subjects

    (customers, products, sales, assets, personnel) that span the entire

    organization.

    Requires extensive business modeling May take years to design and build

    Data Marts: Departmental subsets that focus on selected subjects:

    Marketing data mart: customer, products, sales.

    Faster roll out, but complex integration in the long run

    Virtual warehouse: views over operational DBs Materialize some summary views for efficient query processing

    Easier to build

    Requisite excess capacity on operational DB servers

  • 8/10/2019 Introduction DWH OLAP

    17/37

    Design & Operational Process Define architecture. Do capacity planning.

    Integrate DB and OLAP servers, storage and client tools.

    Design warehouse schema, views.

    Design physical warehouse organization: data placement, partitioning,

    access methods.

    Connect sources: gateways, ODBC drivers, wrappers.

    Design & implement scripts for data extract, load refresh.

    Define metadata and populate repository. Design & implement end-user applications.

    Roll out warehouse and applications.

    Monitor the warehouse.

  • 8/10/2019 Introduction DWH OLAP

    18/37

    OLAP for Decision Support Goal of OLAP is to support ad-hoc querying for the

    business analyst

    Business analysts are familiar with spreadsheets

    Extend spreadsheet analysis model to work with

    warehouse data

    Large data set

    Semantically enriched to understand business terms (e.g., time,

    geography)

    Combined with reporting features

    Multidimensional view of data is the foundation of OLAP

  • 8/10/2019 Introduction DWH OLAP

    19/37

    OLAP for Decision Support Pivot table - a multidimensional spreadsheet

  • 8/10/2019 Introduction DWH OLAP

    20/37

    Multidimensional Data Model Database is a set offacts(points) in a multidimensional

    space

    A fact has a measuredimension

    quantity that is analyzed, e.g., sale, budget

    A set of dimensions on which data is analyzed

    e.g. , store, product, date associated with a sale amount

    Dimensions form a sparsely populated coordinate system

    Each dimension has a set of attributes

    e.g., owner city and county of store

    Attributes of a dimension may be related by partial order

    Hierarchy: e.g., street > county >city

    Lattice: e.g., date> month>year, date>week>year

  • 8/10/2019 Introduction DWH OLAP

    21/37

    Multidimensional Data

    10

    47

    30

    12

    Juice

    Cola

    Milk

    Cream

    Sales volume

    as a function

    of date, city

    and product

    3/1 3/2 3/3 3/4

    Date

    Pr

    oduct

  • 8/10/2019 Introduction DWH OLAP

    22/37

    Sample Data Cube

    Diploma

    B.Sc.

    M.Sc.

    Term

    1st 2nd 3rd 4th

    Country

    Germany

    Switzerland

    U.S.A.

    German students

    in the 4th term

    pursuing a diploma

    Country

    Germany

    Switzerland

    U.S.A.

  • 8/10/2019 Introduction DWH OLAP

    23/37

    Operations in MultidimensionalData Model Aggregation (roll-up)

    dimension reduction: e.g., total sales by city

    summarization over aggregate hierarchy: e.g., total sales by cityand year -> total sales by region and by year

    Navigation to detailed data (drill-down)

    e.g., (sales - expense) by city, top 3% of cities by average income

    Selection (slice) defines a subcube e.g., sales where city = Palo Alto and date = 1/15/96

    Visualization Operations (e.g., Pivot)

  • 8/10/2019 Introduction DWH OLAP

    24/37

    A Visual Operation: Pivot(Rotate)

    10

    47

    30

    12

    Juice

    Cola

    Milk

    Cream

    3/1 3/2 3/3 3/4

    Product

  • 8/10/2019 Introduction DWH OLAP

    25/37

    Approaches to OLAP Servers Relational OLAP (ROLAP)

    Relational and Specialized Relational DBMS to store and manage

    warehouse data

    OLAP middleware to support missing pieces Optimize for each DBMS backend

    Aggregation Navigation Logic

    Additional tools and services

    Multidimensional OLAP (MOLAP)

    Array-based storage structures

    Direct access to array data structures

    Domain-specific enrichment

  • 8/10/2019 Introduction DWH OLAP

    26/37

    Relational DBMS as WarehouseServer Schema design

    Specialized scan, indexing and join techniques

    Handling of aggregate views (querying and

    materialization)

    Supporting query language extensions beyond

    SQL Complex query processing and optimization

    Data partitioning and parallelism

  • 8/10/2019 Introduction DWH OLAP

    27/37

    Warehouse Database Schema ER design techniques not appropriate

    Design should reflect multidimensional

    view

    Star Schema

    Snowflake Schema

    Fact Constellation Schema

  • 8/10/2019 Introduction DWH OLAP

    28/37

    Example of a Star SchemaOrder No

    Order Date

    Customer No

    Customer Name

    Customer

    Address

    City

    SalespersonID

    SalespersonName

    City

    Quota

    OrderNO

    SalespersonID

    CustomerNO

    ProdNo

    DateKeyCityName

    Quantity

    Total Price

    ProductNO

    ProdName

    ProdDescr

    Category

    CategoryDescription

    UnitPrice

    DateKeyDate

    CityName

    State

    Country

    Order

    Customer

    Salesperson

    City

    Date

    Product

    Fact Table

  • 8/10/2019 Introduction DWH OLAP

    29/37

    Star Schema A single fact table and a single table for each dimension

    Every fact points to one tuple in each of the dimensions

    and has additional attributes

    Does not capture hierarchies directly

    Generated keys are used for performance and maintenance

    reasons

    Fact constellation: Multiple Fact tables that share manydimension tables

    Example: Projected expense and the actual expense may share

    dimensional tables

  • 8/10/2019 Introduction DWH OLAP

    30/37

    Example of a Snowflake SchemaOrder No

    Order Date

    Customer No

    Customer Name

    Customer

    Address

    City

    SalespersonID

    SalespersonName

    City

    Quota

    OrderNO

    SalespersonID

    CustomerNO

    ProdNo

    DateKey

    CityName

    Quantity

    Total Price

    ProductNO

    ProdName

    ProdDescr

    Category

    Category

    UnitPrice

    DateKeyDate

    Month

    CityName

    State

    Country

    Order

    Customer

    Salesperson

    City

    Date

    Product

    Fact Table

    CategoryName

    CategoryDescr

    Month

    YearYear

    StateName

    Country

    Category

    State

    Month

    Year

  • 8/10/2019 Introduction DWH OLAP

    31/37

    Snowflake Schema Represent dimensional hierarchy directly by

    normalizing the dimension tables

    Easy to maintain Saves storage, but is alleged that it reduces

    effectiveness of browsing (Kimball)

    Galaxy schema: multiple fact tables with

    shared dimension categories

  • 8/10/2019 Introduction DWH OLAP

    32/37

    Population & Refreshing theWarehouse

    Data extraction

    Data cleaning

    Data transformation

    Convert from legacy/host format to warehouse format

    Load

    Sort, summarize, consolidate, compute views, checkintegrity, build indexes, partition

    Refresh

    Propagate updates from sources to the warehouse

  • 8/10/2019 Introduction DWH OLAP

    33/37

    Metadata Repository Administrative metadata

    source databases and their contents

    gateway descriptions

    warehouse schema, view & derived data definitions dimensions, hierarchies

    pre-defined queries and reports

    data mart locations and contents

    data partitions

    data extraction, cleansing, transformation rules, defaults

    data refresh and purging rules

    user profiles, user groups

    security: user authorization, access control

  • 8/10/2019 Introduction DWH OLAP

    34/37

    Metadata Repository .. 2 Business data

    business terms and definitions

    ownership of data charging policies

    operational metadata

    data lineage: history of migrated data and sequence of

    transformations applied

    currency of data: active, archived, purged

    monitoring information: warehouse usage statistics, error

    reports, audit trails.

  • 8/10/2019 Introduction DWH OLAP

    35/37

    Warehouse Design Tools Creating and managing a warehouse is hard

    Development tools

    defining & editing metadata repository contents (schemas, scripts,

    rules) Queries and reports

    Shipping metadata to and from RDBMS catalogue (e.g., PrismWarehouse Manager)

    Planning & analysis tools

    impact of schema changes

    capacity planning

    refresh performance: changing refresh rates or time windows

  • 8/10/2019 Introduction DWH OLAP

    36/37

    Warehouse Management Tools Monitoring and reporting tools (e.g., HP Intelligent

    Warehouse Advisor)

    which partitions, summary tables, columns are used

    query execution times for summary tables, types & frequencies of roll downs

    warehouse usage over time (detect peak periods)

    Systems and network management tools (e.g., HPOpenView, IBM NetView, Tivoli): traffic, utilization

    Exception reporting/alerting tools 9e.g., DB2 EventAlerters, Information Advantage InfoAgents & InfoAlert)

    runaway queries

    Analysis/Visualization tools: OLAP on metadata

  • 8/10/2019 Introduction DWH OLAP

    37/37

    OLAP Tools Existing Tools: Seagate, Brio, Cognos

    Functionality:

    - Choice of tables

    - Allowing user to specify interrelation relationships

    - Use of filtering conditions

    - Construction of cubes on the fly

    Main Problems:Cost per license, poor semantics of aggregations across

    tables, performance for multiple dimension cubes

    Visual OLAP Tool Tableau:

    htt // t bl ft / t ht

    http://www.tableausoftware.com/ptour.htmhttp://www.tableausoftware.com/ptour.htm

Recommended