+ All Categories
Home > Documents > o Lap and Mining

o Lap and Mining

Date post: 03-Jun-2018
Category:
Upload: asad-rao
View: 220 times
Download: 0 times
Share this document with a friend

of 38

Transcript
  • 8/11/2019 o Lap and Mining

    1/38

    C S 5 6 1 - S P R I N G 2 0 1 2W P I , M O H A M E D E L TA B A K H

    OLAP &

    DATA MINING

    1

  • 8/11/2019 o Lap and Mining

    2/38

    Online Analytic ProcessingOLAP

    2

  • 8/11/2019 o Lap and Mining

    3/38

    OLAP

    OLAP: Online Analytic Processing

    OLAP queries are complex queries that

    Touch large amounts of data

    Discover patterns and trends in the data

    Typically expensive queries that take long time

    Also called decision-support queries

    In contrast to OLAP: OLTP: Online Transaction Processing

    OLTP queries are simple queries, e.g., over banking or airline

    systems

    OLTP queries touch small amount of data for fast transactions

    3

  • 8/11/2019 o Lap and Mining

    4/38

    OLTP vs. OLAP

    ! On-Line Transaction Processing (OLTP):

    technology used to perform updates on operational ortransactional systems (e.g., point of sale systems)

    !

    On-Line Analytical Processing (OLAP):

    technology used to perform complex analysis of the datain a data warehouse

    4

    OLAP is a category of software technology that enables

    analysts, managers, and executives to gain insight into datathrough fast, consistent, interactive access to a wide varietyof possible views of information that has been transformed

    from raw data to reflect the dimensionality of the enterprise

    as understood by the user.[source: OLAP Council: www.olapcouncil.org]

  • 8/11/2019 o Lap and Mining

    5/38

    OLAP AND DATA WAREHOUSE

    5

    Query andAnalysis

    Component

    DataIntegrationComponent

    Data

    WarehouseOperational

    DBs

    ExternalSources

    Internal

    Sources

    OLAPServer

    Metadata

    OLAP

    Reports

    ClientTools

    DataMining

  • 8/11/2019 o Lap and Mining

    6/38

    OLAP AND DATA WAREHOUSE

    Typically, OLAP queries are executed over a separate copy ofthe working data

    Over data warehouse

    Data warehouse is periodically updated, e.g., overnight OLAP queries tolerate such out-of-date gaps

    Why run OLAP queries over data warehouse??

    Warehouse collects and combines data from multiple sources

    Warehouse may organize the data in certain formats to support OLAP

    queries

    OLAP queries are complex and touch large amounts of data

    They may lock the database for long periods of time

    Negatively affects all other OLTP transactions

    6

  • 8/11/2019 o Lap and Mining

    7/38

    OLAP ARCHITECTURE

    7

  • 8/11/2019 o Lap and Mining

    8/38

    EXAMPLE OLAP APPLICATIONS

    Market Analysis

    Find which items are frequently sold over the summer butnot over winter?

    Credit Card Companies

    Given a new applicant, does (s)he a credit-worthy?

    Need to check other similar applicants (age, gender,income, etc) and observe how they perform, then do

    prediction for new applicant

    8

    OLAP queries are also called decision-support queries

  • 8/11/2019 o Lap and Mining

    9/38

    MULTI-DIMENSIONAL VIEW

    Data is typically viewed as pointsin multi-dimensional space

    9

    10

    47

    30

    12Milk 1%fat

    3/1 3/2 3/3 3/4

    Milk 2%fat

    Orangejuice

    bread

    Time

    ItemsNY

    MA

    CA

    Location

    Raw data cubes(raw level without

    aggregation)

    Typical OLAP applicationshave many dimensions

  • 8/11/2019 o Lap and Mining

    10/38

    ANOTHER EXAMPLE

    10

    !"# !$$%&

    #'()

    "#'*

  • 8/11/2019 o Lap and Mining

    11/38

    APPROACHES FOR OLAP

    Relational OLAP (ROLAP)

    Multi-dimensional OLAP (MOLAP)

    Hybrid OLAP (HOLAP) = ROLAP + MOLAP

    11

  • 8/11/2019 o Lap and Mining

    12/38

    RELATIONAL OLAP: ROLAP

    Data are stored in relational model (tables)

    Special schema calledStar Schema

    One relation is the fact table, all the others are dimension tables

    12

    Facts

    Week

    Product

    Product

    Year

    Region

    Time

    Channel

    Revenue

    Expenses

    Units

    Model

    Type

    Color

    Channel

    Region

    Nation

    District

    Dealer

    Time

    Large table Small tables

  • 8/11/2019 o Lap and Mining

    13/38

    CUBE vs. STAR SCHEMA

    13

    Facts

    Week

    Product

    Product

    Year

    Region

    Time

    Channel

    Revenue

    Expenses

    Units

    Model

    Type

    Color

    Channel

    Region

    Nation

    District

    Dealer

    Time

    Data inside the cubeare the fact records

    Dimension tablesdescribe the dimensions

    10

    47

    30

    12Milk 1%fat

    3/1 3/2 3/3 3/4

    Milk 2%fat

    Orangejuice

    bread

    Time

    Items

    NY

    MA

    CA

    Location

  • 8/11/2019 o Lap and Mining

    14/38

  • 8/11/2019 o Lap and Mining

    15/38

    SLICING & DICING

    Dicing

    how each dimension in the cubeis divided

    Different granularities

    When building the data cube

    Slicing

    Selecting slices of the data cube

    to answer the OLAP query

    When answering a query

    15

    Dicing Time by day

    10

    47

    30

    12Milk 1%fat

    3/1 3/2 3/3 3/4

    Milk 2%fat

    Orangejuice

    bread

    Time

    Items

    NY

    MA

    CA

    Location

    Dicing Location by state

  • 8/11/2019 o Lap and Mining

    16/38

    SLICING & DICING: EXAMPLE 1

    16

    Dicing Slicing

    Slicing operation in ROLAP is basically:-- Selection conditions on some attributes (WHERE clause) +

    -- Group by and aggregation

  • 8/11/2019 o Lap and Mining

    17/38

    SLICING & DICING: EXAMPLE 2

    17

  • 8/11/2019 o Lap and Mining

    18/38

    SLICING & DICING: EXAMPLE 3

    18

  • 8/11/2019 o Lap and Mining

    19/38

    DRILL-DOWN & ROLL-UP

    19

    Region Sales variance

    Africa 105%

    Asia 57%

    Europe 122%

    North America 97%

    Pacific 85%

    South America 163%

    Nation Sales variance

    China 123%Japan 52%

    India 87%

    Singapore 95%

    Drill-down(Group by Nation)

    Roll-up(group by Region)

  • 8/11/2019 o Lap and Mining

    20/38

    ROLAP: DRILL-DOWN & ROLL-UP

    20

    Drill-down Roll-up

  • 8/11/2019 o Lap and Mining

    21/38

    MOLAP

    Unlike ROLAP, in MOLAP data are stored in special structures called

    Data Cubes (Array-bases storage)

    Data cubes pre-compute and aggregate the data

    Possibly several data cubes with different granularities

    Data cubes are aggregated materialized views over the data

    As long as the data does not change frequently, the overhead ofdata cubes is manageable

    21

    Sales 1996

    Red

    blob

    Blue

    blob

    1997

    Every day, every item, every city

    Every week, every item

    category, every city

  • 8/11/2019 o Lap and Mining

    22/38

    MOLAP: CUBE OPERATOR

    22

    Raw-data (fact table)

    Aggregation over the X axis

    Aggregation over the Y axis

    Aggregation over the Z axis

    Aggregation over the X,Y

  • 8/11/2019 o Lap and Mining

    23/38

    MOLAP & ROLAP

    Commercial offerings of both types are available

    In general, MOLAPis good for smaller warehouses and isoptimized for canned queries

    In general, ROLAPis more flexible and leverages relationaltechnology

    ROLAPMay pay a performance penalty to realize flexibility

    23

  • 8/11/2019 o Lap and Mining

    24/38

  • 8/11/2019 o Lap and Mining

    25/38

    OLAP: SUMMARY

    OLAP stands for Online Analytic Processing and used indecision support systems

    Usually runs on data warehouse

    In contrast to OLTP, OLAP queries are complex, touch largeamounts of data, try to discover patterns or trends in the data

    OLAP Models

    Relational (ROLAP): uses relational star schema

    Multidimensional (MOLAP): uses data cubes

    25

  • 8/11/2019 o Lap and Mining

    26/38

    Overview on Data MiningTechniques

    26

  • 8/11/2019 o Lap and Mining

    27/38

    DATA MINING vs. OLAP

    27

    OLAP - Online

    Analytical Processing

    Provides you with a very

    good view of what is

    happening, but can not

    predict what will happen

    in the future or why it is

    happening

    Data Mining is a combination of discoveringtechniques + prediction techniques

  • 8/11/2019 o Lap and Mining

    28/38

    DATA MINING TECHNIQUES

    Clustering

    Classification

    Association Rules

    Frequent Itemsets

    Outlier Detection

    .

    28

  • 8/11/2019 o Lap and Mining

    29/38

    FREQUENT ITEMSET MINING

    Very common problem in Market-Basket applications

    Given a set of items I ={milk, bread, jelly, }

    Given a set of transactions where each transaction contains

    subset of items

    t1 = {milk, bread, water}

    t2 = {milk, nuts, butter, rice}

    29

    What are the itemsets frequently sold together ??

    % of transactions in which the itemset appears >= !

  • 8/11/2019 o Lap and Mining

    30/38

    EXAMPLE

    30

    Assume ! = 60%, what are the frequent itemsets {Bread} "80%

    {PeanutButter}"60%

    {Bread, PeanutButter} "60%

    called Support

    All frequent itemsets given ! = 60%

  • 8/11/2019 o Lap and Mining

    31/38

    HOW TO FIND FREQUENT ITEMSETS

    Nave Approach

    Enumerate all possible itemsets and then count each one

    31

    All possible itemsets of size 1

    All possible itemsets of size 2

    All possible itemsets of size 3

    All possible itemsets of size 4

  • 8/11/2019 o Lap and Mining

    32/38

    CAN WE OPTIMIZE??

    32

    Assume ! = 60%, what are the frequent itemsets

    {Bread} "80%

    {PeanutButter}"60%

    {Bread, PeanutButter} "60%

    called Support

    PropertyFor itemset S={X, Y, Z, } of size nto be frequent, all its

    subsets of sizen-1must be frequent as well

  • 8/11/2019 o Lap and Mining

    33/38

  • 8/11/2019 o Lap and Mining

    34/38

    APRIORI EXAMPLE

    34

  • 8/11/2019 o Lap and Mining

    35/38

    APRIORI EXAMPLE (CONTD)

    35

  • 8/11/2019 o Lap and Mining

    36/38

    DATA MINING TECHNIQUES

    Clustering

    Classification

    Association Rules

    Frequent Itemsets

    Outlier Detection

    .

    36

  • 8/11/2019 o Lap and Mining

    37/38

    ASSOCIATION RULES MINING

    What is the probability when a customer buys bread in atransaction, (s)he also buys milkin the same transaction?

    37

    Bread ----------------------> milkImplies?

    Frequent itemsets cannot answer this question.But Association rules can

    General Form

    Association rule: x1, x2, , xn "y1, y2, ymMeaning: when the L.H.S appears (or occurs), the R.H.S also appears (or

    occurs) with certain probabilityTwo measures for a given rule:

    1- Support(L.H.S U R.H.S) >2- Confidence C = Support(L.H.S U R.H.S)/ Support(L.H.S)

  • 8/11/2019 o Lap and Mining

    38/38

    EXAMPLE

    38

    Rule: Bread PeanutButter

    Support of rule = support(Bread, PeanutButter) = 60%

    Confidence of rule = support(Bread, PeanutButter)/support(Bread) = 75%

    Rule: Bread, Jelly PeanutButter

    Support of rule = support(Bread, Jelly, PeanutButter) = 20%

    Confidence of rule = support(Bread, Jelly, PeanutButter) /support(Bread, Jelly) = 100%

    Usually we search for rules:Support >Confidence >


Recommended