+ All Categories
Home > Documents > OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E....

OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E....

Date post: 03-Apr-2018
Category:
Upload: vuongquynh
View: 222 times
Download: 4 times
Share this document with a friend
13
11/9/16 1 OLAP Operations Helena Galhardas DEI/IST References A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation, Springer, 2014 (chpt 3) J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001 (chpt. 2) C. Ciferri, R. Ciferri, L.I. Gómez, M. Schneider, A.A. Vaisman, E. Zimányi, Cube algebra: a generic user- centric model and query language for OLAP cubes. Int. J. Data Warehousing Mining 9(2), 39–65, 2013 A. Wichert, H. Galhardas, SAD slides, MEIC/IST 2
Transcript
Page 1: OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation ... Techniques, Morgan

11/9/16  

1  

OLAP Operations

Helena Galhardas DEI/IST

References

•  A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation, Springer, 2014 (chpt 3)

•  J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001 (chpt. 2)

•  C. Ciferri, R. Ciferri, L.I. Gómez, M. Schneider, A.A. Vaisman, E. Zimányi, Cube algebra: a generic user-centric model and query language for OLAP cubes. Int. J. Data Warehousing Mining 9(2), 39–65, 2013

•  A. Wichert, H. Galhardas, SAD slides, MEIC/IST

2  

Page 2: OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation ... Techniques, Morgan

11/9/16  

2  

Recap. the multidimensional model

•  Multidimensional model enables one to view data from multiple perspectives and at several levels of detail

3  

Example Dimensions: perspectives used to analyze the data

–  Example: A 3-dimensional cube for sales data with dimensions Product, Time, and Customer, and a measure Quantity

•  Attributes describe dimensions –  Product dimension may have attributes ProductNumber and UnitPrice (not shown)

•  Cells or facts have associated numeric values called measures –  Each cell of the data cube represents Quantity of units sold by category, quarter, and

customer’s city

4  

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

measure values

dimensions

Page 3: OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation ... Techniques, Morgan

11/9/16  

3  

Hierarchies •  Allow viewing data at several granularities

–  Define a sequence of mappings relating lower-level, detailed concepts to higher-level ones

–  The lower level is called the child and the higher level is called the parent

–  The hierarchical structure of a dimension is called the dimension schema

–  A dimension instance comprises all members at all levels in a dimension

•  Example –  Hierarchies of the Product, Time, and Customer dimensions

5  

All

Category

Product

ProductAll

Year

Semester

Quarter

Month

Day

TimeAll

Continent

Country

State

City

Customer

Customer

Outline

•  OLAP operations

6  

Page 4: OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation ... Techniques, Morgan

11/9/16  

4  

OLAP Operations: definition

•  Allows these perspectives and several levels of detail to be materialized by exploiting dimensions and their hierarchies

•  Provide an interactive data analysis environment

•  Supported by OLAP modules – Ex: Saiku (Pentaho)

7  

OLAP Operations (1)

8  

Q4

FranceGermany

Product (Category)

Tim

e (Q

uarte

r)

Beverages

Q3

Q2

Q1

Condiments

SeafoodProduce

Cust

omer

(Cou

ntry

)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

...

ParisLyon

Köln

Product (Category)

Tim

e (M

onth

)

Beverages

Mar

Feb

Jan

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Dec

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments SeafoodProduce

Custo

mer(C

ity)

Page 5: OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation ... Techniques, Morgan

11/9/16  

5  

9  

Q4

Köln

Berlin

Paris

Produ

ct

(Cate

gory

)

Time (Quarter)

Beverages

Q3Q2Q1

Lyon

CondimentsSeafood

Produce

Cus

tom

er (C

ity)

Q4

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Condiments

SeafoodProduce

ParisLyon

Product (Category)Ti

me

(Qua

rter

)Beverages

Q2

Q1

Condiments

SeafoodProduceCus

tomer

(City

)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

OLAP Operations (2)

OLAP Operations (3)

10  

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Page 6: OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation ... Techniques, Morgan

11/9/16  

6  

OLAP Operations (4)

11  

SUM BY Time, Customer

84

72

93

84

Q4

Customer (City)

Tim

e (Q

uart

er)

Paris

96

Q3

Q2

Q1

Berlin

Lyon

89 106

79

8865105

82 77

61112 102

Köln

max() by quarter and city

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

•  According to the authors of the book, agg. functions can be classified as:

–  cumulative: compute the measure value of a cell from several other cells(e.g., SUM; COUNT, AVG)

–  filtering: filter the members of a dimension that appears in the result (MIN, MAX); must compute not only the aggregated value but also detemine the dimension members that belong to the result

OLAP Operations (5)

12  

...

ParisLyon

Köln

Product (Category)

Tim

e (M

onth

)

Beverages

Mar

Feb

Jan

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Dec

...

ParisLyon

Köln

Product (Category)

Tim

e (M

onth

)

Beverages

Mar

Feb

Jan

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Dec

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity) Bilbao

Madrid

Page 7: OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation ... Techniques, Morgan

11/9/16  

7  

Algebra of OLAP Operations

•  There is not yet a standard definition of OLAP operations in a similar way to the relational algebra

•  Many proposals of OLAP algebra in the literature •  We adopt the one proposed in [Ciferri et al 2013]

13  

Algebra of OLAP Operations - rollup

•  Roll-up: aggregates measures along a dimension hierarchy (using an aggregate function) to obtain measures at a coarser granularity ROLLUP(CubeName, (Dimension Level), AggFunction(Measure))

14  

Q4

FranceGermany

Product (Category)

Tim

e (Q

uarte

r)

Beverages

Q3

Q2

Q1

Condiments

SeafoodProduce

Cust

omer

(Cou

ntry

)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

ROLLUP(Sales2012,  Customer    Country,  SUM(Quan;ty))  

Page 8: OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation ... Techniques, Morgan

11/9/16  

8  

Algebra of OLAP Operations – drill-down

•  Drill-down moves from a more general level to a more detailed level in a hierarchy –  DRILLDOWN(CubeName, (Dimension Level))

15  

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

...

ParisLyon

Köln

Product (Category)

Tim

e (M

onth

)

Beverages

Mar

Feb

Jan

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Dec

DRILLDOWN(Sales2012,  Time      Month)  

Algebra of OLAP Operations – sort

•  Sort returns a cube where the members of a dimension have been sorted –  SORT(CubeName, Dimension, Expression [ASC | DESC]) –  where the members of Dimension are sorted according to the value of

Expression

16  

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments SeafoodProduce

Custo

mer(C

ity)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

SORT(Sales2012,  Product,  Category)  

Page 9: OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation ... Techniques, Morgan

11/9/16  

9  

•  Pivot (or rotate): rotates the axes of a cube to provide an alternative presentation of data –  PIVOT(CubeName, (Dimension Axis)*) –  where the axes are specified as {X; Y; Z; X1; Y1; Z1; : : :}.

17  

Algebra of OLAP Operations – pivot

Q4

Köln

Berlin

Paris

Produ

ct

(Cate

gory

)

Time (Quarter)

Beverages

Q3Q2Q1

Lyon

CondimentsSeafood

Produce

Cus

tom

er (C

ity)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

PIVOT(Sales2012,  Time      X,  Customer      Y,  Product      Z)  

•  Slice: removes a dimension in a cube so a cube of n-1 dimensions is obtained from a cube of n dimensions

–  SLICE(CubeName, Dimension, Level = Value)

•  Dimension will be dropped by fixing a single Value in the Level; other dimensions unchanged

•  Slice supposes that the granularity of the cube is at the specified level of the dimension

18  

Algebra of OLAP Operations – slice

Q4

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Condiments

SeafoodProduceQ4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

SLICE(Sales2012,  Customer,  City  =  ’Paris’)  

Page 10: OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation ... Techniques, Morgan

11/9/16  

10  

•  Dice: keeps the cells of a cube that satisfy a Boolean condition Φ –  DICE(CubeName, Φ)

•  Φ is a Boolean condition over dimension levels, attributes, and measures.

19  

Algebra of OLAP Operations – dice

ParisLyon

Product (Category)

Tim

e(Q

uart

er)

Beverages

Q2

Q1

Condiments

SeafoodProduce

Custo

mer(C

ity)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

DICE(Sales2012,  (Customer.City  =  ’Paris’  OR  Customer.City  =  ’Lyon’)  AND  (Time.Quarter  =  ’Q1’  OR  Time.Quarter  =  ’Q2’))  

•  Drill-across: combines cells from two data cubes that have the same schema –  DRILLACROSS(CubeName1, CubeName2, [Condition])

20  

Algebra of OLAP Operations – drill-across

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Sales2011-­‐2012      DRILLACROSS(Sales2011,  Sales2012)  

Page 11: OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation ... Techniques, Morgan

11/9/16  

11  

•  Add Measure: adds new measures to a cube –  ADDMEASURE(CubeName, (NewMeasure = Expression)* )

•  Drop measure: Deletes a measure from a cube schema –  DROPMEASURE(CubeName, Measure*)

21  

Algebra of OLAP Operations – ADD-MEASURE

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

ADDMEASURE(Sales2011-­‐2012,  PercChange  =  (Quan;ty2011-­‐Quan;ty2012)/Quan;ty2011)  

•  Another ex: –  Computes the value of a cell by aggregating the measures of several

nearby cells

22  

Algebra of OLAP Operations – ADD-MEASURE

...

ParisLyon

Köln

Product (Category)

Tim

e (M

onth

)

Beverages

Mar

Feb

Jan

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Dec

...

ParisLyon

Köln

Product (Category)

Tim

e (M

onth

)

Beverages

Mar

Feb

Jan

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Dec

ADDMEASURE(Sales2012,  MovAvg  =  AVG(Quan;ty)  OVER  Time  2  CELLS  PRECEDING)  

Page 12: OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation ... Techniques, Morgan

11/9/16  

12  

•  Aggregation functions in OLAP are also needed at the current granularity, that is without performing roll-up.

–  AggFunction(CubeName, Measure) [BY Dimension*] –  Cumulative: compute the measure value of a cell from several other cells; examples are

SUM, COUNT, and AVG –  Filtering: Filters the members of a dimension that appear in the result; examples are

MIN and MAX. Filtering functions compute not only the aggregated value, but also the members of the dimension that belong to the result

23  

Algebra of OLAP Operations – aggregate functions

SUM BY Time, Customer

84

72

93

84

Q4

Customer (City)

Tim

e (Q

uart

er)

Paris

96

Q3

Q2

Q1

Berlin

Lyon

89 106

79

8865105

82 77

61112 102

KölnQ4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

SUM(Sales2012,  Quan;ty)  BY  Time,  Customer  

•  Another example: max sales by quarter and city

24  

Algebra of OLAP Operations – aggregate functions

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uarte

r)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

MAX(Sales2012,  Quan;ty)  BY  Time,  Customer  

Page 13: OLAP Operations - ULisboa · OLAP Operations Helena Galhardas ... References • A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation ... Techniques, Morgan

11/9/16  

13  

•  Union merges two cubes having the same schema but disjoint instances •  Ex: If CubeSpain is a cube having the same schema as the original cube but containing only

the sales to Spanish customers, we can perform: •  Difference removes the cells in a cube that belong to another one; the two cubes must have

the same schema •  Drill-through allows to move from data at the bottom level in a cube to data in the

operational systems from which the cube was derived; Could be used when trying to determine the reason for outlier values in a data cube 25  

Algebra of OLAP Operations – union, difference, drill-through

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity)

Q4

ParisLyon

Köln

Product (Category)

Tim

e (Q

uart

er)

Beverages

Q3

Q2

Q1

Berlin

Condiments

SeafoodProduce

Custo

mer(C

ity) Bilbao

Madrid

UNION(Sales2012,  SalesSpain)  

Next Lecture

•  Conceptual Data Warehouse Design

26  


Recommended