+ All Categories
Home > Technology > 05 OLAP v6 weekend

05 OLAP v6 weekend

Date post: 11-May-2015
Category:
Upload: prithwis-mukerjee
View: 205 times
Download: 0 times
Share this document with a friend
Popular Tags:
43
Business Information Systems OLAP Cubes in Datawarehousing Prithwis Mukerjee, Ph.D. Acknowledgement Hector Garcia Molina – Stanford FORWISS - Bavarian Research Centre for Knowledge Based Systems
Transcript
Page 1: 05 OLAP  v6 weekend

Business Information Systems

OLAP Cubes in Datawarehousing

Prithwis Mukerjee, Ph.D.

•Acknowledgement•Hector Garcia Molina – Stanford•FORWISS - Bavarian Research Centre for Knowledge Based Systems

Page 2: 05 OLAP  v6 weekend

2

OLTP vs. OLAP

OLTP: On Line Transaction Processing Describes processing at

operational sites Mostly updates Many small

transactions Mb-Tb of data Raw data Clerical users Up-to-date data Consistency,

recoverability critical

OLAP: On Line Analytical Processing Describes processing

at warehouse Mostly reads Queries long,

complex Gb-Tb of data Summarized,

consolidated data Decision-makers,

analysts as users

2

Page 3: 05 OLAP  v6 weekend

3

Warehouse Models & Operators

Data Models relations stars & snowflakes cubes

Operators slice & dice roll-up, drill down pivoting other

3

Page 4: 05 OLAP  v6 weekend

4

Star Schema Terms

Fact tableDimension tablesMeasures

4

saleorderId

datecustIdprodIdstoreId

qtyamt

customercustIdname

addresscity

productprodIdnameprice

storestoreId

city

Page 5: 05 OLAP  v6 weekend

5

Star

5

customer custId name address city53 joe 10 main sfo81 fred 12 main sfo

111 sally 80 willow la

product prodId name pricep1 bolt 10p2 nut 5

store storeId cityc1 nycc2 sfoc3 la

sale oderId date custId prodId storeId qty amto100 1/7/97 53 p1 c1 1 12o102 2/7/97 53 p2 c1 2 11105 3/8/97 111 p1 c3 5 50

Page 6: 05 OLAP  v6 weekend

6

Dimension Hierarchies

6

store storeId cityId tId mgrs5 sfo t1 joes7 sfo t2 freds9 la t1 nancy

city cityId pop regIdsfo 1M northla 5M south

region regId namenorth cold regionsouth warm region

sType tId size locationt1 small downtownt2 large suburbs

storesType

city region

snowflake schema constellations

Page 7: 05 OLAP  v6 weekend

7

Cube

7

sale prodId storeId amtp1 c1 12p2 c1 11p1 c3 50p2 c2 8

c1 c2 c3p1 12 50p2 11 8

Fact table view: Multi-dimensional cube:

dimensions = 2

Page 8: 05 OLAP  v6 weekend

8

3-D Cube

8

sale prodId storeId date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

day 2c1 c2 c3

p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

dimensions = 3

Multi-dimensional cube:Fact table view:

Page 9: 05 OLAP  v6 weekend

9

ROLAP vs. MOLAP

ROLAP:Relational On-Line Analytical ProcessingMOLAP:Multi-Dimensional On-Line Analytical Processing

9

Page 10: 05 OLAP  v6 weekend

10

Aggregates

10

sale prodId storeId date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

• Add up amounts for day 1• In SQL: SELECT sum(amt) FROM SALE WHERE date = 1

81

Page 11: 05 OLAP  v6 weekend

11

Aggregates

11

sale prodId storeId date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

• Add up amounts by day• In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date

ans date sum1 812 48

Page 12: 05 OLAP  v6 weekend

12

Another Example

12

sale prodId storeId date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

• Add up amounts by day, product• In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId

sale prodId date amtp1 1 62p2 1 19p1 2 48

drill-down

rollup

Page 13: 05 OLAP  v6 weekend

13

Aggregates

Operators: sum, count, max, min, median, ave

“Having” clauseUsing dimension hierarchy

average by region (within store) maximum by month (within date)

13

Page 14: 05 OLAP  v6 weekend

14

Cube Aggregation

14

day 2c1 c2 c3

p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3sum 67 12 50

sump1 110p2 19

129

. . .

drill-down

rollup

Example: computing sums

Page 15: 05 OLAP  v6 weekend

15

Cube Operators

15

day 2c1 c2 c3

p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3sum 67 12 50

sump1 110p2 19

129

. . .

sale(c1,*,*)

sale(*,*,*)sale(c2,p2,*)

Page 16: 05 OLAP  v6 weekend

16

Extended Cube

16

c1 c2 c3 *p1 56 4 50 110p2 11 8 19* 67 12 50 129day 2 c1 c2 c3 *

p1 44 4 48p2* 44 4 48

c1 c2 c3 *p1 12 50 62p2 11 8 19* 23 8 50 81

day 1

*

sale(*,p2,*)

Page 17: 05 OLAP  v6 weekend

17

Aggregation Using Hierarchies

17

day 2c1 c2 c3

p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

region A region Bp1 56 54p2 11 8

customer

region

country

(customer c1 in Region A;customers c2, c3 in Region B)

Page 18: 05 OLAP  v6 weekend

18

Pivoting

18

day 2

day 1

Multi-dimensional cube:Fact table view:sale prodId storeId date amt

p1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

day 2c1 c2 c3

p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

c1 c2 c3p1 56 4 50p2 11 8

Page 19: 05 OLAP  v6 weekend

19

What is a Multi-Dimensional Database?

A multidimensional database (MDDB) is a computer software system designed to allow for the efficient and convenient storage and retrieval of large volumes of data that are

• intimately related and • stored, viewed and analyzed from different

perspectives. These perspectives are called dimensions.

Page 20: 05 OLAP  v6 weekend

20

2Relational and Multi-Dimensional Models: An Example

SALES VOLUMES FOR GLEASON DEALERSHIP

MODEL COLOR SALES VOLUME

MINI VAN BLUE 6MINI VAN RED 5MINI VAN WHITE 4SPORTS COUPE BLUE 3SPORTS COUPE RED 5SPORTS COUPE WHITE 5SEDAN BLUE 4SEDAN RED 3SEDAN WHITE 2

The Relational Structure

Page 21: 05 OLAP  v6 weekend

21

COLOR

MODEL

Mini Van

Sedan

Coupe

Red WhiteBlue

6 5 4

3 5 5

4 3 2

Sales Volumes Measurement

DimensionPositions

Dimension

Multidimentional Structure

Page 22: 05 OLAP  v6 weekend

22

PERIOD KEY

Store Dimension

Time Dimension

Product Dimension

STORE KEYPRODUCT KEYPERIOD KEY

DollarsUnitsPrice

Period DescYearQuarterMonthDay

Fact Table

PRODUCT KEY

Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.

Product Desc.BrandColorSizeManufacturer

STORE KEY

The “Classic” Star Scheme

Page 23: 05 OLAP  v6 weekend

23

Differences between MDDB and Relational Databases

Relatively Inflexible. Changes in perspectives necessitate reprogramming of structure.

Flexible. Anything an MDDB can do, can be done this way.

Fast retrieval for large datasets due to predefined structure.

Slows down for large datasets due to multiple JOIN operations needed.

Data retrieval and manipulation are easy

Browsing and data manipulation are not intuitive to user

Perspectives embedded directly in the structure.

Data reorganized based on query. Perspectives are placed in the fields – tells us nothing about the contents

MDDBNormalized Relational

Page 24: 05 OLAP  v6 weekend

24

Relational Model and Multi Dimensional Databases -Example 2

SALES VOLUMES FOR ALL DEALERSHIPS MODEL COLOR DEALERSHIP VOLUME MINI VAN BLUE CLYDE 6 MINI VAN BLUE GLEASON 6 MINI VAN BLUE CARR 2 MINI VAN RED CLYDE 3 MINI VAN RED GLEASON 5 MINI VAN RED CARR 5 MINI VAN WHITE CLYDE 2 MINI VAN WHITE GLEASON 4 MINI VAN WHITE CARR 3 SPORTS COUPE BLUE CLYDE 2 SPORTS COUPE BLUE GLEASON 3 SPORTS COUPE BLUE CARR 2 SPORTS COUPE RED CLYDE 7 SPORTS COUPE RED GLEASON 5 SPORTS COUPE RED CARR 2 SPORTS COUPE WHITE CLYDE 4 SPORTS COUPE WHITE GLEASON 5 SPORTS COUPE WHITE CARR 1 SEDAN BLUE CLYDE 6 SEDAN BLUE GLEASON 4 SEDAN BLUE CARR 2 SEDAN RED CLYDE 1 SEDAN RED GLEASON 3 SEDAN RED CARR 4 SEDAN WHITE CLYDE 2 SEDAN WHITE GLEASON 2 SEDAN WHITE CARR 3

Page 25: 05 OLAP  v6 weekend

25

Mutlidimensional Representation

Sales Volumes

DEALERSHIP

Mini Van

Coupe

Sedan

Blue Red White

MODEL

ClydeGleason

Carr

COLOR

Page 26: 05 OLAP  v6 weekend

26

Viewing Data - An Example

DEALERSHIP

Sales Volumes

MODEL

COLOR

•Assume that each dimension has 10 positions, as shown in the cube above •How many records would be there in a relational table? •Implications for viewing data from an end-user standpoint?

Page 27: 05 OLAP  v6 weekend

27

Adding Dimensions- An Example

MODEL

Mini Van

Coupe

Sedan

Blue Red White

ClydeGleason

Carr

COLOR

Sales Volumes

Coupe

Sedan

Blue Red White

ClydeGleason

Carr

COLOR

DEALERSHIP

Mini Van

Coupe

Sedan

Blue Red White

ClydeGleason

Carr

COLOR

JANUARY FEBRUARY MARCH

Mini Van

Page 28: 05 OLAP  v6 weekend

28

3When is MDD (In)appropriate?

PERSONNEL LAST NAME EMPLOYEE# EMPLOYEE AGE SMITH 01 21 REGAN 12 19 FOX 31 63 WELD 14 31 KELLY 54 27 LINK 03 56 KRANZ 41 45 LUCUS 33 41 WEISS 23 19

First, consider situation 1

Page 29: 05 OLAP  v6 weekend

29

When is MDD (In)appropriate?

Now consider situation 2 SALES VOLUMES FOR GLEASON DEALERSHIP

MODEL COLOR VOLUME

MINI VAN BLUE 6MINI VAN RED 5MINI VAN WHITE 4SPORTS COUPE BLUE 3SPORTS COUPE RED 5SPORTS COUPE WHITE 5SEDAN BLUE 4SEDAN RED 3SEDAN WHITE 2

1. Set up a MDD structure for situation 1, with LAST NAMEand Employee# as dimensions, and AGE as the measurement.2. Set up a MDD structure for situation 2, with MODEL and

COLOR as dimensions, and SALES VOLUME as the measurement.

Page 30: 05 OLAP  v6 weekend

30

When is MDD (In)appropriate?

COLOR

MODEL

Miini Van

Sedan

Coupe

Red WhiteBlue

6 5 4

3 5 5

4 3 2

Sales Volumes

EMPLOYEE #

LAST

NAME

Kranz

Weiss

Lucas

41 3331

45

19

Employee Age

41

31

56

63

21

19

Smith

Regan

Fox

Weld

Kelly

Link

01 14 54 03 1223

27

Note the sparseness in the second MDD representation

MDD Structures for the Situations

Page 31: 05 OLAP  v6 weekend

31

When is MDD (In)appropriate?

Highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis. When there are no interrelationships, the MDD structure is not appropriate.

Page 32: 05 OLAP  v6 weekend

32

4MDD Features - Rotation

Sales Volumes

COLOR

MODEL

Mini Van

Sedan

Coupe

Red WhiteBlue

6 5 4

3 5 5

4 3 2

MODEL

COLOR

SedanCoupe

Red

White

Blue 6 3 4

5 5 3

4 5 2( ROTATE 90

o )

View #1 View #2

Mini Van

•Also referred to as “data slicing.”•Each rotation yields a different slice or two dimensional tableof data – a different face of the cube.

Page 33: 05 OLAP  v6 weekend

33

MDD Features - Rotation

COLORCOLORMODEL

MODELDEALERSHIPDEALERSHIP

MODEL

Mini Van

Coupe

Sedan

Blue Red White

ClydeGleason

Carr

COLOR

Mini Van

Blue

Red

WhiteClyde

GleasonCarr

MODEL

Mini Van

Coupe

Sedan

Blue

Red

White

Carr

COLOR

COLOR

DEALERSHIP

View #1 View #2 View #3

DEALERSHIP

Mini Van

CoupeSedan

BlueRedWhite

Clyde

Gleason

Carr

Mini Van Coupe Sedan

BlueRed

WhiteClyde

Gleason

Carr Mini Van

Coupe

SedanBlue

RedWhite

Clyde Gleason Carr

View #4 View #5 View #6

DEALERSHIP

CoupeSedan

( ROTATE 90o

) ( ROTATE 90o

) ( ROTATE 90o

)

COLOR MODEL

MODEL

DEALERSHIP( ROTATE 90

o ) ( ROTATE 90

o )

Gleason Clyde

Sales Volumes

Page 34: 05 OLAP  v6 weekend

34

MDD Features - Ranging

Sales Volumes

DEALERSHIP

Mini Van

Coupe

Metal Blue

MODEL

ClydeCarr

COLOR

Normal Blue

Mini Van

Coupe

Normal Blue

Metal Blue

Clyde

Carr

• The end user selects the desired positions along each dimension.• Also referred to as "data dicing." • The data is scoped down to a subset grouping

Page 35: 05 OLAP  v6 weekend

35

MDD Features - Roll-Ups & Drill Downs

Gary

Gleason Carr Levi Lucas Bolton

Midwest

St. LouisChicago

Clyde

REGION

DISTRICT

DEALERSHIP

ORGANIZATION DIMENSION

• The figure presents a definition of a hierarchy within the organization dimension.

• Aggregations perceived as being part of the same dimension.• Moving up and moving down levels in a hierarchy is referred to

as “roll-up” and “drill-down.”

Page 36: 05 OLAP  v6 weekend

36

The Time Dimension

TIME as a predefined hierarchy for rolling-up and drilling-down across days, weeks, months, years and special periods, such as fiscal years.

Eliminates the effort required to build sophisticated hierarchies every time a database is set up.

Extra performance advantages

Page 37: 05 OLAP  v6 weekend

37

5Pros/Cons of MDD

Cognitive Advantages for the UserEase of Data Presentation and Navigation, Time dimensionPerformance

Less flexibleRequires greater initial effort

Page 38: 05 OLAP  v6 weekend

38

Multidimensional OLAP (MOLAP)

specialized database technology

multidimensional storage structures

E.g. Hyperion Essbase, Oracle Express, Cognos PowerPlay (Server)

Query Performance

Powerful MD Model write access

Database Features multiuser access/ backup and recovery

Sparsity Handling -> DB Explosion

Multidim.Database

Frontend Tool

Page 39: 05 OLAP  v6 weekend

39

MOLAP Server

Multi-Dimensional OLAP Server

39

multi-dimensional

server

M.D. tools

utilitiescould also

sit onrelational

DBMS

Pro

du

ctCity

Date1 2 3 4

milk

soda

eggs

soap

AB

Sales

Page 40: 05 OLAP  v6 weekend

40

Relational OLAP (ROLAP)

idea: use relational data storage

star (snowflake) schema

E.g. Microstrategy, SAP BW

+ advantages of RDBMS+ scalability, reliability,

security etc.

+ Sparsity handling Query Performance Data Model Complexity no write access

ROLAP- Engine

Relational DB

Frontend Tool

SQL

MD-Interface

Meta Data

Page 41: 05 OLAP  v6 weekend

41

ROLAP Server

Relational OLAP Server

41

relationalDBMS

ROLAPserver

tools

utilities

sale prodId date sump1 1 62p2 1 19p1 2 48

Special indices, tuning;

Schema is “denormalized”

Page 42: 05 OLAP  v6 weekend

42

Client (Desktop) OLAP

proprietary data structure on the client

data stored as file mostly RAM based

architectures E.g. Business Objects, Cognos

PowerPlay

+ mobile user+ ease of installation and use data volume no multiuser capabilites

Client-OLAP

Page 43: 05 OLAP  v6 weekend

43

ROLAP- Engine

Multidim.Database

DW-DB (mostly relational)

MOLAP ROLAP Client-OLAP

DW Integration


Recommended