+ All Categories
Home > Documents > MOLAP on Cloud

MOLAP on Cloud

Date post: 23-Feb-2016
Category:
Upload: kimo
View: 43 times
Download: 0 times
Share this document with a friend
Description:
MOLAP on Cloud. Interactive, Cluster Data Warehouse. Hongwei Zhao, Xiaojun Ye. [email protected] , [email protected]. Tsinghua University . Motivation. Extend the cube model to support OLAP operations on Big Data: OLAP o perations Interactive queries. Outline. - PowerPoint PPT Presentation
24
Hongwei Zhao, Xiaojun Ye MOLAP on Cloud Interactive, Cluster Data Warehouse Tsinghua University [email protected] , [email protected] n
Transcript
Page 1: MOLAP on Cloud

Hongwei Zhao, Xiaojun Ye

MOLAP on CloudInteractive, Cluster Data Warehouse

Tsinghua University [email protected], [email protected]

Page 2: MOLAP on Cloud

MotivationExtend the cube model to support OLAP operations on Big Data:»OLAP operations»Interactive queries

Page 3: MOLAP on Cloud

OutlineCube modellingBuilding and queryingExperimenting

Page 4: MOLAP on Cloud

Data Transform for CubeTPC-DS tables Star views Cube data

User queries

Page 5: MOLAP on Cloud

A Simplified Cube Model

Cube Instance

Cuboid InstanceDimension

InstanceDimensionInstance

CubeMetadata

DimensionInstance

Cuboid Instance

Key MemberKey MemberKey

Dimension

Member

Key Measure NodeKey Measure NodeKey Measure Cell

ABC

AB

A

AC

B

BC

C

*

Base Cuboids

Result

Page 6: MOLAP on Cloud

Example: TPC-DS Query7select i_item_id, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, item, promotionwhere ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and ss_cdemo_sk = cd_demo_sk and ss_promo_sk = p_promo_sk and cd_gender = '[GEN]' and cd_marital_status = '[MS]' and cd_education_status = '[ES]' and (p_channel_email = 'N' or p_channel_event = 'N') and d_year = [YEAR] group by i_item_id order by i_item_id

Page 7: MOLAP on Cloud

Relation Schema

Store Sales

Date Dim

Item Promotion

Customer Demographic

s

Page 8: MOLAP on Cloud

Converting to BitKeyDimensio

n ADimension B

Dimension C

Measure

A1 B1 C1 M1

A2 B1 C2 M2

A3 B2 C2 M3

Dimension Member

BitKey Dimension Mask

A1 000001 000001B1 000010 000010C1 000100 000100A2 001000 001001B1 000010 000010C2 010000 010100

BitKeys

Value

000111 M1011010 M2Result2

Result1

Intermediate

Result1

Fact1

Fact2

Intermediate

Result1

Page 9: MOLAP on Cloud

Cube StorageTableRegionColumnFamilyRowColumnVersionValue

Cell

One table for dimension instances storage:Row Key Dimension NameColumn Family

Default

Column Member BitKeyValue Member ValueMultiple tables for cuboids instancesTable Name Cuboid NameRow Key Cell BitKeyColumn Family

Default

Column Measure NameValue Measure Value

Page 10: MOLAP on Cloud

MDX for query 7select { i_item_id } on rows,

{ avg(ss_quantity), avg(ss_list_price),avg(ss_coupon_amt),

avg(ss_sales_price) } on columns

from store_sales_cubewhere (cd_gender .[Male], cd_marital_status .[Single], cd_education_status .[College],

d_year.[2000])

Page 11: MOLAP on Cloud

Cube Implementation

Base cuboid building with 4 stages:Dimension constructingHive queryAggregationSaving

Query execution with 4 stages:Loading dimensionOther cuboid constructingMappingReducing

Page 12: MOLAP on Cloud

OLAP System

Engi

neCo

lum

nar

Data

base

Master Node

Region NodeRegion

Node

Dispatcher Node

Worker Node

Region Node

Worker NodeWorker Node

cachedat

a

Cube data

Cluster FrameworkDispatcher Node

Worker Nodes

• Distribute dynamically cubes data onto worker nodes

• Parallelize OLAP operations into a concurrent model

Page 13: MOLAP on Cloud

Actor of AkkaStateBehaviorMailbox

Lifecycle

Fault tolerance

Page 14: MOLAP on Cloud

Execute Query

Query Dispatch

erCuboid

ManagerDimensio

n Manager

Mapper Reducer

1 2

34

require

Cuboid ready

Dimension load

data ready

Extract Query

Hit Cell

Hit Cell

Actors for Query • Load dimension

members• Build other cuboids• Mapping• Reducing

Page 15: MOLAP on Cloud

Compiling & MappingQuery 7 Condition: GEN=M and MS=S and ES=College and YEAR=2000

GEN Mask: 000000011 Male 000000010MS Mask: 000011100 Single :000001100ES Mask: 001100000College: 001000000YEAR Mask: 110000000 2000:010000000

Mask: 111111111FilterKey: 011001110

Query Dispatch

er

Mapper1

Mapper2

Mapper3

For each cell in mapper{ If (key & mask

== Filter Key) Send to Reducer}

Page 16: MOLAP on Cloud

Region 1

Region 2

Region 3

Worker

Worker

Worker

Master

messages

results

Cache 1

Cache 2

Cache 3

Query Execution• Master sends task messages to workers

• Each worker caches each region data

• Sequential tasks reuse the cache data

First query on 1G consume 48 secs, the following queries with various parameters consume 2.4 secs

Page 17: MOLAP on Cloud

Experiments On TPC-DS

1g 10g 100g0

50000000100000000150000000200000000250000000300000000

fact recordscells

  1G 10G 100Grecords number

2,653,108

26,532,571

265,325,821

cube cell number

1,836,162

10,190,922

41,892,286

4 nodes:• 2*Intel Xeon CPU E5-2630• 4*600G 15000r/s SAS • 256G RAM• 10Gb Network

Dimensions:1. "i_item_id", 2. "cd_gender", 3. "cd_marital_status", 4. "cd_education_status", 5. "p_channel_email", 6. "p_channel_event", 7. "d_year“Measures: 8. ss_quantity_avg,9. ss_list_price_avg, 10. ss_coupon_amt_avg, 11. ss_sales_price_avg

Page 18: MOLAP on Cloud

Build Cube for Query 7

1G

10G

100G

0 1000 2000 3000 4000 5000 6000

queryingaggregatingSaving

running time (seconds)

TPC-

DS

data

siz

e

• Partition by the largest Dimension(i_item_id)

• In-Memory aggregation• Saving stage can be

ignore(cache)

Page 19: MOLAP on Cloud

1 2 3 4 50

50

100

150

200

250

300

350

400

4 workers8 workers16 workers

iteration number

runn

ing

tim

e (s

econ

ds)

Execute Query 7First execution on the cube includes • Dimension loading• other cuboids construction • Caching• Mapping• Reducing

Sequential execution includes:• Mapping• Reducing

Page 20: MOLAP on Cloud

Hive Query for Fact Data select p_channel_email, p_channel_event, cd_gender, cd_marital_status, cd_education_status, i_item_id,d_year, ss_quantity, ss_list_price, ss_coupon_amt, ss_sales_price from store_sales

join date_dim on (store_sales.ss_sold_date_sk

= date_dim.d_date_sk) join item on (store_sales.ss_item_sk =

item.i_item_sk) join customer_demographics on

(store_sales.ss_cdemo_sk = custom-er_demographics.cd_demo_sk)

join promotion on (store_sales.ss_promo_sk = promotion.p_promo_sk)

Page 21: MOLAP on Cloud

Compare with Hive

1G 10G 100G0

200

400

600

800

1000

1200

1400

hiveprototype

1G 10G 100G0

200

400

600

800

1000

1200

1400

hiveprototype

First query time compare:2-3X

Sequential execution time:30-50X

Page 22: MOLAP on Cloud

Future work• Cube Model:

• Demand-driven & Data-driven • Cube Data:

• Model-driven & Requirement-driven• More experiments on TPC-DS

queries• Report, ad hoc, iterative, data mining,

• MDX/XMLA compliance

Page 23: MOLAP on Cloud

Thanks.

Page 24: MOLAP on Cloud

Storage for Example

Row Key

Column Family: default

Dimension A

Mask 000001 001000 001001001001 A1 A2 A3

Dimension B

Mask 000010 100000100010 B1 B2

Row Key Column Family: default000111 Mea_count Mea_sum

1 M1

011010 Mea_count Mea_sum1 M2

Table: Dimension

Table: Cuboid_ABC


Recommended