1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang...

1

Using Tiling to Scale Parallel Datacube Implementation

Ruoming JinKarthik Vaidyanathan

Ge Yang Gagan Agrawal

The Ohio State University

2

Introduction to Data Cube Construction

Data cube construction involves computing aggregates for all values across all possible subsets of dimensions.

If the original dataset is n dimensional, the data cube construction includes computing and storing nCm m-dimensional arrays.

Three-dimensional data cube construction involves computing arrays AB, AC, BC, A, B, C and a scalar value all.

Part I

3

Motivation• Datasets for off-line processing are becoming larger.

– A system storing and allowing analysis on such datasets is a data warehouse.

• Frequent queries on data warehouses require aggregation along one or more dimensions.– Data cube construction performs all aggregations in

advance to facilitate fast responses to all queries.

• Data cube construction is a compute and data-intensive problem.– Memory requirements become the bottleneck for

sequential algorithms.

Part I

Construct data cubes in parallel in cluster environments!

4

Our Earlier Work

• Parallel Algorithms for Small Dimensional Cases and Use of a Cluster Middleware (CCGRID 2002, FGCS 2003)

• Parallel algorithms and theoretical results (ICPP 2003, HiPC 2003)

• Evaluating parallel algorithms (IPDPS 2003)

5

Using Tiling

• One important issue: memory requirements for intermediate results – From a Sparse m dimensional array, we compute m m-

1 dimensional dense arrays

• Tiling can help scale sequential and parallel datacube algorithms

• Two important issues: – Algorithms for Using Tiling

– How to tile so as to have minimum overhead

6

Outline

• Main Issues and Data Structures

• Parallel algorithms without tiling

• Tiling for Sequential Datacube construction

• Theoretical analysis

• Tiling for Parallel Datacube construction

• Experimental evaluation

8

Main Issues

• Cache and Memory Reuse– Each portion of the parent array is read only once to

compute its children. Corresponding portions of each child should be updated simultaneously.

• Using Minimal Parents– If a child has more than one parent, it uses the

minimal parent which requires less computation to obtain the child.

• Memory Management– Write back the output array to the disk if there is no

child which is computed from this array.

– Manage available main memory effectively

• Communication Volume– Appropriately partition along one or more dimensions

to guarantee minimal communication volume.

Part I

9

Aggregation Tree

Given a set X = {1, 2, …, n} and a prefix tree P(n), the corresponding aggregation tree A(n) is constructed by complementing every node in P(n) with respect to X.

Part III

Prefix lattice Prefix tree Aggregation tree

10

Theoretical Results

• For data cube construction using aggregation tree– The total memory requirement for holding the results is

bounded.

– The total communication volume is bounded.

– It is guranteed that all arrays are computed from their minimal parents.

– A procedure of partitioning input datasets exists for minimizing interprocessor communication.

Part III

11

Level One Parallel Algorithm

Main ideas

• Each processor computes a portion of each child at the first level.

• Lead processors have the final results after interprocessor communication.

• If the output is not used to compute other children, write it back; otherwise compute children on lead processors.

Part III

12

Example

• Assumption– 8 processors

– Each of the three dimensions is partitioned in half

• Initially– Each processor computes

partial results for each of D1D2, D1D3 and D2D3

D1D2D3

D2D3 D1D3 D1D2

D3 D2 D1

all

Three-dimensional array D1D2D3 with |D1| |D2| |D3|

Part III

13

Example (cont.)

• Lead processors for D1D2

(l(l11, l, l22, 0), 0) (l1, l2, 1)

(0, 0, 0)(0, 0, 0) (0, 0, 1)

(0, 1, 0)(0, 1, 0) (0, 1, 1)

(1, 0, 0)(1, 0, 0) (1, 0, 1)

(1, 1, 0)(1, 1, 0) (1, 1, 1)

• Write back D1D2 on lead processors

D1D2D3

D2D3 D1D3 D1D2

D3 D2 D1

all


Part III

14

Example (cont.)• Lead processors for D1D3

(l(l11, 0, l, 0, l33)) (l1, 1, l3)

(0, 0, 0)(0, 0, 0) (0, 1, 0)

(0, 0, 1)(0, 0, 1) (0, 1, 1)

(1, 0, 0)(1, 0, 0) (1, 1, 0)

(1, 0, 1)(1, 0, 1) (1, 1, 1)

• Compute D1 from D1D3 on lead processors; write back D1D3 on lead

processors • Lead processors for D1

(l(l11, 0, 0), 0, 0) (l1, 0, 1)

(0, 0, 0)(0, 0, 0) (0, 0, 1)

(1, 0, 0)(1, 0, 0) (1, 0, 1)

• Write back D1 on lead processors

D1D2D3

D2D3 D1D3 D1D2

D3 D2 D1

all


Part III

15

Tiling-based Approach

• Motivation– Parallel machines are not always available

– Memory of individual computer is limited

• Tiling-based Approaches– Sequential: Tile along dimensions on one processor

– Parallel: Partition among processors and on each processor tile along dimensions

Part IV

16

Sequential Tiling-based Algorithm• Main Idea

A portion of a node in aggregation tree is expandable (can be used to compute its children) once enough tiles of the portion of this node have been processed.

• Main MechanismEach tile is given a label

D1D2D3

D1D2 D1D3 D2D3

D1 D2 D3

all


4 tiles, tile along D2, D3.Each tile is given a lable (0, l2, l3)Tile 0 – (0, 0, 0)Tile 1 – (0, 0, 1)Tile 2 – (0, 1, 0)Tile 3 – (0, 1, 1)

Part IV

17

Example

D1D2D3

D1D2 D1D3 D2D3

D1 D2 D3

all


D2D3 D1D3 D1D2

Tile

(0 0 0)done Portion 0 Portion 0

Tile

(0 0 1) done Portion 1Portion 0

Merge & expand

Tile

(0 1 0) donePortion 0

Merge & expand

Portion 1

Tile

(0 1 1) donePortion 1

Merge & expand

Portion 1

Merge & expand

Part IV

18

Tiling Overhead • Tiling based algorithm requires writing

back and rereading portions of results

• Want to tile to minimize the overhead

• Tile the dimension Di 2ki times

• We can compute the total tiling overhead as

19

Minimizing Tiling Overhead

• Tile the largest dimension first, change its effective size

• Keep choosing the largest dimension, till the memory requirements are below the available memory

20

Parallel Tiling-based Algorithm

• Assumptions– Three-dimensional

partition (0 1 1 1)

– Two-dimensional tiling (0 0 1 1)

D1D2D3D4

D1D2D3 D1D2D4 D1D3D4 D2D3D4

D1D2 D1D3 D2D3 D1D4 D2D4 D3D4

D1 D2 D3 D4

all

Four-dimensional aggregation tree with|D1| |D2| |D3| |D4|

Part IV

• Solutions–Apply tiling-based approaches to first level nodes only

–Apply Level One Parallel Algorithm to other nodes

21

Choosing Tiling Parameters

0

50

100

150200

250

300

350

400

Time (s)

25 5

Sparsity Level (percent)

128^4 Dataset, 1 Processor, 8 Tiles

Sequential Algorithmw/o TilingThree-dimensionalTilingTwo-dimensional Tiling

One-dimensional Tiling

Tiling overhead exists.

Tiling along multiple dimensions can reduce tiling overhead.

Part IV

22

Parallel Tiling-based Algorithm Results

0

10

20

30

40

50

60

70

Time (s)

25 5

Sparsity Level (percent)

128^4 Dataset, 8 Processors, Three-dimensional Partition

Tiling Parameter (1 0 0 1)Tiling Parameter (0 0 1 1)Tiling Parameter (0 0 0 2)

Algorithm of choosing tiling parameters to reduce tiling overhead still takes effect in parallel environments!

Part IV

23

More data goes here

24

Conclusions

• Tiling can help scale parallel datacube construction

• Algorithms and analytical results in our work

Date post:	05-Jan-2016
Category:	Documents
Upload:	lee-hart
View:	212 times
Download:	0 times

1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang...

Documents