+ All Categories
Home > Documents > Incremental Maintenance for Non-Distributive Aggregate Functions

Incremental Maintenance for Non-Distributive Aggregate Functions

Date post: 01-Feb-2016
Category:
Upload: savea
View: 19 times
Download: 0 times
Share this document with a friend
Description:
Incremental Maintenance for Non-Distributive Aggregate Functions. Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh. work done at IBM Almaden Research Center. Motivation. large amounts of data stored in databases often times data warehouses are used - PowerPoint PPT Presentation
28
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh
Transcript
Page 1: Incremental Maintenance for  Non-Distributive Aggregate Functions

Incremental Maintenance for Non-Distributive Aggregate Functions

work done at IBM Almaden Research Center

Themis Palpanas (U of Toronto)Richard SidleBobbie CochraneHamid Pirahesh

Page 2: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

2

Motivation

large amounts of data stored in databases often times data warehouses are used

consolidate data from many sources offer more general and descriptive view of data

queried by business intelligence tools and decision support systems produce expensive OLAP queries

these OLAP queries have nice properties: based on same set of tables perform similar aggregations

Page 3: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

3

Motivation (cont’d)

can efficiently support such queries with Automatic Summary Tables (ASTs) materialized queries defined over a set of base tables

precomputed once, used many times answer complex queries fast

must maintain ASTs when base tables change inserts, updates, deletes

Page 4: Incremental Maintenance for  Non-Distributive Aggregate Functions

4

Motivation (cont’d)

basetables

AST

insert/update/delete

ASTdefinition

Page 5: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

5

Aggregate Functionscharacterization of functions wrt insertion and deletion operations updates are series of deletions and insertions

distributive aggregate functions new value computed based on old value and

value of operation SUM()

non-distributive aggregate functions above property does not hold

STDDEV() MIN() (because of deletions)

Page 6: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

6

Problem Statementgiven ASTs with aggregate functions distributive

SUM, COUNT non-distributive

STDDEV, CORRELATION, REGRESSION, MIN/MAX, XMLAGG, …

when base tables change incrementally maintain affected ASTs

efficient maintenance of ASTs with non-distributive aggregate functions

Page 7: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

7

Outline

Current Approach

Our Solution

Experimental Evaluation

Related Work

Conclusions

Page 8: Incremental Maintenance for  Non-Distributive Aggregate Functions

8

AST

Current Approach

basetablesinsert/update/delete

delta

combine old and

new values

ASTdefinition

Propagatephase

Applyphase

Page 9: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

9

Current Approach (cont’d)

works for distributive SUM, COUNT

does not work for non-distributive STDDEV, CORRELATION, REGRESSION MIN/MAX XMLAGG

need new way to deal with these functions

Page 10: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

10

Our Solution

selective recomputation no longer enough to compute delta must recompute some aggregation groups

minimize work to be done choose which groups to recompute optimize query plan

Page 11: Incremental Maintenance for  Non-Distributive Aggregate Functions

11

Our Solution (cont’d)

AST

basetablesinsert/update/delete

delta

recomputeaffectedgroups

combine old and

new valuesPropagatephase

ASTdefinition

Applyphase

Page 12: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

12

Our Solution (cont’d)

the 5 steps

1. compute new aggregate values2. change column derivation3. recompute only affected groups4. eliminate unnecessary operations5. optimize for special cases

Page 13: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

13

Initial Query Plan

prop

UDI

LOJ

AST

Query Graph Model (QGM)

Page 14: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

14

1. Compute New Aggregate Values

compute delta for distributive functionsrecompute non-distributive functionsget those values only for affected groups

duplicate computation for distributive functions!

prop

UDI

LOJ

AST

AST

LOJ

Page 15: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

15

2. Change Column Derivation

change column derivationrewrite phase projects out unused columns

entire AST gets recomputed!

prop

UDI

LOJ

AST

AST

LOJnon-distributive only

distributive only

Page 16: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

16

2. Change Column Derivation

example AST:

SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary)FROM employeesGROUP BY dept_id

result of COUNT() computed from old propagate phaseresults of MAX() and STDDEV() from AST definition

Page 17: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

17

3. Recompute Affected Groups

push join predicate down in AST

only affected groups are recomputed

special rules for super-aggregates

GROUPING SETS ROLLUP CUBE

prop

UDI

LOJ

AST

AST*

LOJ

non-distributive only

distributive

only

T1 Tk…

J J

Page 18: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

18

3. Recompute Affected Groups

special treatment for ASTs with super-aggregates predicates not pushdownable caution not to compute totals of totals

build special join predicate ensure correct aggregations

change rewrite rules allow predicate pushdown through super aggregates applicable only for special join predicate

Page 19: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

19

4. Remove Unnecessary Operations

outerjoin not always neededwhen changes are only inserts

reroute columns from propagate phase through AST

remove outerjoin operator

same for updates not referencing AST grouping columns and predicates

prop

UDI

LOJ

AST

AST

T1 Tk…

J J

all columns

distributive

only

Page 20: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

20

4. Remove Unnecessary Operations

example AST:SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary)FROM employeesGROUP BY dept_id

modification on base tables:UPDATE employees SET salary=10 WHERE age>40

outerjoin operation will not be built update does not refer to grouping column (dept_id),

and no predicate in AST refers to updated column (salary)

certain that no tuples in AST will be deletedonly STDDEV() will be recomputed

the rest are not affected by changes

Page 21: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

21

5. Optimize for Special Cases

recomputation step not needed when

only insertions and only MIN/MAX functions build predicate in apply phase check if new min/max should replace old values

only deletions referring only to grouping columns of AST can only cause entire groups to be deleted handled in apply phase

Page 22: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

22

5. Optimize for Special Casesexample AST:SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary)

FROM employees

GROUP BY dept_id

modification on base tables:DELETE FROM employees WHERE dept_id>40

selective recomputation step not needed deletion refers only to grouping column (dept_id) certain that entire groups will be deleted from AST no other groups will be affected

Page 23: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

23

Experimental Evaluationprototype implementation in IBM DB2 UDBstar schema database

sales of products over 5 year time period fact table: 10 million tuples

AST with non-distributive aggregate function 240,000 tuples

workload simulates nightly updates1. add/delete data for first day of month2. add/delete data for second day of month3. add/delete data for full month

Page 24: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

24

Experimental Evaluation (cont’d)

workload 1 workload 2 workload 3

incremental 286 294 420

full refresh 699 702 692

deletions require 40-60% of full refresh time

workload 1 workload 2 workload 3

incremental 3 n/a 31

full refresh 699 702 692

optimized deletions require 1-4% of full refresh time

Page 25: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

25

Experimental Evaluation (cont’d)

workload 1 workload 2 workload 3

incremental 151 158 180

full refresh 702 702 721

insertions/updates require 20-25% of full refresh time

Page 26: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

26

Related Work

incremental view maintenance

differential refresh algorithms Lindsay et al. 1986, Blakeley et al. 1986, Qian and

Wiederhold 1991, Ceri and Widom 1991

deferred incremental maintenance Colby et al. 1996, Salem et al. 2000

views with aggregation Quass 1996, Mumick et al. 1997

Page 27: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

27

Conclusions

incremental maintenance for ASTs with non-distributive aggregate functions support MIN/MAX, STDDEV, CORRELATION,

REGRESSION, XMLAGG, …

efficient selective recomputation recompute only affected groups optimize query plan customize for special cases

significant performance improvements

Page 28: Incremental Maintenance for  Non-Distributive Aggregate Functions

UoA, Sep 2002 Themis Palpanas - U of Toronto

28

Future Work

examine use of work areas temporary storage space store intermediate values maintenance without recomputation

STDDEV, MIN/MAX(?), … very helpful for ASTs defined with super-

aggregates

ASTs with HAVING clauses do not know when groups will enter/leave AST


Recommended