Incremental Maintenance for Non-Distributive Aggregate Functions
work done at IBM Almaden Research Center
Themis Palpanas (U of Toronto)Richard SidleBobbie CochraneHamid Pirahesh
UoA, Sep 2002 Themis Palpanas - U of Toronto
2
Motivation
large amounts of data stored in databases often times data warehouses are used
consolidate data from many sources offer more general and descriptive view of data
queried by business intelligence tools and decision support systems produce expensive OLAP queries
these OLAP queries have nice properties: based on same set of tables perform similar aggregations
UoA, Sep 2002 Themis Palpanas - U of Toronto
3
Motivation (cont’d)
can efficiently support such queries with Automatic Summary Tables (ASTs) materialized queries defined over a set of base tables
precomputed once, used many times answer complex queries fast
must maintain ASTs when base tables change inserts, updates, deletes
4
Motivation (cont’d)
basetables
AST
insert/update/delete
ASTdefinition
UoA, Sep 2002 Themis Palpanas - U of Toronto
5
Aggregate Functionscharacterization of functions wrt insertion and deletion operations updates are series of deletions and insertions
distributive aggregate functions new value computed based on old value and
value of operation SUM()
non-distributive aggregate functions above property does not hold
STDDEV() MIN() (because of deletions)
UoA, Sep 2002 Themis Palpanas - U of Toronto
6
Problem Statementgiven ASTs with aggregate functions distributive
SUM, COUNT non-distributive
STDDEV, CORRELATION, REGRESSION, MIN/MAX, XMLAGG, …
when base tables change incrementally maintain affected ASTs
efficient maintenance of ASTs with non-distributive aggregate functions
UoA, Sep 2002 Themis Palpanas - U of Toronto
7
Outline
Current Approach
Our Solution
Experimental Evaluation
Related Work
Conclusions
8
AST
Current Approach
basetablesinsert/update/delete
delta
combine old and
new values
ASTdefinition
Propagatephase
Applyphase
UoA, Sep 2002 Themis Palpanas - U of Toronto
9
Current Approach (cont’d)
works for distributive SUM, COUNT
does not work for non-distributive STDDEV, CORRELATION, REGRESSION MIN/MAX XMLAGG
need new way to deal with these functions
UoA, Sep 2002 Themis Palpanas - U of Toronto
10
Our Solution
selective recomputation no longer enough to compute delta must recompute some aggregation groups
minimize work to be done choose which groups to recompute optimize query plan
11
Our Solution (cont’d)
AST
basetablesinsert/update/delete
delta
recomputeaffectedgroups
combine old and
new valuesPropagatephase
ASTdefinition
Applyphase
UoA, Sep 2002 Themis Palpanas - U of Toronto
12
Our Solution (cont’d)
the 5 steps
1. compute new aggregate values2. change column derivation3. recompute only affected groups4. eliminate unnecessary operations5. optimize for special cases
UoA, Sep 2002 Themis Palpanas - U of Toronto
13
Initial Query Plan
prop
UDI
LOJ
AST
Query Graph Model (QGM)
UoA, Sep 2002 Themis Palpanas - U of Toronto
14
1. Compute New Aggregate Values
compute delta for distributive functionsrecompute non-distributive functionsget those values only for affected groups
duplicate computation for distributive functions!
prop
UDI
LOJ
AST
AST
LOJ
UoA, Sep 2002 Themis Palpanas - U of Toronto
15
2. Change Column Derivation
change column derivationrewrite phase projects out unused columns
entire AST gets recomputed!
prop
UDI
LOJ
AST
AST
LOJnon-distributive only
distributive only
UoA, Sep 2002 Themis Palpanas - U of Toronto
16
2. Change Column Derivation
example AST:
SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary)FROM employeesGROUP BY dept_id
result of COUNT() computed from old propagate phaseresults of MAX() and STDDEV() from AST definition
UoA, Sep 2002 Themis Palpanas - U of Toronto
17
3. Recompute Affected Groups
push join predicate down in AST
only affected groups are recomputed
special rules for super-aggregates
GROUPING SETS ROLLUP CUBE
prop
UDI
LOJ
AST
AST*
LOJ
non-distributive only
distributive
only
T1 Tk…
J J
UoA, Sep 2002 Themis Palpanas - U of Toronto
18
3. Recompute Affected Groups
special treatment for ASTs with super-aggregates predicates not pushdownable caution not to compute totals of totals
build special join predicate ensure correct aggregations
change rewrite rules allow predicate pushdown through super aggregates applicable only for special join predicate
UoA, Sep 2002 Themis Palpanas - U of Toronto
19
4. Remove Unnecessary Operations
outerjoin not always neededwhen changes are only inserts
reroute columns from propagate phase through AST
remove outerjoin operator
same for updates not referencing AST grouping columns and predicates
prop
UDI
LOJ
AST
AST
T1 Tk…
J J
all columns
distributive
only
UoA, Sep 2002 Themis Palpanas - U of Toronto
20
4. Remove Unnecessary Operations
example AST:SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary)FROM employeesGROUP BY dept_id
modification on base tables:UPDATE employees SET salary=10 WHERE age>40
outerjoin operation will not be built update does not refer to grouping column (dept_id),
and no predicate in AST refers to updated column (salary)
certain that no tuples in AST will be deletedonly STDDEV() will be recomputed
the rest are not affected by changes
UoA, Sep 2002 Themis Palpanas - U of Toronto
21
5. Optimize for Special Cases
recomputation step not needed when
only insertions and only MIN/MAX functions build predicate in apply phase check if new min/max should replace old values
only deletions referring only to grouping columns of AST can only cause entire groups to be deleted handled in apply phase
UoA, Sep 2002 Themis Palpanas - U of Toronto
22
5. Optimize for Special Casesexample AST:SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary)
FROM employees
GROUP BY dept_id
modification on base tables:DELETE FROM employees WHERE dept_id>40
selective recomputation step not needed deletion refers only to grouping column (dept_id) certain that entire groups will be deleted from AST no other groups will be affected
UoA, Sep 2002 Themis Palpanas - U of Toronto
23
Experimental Evaluationprototype implementation in IBM DB2 UDBstar schema database
sales of products over 5 year time period fact table: 10 million tuples
AST with non-distributive aggregate function 240,000 tuples
workload simulates nightly updates1. add/delete data for first day of month2. add/delete data for second day of month3. add/delete data for full month
UoA, Sep 2002 Themis Palpanas - U of Toronto
24
Experimental Evaluation (cont’d)
workload 1 workload 2 workload 3
incremental 286 294 420
full refresh 699 702 692
deletions require 40-60% of full refresh time
workload 1 workload 2 workload 3
incremental 3 n/a 31
full refresh 699 702 692
optimized deletions require 1-4% of full refresh time
UoA, Sep 2002 Themis Palpanas - U of Toronto
25
Experimental Evaluation (cont’d)
workload 1 workload 2 workload 3
incremental 151 158 180
full refresh 702 702 721
insertions/updates require 20-25% of full refresh time
UoA, Sep 2002 Themis Palpanas - U of Toronto
26
Related Work
incremental view maintenance
differential refresh algorithms Lindsay et al. 1986, Blakeley et al. 1986, Qian and
Wiederhold 1991, Ceri and Widom 1991
deferred incremental maintenance Colby et al. 1996, Salem et al. 2000
views with aggregation Quass 1996, Mumick et al. 1997
UoA, Sep 2002 Themis Palpanas - U of Toronto
27
Conclusions
incremental maintenance for ASTs with non-distributive aggregate functions support MIN/MAX, STDDEV, CORRELATION,
REGRESSION, XMLAGG, …
efficient selective recomputation recompute only affected groups optimize query plan customize for special cases
significant performance improvements
UoA, Sep 2002 Themis Palpanas - U of Toronto
28
Future Work
examine use of work areas temporary storage space store intermediate values maintenance without recomputation
STDDEV, MIN/MAX(?), … very helpful for ASTs defined with super-
aggregates
ASTs with HAVING clauses do not know when groups will enter/leave AST