+ All Categories
Home > Documents > Wavelet Synopses with Predefined Error Bounds: Windfalls of Duality Panagiotis Karras DB seminar, 23...

Wavelet Synopses with Predefined Error Bounds: Windfalls of Duality Panagiotis Karras DB seminar, 23...

Date post: 02-Jan-2016
Category:
Upload: bruno-cain
View: 214 times
Download: 0 times
Share this document with a friend
20
Wavelet Synopses with Predefined Wavelet Synopses with Predefined Error Bounds: Error Bounds: Windfalls of Duality Windfalls of Duality Panagiotis Karras Panagiotis Karras DB seminar, 23 March, 2006
Transcript

Wavelet Synopses with Predefined Error Wavelet Synopses with Predefined Error Bounds:Bounds:

Windfalls of DualityWindfalls of Duality

Panagiotis KarrasPanagiotis KarrasDB seminar, 23 March, 2006

Algorithms for Maximum-Error Wavelet Algorithms for Maximum-Error Wavelet SynopsesSynopses

Restricted

Space-Bounded

Direct

GK04,05

Guha05

Unrestricted

GH05,06

Error-Bounded

Muthukrishnan05

Indirect?

Compact Data Synopses useful in:Compact Data Synopses useful in:

• Approximate Query Processing (exact answers not always required)

• Learning, Classification, Event Detection• Data Mining, Selectivity Estimation• Situations where massive data arrives in a

stream

34 16 2 20 20 0 36 16

0

18

7 -8

9 -9 1010 25 11 10 26

Haar Wavelets Haar Wavelets

18 18

• Wavelet transform:: orthogonal transform for the hierarchical representation of functions and signals

• Haar wavelets:: simplest wavelet system, easy to understand and implement

• Haar tree: structure for the visualization of decomposition and value reconstructions

• Synopsis: Wavelet representation with few non-zero terms.

Maximum-Error Metrics Maximum-Error Metrics

• Error Metrics providing tight error guarantees for all reconstructed values:

– Maximum Absolute Error

– Maximum Relative Error with Sanity Bound (to avoid domination by small data values)

• Aim at minimization of these metrics

}|,max{|

|ˆ|max

sd

dd

i

iii

|ˆ|max iii dd

Restricted Synopses Restricted Synopses

• Compute Haar wavelet decomposition of D• Preserve best coefficient subset that satisfies

bound

• Space-Bounded ProblemSpace-Bounded Problem: [GK04,05,Guha05]

Bound B on number of non-zero coefficients

• Error-Bounded ProblemError-Bounded Problem: [Muthukrishnan05]

Bound ε on maximum error Faster Indirect solution to Space-Bounded

Problem

How does it work?How does it work?• Space-Bounded Problem GK04,05: Global Tabulation

iR

iL

R

L

cSbbi

cSbi

Sbbi

Sbi

Sbi

bb

bb

,1,E

,,,Emaxmin

,,E

,,,Emaxmin

min,,E

10

0

Guha05: Local Tabulation

- Tabulate four one-dimensional arrays:

iSiL ,*,E iSiR ,*,E SiL ,*,E SiR ,*,E

- Extract from these four, delete them - At most arrays concurrently stored - Derive solution at the top, solve the problem again below time, space

Si,*,E

nO log

iL iR

S = subset of selected ancestorsroo

t

i+ -

2nO nO

How does it work?How does it work?

• Error-Bounded Problem Muthukrishnan05

1,M,M

,M,Mmin,M

iRiL

RL

cSicSi

SiSiSi

iL iR

+

root

S = subset of selected ancestors

i

root

-

- At levels from bottom stop recursion, enter local search- time, space

nloglog

n

nOlog

2 nO

• No need to tabulate• The solution to this problem is more economic• Dual Space-Bounded solved Indirectly via binary

search

Unrestricted Synopses Unrestricted Synopses [GH05,06]

• Forget about actual coefficient values

• Choose a best set of non-zero wavelet terms of any values

• In practice:

Examined values are multiples of resolution step δ

zvbbi

zvbi

vbbi

vbi

vbi

R

L

R

L

bbz

bb

,1,E

,,,Emaxmin

,,E

,,,Emaxmin

min,,E

10,

0

Unrestricted Synopses Unrestricted Synopses [GH05,06]

• Approximation quality better than restricted

• Time asymptotically linear to n

But:

- Examined values bounded by M [GH05]

- Multiple Guesses of error result [GH06]

- Space-Bounded Problem:

Two-Dimensional Tabulation E(b,v) on each tree node

→ High Running Time and Space demands

Our Approach:Our Approach:Wavelet Synopses with Predefined Error Wavelet Synopses with Predefined Error BoundsBounds• Error-Bounded ProblemError-Bounded Problem DP algorithm: - Demarcates examined values using error bound ε - Tabulates only S(v), one dimension per node

• Space-Bounded ProblemSpace-Bounded Problem Enhanced Solution: - Calculate upper bound for error, use it to bound

values

Indirect Solution: - Use binary search on Error-Bounded problem

How does it work?How does it work?• One-dimensional tabulation on values only

0,S,Smin,S zzvizvivi RLz

• Examined incoming values v bounded by error bound

vvi

• Examined assigned values z also bounded

vvzz ivii

• Strong version of problem: minimize error within space

ComplexityComplexity• Error-Bounded Problem:

Time or

Space or

• Space-Bounded Problem:

Time vs.

Space vs.

nO2

MM B

nBO log

nnO log

nnO log2

nnE

o log2

nnE

o log

BnE

O 22

log

nB

nB

EO log

Experiments: Error-Bounded Experiments: Error-Bounded ProblemProblem

Experiments: Error-Bounded Experiments: Error-Bounded ProblemProblem

Experiments: Space-Bounded Experiments: Space-Bounded ProblemProblem

Experiments : Space-Bounded Experiments : Space-Bounded ProblemProblem

Experiments : Space-Bounded Experiments : Space-Bounded ProblemProblem

Related WorkRelated Work• M. Garofalakis and A. Kumar. Deterministic

wavelet thresholding for maximum-error metrics. PODS 2004

• S. Guha. Space efficiency in synopsis construction algorithms. VLDB 2005

• S. Guha and B. Harb. Wavelet Synopses for Data Streams: Minimizing Non-Euclidean Error. KDD 2005

• S. Muthukrishnan. Subquadratic algorithms for workload-aware haar wavelet synopses. FSTTCS 2005

• S. Guha and B. Harb. Approxmation algorithms for wavelet transform coding of data streams. SODA 2006

Thank you! Questions?Thank you! Questions?


Recommended