Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | bruno-cain |
View: | 214 times |
Download: | 0 times |
Wavelet Synopses with Predefined Error Wavelet Synopses with Predefined Error Bounds:Bounds:
Windfalls of DualityWindfalls of Duality
Panagiotis KarrasPanagiotis KarrasDB seminar, 23 March, 2006
Algorithms for Maximum-Error Wavelet Algorithms for Maximum-Error Wavelet SynopsesSynopses
Restricted
Space-Bounded
Direct
GK04,05
Guha05
Unrestricted
GH05,06
Error-Bounded
Muthukrishnan05
Indirect?
Compact Data Synopses useful in:Compact Data Synopses useful in:
• Approximate Query Processing (exact answers not always required)
• Learning, Classification, Event Detection• Data Mining, Selectivity Estimation• Situations where massive data arrives in a
stream
34 16 2 20 20 0 36 16
0
18
7 -8
9 -9 1010 25 11 10 26
Haar Wavelets Haar Wavelets
18 18
• Wavelet transform:: orthogonal transform for the hierarchical representation of functions and signals
• Haar wavelets:: simplest wavelet system, easy to understand and implement
• Haar tree: structure for the visualization of decomposition and value reconstructions
• Synopsis: Wavelet representation with few non-zero terms.
Maximum-Error Metrics Maximum-Error Metrics
• Error Metrics providing tight error guarantees for all reconstructed values:
– Maximum Absolute Error
– Maximum Relative Error with Sanity Bound (to avoid domination by small data values)
• Aim at minimization of these metrics
}|,max{|
|ˆ|max
sd
dd
i
iii
|ˆ|max iii dd
Restricted Synopses Restricted Synopses
• Compute Haar wavelet decomposition of D• Preserve best coefficient subset that satisfies
bound
• Space-Bounded ProblemSpace-Bounded Problem: [GK04,05,Guha05]
Bound B on number of non-zero coefficients
• Error-Bounded ProblemError-Bounded Problem: [Muthukrishnan05]
Bound ε on maximum error Faster Indirect solution to Space-Bounded
Problem
How does it work?How does it work?• Space-Bounded Problem GK04,05: Global Tabulation
iR
iL
R
L
cSbbi
cSbi
Sbbi
Sbi
Sbi
bb
bb
,1,E
,,,Emaxmin
,,E
,,,Emaxmin
min,,E
10
0
Guha05: Local Tabulation
- Tabulate four one-dimensional arrays:
iSiL ,*,E iSiR ,*,E SiL ,*,E SiR ,*,E
- Extract from these four, delete them - At most arrays concurrently stored - Derive solution at the top, solve the problem again below time, space
Si,*,E
nO log
iL iR
S = subset of selected ancestorsroo
t
i+ -
2nO nO
How does it work?How does it work?
• Error-Bounded Problem Muthukrishnan05
1,M,M
,M,Mmin,M
iRiL
RL
cSicSi
SiSiSi
iL iR
+
root
S = subset of selected ancestors
i
root
-
- At levels from bottom stop recursion, enter local search- time, space
nloglog
n
nOlog
2 nO
• No need to tabulate• The solution to this problem is more economic• Dual Space-Bounded solved Indirectly via binary
search
Unrestricted Synopses Unrestricted Synopses [GH05,06]
• Forget about actual coefficient values
• Choose a best set of non-zero wavelet terms of any values
• In practice:
Examined values are multiples of resolution step δ
zvbbi
zvbi
vbbi
vbi
vbi
R
L
R
L
bbz
bb
,1,E
,,,Emaxmin
,,E
,,,Emaxmin
min,,E
10,
0
Unrestricted Synopses Unrestricted Synopses [GH05,06]
• Approximation quality better than restricted
• Time asymptotically linear to n
But:
- Examined values bounded by M [GH05]
- Multiple Guesses of error result [GH06]
- Space-Bounded Problem:
Two-Dimensional Tabulation E(b,v) on each tree node
→ High Running Time and Space demands
Our Approach:Our Approach:Wavelet Synopses with Predefined Error Wavelet Synopses with Predefined Error BoundsBounds• Error-Bounded ProblemError-Bounded Problem DP algorithm: - Demarcates examined values using error bound ε - Tabulates only S(v), one dimension per node
• Space-Bounded ProblemSpace-Bounded Problem Enhanced Solution: - Calculate upper bound for error, use it to bound
values
Indirect Solution: - Use binary search on Error-Bounded problem
How does it work?How does it work?• One-dimensional tabulation on values only
0,S,Smin,S zzvizvivi RLz
• Examined incoming values v bounded by error bound
vvi
• Examined assigned values z also bounded
vvzz ivii
• Strong version of problem: minimize error within space
ComplexityComplexity• Error-Bounded Problem:
Time or
Space or
• Space-Bounded Problem:
Time vs.
Space vs.
nO2
MM B
nBO log
nnO log
nnO log2
nnE
o log2
nnE
o log
BnE
O 22
log
nB
nB
EO log
Related WorkRelated Work• M. Garofalakis and A. Kumar. Deterministic
wavelet thresholding for maximum-error metrics. PODS 2004
• S. Guha. Space efficiency in synopsis construction algorithms. VLDB 2005
• S. Guha and B. Harb. Wavelet Synopses for Data Streams: Minimizing Non-Euclidean Error. KDD 2005
• S. Muthukrishnan. Subquadratic algorithms for workload-aware haar wavelet synopses. FSTTCS 2005
• S. Guha and B. Harb. Approxmation algorithms for wavelet transform coding of data streams. SODA 2006