Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 215 times |
Download: | 0 times |
1
Wavelet synopseswith Error Guarantees
Minos Garofalakis Phillip B Gibbons1048576Information Sciences Research Center Bell Labs Lucent Technologies Murray Hill NJ 07974
ACM SIGMOD 2002
2
Outline
Introduction Wavelet basics Probabilistic wavelet synopses Experimental study Conclusions
3
Introduction The wavelet decomposition has demonstrated
the effectiveness in reducing large amounts of data to compact sets of wavelet coefficients (termed ldquowavelet synopsesrdquo) that can be used to provide fast and reasonably accurate approximate answers to queries
Due to exploratory nature of many Decision Support Systems applications there are a number of scenarios in which the user may prefer a fast approximate answer
4
Introduction A major criticism of wavelet-based
techniques is the fact that conventional wavelet synopses can not provide guarantees on the error of individual approximate query answers
5
Introduction The problem for approximate query
processing with wavelet synopses due to their deterministic approach to selecting coefficients and their lack of error guarantees
We propose a approach to building wavelet synopses that enables unbiased approximate query answers with error guarantees on the accuracy of individual answers
6
Introduction The technique is based on probabilistic thre
sholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis
7
Wavelet basics Given the data vector A the wavelet
transform of A can be computed as follow
In order equalize the importance of all wavelet coefficients we normalize the coefficient is
8
Wavelet basics A helpful tool for exploring and
understanding the key properties of the wavelet decomposition is error tree structure
9
Wavelet basics The important reconstruction properties
(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)
(P2)The range sum d(lh)=
10
Wavelet basics
d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93
11
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error
12
Probabilistic wavelet synopsesAThe problem with conventional wavelets
d5=65-0+0-0=65 d(35)=365-0-0+0-0=195
13
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi
cients without compensating for their loss
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
2
Outline
Introduction Wavelet basics Probabilistic wavelet synopses Experimental study Conclusions
3
Introduction The wavelet decomposition has demonstrated
the effectiveness in reducing large amounts of data to compact sets of wavelet coefficients (termed ldquowavelet synopsesrdquo) that can be used to provide fast and reasonably accurate approximate answers to queries
Due to exploratory nature of many Decision Support Systems applications there are a number of scenarios in which the user may prefer a fast approximate answer
4
Introduction A major criticism of wavelet-based
techniques is the fact that conventional wavelet synopses can not provide guarantees on the error of individual approximate query answers
5
Introduction The problem for approximate query
processing with wavelet synopses due to their deterministic approach to selecting coefficients and their lack of error guarantees
We propose a approach to building wavelet synopses that enables unbiased approximate query answers with error guarantees on the accuracy of individual answers
6
Introduction The technique is based on probabilistic thre
sholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis
7
Wavelet basics Given the data vector A the wavelet
transform of A can be computed as follow
In order equalize the importance of all wavelet coefficients we normalize the coefficient is
8
Wavelet basics A helpful tool for exploring and
understanding the key properties of the wavelet decomposition is error tree structure
9
Wavelet basics The important reconstruction properties
(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)
(P2)The range sum d(lh)=
10
Wavelet basics
d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93
11
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error
12
Probabilistic wavelet synopsesAThe problem with conventional wavelets
d5=65-0+0-0=65 d(35)=365-0-0+0-0=195
13
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi
cients without compensating for their loss
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
3
Introduction The wavelet decomposition has demonstrated
the effectiveness in reducing large amounts of data to compact sets of wavelet coefficients (termed ldquowavelet synopsesrdquo) that can be used to provide fast and reasonably accurate approximate answers to queries
Due to exploratory nature of many Decision Support Systems applications there are a number of scenarios in which the user may prefer a fast approximate answer
4
Introduction A major criticism of wavelet-based
techniques is the fact that conventional wavelet synopses can not provide guarantees on the error of individual approximate query answers
5
Introduction The problem for approximate query
processing with wavelet synopses due to their deterministic approach to selecting coefficients and their lack of error guarantees
We propose a approach to building wavelet synopses that enables unbiased approximate query answers with error guarantees on the accuracy of individual answers
6
Introduction The technique is based on probabilistic thre
sholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis
7
Wavelet basics Given the data vector A the wavelet
transform of A can be computed as follow
In order equalize the importance of all wavelet coefficients we normalize the coefficient is
8
Wavelet basics A helpful tool for exploring and
understanding the key properties of the wavelet decomposition is error tree structure
9
Wavelet basics The important reconstruction properties
(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)
(P2)The range sum d(lh)=
10
Wavelet basics
d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93
11
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error
12
Probabilistic wavelet synopsesAThe problem with conventional wavelets
d5=65-0+0-0=65 d(35)=365-0-0+0-0=195
13
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi
cients without compensating for their loss
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
4
Introduction A major criticism of wavelet-based
techniques is the fact that conventional wavelet synopses can not provide guarantees on the error of individual approximate query answers
5
Introduction The problem for approximate query
processing with wavelet synopses due to their deterministic approach to selecting coefficients and their lack of error guarantees
We propose a approach to building wavelet synopses that enables unbiased approximate query answers with error guarantees on the accuracy of individual answers
6
Introduction The technique is based on probabilistic thre
sholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis
7
Wavelet basics Given the data vector A the wavelet
transform of A can be computed as follow
In order equalize the importance of all wavelet coefficients we normalize the coefficient is
8
Wavelet basics A helpful tool for exploring and
understanding the key properties of the wavelet decomposition is error tree structure
9
Wavelet basics The important reconstruction properties
(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)
(P2)The range sum d(lh)=
10
Wavelet basics
d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93
11
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error
12
Probabilistic wavelet synopsesAThe problem with conventional wavelets
d5=65-0+0-0=65 d(35)=365-0-0+0-0=195
13
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi
cients without compensating for their loss
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
5
Introduction The problem for approximate query
processing with wavelet synopses due to their deterministic approach to selecting coefficients and their lack of error guarantees
We propose a approach to building wavelet synopses that enables unbiased approximate query answers with error guarantees on the accuracy of individual answers
6
Introduction The technique is based on probabilistic thre
sholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis
7
Wavelet basics Given the data vector A the wavelet
transform of A can be computed as follow
In order equalize the importance of all wavelet coefficients we normalize the coefficient is
8
Wavelet basics A helpful tool for exploring and
understanding the key properties of the wavelet decomposition is error tree structure
9
Wavelet basics The important reconstruction properties
(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)
(P2)The range sum d(lh)=
10
Wavelet basics
d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93
11
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error
12
Probabilistic wavelet synopsesAThe problem with conventional wavelets
d5=65-0+0-0=65 d(35)=365-0-0+0-0=195
13
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi
cients without compensating for their loss
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
6
Introduction The technique is based on probabilistic thre
sholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis
7
Wavelet basics Given the data vector A the wavelet
transform of A can be computed as follow
In order equalize the importance of all wavelet coefficients we normalize the coefficient is
8
Wavelet basics A helpful tool for exploring and
understanding the key properties of the wavelet decomposition is error tree structure
9
Wavelet basics The important reconstruction properties
(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)
(P2)The range sum d(lh)=
10
Wavelet basics
d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93
11
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error
12
Probabilistic wavelet synopsesAThe problem with conventional wavelets
d5=65-0+0-0=65 d(35)=365-0-0+0-0=195
13
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi
cients without compensating for their loss
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
7
Wavelet basics Given the data vector A the wavelet
transform of A can be computed as follow
In order equalize the importance of all wavelet coefficients we normalize the coefficient is
8
Wavelet basics A helpful tool for exploring and
understanding the key properties of the wavelet decomposition is error tree structure
9
Wavelet basics The important reconstruction properties
(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)
(P2)The range sum d(lh)=
10
Wavelet basics
d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93
11
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error
12
Probabilistic wavelet synopsesAThe problem with conventional wavelets
d5=65-0+0-0=65 d(35)=365-0-0+0-0=195
13
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi
cients without compensating for their loss
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
8
Wavelet basics A helpful tool for exploring and
understanding the key properties of the wavelet decomposition is error tree structure
9
Wavelet basics The important reconstruction properties
(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)
(P2)The range sum d(lh)=
10
Wavelet basics
d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93
11
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error
12
Probabilistic wavelet synopsesAThe problem with conventional wavelets
d5=65-0+0-0=65 d(35)=365-0-0+0-0=195
13
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi
cients without compensating for their loss
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
9
Wavelet basics The important reconstruction properties
(P1)The reconstruction of any data value di depends on the values of the nodes in path(di)
(P2)The range sum d(lh)=
10
Wavelet basics
d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93
11
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error
12
Probabilistic wavelet synopsesAThe problem with conventional wavelets
d5=65-0+0-0=65 d(35)=365-0-0+0-0=195
13
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi
cients without compensating for their loss
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
10
Wavelet basics
d5=c0-c2+c5-c10=65-14+(-20)-28=3 d(35)=3c0+(1-2)c2-c4+2c5-c9+(1-1)c10=93
11
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error
12
Probabilistic wavelet synopsesAThe problem with conventional wavelets
d5=65-0+0-0=65 d(35)=365-0-0+0-0=195
13
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi
cients without compensating for their loss
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
11
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization this deterministic process minimizes the overall L2 error
12
Probabilistic wavelet synopsesAThe problem with conventional wavelets
d5=65-0+0-0=65 d(35)=365-0-0+0-0=195
13
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi
cients without compensating for their loss
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
12
Probabilistic wavelet synopsesAThe problem with conventional wavelets
d5=65-0+0-0=65 d(35)=365-0-0+0-0=195
13
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi
cients without compensating for their loss
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
13
Probabilistic wavelet synopsesAThe problem with conventional wavelets
Root causes (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coeffi
cients without compensating for their loss
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
14
Probabilistic wavelet synopses BGeneral Approach
Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero
By carefully selecting the rounding values we ensure that (1)We expect a total of B coefficients to be
retained (2)We minimize a desired error metric in the
reconstruction of the data
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
15
Probabilistic wavelet synopses BGeneral Approach
The key idea in thresholding scheme is to associate a random variable Ci such that (1)Ci=0 with some probability (2)E[Ci] = ci
where we select a rounding value λi for each non-zero ci such that
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
16
Probabilistic wavelet synopses BGeneral Approach
Our thresholding scheme essentially ldquoroundsrdquo each non-zero wavelet coefficient ci independently to either λi or zero by flipping a biased coin with success probability
It variance is simply
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
17
Probabilistic wavelet synopses BGeneral Approach 1
For example λ0=c0 λ10= 2c10 λi=3ci2
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
18
Probabilistic wavelet synopses BGeneral Approach The impact of the λirsquo s
λi closer ci reduce the variance
λi further from ci reduces the expected number of retained coefficients
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
19
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
A reasonable approach is to select the λi values in a way that minimize the some overall error metric (egL2)
1
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
20
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Letting and The expected L2 error minimization problem is
equivalent to
Based on the Cauchy-Schwarz inequality the minimum value of the objective is reached when
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
21
Probabilistic wavelet synopses CRounding to minimize the expected mean-square error
Let
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
22
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We focus on minimizing the maximum reconstruction error for individual (related error)
The goal is to produce estimate for each value di such that
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
23
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The expected value of we would like to minimize the variance
More precisely we seek to minimize the normalized standard error for a reconstructed data value
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
24
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
Note that by applying Chebyshevrsquos Inequality we obtain( for all αgt1)
So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
25
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
26
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
We would like to formulate a dynamic programming recurrence for this problem
Let PATHSj denote the set of all root-to-leaf pahts in Tj M[jB] denote the optimal value of the maximum among all data dk in Tj assuming a space budget of B
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
27
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
M[jB] depicted in (11)
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
28
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
29
Probabilistic wavelet synopses DRounding to minimize the maximum relative error
The problem in (11) is that the yi and bL each range over a continuous interval making it infeasible to use
The key technical idea is to quantize the solution space
We modify the constraint
where q is a input integer
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
30
Probabilistic wavelet synopses ELow-bias probabilistic wavelet synopses
Each coefficient is either retained or discarded according to the probabilities yi where as before the yirsquos are selected to minimize a desired error metric
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
31
Probabilistic wavelet synopsesF Summary of the approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
32
Experimental study A Zipfian data generator was used to produ
ce Zipfian frequencies for various levels of skew (z parameter between 03 to 20)
We use real world data set download from the National Forest Service
Let q=10 sanity bound S as the 10-percentile in the da
ta perturbation Δ= min001 S100
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
33
Experimental study
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
34
Experimental study
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
35
Experimental study
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach
36
Conclusions We has introduced probabilistic wavelet synopses
the first wavelet-based data reduction technique that provably enables unbiased data reconstruction with error guarantees on individual approximate answers
We have described a number of novel techniques for tuning our scheme to minimize desired error metrics
Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach