Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | hester-newman |
View: | 212 times |
Download: | 0 times |
APPROXIMATE QUERY PROCESSING USING WAVELETS
Kaushik Chakrabarti(Univ Of Illinois)
Minos Garofalakis(Bell Labs)
Rajeev Rastogi(Bell Labs)
Kyuseok Shim(KAIST and AITrc)
Presented By:
Charanmai Koorapati Ramesh
Harika Guniganti
AGENDA
Introduction Motivation Prior Work Wavelet Decomposition Building Wavelet Synopses Processing Relational Queries Experimental Study Quality Metrics Query Execution Times Conclusion
DECISION SUPPORT SYSTEMS
Comparative sales figures between one week and the next
Projected revenue figures based on new product sales assumptions
The consequences of different decision alternatives, given past experience in a context that is described
MOTIVATION
DSS users pose very complex queries to the underlying DBMS that require complex operations over Gigabytes or Terabytes of disk-resident data.
SQL Query
Exact Answer
Decision Support Systems
Long Response Times!
Exact answers NOT always required. User may prefer a fast, approximate answer.
SQL Query
Exact Answer
CompacCompact Data t Data SynopsSynopseses
“Transformed” Query
KB/MB
Approximate Answer
FAST!!
Long Response Times!
Decision Support Systems
GB/TB
APPROXIMATE QUERY PROCESSING
Viable solution for dealing with Huge amounts of data High query complexities Increasingly stringent response-time
requirements
PRIOR WORK
Sampling Based TechniquesLimitations:• Join operator on two uniform samples• Non- aggregate query
Histogram Based TechniquesLimitations:• Storage overhead• Construction cost achieve reasonable error rates for high
dimensional data sets.
WAVELET BASED TECHNIQUES
Wavelet -mathematical function used to divide a given function or continuous-time signal into different frequency components
and study each component with a resolution that matches its scale.
This paper extends the scope of earlier work , establishing the viability and effectiveness of wavelets as a generic approximate query processing tool for modern high-dimensional DSS applications.
APPROXIMATE QUERY PROCESSING USING WAVELETS
Novel approach consisting of two steps- Multi dimensional Haar wavelets - effective,
compact synopses Novel query processing alogorithms - fast
and accurate approximate query answers
WAVELET DECOMPOSITION/TRANSFORM
One- dimensional Haar WaveletsData vector A = [2,2,5,7]
Wavelet transform, WA = [4,-2,0,-1]
Resolution Averages Detail Coefficients
2 [2,2,5,7] -
1 [2,6] [0,-1]
0 [4] [-2]
Wavelet Coefficient
NORMALIZED WAVELET TRANSFORM
To equalize the importance of all the wavelet coefficients , we normalize the final entries of WA, by dividing each wavelet coefficient by √2 ^l,
where l is the level of resolution.
Thus WA= [4,-2,0,-1/ √2]
MULTIDIMENSIONAL HAAR WAVELETS Standard Decomposition First, fix an ordering for the data
dimensions(say 1,2,… d) and then proceed to apply the complete one-dimensional wavelet transform for each one dimensional “row” of array cells along dimension k, for all k=1,2…d.
Non- standard DecompositionGiven an ordering for the data dimensions (1,2,…d), we perform one step of pairwise averaging and differencing for each one dimensional row of array cells along dimension k, for each k=1,…d. This process is repeated recursively only on quadrant containing averages across all dimensions.
NON-STANDARD DECOMPOSITION
EXAMPLE DECOMPOSITION OF A 4×4 ARRAY
MULTIDIMENSIONAL HAAR COEFFICIENTS- SEMANTICS AND REPRESENTATION
SUPPORT REGIONS AND SIGNS FOR 16 NONSTANDARD 2-DIMENSIONAL HAAR BASIS FUNCTIONS
Haar wavelet coefficient can be represented with the triple
W=<R,S,v> where1) W.R is d-dimensional support hyper-
rectangle of W Along each dimension j,1<=j<=d
Low boundary value - W.R.bound[j].loHigh boundary value - W.R.bound[j].hi
Coefficient W contributes to each data cell of A[i1,…id] satisfying the condition W.R.bound[j].lo <= ij <= W.R.bound[j].hi
for all dimensions j, 1<= j<=d
2) W.S stores sign information for all d-dimensional quadrants of W.R.
The two elements of the sign vector of coefficient W along dimensions j are denoted by
W.S.sign[j].lo , W.S.sign[j].hi corresponding to lower and upper half of W.R’s extent along dimension j.
The sign information is computed as a product of the d-sign entries that map to that quadrant.
3) W.v is the (scalar) magnitude of coefficient W.This is exactly the quantity that W contributes
to all data array cells enclosed in W.R.
BUILDING WAVELET-COEFFICIENT SYNOPSES
Joint Data Distribution Joint Data Distribution ArrayArray
0 1 2 3Attr1
3
2
1
0
Attr2
36
4
Attr1 Attr2 Count
2 0 4
1 1 6
3 1 3
Relation (ROLAP) Relation (ROLAP) Representation Representation
Capturing d-dimensional array AR (joint frequency distribution) from relational table R (“set of tuples” ROLAP)
What is the size of the wavelet-coefficient synopsis?
PROCESSING RELATIONAL QUERIES IN WAVELET-COEFFICIENT DOMAIN
Wavelet Synopses
Approximate
Relations
Query Results in
Wavelet Domain
Final Approximate
Results
Render
Render
Querying in
Wavelet
Domain
Querying in
Relation
Domain
Compressed domain (FAST)
Relation domain (SLOW)
• Reduce relations into compact wavelet-coefficient synopses
WAVELET QUERY PROCESSING
join
project
select select
set of coefficients
set of coefficients
set of coefficients
Each operator (e.g., select, project,
join, aggregates, etc.)
input: set of wavelet
coefficients
output: set of wavelet
coefficients
Finally, rendering step
input: set of wavelet
coefficients
output: (multi)set of tuples
render
QUICK REVIEW OF NOTATIONS
SELECTION OPERATOR (SELECT)
SELECTION -- RELATIONAL DOMAIN
In relational domain, interested in only those cells inside query range
In wavelet domain, interested in only the coefficients that contribute to those cells
Dim D1(Attr1)
Dim D2(Attr2)
Count
0 6 61 2 31 3 41 5 61 6 82 6 73 0 14 2 35 2 26 1 36 2 26 5 16 6 3
Dim. D2
6
3
7
3
32
2
4
1
1
8
6
3
Query Range
Dim.
D1
Joint Data Distribution ArrayJoint Data Distribution ArrayRelationRelation
APPROXIMATE QUERY EXECUTION ENGINE PROCESS FOR SELECT
SELECTION -- WAVELET DOMAIN
--++
+ -
-+
+-
D2
D1
Query
Range -+
-+
-+
D2
D1
PROJECTION OPERATOR (PROJECT)
PROJECTION- WAVELET DOMAIN
JOIN OPERATOR (JOIN)
EQUI-JOIN -- RELATIONAL DOMAIN
Relational domain: Join count= 7*3 = (A1-A3)*(B2+B3) Wavelet domain: A1*B2 + A1*B3 - A3*B2 - A3*B3 Consider all pairs of coefficients: (1) check joinability (overlap in
join dimension(s)), (2) compute output coefficients
3
Coefficients A1 (+) and A3 (-)
contribute to this cell
Coefficients B2 (+), and B3 (+)
contribute to this cellDim D1(Attr1)
Dim D2(Attr2)
Count
6 2 74 3 6
Dim D1(Attr1)
Dim D3(Attr3)
Count
6 3 3
Join along D1
Dim D1(Attr1)
Dim D2(Attr2)
Dim D3(Attr3)
Count
6 2 3 21
Joint Data DistributionJoint Data Distribution of Relation 1of Relation 1
Joint Data Distr.Joint Data Distr. of Relation 2of Relation 2
7
6
Dim. D2 Dim. D3
Join Dim.
D1
Relation 1Relation 1
Relation 2Relation 2
EQUI-JOIN -- WAVELET DOMAIN
-+
D3
D1--++
D2
D1
D1
v1 v2
Join output coefficient:
D3
D1
+
D2
-v = v1 * v2
EXPERIMENTAL STUDY
Improved Answer Quality
Low Synopsis Construction Costs
Fast Query Execution
ERROR METRICS FOR SET-VALUED QUERY ANSWERS
Need an error metric for (multi)sets that accounts for both differences in element frequencies
differences in element values
Proposed Solutions MAC (Match-And-Compare) Error [IP99]: based on perfect
bipartite graph matching
EMD (Earth Mover’s Distance) Error [CGR00, RTG98]: based on bipartite network flows
QUERY EXECUTION TIMES
SELECT-JOIN-SUM QUERY ERRORS ON REAL-LIFE DATA
SELECT query errors on real-life data
SELECT-SUM QUERY ERRORS ON REAL-LIFE DATA
CONCLUSION
Multidimensional wavelets as an effective tool for general purpose approximate query processing in modern, high dimensional applications.
The query processing algorithms operate directly on the wavelet-coefficient synopses of relational data, thus allowing for very fast processing of arbitrarily complex queries entirely in the wavelet-coefficient domain.
Extensive experimental study with synthetic as well as real-life data sets that verifies the effectiveness of our wavelet-based approach compared to both sampling and histograms
Questions???
THANK YOU