Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | leslie-lobdell |
View: | 217 times |
Download: | 0 times |
PREFER: A System for the Efficient Execution
of Multi-parametric Ranked Queries
• Vagelis Hristidis
University of California, San Diego
• Nick Koudas
AT&T Research
• Yannis Papakonstantinou
University of California, San Diego
Example
Car ID Mileage Year Price Doors
1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4
Example
Car ID Mileage Year Price Doors
1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4
ORDER BY
0.01· Mileage + 0.6·Year + 0.03· Price
ExampleCar ID Mileage Year Price Doors
2 20000 2000 11000 41 10000 1997 20000 23 17000 1998 12000 65 5000 1990 12000 24 15000 1990 8000 26 15000 1990 5000 47 12000 1985 5000 4
Car ID Mileage Year Price Doors
1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4
ORDER BY
0.01· Mileage + 0.6·Year + 0.03· Price
ExampleCar ID Mileage Year Price Doors
2 20000 2000 11000 41 10000 1997 20000 23 17000 1998 12000 65 5000 1990 12000 24 15000 1990 8000 26 15000 1990 5000 47 12000 1985 5000 4
Car ID Mileage Year Price Doors
1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4
ORDER BY
0.01· Mileage + 0.6·Year + 0.03· Price
Problem: Retrieve WHOLE relation
ExampleCar ID Mileage Year Price Doors
2 20000 2000 11000 41 10000 1997 20000 23 17000 1998 12000 65 5000 1990 12000 24 15000 1990 8000 26 15000 1990 5000 47 12000 1985 5000 4
Car ID Mileage Year Price Doors
1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4
ORDER BY
0.01· Mileage + 0.6·Year + 0.03· Price
Problem: Retrieve WHOLE relation
PREFERretrieves
onlypart ofrelation
Applications
Such preference queries are used in Web sites like:
• www.Zagat.com ( restaurants)
• www.personallogic.com (online retailer)
Definitions - Problem statement• A preference query orders the tuples of
a relation according to a function of the attribute values. eg: 0.01· Mileage + 0.6·Year + 0.03· Price
• Goal is to produce top-K answers of a preference query, retrieving the minimum # of tuples
Our Approach
PREFER materializes a number of ranked views of the relation and uses them to efficiently answer to preference queries.
Our Approach
Ranked view 0.08*Price +
0.2*Year
0.08
0.2
Price
YearRanked view0.075*Price + 0.8*Year
Our Approach
Ranked view 0.08*Price +
0.2*Year
0.08
0.2
Price
Year
Preference query:
0.07*Price + 0.35*Year
0.07
0.35
Ranked view 0.075*Price +
0.8*Year
•Relation
•Space constraints
•Discretization of ranked views’ vectors.
Which ranked views should we
materialize?
PREFER Architecture
Views Creation
Preprocessing stage
View Selection
Query
Pipelining Algorithm
•Query
•Ranked View id
Mat.Views
Output results
Runtime Process Which ranked view should we use to answer to a
specific preference query?
PREFER Architecture
index of mat.
views
Preprocessing stage•Relation
•Space constraints
•Discretization of ranked views’ vectors.
Which ranked views should we
materialize?
Views Creation
How to use a preference view to answer to a preference query
View Selection
Query
Pipelining Algorithm
•Query
•Ranked View id
Mat.Views
Output results
Runtime Process
How to use a preference view to answer to a preference query
Which ranked view should we use to answer to a
specific preference query?
PREFER Architecture
index of mat.
views
Preprocessing stage•Relation
•Space constraints
•Discretization of ranked views’ vectors.
Which ranked views should we
materialize?
Views Creation
Car ID Mileage Year Price Doors fv
1 10000 1997 20000 2 16.82 20000 2000 11000 4 16.43 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4
t1
Watermark = 14.26
Car ID ... Doors fq
Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price
Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price
last tuple
Watermark
)()()(, 11, vqqqvv tftfTtfRt
Calculating the Watermark
k
iiiiv
k
iiiq tAvqtftAqtf
11
)()()()()(
)()'()()'( 1
1vq
k
iiiiv tftAvqtf
)()()(, 11, vqqqvv tftfTtfRt
Watermark
Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price
How to use a ranked view to answer a preference query (cont’d)
PipelineResults Algorithm
Car ID Mileage Year Price Doors fv
1 10000 1997 20000 2 16.82 20000 2000 11000 4 16.43 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4
Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price
t1
1. Calculate Watermark for t1, which is 14.26
Car ID
How to use a ranked view to answer a preference query (cont’d)
PipelineResults Algorithm
Car ID Mileage Year Price Doors fv
1 10000 1997 20000 2 16.82 20000 2000 11000 4 16.43 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4
t1
Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price
Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price
1. Calculate Watermark for t1, which is 14.26
2. Find prefix of view with fv greater than watermark value and sort them by fq
Car ID
How to use a ranked view to answer a preference query (cont’d)
PipelineResults Algorithm
t1
Car ID Mileage Year Price Doors fv
2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4
Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price
Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price
1. Calculate Watermark for t1, which is 14.26
2. Find prefix of view with fv greater than watermark value and sort them by fq
Car ID
How to use a ranked view to answer a preference query (cont’d)
PipelineResults Algorithm
t1
Car ID Mileage Year Price Doors fv
2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4
1. Calculate Watermark for t1, which is 14.26
2. Find prefix of view with fv greater than watermark value and sort them by fq
3. Output tuples up to t1
Car ID
2
1
Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price
Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price
How to use a ranked view to answer a preference query (cont’d)
PipelineResults Algorithm
t1
Car ID Mileage Year Price Doors fv
2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4
1. Calculate Watermark for t1, which is 14.26
2. Find prefix of view with fv greater than watermark value and sort them by fq
3. Output tuples up to t1
4. Repeat using first unprocessed as t1
Car ID
2
1
Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price
Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price
How to use a ranked view to answer a preference query (cont’d)
PipelineResults Algorithm
t1
Car ID Mileage Year Price Doors fv
2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4
Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price
Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price
1. Calculate Watermark for t1, which is 13.1
2. Find prefix of view with fv greater than watermark value and sort them by fq
3. Output tuples up to t1
4. Repeat using first unprocessed as t1
Car ID
2
1
How to use a ranked view to answer a preference query (cont’d)
PipelineResults Algorithm
t1
Car ID Mileage Year Price Doors fv
2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4
Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price
Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price
1. Calculate Watermark for t1, which is 13.1
2. Find prefix of view with fv greater than watermark value and sort them by fq
3. Output tuples up to t1
4. Repeat using first unprocessed as t1
Car ID
2
1
3
How to use a ranked view to answer a preference query (cont’d)
PipelineResults Algorithm
t1
Car ID Mileage Year Price Doors fv
2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4
1. Calculate Watermark for t1, which is 8.3
2. Find prefix of view with fv greater than watermark value and sort them by fq
3. Output tuples up to t1
4. Repeat using first unprocessed as t1
Car ID
2
1
3
Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price
How to use a ranked view to answer a preference query (cont’d)
PipelineResults AlgorithmResult , ordered by 0.01*Mileage+0.6*Year+0.03*Price
t1
Car ID Mileage Year Price Doors fv
2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.45 5000 1990 12000 2 9.84 15000 1990 8000 2 10.26 15000 1990 5000 4 97 12000 1985 5000 4 6.4
1. Calculate Watermark for t1, which is 8.3
2. Find prefix of view with fv greater than watermark value and sort them by fq
3. Output tuples up to t1
4. Repeat using first unprocessed as t1
Car ID
2
1
3
5
4
View Selection
Query
Pipelining Algorithm
•Query
•Ranked View id
Mat.Views
Output results
How to use a preference view to answer to a preference query
Which ranked view should we use to answer to a
specific preference query?
PREFER Architecture
index of mat.
views
Preprocessing stage•Relation
•Space constraints
•Discretization of ranked views’ vectors.
Which ranked views should we
materialize?
Views Creation
Runtime Process
Define coverage
0.8
0.2
Year
Price
Ranked view 0.8*Price + 0.2*Year
V1
q1
Preference query:
0.7*Price + 0.35*Year
0.7
0.35
V1 covers q1: At most k tuples are retrieved from V1 in order to output first result of q1.
Which ranked view should we use to answer to a specific preference
query?
Ranked view 0.8*Price + 0.2*Year
0.8
0.2
Price
Year
Ranked view 0.75*Price +
0.8*Year
Ranked view 0.8*Price + 0.2*Year
0.8
0.2
Price
YearRanked view 0.75*Price +
0.8*Year
Which ranked view should we use to answer to a specific preference
query?
Ranked view 0.8*Price + 0.2*Year
0.8
0.2
Price
Year
Preference query:
0.7*Price + 0.35*Year
0.7
0.35
Ranked view 0.75*Price +
0.8*Year
V1 covers q1
Which ranked view should we use to answer to a specific preference
query?
V1
q1
View Selection
Query
Pipelining Algorithm
•Query
•Ranked View id
Mat.Views
Output results
How to use a preference view to answer to a preference query
Which ranked view should we use to answer to a
specific preference query?
PREFER Architecture
index of mat.
views
Preprocessing stage•Relation
•Space constraints
•Discretization of ranked views’ vectors.
Which ranked views should we
materialize?
Views Creation
Runtime Process
Which ranked views should we materialize?
ViewSelection Algorithm
while (not all preference vectors in [0,1]n covered) Randomly pick v[0,1]n and add it to the list of views L
VIEWSfor i = 1 to C do
select v L that covers the maximum number of uncovered vectors in [0,1]n VIEWSVIEWSv
Which ranked views should we materialize? (cont’d)
ViewSelection Algorithm
while (not all preference vectors in [0,1]n covered) Randomly pick v[0,1]n and add it to the list of views L
VIEWSfor i = 1 to C do
select v L that covers the maximum number of uncovered vectors in [0,1]n VIEWSVIEWSv
Which ranked views should we materialize? (cont’d)
ViewSelection Algorithm
while (not all preference vectors in [0,1]n covered) Randomly pick v[0,1]n and add it to the list of views L
VIEWSfor i = 1 to C do
select v L that covers the maximum number of uncovered vectors in [0,1]n VIEWSVIEWSv
C = 3
Constraint on # of views
Maximum coverage problem using the minimum # of materialized views is NP-Hard.
Greedy Heuristic is
approximation for maximum coverage.e
11
Related Work
• Preference Query Framework [AW00]• Top-k queries
– Joins• Fagin [F99,F96,F01], equijoins of ordered data
– Selections [reduce top-k selection to range query]• Histograms to estimate cutoff
[Chaudhuri&Gravano 99]• Probabilistic model
[Donjerkovic&Ramakrishnan 99]• Partitioning [Carey & Kossman 97,98]
Related WorkThe Onion Technique (Sigmod 2000). Main observation: the points of interest lie
on the convex hull of the tuple space.Drawbacks of Onion:• Does not scale• Computing the convex hull is very
computationally intensive• Not efficient if the domain of an attribute
has a small cardinality• Not efficient for more than the top-1 result
Experiments
Measured parameters
• # attributes
• size of relation
• # views
• constraint on max # tuples retrieved
Parameters of Experiments
• synthetic datasets
• 3 to 5 attributes
• 10,000 to 500,000 tuples
• random & correlated data
• discretization of 0.1 or 0.05
Experiments (cont’d)
Execution Times (correlated dataset)
0
5
10
15
20
25
30
35
40
45
0 100 200 300 400 500 600#results requested
Tim
e (s
ecs)
Using our algor Using Oracle
Dual PII CPU, 512MB RAM, 4 attr, 50,000 tuples, 34 Views
Experiments (cont’d)
Varying the dataset size
0
10
20
30
40
50
60
70
80
90
100
1 6 11 16 21 26
# number of ranked views to use
% q
uer
ies
cove
red
10,000 tuples 50,000 tuples 500,000 tuples
4 attr, constraint = 500 tuples, discretization = 0.1
Experiments (cont’d)Varying the number of attributes
0
10
20
30
40
50
60
70
80
90
100
1 11 21 31 41 51# ranked views to use
% q
uer
ies
cove
red
3attr 4attr 5attr
500,000 tuples, constraint = 500 tuples, discretization = 0.05...0.1
Experiments (cont’d)Varying guarantee (maximum number of tuples retrieved to output first result) 50,000 tuples
0
10
20
30
40
5060
70
80
90
100
0 2000 4000 6000 8000constraint (#tuples)
% q
uer
ies
cove
red
10 views 20 views
4 attr, discretization = 0.1
Experiments (cont’d)Varying guarantee (maximum number of tuples retrieved to output first result) 500,000 tuples
0
10
20
30
40
50
60
70
80
90
100
0 10000 20000 30000 40000constraint (#tuples)
%q
ueri
es c
overe
d
10 view s 20 view s
4 attr, discretization = 0.1
Experiments (cont’d)Comparison with the Onion Technique
0
10000
20000
30000
40000
50000
60000
1 10 100
#of results
# t
up
les
re
trie
ve
d f
rom
d
ata
ba
se
onion 1 index 2 indices 4 indices 6 indices
50,000 tuples, 3 attr, discretization = 0.05
More Resources
www.db.ucsd.edu/PREFER
• PREFER demo
• PREFER Application– Construct Materialized Views– Issue preference queries
MS Windows, on top of Oracle DBMS
Conclusions• Methodology to efficiently answer to
top-K linearly weighted queries
• Algorithm that uses a ranked view to answer to a preference query
• Ranked materialized views were used
• Experimental evaluation