PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis...

PREFER: A System for the Efficient Execution

of Multi-parametric Ranked Queries

• Vagelis Hristidis

University of California, San Diego

• Nick Koudas

AT&T Research

• Yannis Papakonstantinou

University of California, San Diego

Example

Car ID Mileage Year Price Doors

1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

Example


1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

ORDER BY

0.01· Mileage + 0.6·Year + 0.03· Price

ExampleCar ID Mileage Year Price Doors

2 20000 2000 11000 41 10000 1997 20000 23 17000 1998 12000 65 5000 1990 12000 24 15000 1990 8000 26 15000 1990 5000 47 12000 1985 5000 4


1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

ORDER BY



2 20000 2000 11000 41 10000 1997 20000 23 17000 1998 12000 65 5000 1990 12000 24 15000 1990 8000 26 15000 1990 5000 47 12000 1985 5000 4


1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

ORDER BY


Problem: Retrieve WHOLE relation


2 20000 2000 11000 41 10000 1997 20000 23 17000 1998 12000 65 5000 1990 12000 24 15000 1990 8000 26 15000 1990 5000 47 12000 1985 5000 4


1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

ORDER BY


Problem: Retrieve WHOLE relation

PREFERretrieves

onlypart ofrelation

Applications

Such preference queries are used in Web sites like:

• www.Zagat.com ( restaurants)

• www.personallogic.com (online retailer)

http://www.zagat.com/





http://www.personallogic.com/




Definitions - Problem statement• A preference query orders the tuples of

a relation according to a function of the attribute values. eg: 0.01· Mileage + 0.6·Year + 0.03· Price

• Goal is to produce top-K answers of a preference query, retrieving the minimum # of tuples

Our Approach

PREFER materializes a number of ranked views of the relation and uses them to efficiently answer to preference queries.

Our Approach

Ranked view 0.08*Price +

0.2*Year

0.08

0.2

Price

YearRanked view0.075*Price + 0.8*Year

Our Approach


0.2*Year

0.08

0.2

Price

Year

Preference query:

0.07*Price + 0.35*Year

0.07

0.35


0.8*Year

•Relation

•Space constraints

•Discretization of ranked views’ vectors.

Which ranked views should we

materialize?

PREFER Architecture

Views Creation

Preprocessing stage

View Selection

Query

Pipelining Algorithm

•Query

•Ranked View id

Mat.Views

Output results

Runtime Process Which ranked view should we use to answer to a

specific preference query?

PREFER Architecture

index of mat.

views

Preprocessing stage•Relation




materialize?

Views Creation

How to use a preference view to answer to a preference query

View Selection

Query


•Query

•Ranked View id

Mat.Views

Output results

Runtime Process


Which ranked view should we use to answer to a


PREFER Architecture

index of mat.

views





materialize?

Views Creation

Car ID Mileage Year Price Doors fv

1 10000 1997 20000 2 16.82 20000 2000 11000 4 16.43 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

t1

Watermark = 14.26

Car ID ... Doors fq

Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price

Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price

last tuple

Watermark

)()()(, 11, vqqqvv tftfTtfRt

Calculating the Watermark

k

iiiiv

k

iiiq tAvqtftAqtf

11

)()()()()(

)()'()()'( 1

1vq

k

iiiiv tftAvqtf

)()()(, 11, vqqqvv tftfTtfRt

Watermark


How to use a ranked view to answer a preference query (cont’d)

PipelineResults Algorithm


1 10000 1997 20000 2 16.82 20000 2000 11000 4 16.43 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4


t1

1. Calculate Watermark for t1, which is 14.26

Car ID




1 10000 1997 20000 2 16.82 20000 2000 11000 4 16.43 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

t1




2. Find prefix of view with fv greater than watermark value and sort them by fq

Car ID



t1


2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4





Car ID



t1


2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4



3. Output tuples up to t1

Car ID

2

1





t1


2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4




4. Repeat using first unprocessed as t1

Car ID

2

1





t1


2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4







Car ID

2

1



t1


2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4







Car ID

2

1

3



t1


2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4





Car ID

2

1

3



PipelineResults AlgorithmResult , ordered by 0.01*Mileage+0.6*Year+0.03*Price

t1


2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.45 5000 1990 12000 2 9.84 15000 1990 8000 2 10.26 15000 1990 5000 4 97 12000 1985 5000 4 6.4





Car ID

2

1

3

5

4

View Selection

Query


•Query

•Ranked View id

Mat.Views

Output results




PREFER Architecture

index of mat.

views





materialize?

Views Creation

Runtime Process

Define coverage

0.8

0.2

Year

Price

Ranked view 0.8*Price + 0.2*Year

V1

q1

Preference query:


0.7

0.35

V1 covers q1: At most k tuples are retrieved from V1 in order to output first result of q1.

Which ranked view should we use to answer to a specific preference

query?


0.8

0.2

Price

Year


0.8*Year


0.8

0.2

Price

YearRanked view 0.75*Price +

0.8*Year


query?


0.8

0.2

Price

Year

Preference query:


0.7

0.35


0.8*Year

V1 covers q1


query?

V1

q1

View Selection

Query


•Query

•Ranked View id

Mat.Views

Output results




PREFER Architecture

index of mat.

views





materialize?

Views Creation

Runtime Process

Which ranked views should we materialize?

ViewSelection Algorithm

while (not all preference vectors in [0,1]n covered) Randomly pick v[0,1]n and add it to the list of views L

VIEWSfor i = 1 to C do

select v L that covers the maximum number of uncovered vectors in [0,1]n VIEWSVIEWSv

Which ranked views should we materialize? (cont’d)





Which ranked views should we materialize? (cont’d)





C = 3

Constraint on # of views

Maximum coverage problem using the minimum # of materialized views is NP-Hard.

Greedy Heuristic is

approximation for maximum coverage.e

11

Related Work

• Preference Query Framework [AW00]• Top-k queries

– Joins• Fagin [F99,F96,F01], equijoins of ordered data

– Selections [reduce top-k selection to range query]• Histograms to estimate cutoff

[Chaudhuri&Gravano 99]• Probabilistic model

[Donjerkovic&Ramakrishnan 99]• Partitioning [Carey & Kossman 97,98]

Related WorkThe Onion Technique (Sigmod 2000). Main observation: the points of interest lie

on the convex hull of the tuple space.Drawbacks of Onion:• Does not scale• Computing the convex hull is very

computationally intensive• Not efficient if the domain of an attribute

has a small cardinality• Not efficient for more than the top-1 result

Experiments

Measured parameters

• # attributes

• size of relation

• # views

• constraint on max # tuples retrieved

Parameters of Experiments

• synthetic datasets

• 3 to 5 attributes

• 10,000 to 500,000 tuples

• random & correlated data

• discretization of 0.1 or 0.05

Experiments (cont’d)

Execution Times (correlated dataset)

0

5

10

15

20

25

30

35

40

45

0 100 200 300 400 500 600#results requested

Tim

e (s

ecs)

Using our algor Using Oracle

Dual PII CPU, 512MB RAM, 4 attr, 50,000 tuples, 34 Views

Experiments (cont’d)

Varying the dataset size

0

10

20

30

40

50

60

70

80

90

100

1 6 11 16 21 26

# number of ranked views to use

% q

uer

ies

cove

red

10,000 tuples 50,000 tuples 500,000 tuples

4 attr, constraint = 500 tuples, discretization = 0.1

Experiments (cont’d)Varying the number of attributes

0

10

20

30

40

50

60

70

80

90

100

1 11 21 31 41 51# ranked views to use

% q

uer

ies

cove

red

3attr 4attr 5attr

500,000 tuples, constraint = 500 tuples, discretization = 0.05...0.1

Experiments (cont’d)Varying guarantee (maximum number of tuples retrieved to output first result) 50,000 tuples

0

10

20

30

40

5060

70

80

90

100

0 2000 4000 6000 8000constraint (#tuples)

% q

uer

ies

cove

red

10 views 20 views

4 attr, discretization = 0.1

Experiments (cont’d)Varying guarantee (maximum number of tuples retrieved to output first result) 500,000 tuples

0

10

20

30

40

50

60

70

80

90

100

0 10000 20000 30000 40000constraint (#tuples)

%q

ueri

es c

overe

d

10 view s 20 view s

4 attr, discretization = 0.1

Experiments (cont’d)Comparison with the Onion Technique

0

10000

20000

30000

40000

50000

60000

1 10 100

#of results

# t

up

les

re

trie

ve

d f

rom

d

ata

ba

se

onion 1 index 2 indices 4 indices 6 indices

50,000 tuples, 3 attr, discretization = 0.05

More Resources

www.db.ucsd.edu/PREFER

• PREFER demo

• PREFER Application– Construct Materialized Views– Issue preference queries

MS Windows, on top of Oracle DBMS

http://www.db.ucsd.edu/PREFER



Conclusions• Methodology to efficiently answer to

top-K linearly weighted queries

• Algorithm that uses a ranked view to answer to a preference query

• Ranked materialized views were used

• Experimental evaluation

Date post:	14-Dec-2015
Category:	Documents
Upload:	leslie-lobdell
View:	217 times
Download:	0 times

PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis...

Documents