Relaxing Join Selection & Queries Nick Koudas et al
Presented by Pradnya Chavarkar
Relaxing Join and Selection Queries
2
Motivation
Complex queries in terms of predicates against large databases.
Right parameter values not known. Parameter adjustment becomes a trial-
and-error method. More the predicates, difficult the
adjustment
Relaxing Join and Selection Queries
3
Example
Relaxing Join and Selection Queries
4
Example
How many conditions to adjust?? How much to adjust?? No. of choices exponential in number of
conditions
Relaxing Join and Selection Queries
5
Outline
Relaxation framework Relaxing select conditions Relaxing All conditions Lattice traversal Experimental results Conclusion
Relaxing Join and Selection Queries
6
Query Relaxation
Top–K queriesWeights given to each condition
Skyline Points which are not dominated by any
other points in the given set
Relaxing Join and Selection Queries
7
Relaxation Framework
Selection Condition
Join Condition
Relaxing Join and Selection Queries
8
Relaxation Skyline
Relaxing Join and Selection Queries
9
Simple Solution !
Compute Skyline on Jobs Compute Skyline on Candidates Join the skylines
Relaxing Join and Selection Queries
10
Simple Solution ! (Contd)
Incorrect results ! Computes skyline locally
Relaxing Join and Selection Queries
11
Relaxing Selection Conditions Many joins are on Identifier attributes Relaxation skyline should satisfy :
Relaxing Join and Selection Queries
12
Join First (JF)
Join without select conditions Compute skyline for select conditions
on the resulting tuples May not be efficient for joins which
return a large number of pairs
Relaxing Join and Selection Queries
13
Pruning Join (PJ)
Compute skyline during join step Assuming index on relation S For each tuple of R, find joining tuples
from S Call ‘update’ for each such pair and
current skyline Discard the pair if it is already
dominated else discard the pairs dominated by the current pair
Relaxing Join and Selection Queries
14
Pruning Join (PJ) (Contd.)
Advantage: Produces less pairs after join step
as compared to Join First Disadvantage
Need for modification of join methods
Relaxing Join and Selection Queries
15
Pruning Join+ (PJ+)
Modifies pruning Join Algorithm Computes “local skylines” for records
joining with a record of othe relation Does Dominance checking within local
skyline If a pair is locally dominated then it will
not be in global skyline
Relaxing Join and Selection Queries
16
Pruning Join+ (PJ+) (Contd)
R (A, B, C)S (C, D, E)
Relaxing Join and Selection Queries
17
Pruning Join+ (PJ+) (Contd)
Does local pruning hoping to eliminate some S tuples beforehand
May not be always beneficial, when many tuples are not eliminated
Experimental results show that the tradeoff depends upon factors like number of conditions
Relaxing Join and Selection Queries
18
Sorted Access Join (SAJ)
Constructs sorted list of tuple IDs for each select condition based on relaxation
Go through the lists in round robin fashion
For each record in Li find records which can join -- (p,q)
Stop if all the current records in list Lj (i≠j), have relaxation greater that (p,q)
Relaxing Join and Selection Queries
19
Sorted Access Join (SAJ) (Contd)
Relaxing Join and Selection Queries
20
Relaxing All Conditions
Relaxing join conditions along with selection conditions
Assumes multidimensional indexed structure on selection and join conditions
R-Tree on both relations
Relaxing Join and Selection Queries
21
MIDIR
Traverses both R-Trees top down to identify potential pairs which can be in relaxation skyline
Push such pairs in queue In each iteration, pop one pair and
perform dominance checking If object-object pair, call Update()
Relaxing Join and Selection Queries
22
Dominance checking in MIDIR
Compute lower and upper bound on relaxation Pair p’ dominates p if for each condition Ci
To compute the lower and upper boundfor Ci (selection condition) convert itinto interval
Relaxing Join and Selection Queries
23
Dominance checking in MIDIR (Contd) B: MRB or tuple A: attribute used in B I(B,A): interval of A in B
Selection Condition:
Join Condition
Relaxing Join and Selection Queries
24
Dominance checking in MIDIR (Contd) Minimum distance between two intervals
Then MINDIST = 0
else MINDIST =
Relaxing Join and Selection Queries
25
Variants of Query Relaxation
Top-k answers User can specify weights on the conditions K points that have smallest weighted
summation Store the k best skyline answers Modify Update() in PJ Priority queues in MIDIR
Relaxing Join and Selection Queries
26
Variants of Query Relaxation (Contd)
Queries with multiple joins JF – compute all joins beforehand SAJ – access multiple index structures to
find the joining tuples MIDIR – queue maintains possible vectors
Relaxation of nonnumeric attributes RELAX(r,c) an be modified All algorithms can refer to this computed
value
Relaxing Join and Selection Queries
27
Variants of Query Relaxation (Contd)
Lattice Structure
Relaxing Join and Selection Queries
28
Lattice Traversal At least one result should be returned Examine the reasons for the empty
result – Join condition Cj is strict
Relax Cj with the tuples provided by applying selection conditions
Either CR or Cs is strict Relax the strict selection condition If join still strict then relax join with either of the
selections
Relaxing Join and Selection Queries
29
Lattice Traversal (Contd)
Both CR or Cs are strict Relax both selection conditions If join condition still strict, relax join condition If still no result, relax all togather
Relaxing Join and Selection Queries
30
Experiments
Experimental Settings Two real and three synthetic dabases Three types of correlations
Independent Correlated Anti-correlated
Relaxing Join and Selection Queries
31
Experiments (different data size)
Relaxing Join and Selection Queries
32
Experiments (different data sizes)
Relaxing Join and Selection Queries
33
Experiments (Different join cardinality)
Relaxing Join and Selection Queries
34
Experiments (Different selection conditions)
Relaxing Join and Selection Queries
35
Summary
JF and PJ Almost same performance for relaxation of
selection conditions PJ and PJ+
PJ works faster, but performance gets affected by join cardinality
SAJ Efficient for correlated data and query
conditions
Relaxing Join and Selection Queries
36
Conclusion
Efficient relaxation algorithm for relaxing join and selection queries
Lattice traversal method for minimal relaxation
Relaxing Join and Selection Queries
37
Related Work (ML for query relaxation ) LOQR algorithm proposed which is for
queries in disjunctive normal form Three steps
Exacting domain knowledge Finding the most useful rule Relaxing the failing query
Relaxing Join and Selection Queries
38
Introduction Let the query by the user be
The query fails because Laptops with large screen weights more
than 3 pounds Fast laptops with large HDD cost more than
$2000
Relaxing Join and Selection Queries
39
Extracting domain knowledge Take subset D of the target database Find “typical values” of the other attributes for
each constraints Forms a new dataset Di, with values of the
constraint attribute indicated a Boolean
Relaxing Join and Selection Queries
40
Extracting domain knowledge (contd)
Relaxing Join and Selection Queries
41
Finding the most useful rule
For rules formed using all the constraints individually, find the most similar rule by nearest neighbour search
R1 is the most similar as they differ only on the price attribute
It is guaranteed to give a result as it gives a result on the subset of the target database
Relaxing Join and Selection Queries
42
Relaxing the failing query
Most similar query is combined with the original query to give the relaxed query
Relaxing Join and Selection Queries
43
References
Relaxing Join and selection QueriesNick Koudas, Chen Li, and Anthony K. H. Tung, Rares Vernica
Machine learning for online query relaxationIon Muslea