1
Query Compilation
ParsingLogical Query Plan
Source: our textbook, slides by Hector Garcia-Molina
2
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
{P1,P2,…..}
{(P1,C1),(P2,C2)...}
Pi
answer
SQL query
parse tree
logical query plan
“improved” l.q.p
l.q.p. +sizes
statistics
3
Outline Convert SQL query to a parse tree
Semantic checking: attributes, relation names, types Convert to a logical query plan (relational
algebra expression) deal with subqueries
Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of
relations Convert to a physical query plan
search the space of physical plans choose order of operations complete the physical query plan
4
Parsing Goal is to convert a text string
containing a query into a parse tree data structure: leaves form the text string (broken into
lexical elements) internal nodes are syntactic categories
Uses standard algorithmic techniques from compilers given a grammar for the language (e.g.,
SQL), process the string and build the tree
5
Example: SQL querySELECT titleFROM StarsInWHERE starName IN (
SELECT nameFROM MovieStarWHERE birthdate LIKE ‘%1960’
);
(Find the movies with stars born in 1960)
Assume we have a simplified grammar for SQL.
6
Example: Parse Tree<Query>
<SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> <Tuple> IN <Query>
title StarsIn <Attribute> ( <Query> )
starName <SFW>
<Attribute> <RelName> <Attribute> LIKE <Pattern>
name MovieStar birthDate ‘%1960’
SELECT <SelList> FROM <FromList> WHERE <Condition>
7
The Preprocessor
replaces each reference to a view with a parse (sub)-tree that describes the view (i.e., a query)
does semantic checking: are relations and views mentioned in the
schema? are attributes mentioned in the current
scope? are attribute types correct?
8
Outline Convert SQL query to a parse tree
Semantic checking: attributes, relation names, types
Convert to a logical query plan (relational algebra expression) deal with subqueries
Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of
relations Convert to a physical query plan
search the space of physical plans choose order of operations complete the physical query plan
9
Convert Parse Tree to Relational Algebra
Complete algorithm depends on specific grammar, which determines forms of the parse trees
Here give a flavor of the approach
10
Conversion
Suppose there are no subqueries.
SELECT att-list FROM rel-list WHERE cond
is converted into
PROJatt-list(SELECTcond(PRODUCT(rel-list))), or
att-list(cond( X (rel-list)))
11
SELECT movieTitleFROM StarsIn, MovieStarWHERE starName = name AND birthdate LIKE '%1960';
<Query>
<SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> , <FromList> AND <Condition>
movieTitle StarsIn <RelName> <Attribute> LIKE <Pattern>
MovieStar birthdate '%1960'
<Condition>
<Attribute> = <Attribute>
starName name
12
Equivalent Algebraic Expression Tree
movieTitle
starname = name AND birthdate LIKE '%1960'
X
StarsIn MovieStar
13
Handling Subqueries
Recall the (equivalent) query:SELECT titleFROM StarsInWHERE starName IN (
SELECT nameFROM MovieStarWHERE birthdate LIKE ‘%1960’
);
Use an intermediate format called two-argument selection
14
title
StarsIn <condition>
<tuple> IN name
<attribute> birthdate LIKE ‘%1960’
starName MovieStar
Example: Two-Argument Selection
15
Converting Two-Argument Selection
To continue the conversion, we need rules for replacing two-argument selection with a relational algebra expression
Different rules depending on the nature of the subquery
Here show example for IN operator and uncorrelated query (subquery computes a relation independent of the tuple being tested)
16
Rules for IN
R <Condition>
t IN S R S
X
C
C is the condition that equatesattributes in t with correspondingattributes in S
17
Example: Logical Query Plan
title
starName=name
StarsIn
birthdate LIKE ‘%1960’
MovieStar
name
18
What if Subquery is Correlated?
Example is when subquery refers to the current tuple of the outer scope that is being tested
More complicated to deal with, since subquery cannot be translated in isolation
Need to incorporate external attributes in the translation
Some details are in textbook
19
Outline Convert SQL query to a parse tree
Semantic checking: attributes, relation names, types
Convert to a logical query plan (relational algebra expression) deal with subqueries
Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of
relations Convert to a physical query plan
search the space of physical plans choose order of operations complete the physical query plan
20
Improving the Logical Query Plan
There are numerous algebraic laws concerning relational algebra operations
By applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficiently
Next we'll survey some of these laws
21
Associative and Commutative Operations
product natural join set and bag union set and bag intersection
associative: (A op B) op C = A op (B op C)
commutative: A op B = B op A
22
Laws Involving Selection
Selections usually reduce the size of the relation
Usually good to do selections early, i.e., "push them down the tree"
Also can be helpful to break up a complex selection into parts
23
Selection Splitting
C1 AND C2 (R) = C1 ( C2 (R))
C1 OR C2 (R) = ( C1 (R)) Uset ( C2 (R))if R is a set
C1 ( C2 (R)) = C2 ( C1 (R))
24
Selection and Binary Operators
Must push selection to both arguments: C (R U S) = C (R) U C (S)
Must push to first arg, optional for 2nd: C (R - S) = C (R) - S C (R - S) = C (R) - C (S)
Push to at least one arg with all attributes mentioned in C: product, natural join, theta join, intersection e.g., C (R X S) = C (R) X S, if R has all the
atts in C
25
Pushing Selection Up the Tree
Suppose we have relations StarsIn(title,year,starName) Movie(title,year,len,inColor,studioName)
and a view CREATE VIEW MoviesOf1996 AS
SELECT *FROM MovieWHERE year = 1996;
and the query SELECT starName, studioName
FROM MoviesOf1996 NATURAL JOIN StarsIn;
26
The Straightforward Tree
starName,studioName
year=1996 StarsIn
Movie Remember the ruleC(R S) = C(R) S ?
27
The Improved Logical Query Plan
starName,studioName
year=1996 StarsIn
Movie
starName,studioName
year=1996
Movie StarsIn
starName,studioName
year=1996 year=1996
Movie StarsIn
push selectionup tree
push selectiondown tree
28
Laws Involving Projections Consider adding in additional projections Adding a projection lower in the tree can
improve performance, since often tuple size is reduced Usually not as helpful as pushing selections down
If a projection is inserted in the tree, then none of the eliminated attributes can appear above this point in the tree Ex: L(R X S) = L(M(R) X N(S)), where M (resp. N) is
all attributes of R (resp. S) that are used in L Another example:
L(R Ubag S) = L(R) Ubag L(S) But watch out for set union!
29
Projection and Union Example: Suppose
R(a,b) = {(1,2)}, S(a,b) = {(1,3)}, and bag projection
a(R Ubag S)
= a({(1,2), (1,3)}) = {1, 1}
= a({(1,2)} Ubag a({(1,3)})
=a(R) Ubag a(S)
a(R Uset S)
= a({(1,2), (1,3)}) = {1, 1}
≠ {1} = {1} Uset {1} = a({(1,2)}) Uset a({(1,3)})
=a(R) Uset a(S)
30
Push Projection Below Selection?
Rule: L(C(R)) = L(C(M(R)))
where M is all attributes used by L or C But is it a good idea?
SELECT starName FROM StarsIn WHERE movieYear = 1996;
starName
movieYear=1996
StarsIn
starName,movieYear
starName
movieYear=1996
StarsIn
extra workto scan throughStarsIn twice
31
Joins and Products Recall from the definitions of relational
algebra: R C S = C(R X S) (theta join) R S = L(C(R X S)) (natural join)
where C equates same-name attributes in R and S, and L includes all attributes of R and S dropping duplicates
To improve a logical query plan, replace a product followed by a selection with a join Join algorithms are usually faster than doing
product followed by selection (on large result of the product)
32
Duplicate Elimination Moving down the tree is potentially beneficial
as it can reduce the size of intermediate relations
Can be eliminated if argument has no duplicates a relation with a primary key a relation resulting from a grouping operator
Legal to push through product, join, selection, and bag intersection Ex: (R X S) = (R) X (S)
Cannot push through bag union, bag difference or projection
33
Duplicate Elimination Pitfalls Example: Suppose
R has two copies of tuple t S has one copy of t T contains only (1,2) and (1,3)
bag union: (R Ubag S) has one copy of t (R) Ubag (S) has two copies of t
bag difference: (R S) has one copy of t (R) (S) has no copies of t
bag projection: ((T)) = {1} ((T)) = {1,1}
34
Grouping and Aggregation
Since produces no duplicates: (L(R)) = L(R)
Get rid of useless attributes: L(R) = L(M(R))
where M contains all attributes in L If L contains only MIN and MAX:
L(R) = L((R))
35
Example Suppose we have the relations
MovieStar(name,addr,gender,birthdate)StarsIn(title,year,starName)
and we want to find the youngest star to appear in a movie for each year:SELECT year, MAX(birthdate)FROM MovieStar,StarsInWHERE name = starNameGROUP BY year;
year,MAX(birthdate)
name=starName
X
MovieStar StarsIn
36
Example cont'dyear,MAX(birthdate)
name=starName
X
MovieStar StarsIn
year,MAX(birthdate)
year,birthdate
MovieStar StarsIn
name=starName
year,
starName
birthdate,
name
year,MAX(birthdate)
MovieStar StarsIn
year,birthdate
name=starName
37
Summary of LQP Improvements
Selections: push down tree as far as possible if condition is an AND, split and push separately sometimes need to push up before pushing down
Projections: can be pushed down new ones can be added (but be careful)
Duplicate elimination: sometimes can be removed
Selection/product combinations: can sometimes be replaced with join
38
Outline Convert SQL query to a parse tree
Semantic checking: attributes, relation names, types Convert to a logical query plan (relational
algebra expression) deal with subqueries
Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of
relations Convert to a physical query plan
search the space of physical plans choose order of operations complete the physical query plan
39
Grouping Assoc/Comm Operators
Group together adjacent joins, adjacent unions, and adjacent intersections as siblings in the tree
Sets up the logical QP for future optimization when physical QP is constructed: determine best order for doing a sequence of joins (or unions or intersections)
U D E FU
UA
B C
D E F
A B C
40
Evaluating Logical Query Plans
The transformations discussed so far intuitively seem like good ideas
But how can we evaluate them more scientifically?
Estimate size of relations, also helpful in evaluating physical query plans
Coming up next…