Query Optimization
Dr. Karen C. Davis
Professor School of Electronic and Computing Systems
School of Computing Sciences and Informatics
Outline
• overview of relational query optimization• logical optimization
– algebraic equivalences– transformation of trees
• physical optimization– selection algorithms– join algorithms
• cost-based optimization• research example using relational algebra
Relational Query Optimization
query optimizer
logical physicalSQL query
relational algebraquery tree
access plan(executable)
Learning Outcomes
• translate basic SQL to RA query tree• perform heuristic optimizations to tree• use cost-based optimization to select algorithms
for tree operators to generate an execution plan
SQL is declarative
• describes what data, not how to retrieve it
select distinct …
from …
where …
• helpful for users, not necessarily good for efficient execution
Relational Algebra is procedural
• specifies operators and the order of evaluation
• steps for query evaluation:
1. translate SQL to RA operators (query tree)
2. perform heuristic optimizations:a. push RA select operators down the tree
b. convert select and cross product to join
c. others based on algebraic transformations
Relational Algebra Operators
name symbolically evaluation
select σcR applies condition c to R
project πl R keeps a list (l) of attributes of R
cross product
R X S all possible combinations of tuples of R are appended with tuples from S
join R ⋈c S πl (σc (R X S)), where l is a list of attributes of R and S with duplicate columns removed and c is a join condition
SQL to RA
select distinct … πl
from … x
where … σc
πl
|
σc
|
X
/ \
R S
πl
|
σc
|
X
/ \
X S
/ \ R T
πl
|
σc
|
X
/ \
X S
/ \ X T
/ \ R U
two relations
three relations
four relations ↓
SQL to RA Tree Example
select A.x, A.y, B.z
from A, B
where A.a = B.z and A.x > 10
πA.x, A.y, B.z
|
σA.a = B.z and A.z > 10
|
X
/ \
A B
evaluated bottom-up left to right; intermediate values are passed up the tree to the next operator
SQL to RA Tree Example
select lname
from employee, works_on, projects
where pname = ‘Aquarius’ and
pnumber = pno and
essn = ssn and
bdate = ‘1985-12-03’
πlname
|
σpname = ‘Aquarius’ and
pnumber = pno and
essn = ssn and
bdate = ‘1985-12-03’
|
X
/ \
X projects
/ \ employee works_on
Simple Heuristic Optimization
1. cascade selects (split them up)
πl
|
σc1 and c2 and c3
|
X
/ \
R S
πl
|
σc1
|
σc2
|
σc3
|
X
/ \
R S
2. Push any single attribute selects down the tree to be just above their relation
πl
|
σc1
|
σc2
|
σc3
|
X
/ \
R S
πl
|
σc2
|
X
/ \ σc1
σc3
| |
R S
3. Convert 2-attribute select and cross product to join
πl
|
σc2
|
X
/ \ σc1
σc3
| |
R S
πl
|
⋈c2
/ \ σc1
σc3
| |
R S
smaller intermediate
results
efficient join algorithms
Practice problem: optimize RA tree
select P.pnumber, P.dnum, E.lname, E.bdate
from projects P, department D, employee E
where D.dnumber = P.dnum and // c1
D.mgrssn = E.ssn and // c2
P.plocation = ‘Stafford’; // c3
RA tree to RA expression
πl
|
⋈c2
/ \ σc1
σc3
| |
R S
σc1 R
σc3 S
⋈c
2
πl( )
Other Operators in Relational Algebra
SQL: (select pnumber from projects, department, employee where dnum = dnumber and mgrssn = ssn
and lname = 'Smith‘)union (select pnumber from projects, works_on, employee where pnumber = pno and essn = ssn
and lname = 'Smith');
RA:π pnumber (σ lname = ‘Smith’ employee ⋈ssn=mgrssn department
⋈ dnumber = dnum projects)⋃
π pnumber (σ lname = ‘Smith’ employee ⋈ssn=essn works_on ⋈ pnumber = pno projects)
Selection Algorithms
• linear search• binary search• primary index or hash for point query• primary index for range query• clustering index• secondary index• conjunctives
– individual index
– composite index or hash
– intersection of record pointers for multiple indexes
Join Algorithms
• nested loop join• single-scan join• sort-merge join• hash join
http://docs.oracle.com/cd/E13085_01/doc/timesten.1121/e14261/query.htm
sort-merge using indexes
example execution plan
Multiple View Processing Plan (MVPP)
view chromosome: 101100010100001
index chromosome: 1100110
Fitness: sum of query processing costs of individual queries using the views and indexes selected
⋈orderkey
(v7)
Customer (C)
Orders (O)
Lineitem (L)
Nation (N)
Part (P)
Q1Q2 Q3
πO.orderkey,
O.shippriority
(v9)
πC.custkey, C.name,
C.acctbal, N.name,
C.address, C.phone
(v12)
πP.type,
L.extendedprice
(v15)
σ C.mktsegment =
“building”
and L.shipdate = “1995-03-
15” (v8)
σ O.orderdate = “1994-10-
01”
(v11)
σ L.shipdate = “1995-
09-01”
(v14)⋈nationkey
(v10)
⋈custkey
(v6)⋈partkey
(v13)πname, address,
phone, acctbal,
nationkey, custkey,
mktsegment (v1)
πorderkey, orderdate,
custkey, shippriority
(v2)
πpartkey, orderkey,
shipdate,
extendedprice
(v3)
πnationkey,
name
(v4)
πpartkey,
type(v5)
thesis defense of Sirisha Machiraju: Space Allocation for Materialized Views and Indexes Using Genetic Algorithms, June 2002