Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | jayson-townsend |
View: | 218 times |
Download: | 0 times |
Query Optimization
Chap. 19
Evaluation of SQL
• Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where clauses eliminated – Rows grouped by – Groups not satisfying having eliminated – Select clause target list evaluated – If distinct eliminate duplicate rows – Union after each subselect evaluated – Rows sorted in order by
Actual Order of Evaluation• Order chosen by query optimizer
– Determines efficient way
• Steps to optimization:– Syntax checking phase
• scan - identify tokens of text • parse- check syntax • validate - check attributes and relation names
– Query optimization phase • create internal representation - Query Tree• identify execution strategies for Query Plan
– Maintains statistics for tables and columns, indexes
• choose suitable Query Plan to optimize query e.g. order of execution of ops, use indexes, etc.
How to produce an execution planOracle’s example
Evaluation cont’d
– Execution phase • query optimizer produces execution plan• code generator generates code • runtime db processor runs query code
– To minimize run time • chosen strategy NOT optimal, but reasonably efficient
For procedural languages limited need for query optimization
Strategy
• How would you optimize an SQL query?
Heuristics in Query Optimization
• Apply select and project before join or other binary operations. Why?– select and project reduce size
• Strategy is obvious, but challenge was to show could be done with rules
Create Internal Representation –Query Tree
Converting Query Trees into Query Plans
• An execution plan for a query tree includes information about access methods for each relation and algorithms for operators
• For example:– For operation
• Choose an index• Use a tablescan
– For |X|• Use nested loop, sort-merge or hash join
– For • Scan result of join or combine with |X| when write out
result from join
Query Optimization
• Canonical form (initial query tree - conceptual order of evaluation) – Leaf nodes are tables– Internal nodes are operations– Begin by separating select conditions from joins
(end up with X)– Combine all selects then all projects
• transform to final query tree using general transformation rules for relational algebra
Query tree for SQL query
Select lnameFrom employee, works_on, projectWhere pname=‘Aquarius’ and pnumber=pno
and essn=ssn and bdate > ‘1987-12-31’
Write as canonical query treeWrite as relational algebra expression
General Transformation Rules for Relational Algebra
1. Cascade of 2. Commutativity of 3. Cascade of 4. Commuting with 5. Commutativity of |X| (and X)
6. Commuting with |X| (or X)
7. Commuting with |X| (or X)
General Transformation Rules for Relational Algebra
8. Commutativity of set operations U and ∩9. Associativity of |X|, X, U and ∩10. Commuting with set operations
11. The operation commuted with U12. Converting a (, X) sequence into |X|
Outline of a Heuristic Algebraic Optimization Algorithm
• Use Rule 1 break up conjunctive ’s into cascades of ’s
• Use Rules 2,4,6, 10 for commutativity of to: – move ’s as far down tree as possible
• Use Rules 5 and 9 for commutativity and associativity of binary operations to:– Place most restrictive (and |X|) so executed first
• fewest tuples, smallest absolute size or smallest selectivity
• But make sure no cartesian products result
Outline of a Heuristic Algebraic Optimization Algorithm
• Use Rule 12, combine Cartesian product with to:– create |X|
• Use Rules 3, 4, 7, 11 concerning cascade of ’s and commuting with other ops to:– move down tree as far as possible
• Identify subtrees that represent groups of operations than can be executed by a single algorithm
Summary of Heuristics
• Apply first operations that reduce size of intermediate results– Perform ’s and ’s as early as possible (move
down tree as far as possible)– Execute most restrictive and |X| first (reorder
leaf nodes but avoid cartesian products)
Multiple table joins
• Multiple table joins – Query plan identifies most efficient ordering of
joins – uses dynamic programming
Order of joins - Oracle
• Much more complicated – when determining order of joins, keep track or resulting sort order (interesting order)
• Using dynamic programming considers order of result – can void redundant sort operation later and/or speed up subsequent join
• Can flatten nested joins – dynamic programming can escalate optimization time, so use rules instead
• Estimating cost is difficult
Joins
– May not have to materialize actual table resulting from join
– Instead use pipelining - successive rows output from one step fed into next plan
Converting trees into Query plans
• pipelined evaluation– Forward result from an operation directly to next
operation• Result from , placed in buffer• |X| consumes tuples from buffer• Result from |X| pipelined to
Query Tree Question
• Should we do a pname, pnumber then pname = ‘Aquarius’ then pnumber ?
• No, since the operations are done together– the processor would read a row of project,
see if pname = ‘Aquarius’ then use pnumber to perform the join.
Algorithms
• DBMS has general access algorithms to implement select, join, or combinations of ops
• Don't have separate access routines for each op– Creating temporary files is inefficient – Generate algorithms for combinations of operations – join, select, project - code is created dynamically to
implement multiple operations
Materialized table
• Think about what operations require utilizing a materialized table– Input to select?– Input to project?– Input to join?
Identify Execution Strategiesand Suitable Query Plans
Cost
• Optimizers combine: – Heuristic rules for ordering ops – Systematically estimate cost of different
execution strategies - choose lowest cost• E.g. nested loop or hash join?• CPU cost usually similar for the Query Plans
Cost of Query Plans• Cost components for query execution
– Computation cost: • in-memory ops on data buffers e.g. sorting, searching,
implementation/order of operatopms• CPU cost usually similar for the Query Plans
– Memory usage cost:• Number of memory buffers needed
– Access cost to secondary storage: • cost of search for hashing, indexes, R/W• contiguous versus scattered storage on disk
– Storage cost • if intermediate tables
– Communication cost: • cost to ship query and results from DB site to request site
Cost cont’d
• For small DB's, minimize computation cost • For large DB's, minimize cost to secondary storage
e.g. block transfers between disk and main memory• For distributed DB's, minimize communication cost
• Workload– Mix of queries and frequencies of queries– Given workload and query execution plans, can
determine CPU and I/O resource needs•
Statistics
• Statistics– Maintain statistics about tables
• # rows, #columns, domains• SYSCOLUMNS
col_name, table_name, #of values, High, Low
– Statistics on columns that deviate strongly from the uniform assumption
– Selectivity of values – Execute special command to gather info
• RUNSTATS DB2• ANALYZE Oracle
Cost function information
• Number of tuples or records (r)• Record size (R)• Number of blocks (b)• Blocking factor – records per block (bfr)• Number of distinct values (d)• Selectivity (sl)
Selectivity sl a.k.a Filter Factor FF
• Fraction of rows with specified values(s) for specific attribute that result from the predicate restriction
• How many tuples satisfy predicate• Hopefully only need to access those
tuples + index
Selectivity sl, Filter Factor FF
• # records satisfying condition c total# of records in relation
• Estimate attribute with i distinct values as:– Assume |R| is #rows in table R
FF = sl = ( |R|/i) / |R| = 1/col_cardinality
FF = sl =(10,000/2)/10,000 = 1/2
Examples of sl
• if SQL statement specified: – col = const,
• DB2 assumes sl = 1/col_cardinality
– col between const1 and const2• DB2 assumes sl = (const2 - const1)/(High - Low)
• For some predicates, sl not predictable by simple formula
Assumptions
• Uniform distribution of column value• Attribute values independent• Independent distribution of values from any 2
columns C1 and C2
sl(C1) * sl(C2) e.g. 1/2 (gender) * 1/4 (class) =
1/8 undergrads are female freshman
Cost of a Join
• How to determine join selectivity (js)– js = |(R|X|c S)|/|(RXS)| = |(R|X|c S)| / (|R|X|S|)
• If no join condition, js=?js=1
• If no matching tuples, js=?js=0
Cost of a Join
• Assuming R.A=S.B is join condition– if A is a key of R, js=?
js ≤ (1/|R|)– Unless B is a foreign key and NOT NULL, then js=?
js = (1/|R|)– if B is a key of S, js=?
js ≤ (1/|S|)
• Cost of different implementations of join:
You Can Retrieve Query Plan
Explain plan set queryno=1000 for select * from customers where city = ‘Boston’
Select * from plan_table where queryno=1000;
Givess access type (index or not), columns, etc.
Information for Optimization
• When using indexes– Cluster Ratio
how well clustering property holds for rows with respect to a given index if clustering ratio 80% or more, use sequential
prefetch