Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | alexis-wood |
View: | 225 times |
Download: | 0 times |
Query evaluation
QueryParse,
compileRelational
algebra
Optimize
Execution plan
Evaluate
Statistics
Query output
Data
• Query– SQL
• Parse– Correct SQL query?
• Relational algebra– Understandable for the computer
• Optimize– Based on what?
• Execution plan– If several queries give the same result: which is
the best?• Evaluate– Find the proper data records
• Query output– Give answer to the user
Optimization example
• Data of a bank• select balance from account
where balance<2500• Two relational algebra representation
• The cost of an operation depends on the algorithms we can use: e.g. an index speeds up the selection
• Primitive: elemental operation (projection, selection, …)
• Pipeline: building blocks for evaluations and statistics
• Input of a primitive=output of the previous primitive
Catalogue cost approximation
• For choosing the proper strategy• The approximation of cost is needed• Cost approximation can be done based on
several attributes– Space– Time
• Statistics are stored in the catalogue
Content of the catalogue
• Number of records in relation r: nr
• Number of blocks used for relation r: br
• Size of one records in relation r: sr
• Number of records in one block: fr
• Number of different values of attribute A in relation r: V(A,r) = |πA(r)|
• Average number of records that fulfills an equality selection for attribute A: SC(A,r)
Catalogue information about indexes
• Hash tables are considered as special indexes• Average number of pointers in one node
(averge number of children): fi
• Height of tree i: HTi=|log fi V(A,r)| or in case of hash, HTi=1
• Lowest level index Block (number of leaf nodes): LBi
• Statistics should be updated after every modification expensive
• Updated when DB has time• Not always consistent, but gives good
approximation
Equality selection
• Full scan: br
• Binary search: – Blocks are sequential on the disk– File is ordered by attribute A– Just for equality search
• For clustered index on search key: HTi+1• For clustered index not on search key: HTi+• For unclustered index: HTi+SC(A,r)
Range selection
• Selection: σA≤v(r)– If v is unknown: nr/2– If v is known (with uniform distribution):
• With clustered index– If v is unknown: HTi+br/2– If v is known:
where c is the number of records that fulfills A≤v
• With unclustered index: HTi+LBi/2+nr/2– Sometimes better to use full scan
Types of join
• and , • Natural join:• T• Outer join– Left join: – Right join:– Full join:
• Theta join:
Distinct
• Nested loop join of relations r and s:FOR trr DO BEGIN
FOR tss DO BEGIN
test (tr, ts) if they fulfill Θ
IF yes THEN add (tr, ts)
ENDENDWorst case cost: nr*bs+br
Nested with blocks
FOR brr DO BEGIN
FOR bss DO BEGIN
FOR trbr DO BEGIN
FOR ts bs DO BEGIN
test (tr, ts) if they fulfill Θ
IF yes THEN add (tr, ts)
ENDEND
ENDENDWorst case: br*bs+br
Indexed nested-loop join
• If one of the relations is indexed• No need for full scan• Cost: br + nr*c, where c is the cost of selection
on s
Merge join
• First sort the relations based on the join attributes
• Reading the relations once is enough• Cost: cost of sorting+br+bs
Other operations
• Filter repetition (distinct)– Sort– Delete
• Cost: cost of sorting• Projection: cost of sorting +(filter repetition+)br
• Union: Sort relations+merge+filter repetition• Intersection: sort both+select common rows• Difference: sort+delete rows from 2nd relation
Evaluation - Materialization
• Tree of operations• Leaves: relations• Nodes: operations• Cost: storing temporal
relations + cost of operations
• Parallel processing
Pipelining
• Temporal storing is reduced• Result records are given for the next process and
not stored any more• Save memory (records are stored, not relations)• Sorting is not possible• Demand-driven pipeline: system requires data
when needed• Data-driven pipeline: operations push data to the
pipeline without request until the buffer gets full
Pipeline evaluation
• Records arrive one after another• Merge cannot be used• Indexed nested-loop join can be used
Transformation of relational expressions
• Transform to equivalent expressions with smaller evaluation time
• Example: Give me the names of customers who have account in Brooklyn
• Time consuming (selection after join 3 tables)
• Much better
Equivalence rules
• Predicates: Θ, Θ1, Θ2
• Attributes: L1, L2, L3
• Relational algebra expression: E, E1, E2
• Cascade selection: • Commutativity:• Cascade projection:• Connection of join and Descartes
multipliation:
• Commutativity of theta-join:
• Associativity of natural join:
• Distributivity of selection on join– Θ0 contains attributes from E1
• Distributivity of projection on theta-join– L1, L2 contains attributes from E1, E2 and in the join
condition there are attributes only from L1
– L1, L2 contains attributes from E1, E2
L3, L4 contains attributes from E1, E2 but not from L1
and in the join condition there are attributes only from L3 and
• Commutativity of union and intersection
• Assiciativity of union and intersection
• Distributivity of selection on union, intersection, and difference
• Distributivity of projection on union
Choosing evaluation plan
• Create algorithm for the expressions• Give order for the operations• Take them into processes• Example:
pipeline pipeline
use 1. indexuse linear
scan
Sort to filter repetition
Cost-based optimization
• List all the equivalent expressions• Assign execution plan for every plan• Calculate the cost for every plan• Choose the cheapest (based on
approximations and statistics)• Disadvantage: if too many plan, then too many
pre calculations
Example
• Joining 3 relations: 6 ways and parenthesized in two ways: (2*(n-1)!) / (n-1)!
• If n=10 then 176 billions of plans…• Solution: use some heuristics• Consider• First optimal join for the first 3 relations, then
join with the rest:• 12+12 plans remain not good!
Rules for heuristics
• Do the selection at the beginning to reduce the number of rows
• Do the projection as soon as possible to reduce the size of rows
• Split the conjunction of selections to sequence of selections (use only one selection at the time)
• Push down the selections on the tree• Use the selection or join which results in the
least number of rows use associativity of join
• If join is equivalent to a Descartes multiplication and a selection comes next then merge them into a join operation: less records are generated
• Break the projection lists, push them up on the tree (sometimes new projections can be generated)
• Search subtrees where pipeline can be applied
1. By applying the rules, several trees are got2. Calculate the cost3. Apply the cheapest• The optimization adds a cost optimize it• The optimal optimizer optimizes the cost of its
own work and the execution too.