Date post: | 28-Mar-2015 |
Category: |
Documents |
Upload: | brandon-dunlap |
View: | 239 times |
Download: | 0 times |
Query optimisation
1
Query optimisation
Query optimisation
2
Example - hospital database
Name Office …L. Johnson Sur1-left …
P. Thomson IC100 …… … …C. Craig Int-100 …
Name Disease D_name …C. Reed prk11 P. Thomson …M. Fox blood press L. Johnson …C. Fish stomach-ul L. Johnson …… … … …… … … …P. Wolf kidn-fail U. Ulrich …
100 tuples
2500 tuples
Doctors
Patients
Query optimisation
3
Query
Get the name of the doctors who treat patients suffering from the prk11 disease
SELECT Doctors.nameFROM Doctors, PatientsWHERE Disease = ‘prk11’ AND D_name = Doctors.Name
Query optimisation
4
Evaluation #1
restrict Patients to those who suffer from prk11 • read: 2500 tuples; result: estimated 50 tuples; no need to
write intermediate result - sufficiently small
join above result with Doctors• read: 100 tuples (Doctors); result 50 tuples; no need to
write to disk intermediate result
project result over Doctors.name• the desired result is in the memory
estimated cost (read and write) 2600
Query optimisation
5
Evaluation #2
• suppose the internal memory allows only some 350 tuples
join Patients with Doctors• read Patients in batches of 250 tuples; therefore read
Doctors 10 times; read: 2500 + 1000 = 3500; write intermediate result (too big) to disk: 2500;
restrict above result• read 2500; result: estimated 50 tuples;
project cost: 8500 (read and write)
Query optimisation
6
Intermediate conclusions
the evaluation strategy (procedural aspect) can lead to very big differences in computation time, for the same query
• computation time: read from and write to disk (quintessential)• processor time
the actual evaluation procedures are far more complex than in the previous introductory example
Query optimisation
7
Optimisation - what
deciding upon the best strategy of evaluating a query
it is performed automatically by the optimiser of the DBMS
not just for data retrieval operations, but for updating operations as well (e.g. UPDATE)
not guaranteed to give the best result
Query optimisation
8
Optimisation - how
based on statistical information about the specific database (not necessarily, though) perform expression transformation (cast query in some
internal form and convert to respective canonical form candidate low level procedures selection query plans generation and selection
statistical information - could you think of examples? cardinality of base relations, indexes, ...
Query optimisation
9
Cast (transform) query in some internal form
internal format• more suitable for automatic processing• trees (syntax tree or query tree)
from a conceptual point of view is is easier to assume that the internal format is relational algebra
Query optimisation
10
Convert to canonical form
the initial expression is transformed into an equivalent but more efficient form
• “efficient form” = efficient when executed• these transformation are performed independently from
actual data values and access paths
Query optimisation
11
Expression transformation
examples(A WHERE condition#1) WHERE condition#2
(A WHERE condition#1 AND condition#2)
(A [projection#1] ) [projection#2]
A [projection#2]
(A [projection]) WHERE condition
(A WHERE condition) [restriction]
Query optimisation
12
Expression transformation
distributivity commutativity and associativity idempotence scalar expressions conditional expressions semantic transformation
Query optimisation
13
Set level operations
the operators of relational algebra are set level
• i.e. they manipulate sets (relations) and not individual tuples
however, these operators are implemented by internal (DBMS) procedures
• these procedures, inherently, need tuple-access (in fact, they need access to scalar values)
Query optimisation
14
Choose candidate low-level procedures
the optimiser decides how to execute the query (expressed in canonical form)
• access paths are relevant at this stage
in the main, each basic operation (join , restriction, …) has a set of procedures that implement it
• e.g. RESTRICTION - (1) on candidate key; (2) on indexed key; (3) on other attributes …
• each procedure has associated a cost function (usually based on the required I/O disk operations); these functions are used in the next stage
Query optimisation
15
Implementing JOIN - examples
R and P - two relations to be joined J - the attribute on which the (natural) join is
performed R[i] and P[j] mean the i-th tuple of R and the
j-th tuple of P, respectively R[i].J means the value of the attribute J for
the i-th tuple of the relation R R has M and P has N tuples, respectively
Query optimisation
16
Implementing JOIN - brute force
for i:=1 to Mfor j := 1 to N do
if R[i].J = P[j].J thenadd joined tuple R[i]*P[j] to result
endend
end
Query optimisation
17
Index lookup
index X on Patients.D_name
Name Disease D_name …C. Reed prk11 Thomson …M. Fox blood press Johnson …C. Fish stomach-ul Johnson …M. Maria ear Thomson …P. Bosh nose Johnson …P. Wolf kidn-fail Ulrich …
D_name PointerJohnsonJohnsonJohnsonThomsonThomsonU. Ulrich
Query optimisation
18
Implementing JOIN - index lookup
/* index X on P.J */for i:=1 to M
for j := 1 to K[i] doadd joined tuple (R[i] * PK[j]) to result/* PK[j] represents the tuple of P that K[j] points to */
endend
Query optimisation
19
Choose the cheapest query plan
construct query plans (query evaluation plan)• combine candidate low level procedures• choose the cheapest• total cost = the sum of individual costs• individual costs depend on the actual data values;
estimates are used instead, based on statistical data • usually not all possible evaluation procedures are
generated; the search space is reduced by applying heuristics
Query optimisation
20
Database statistics - in the data dictionary
for each base table• cardinality• space occupied• etc.
for each column of each base table• no of distinct values• maximum, minimum and average values• histogram of values • …
...
Query optimisation
21
An optimiser is never perfect
the following example is a real life example suppose a Postgres definition for
• base relation: Treatment(Patient, Drug, Disease, …)
the query• get all the drugs that are taken by patients that suffer from prk11
• (all the drugs, not only those for prk11)
SELECT DISTINCT Drug FROM Treatment
WHERE Patient IN
(SELECT Patient FROM Treatment
WHERE Disease = ‘prk11’) ;
the query is far slower that the equivalent one (next) ...
Query optimisation
22
An optimiser is never perfect
/* this query is faster than the previous one, even though it seems to be performing more computations - Patient is not unique! */
CREATE VIEW V_Treatment AS SELECT * FROM Treatment
SELECT DISTINCT Treatment.Drug FROM Treatment, V_Treatment WHERE Treatment.Patient = V_Treatment.Patient AND Disease = ‘prk11’ ;