+ All Categories
Home > Documents > Algebraic Laws

Algebraic Laws

Date post: 04-Jan-2016
Category:
Upload: samuru
View: 30 times
Download: 0 times
Share this document with a friend
Description:
Algebraic Laws. For the binary operators, we push the selection only if all attributes in the condition C are in R. Example:. Consider relation schemas R ( A , B ) and S ( B , C ) and the expression below:  ( A =1 OR A =3) AND B < C ( R  S ) - PowerPoint PPT Presentation
23
Algebraic Laws ) ( R D C ) ( S R C ) ( S R C ) ( S R D C For the binary operators, we push the selection only if all attributes in the condition C are in R. )) ( ( R D C S R C S R C ) ( S R D C ) (
Transcript
Page 1: Algebraic Laws

Algebraic Laws )(RDC

)( SRC

)( SRC

)( SR DC

For the binary operators, we push the selection only if all attributes in the condition C are in R.

))(( RDC

SR C

SRC )(

SR DC )(

Page 2: Algebraic Laws

Example:

• Consider relation schemas R(A,B) and S(B,C) and the expression below:

(A=1 OR A=3) AND B<C(R S)

1. Splitting AND A=1 OR A=3 (B < C(R S))

2. Push to S A=1 OR A=3 (R B < C(S))

3. Push to R A=1 OR A=3 (R) B < C(S)

Page 3: Algebraic Laws

Pushing selections• Usually selections are pushed down the expression tree.

• The following example shows that it is sometimes useful to pull selection up in the tree.

StarsIn(title,year,starName)Movie(title,year,length,studioName)

CREATE VIEW MoviesOf1996 AS SELECT * FROM MOVIE WHERE year=1996;

Query: Which stars worked for which studios in 1996?

SELECT starName,studioName FROM MoviesOf1996 NATURAL JOIN StarsIN;

Page 4: Algebraic Laws

pull selection upthen push down

Page 5: Algebraic Laws

Laws for (bag) Projection• A simple law: Project out attributes that are not needed later.

– I.e. keep only the input attr. and any join attribute.

)( SRL

)( SR CL

)( SRL

))()(( SR NML

))()(( SR NCML

))()(( SR NML

RCL RMC

Page 6: Algebraic Laws

Examples for pushing projection

Schema R(a,b,c), S(c,d,e)

)( SRxea ))(( , SR ecxea

)(, SRyedxba ))()(( ,,, SR cyedcxbayx

Page 7: Algebraic Laws

Example: Pushing Projection• Schema: StarsIn(title,year,starName) • Query: SELECT starName FROM StarsIn WHERE year = 1996;

Should we transform to ?Depends!Is StarsIn stored or computed?

starName

year=1996

StarsIn

starName

year=1996

StarsIn

starName,year

Page 8: Algebraic Laws

Reasons for not pushing the projection

• If StarsIn is stored, the for the projection we have to scan the relation.

• If the relation is pipelined from some previous computation, then yes, we better do the projection (on the fly).

• Also, if for example there is an index on year for StarsIn, such index is useless in the projected relation starName,year(StarsIn)

– While such an index is very useful for the selection on “year=1996”

Page 9: Algebraic Laws

Laws for duplicate elimination and grouping

• Try to move in a position where it can be eliminated altogether

E.g. when is applied on • A stored relation with a declared primary key• A relation that is the result of a operation, since grouping creates a relation

with no duplicates.

)())(( RR LL

))(()( RR MLL

absorbs

Also:

What’s M?

Page 10: Algebraic Laws

Improving logical query plans

• Push as far down as possible (sometimes pull them up first).

• Do splitting of complex conditions in in order to push even further.

• Push as far down as possible, introduce new early (but take care for exceptions)

• Combine with to produce -joins or equi-joins

• Choose an order for joins

Page 11: Algebraic Laws

Example of improvement

SELECT titleFROM StarsIn, MovieStarWHERE starName = name AND birthdate LIKE ‘%1960’;

title

starname=name AND birthdate LIKE ‘%1960’

StarsIn

MovieStar

title

StarsIn MovieStar

starName=name

birthdate LIKE ‘%1960’

Page 12: Algebraic Laws

title

StarsIn

MovieStar

starName=name

birthdate LIKE ‘%1960’

name

And a better plan introducing a projection to filter out useless attributes:

Page 13: Algebraic Laws

Estimating the Cost of Operations• We don’t want to execute the query in order to learn the costs.

So, we need to estimate the costs.

• What’s the cost?

• The number of I/O’s to needed to manage the intermediate relations.

• This number will be a function of the size of intermediate relations, – i.e. number of their tuples times the number of bytes per tuple

• How can we estimate the number of tuples in an intermediate relation?

Rules about estimation formulas:

1. Give (somehow) accurate estimates

2. Easy to compute

Page 14: Algebraic Laws

Projection

• Projection retains duplicates, so the number of tuples in the result is the same as in the input.

• Result tuples are usually shorter than the input tuples.

• The size of a projection is the only one we can compute exactly.

Page 15: Algebraic Laws

SelectionLet S = A=c (R)

We can estimate

T(S) = T(R) / V(R,A)

Let S = A<c (R)

On average, T(S) would be T(R)/2, but more properly: T(R)/3

Let S = Ac (R),

Then, an estimate is:

T(S) = T(R)*[(V(R,A)-1)/V(R,A)], or simply

T(S) = T(R)

Page 16: Algebraic Laws

Selection ...Let

S = C AND D(R) = C(D(R)) and

U = D(R).

First estimate T(U) and then use this to estimate T(S).

Example

S = a=10 AND b<20(R)T(R) = 10,000, V(R,a) = 50T(S) = (1/50)* (1/3) * T(R) = 67

Note: Watch for selections like:

a=10 AND a>20(R)

Page 17: Algebraic Laws

Selection ...• Let S = C OR D(R). • Simple estimate: T(S) = T(C(R)) + T(D(R)).• Problem: It is possible that T(S)T(R)!

A more accurate estimate• Let:

– T(R)=n, – m1 = size of selection on C, and – m2 = size of selection on D.

• Then T(S) = n(1-(1-m1/n)(1-m2/n)) Why?

• Example: S = a=10 OR b<20(R). T(R) = 10,000, V(R,a) =50• Simple estimation: T(S) = 3533• More accurate: T(S) = 3466

Page 18: Algebraic Laws

Natural Join R(X,Y) S(Y,Z)

• Anything could happen!

• No tuples join: T(R S) = 0

• Y is the key in S and a foreign key in R (i.e., R.Y refers to S.Y): Then,

T(R S) = T(R)

• All tuples join: i.e. R.Y=S.Y = a.

Then,

T(R S) = T(R)*T(S)

Page 19: Algebraic Laws

Two AssumptionsContainment of value sets

• If V(R,Y) ≤ V(S,Y), then every Y-value in R is assumed to occur as a Y-value in S

• When such thing can happen?

• For example when: Y is foreign key in R, and key in S

Preservation of set values

• If A is an attribute of R but not S, then is assumed that

V(R S, A)=V(R, A)

• This may be violated when there are dangling tuples in R

• There is no violation when: Y is foreign key in R, and key in S

Page 20: Algebraic Laws

Natural Join size estimation• Let, R(X,Y) and S(Y,Z), where Y is a single attribute.

• What’s the size of T(R S)?

• Let r be a tuple in R and s be a tuple in S. What’s the probability that r and s join?

• Suppose V(R,Y) V(S,Y)

• By the containment of set values we infer that:

• Every Y’s value in R appears in S.

• So, the tuple r of R surely is going match with some tuples of S, but what’s the probability it matches with s?

• It’s 1/V(S,Y).

• Hence, T(R S) = T(R)*T(S)/V(S,Y)

– When V(R,Y) V(S,Y)• By a similar reasoning, for the case when V(S,Y) V(R,Y), we get T(R S) =

T(R)*T(S)/V(S,Y).

• So, sumarizing we have as an estimate:

T(R S) = T(R)*T(S)/max{V(R,Y),V(S,Y)}

Page 21: Algebraic Laws

• Remember:T(R S) = T(R)*T(S)/max{V(R,Y),V(S,Y)}

• Example: R(a,b), T(R)=1000, V(R,b)=20 S(b,c), T(S)=2000, V(S,b)=50, V(S,c)=100 U(c,d), T(U)=5000, V(U,c)=500

• Estimate the size of R S U• T(R S) =

40,000, • T((R S) U) =

400,000• T(S U) =

20,000, • T(R (S U)) =

400,000

• The equality of results is not a coincidence.

• Note 1: estimate of final result should not depend on the evaluation order

• Note 2: intermediate results could be of different sizes

Page 22: Algebraic Laws

Natural join with multiple join attrib.• R(x,y1,y2) S(y1,y2,z)• T(R S) = T(R)*T(S)/m1*m2, where

m1 = max{V(R,y1),V(S,y1)}

m2 = max{V(R,y2),V(S,y2)} • Why?• Let r be a tuple in R and s be a tuple in S. What’s the probability that r and s agree on

y1?

From the previous reasoning, it’s 1/max{V(R,y1),V(S,y1)}• Similarly, what’s the probability that r and s agree on y2?

It’s 1/max{V(R,y2),V(S,y2)}• Assuming that aggrements on y1 and y2 are independent we estimate:

T(R S) =

T(R)*T(S)/[max{V(R,y1),V(S,y1)} * max{V(R,y2),V(S,y2)}]Example:

T(R)=1000, V(R,b)=20, V(R,c)=100T(S)=2000, V(S,d)=50, V(S,e)=50

R(a,b,c) R.b=S.d AND R.c=S.e S(d,e,f)T(R S) = (1000*2000)/(50*100)=400

Page 23: Algebraic Laws

• Another example: (one of the previous) R(a,b), T(R)=1000, V(R,b)=20 S(b,c), T(S)=2000, V(S,b)=50, V(S,c)=100 U(c,d), T(U)=5000, V(U,c)=500

• Estimate the size of R S U

• Observe that R S U = (R U) S

• T(R U) = 1000*5000 = 5,000,000

• Note that the number of b’s in the product is 20 (V(R,b)), and the number of c’s is 500 (V(U,c)).

• T((R U) S) = 5,000,000 * 2000 / (50 * 500) = 400,000


Recommended