3
Relational Algebra
• Formalism for creating new relations
from existing ones
• Its place in the big picture:
Declarative
query
language
Algebra Implementation
SQL,
relational calculus
Relational algebra
4
Relational Algebra• Five operators:
– Union: !
– Difference: -
– Selection: s
– Projection: P
– Cartesian Product: "
• Derived or auxiliary operators:– Intersection, complement
– Joins (natural,equi-join, theta join, semi-join)
– Renaming: r
5
1. Union and 2. Difference
• R1 ! R2
Example:
– ActiveEmployees ! RetiredEmployees
• R1 – R2
Example:
– AllEmployees ! RetiredEmployees
6
What about Intersection?
• It is a derived operatorR1 # R2 = R1 – (R1 – R2)
• Also expressed as a join (will see later)
Example
– UnionizedEmployees # RetiredEmployees
7
3. Selection
• Returns all tuples which satisfy a condition
• Notation: sc(R) (sometimes also !)
• Examples
– s Salary > 40000 (Employee)
– s name = “Smith” (Employee)
• The condition c can be =, <, $, >, %, <>
[in SQL: SELECT * FROM Employee
WHERE Salary > 40000]
condition
8
Selection Example
Employee
SSN Name DepartmentID Salary
999999999 John 1 30,000
777777777 Tony 1 32,000
888888888 Alice 2 45,000
SSN Name DepartmentID Salary
888888888 Alice 2 45,000
Find all employees with salary more than $40,000.s Salary > 40000 (Employee)
9
4. Projection
• Eliminates columns, then removes duplicates
• Notation: P A1,…,An (R) (sometimes !)
• Example:
– project to social-security number and names:
– P SSN, Name (Employee)
– Output schema: Answer(SSN, Name)
[In SQL: SELECT DISTINCT SSN, Name FROM Employee]
attributes
10
Projection Example
Employee
SSN Name DepartmentID Salary
999999999 John 1 30,000
777777777 Tony 1 32,000
888888888 Alice 2 45,000
SSN Name
999999999 John
777777777 Tony
888888888 Alice
P SSN, Name (Employee)
11
5. Cartesian Product
• Combine each tuple in R1 with each tuple inR2
• Notation: R1 " R2
• Example:– Employee " Dependents
• Very rare in practice; mainly used to expressjoins
[In SQL: SELECT * FROM R1, R2]
12
Cartesian Product Example
Employee
Name SSN
John 999999999
Tony 777777777
Dependents
EmployeeSSN Dname
999999999 Emily
777777777 Joe
Employee x Dependents
Name SSN EmployeeSSN Dname
John 999999999 999999999 Emily
John 999999999 777777777 Joe
Tony 777777777 999999999 Emily
Tony 777777777 777777777 Joe
Note the output
13
Cartesian Product and SQLselect * from Employee, Dependents;
+------+-----------+-------------+-------+
| Name | SSN | EmployeeSSN | Dname |
+------+-----------+-------------+-------+
| John | 999999999 | 999999999 | Emily |
| Tony | 777777777 | 999999999 | Emily |
| John | 999999999 | 777777777 | Joe |
| Tony | 777777777 | 777777777 | Joe |
+------+-----------+-------------+-------+
select * from Employee, Dependents where SSN=EmpoyeeSSN;
+------+-----------+-------------+-------+
| Name | SSN | EmployeeSSN | Dname |
+------+-----------+--------------+-------+
| John | 999999999 | 999999999 | Emily |
| Tony | 777777777 | 777777777 | Joe |
+------+-----------+-------------+-------+
14
Relational Algebra• Five operators:
– Union: !
– Difference: -
– Selection: s
– Projection: P
– Cartesian Product: "
• Derived or auxiliary operators:– Intersection, complement
– Joins (natural,equi-join, theta join, semi-join)
– Renaming: r
15
Renaming
• Changes the schema, not the instance
• Schema: R(A1, …, An )
• Notation: r B1,…,Bn (R)
• Example:
– r LastName, SocSocNo (Employee)
– Output schema: Answer(LastName, SocSocNo)
[in SQL: SELECT Name AS LastName, SSN AS SocSocNo
FROM Employee]
16
Renaming Example
Employee
Name SSNJohn 999999999Tony 777777777
LastName SocSocNoJohn 999999999Tony 777777777
r LastName, SocSocNo (Employee)
17
Natural Join
• Notation: R1 ! R2
• Meaning: R1 ! R2 = PA(sC(R1 " R2))– Equality on common attributes names
• Where:– The selection sC checks equality of all common
attributes
– The projection eliminates the duplicate commonattributes
[in SQL: SELECT DISTINCT R1.A, R1. B, R2.C FROM R1, R2WHERE R1.B = R2.BSchema: R1(A,B), R2(B,C)]
18
Natural Join Example
Employee
Name SSNJohn 999999999Tony 777777777
Dependents
SSN Dname999999999 Emily777777777 Joe
Name SSN DnameJohn 999999999 EmilyTony 777777777 Joe
Employee Dependents =
PName, SSN, Dname(s SSN=SSN2(Employee x r SSN2, Dname(Dependents))
19
Natural Join
• R= S=
• R ! S=
VZ
ZY
ZX
YX
BA
VZ
WV
UZ
CB
WVZ
VZY
UZY
VZX
UZX
CBA
20
Natural Join
• Given the schemas R(A, B, C, D), S(A, C,
E), what is the schema of R ! S ?
• Given R(A, B, C), S(D, E), what is R ! S ?
• Given R(A, B), S(A, B), what is R ! S ?
21
Theta Join
• A join that involves a predicate
• R1 ! q R2 = sq (R1 " R2)
• Here q can be any condition
22
Eq-join
• A theta join where q is an equality
R1 !A=B R2 = s A=B (R1 " R2)
Equality on two different attributes
• Example:
– Employee !SSN=SSN Dependents
• Most useful join in practice(difference to natural join?)
23
Semijoin
• R " S = P A1,…,An (R ! S)
• Where A1, …, An are the attributes in R
• Example:
– Employee " Dependents
24
EmployeeName EmpId DeptName
Harry 3415 Finance
Sally 2241 Sales
George 3401 Finance
Harriet 2202 Sales
DeptDeptName Manager
Sales Harriet
Production Charles
Semijoin - Example
Employee ! DeptName EmpId DeptName
Sally 2241 Sales
Harriet 2202 Sales
Only attributes of Employee are selected
25
Semijoins in Distributed
Databases• Semijoins are used in distributed databases
. . .. . .
NameSSN
Dname
. . .. . .
AgeSSNEmployee
Dependents
network
Employee !ssn=ssn (s age>71 (Dependents))
T = P SSN s age>71 (Dependents)R = Employee "!T
Answer = R ! Dependents
26
Complex RA Expressions
Person Purchase Person Product
sname=fred sname=gizmo
P pidP ssn
seller-ssn=ssn
pid=pid
buyer-ssn=ssn
P name
27
Application:
Query Rewriting for Optimization
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
Reserves Sailors
sid=sid
bid=100
sname
rating > 5(Scan;write to temp T1)
(Scan;write totemp T2)
The earlier we process selections, the fewer tuples we need to
manipulate higher up in the tree (predicate pushdown)
Disadvantages?
28
Algebraic Laws (Examples)
• Commutative and Associative Laws– R " S = S " R, R " (S " T) = (R " S) " T
– R S = S R, R (S T) = (R S) T
• Laws involving selection– s C AND C"(R) = s C(s C"(R)) = s C(R) " s C"(R)
– s C (R S) = s C (R) S• When C involves only attributes of R
• Laws involving projections– PM(PN(R)) = PM,N(R)
! ! ! ! ! !
! !
29
Operations on Bags
A bag = a set with repeated elements
All operations need to be defined carefully on bags
• {a,b,b,c}!{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}
• {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b}
• sC(R): preserve the number of occurrences
• PA(R): no duplicate elimination
• Cartesian product, join: no duplicate elimination
Important: Relational Engines work on bags, not sets!
30
Finally: RA has Limitations !
• Cannot compute “transitive closure”
• Find all direct and indirect relatives of Fred
• Cannot express in RA !!! Need to write Cprogram
SisterLouNancy
SpouseBillMary
CousinJoeMary
FatherMaryFred
RelationshipName2Name1