1
Lecture 07:Relational Algebra
2
Outline
• Relational Algebra (Section 6.1)
3
Relational Algebra
• Formalism for creating new relations from existing ones
• Its place in the big picture:
Declarativequery
language
Declarativequery
languageAlgebraAlgebra ImplementationImplementation
SQL,relational calculus
Relational algebra
4
Relational Algebra• Five operators:
– Union: – Difference: -– Selection:– Projection: – Cartesian Product:
• Derived or auxiliary operators:– Intersection, complement– Joins (natural,equi-join, theta join, semi-join)– Renaming:
5
1. Union and 2. Difference
• R1 R2Example: – ActiveEmployees RetiredEmployees
• R1 – R2Example:– AllEmployees − RetiredEmployees
6
What about Intersection ?
• It is a derived operatorR1 R2 = R1 – (R1 – R2)
• Also expressed as a join (will see later)Example– UnionizedEmployees RetiredEmployees
7
3. Selection
• Returns all tuples which satisfy a condition
• Notation: c(R)
• Examples– Salary > 40000 (Employee)
– name = “Smith” (Employee)
• The condition c can be =, <, , >, , <>
[in SQL: SELECT * FROM Employee WHERE Salary > 40000]
8
Selection Example
EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000
SSN Name DepartmentID Salary888888888 Alice 2 45,000
Find all employees with salary more than $40,000.Salary > 40000 (Employee)
9
4. Projection• Eliminates columns, then removes duplicates
• Notation: A1,…,An (R)
• Example: project to social-security number and names:– SSN, Name (Employee)
– Output schema: Answer(SSN, Name)
[In SQL: SELECT DISTINCT SSN, Name FROM Employee]
10
Projection Example
EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000
SSN Name999999999 John777777777 Tony888888888 Alice
SSN, Name (Employee)
11
5. Cartesian Product
• Combine each tuple in R1 with each tuple in R2• Notation: R1 R2• Example:
– Employee Dependents
• Very rare in practice; mainly used to express joins
[In SQL: SELECT * FROM R1, R2]
12
Cartesian Product Example Employee Name SSN John 999999999 Tony 777777777 Dependents EmployeeSSN Dname 999999999 Emily 777777777 Joe Employee x Dependents Name SSN EmployeeSSN Dname John 999999999 999999999 Emily John 999999999 777777777 Joe Tony 777777777 999999999 Emily Tony 777777777 777777777 Joe
13
Relational Algebra• Five operators:
– Union: – Difference: -– Selection:– Projection: – Cartesian Product:
• Derived or auxiliary operators:– Intersection, complement– Joins (natural,equi-join, theta join, semi-join)– Renaming:
14
Renaming
• Changes the schema, not the instance
• Schema: R(A1, …, An )
• Notation: B1,…,Bn (R)
• Example:– LastName, SocSocNo (Employee)
– Output schema: Answer(LastName, SocSocNo)
[in SQL: SELECT Name AS LastName, SSN AS SocSocNo FROM Employee]
15
Renaming Example
EmployeeName SSNJohn 999999999Tony 777777777
LastName SocSocNoJohn 999999999Tony 777777777
LastName, SocSocNo (Employee)
16
Natural Join• Notation: R1 R2⋈• Meaning: R1 R2 = ⋈ A(C(R1 R2))
• Where:– The selection C checks equality of all common attributes– The projection eliminates the duplicate common attributes
[in SQL: SELECT DISTINCT R1.A, R1. B, R2.C FROM R1, R2
WHERE R1.B = R2.B
Schema: R1(A,B), R2(B,C)]
17
Natural Join Example
EmployeeName SSNJohn 999999999Tony 777777777
DependentsSSN Dname999999999 Emily777777777 Joe
Name SSN DnameJohn 999999999 EmilyTony 777777777 Joe
Employee Dependents = Name, SSN, Dname( SSN=SSN2(Employee x SSN2, Dname(Dependents))
18
Natural Join
• R= S=
• R ⋈ S=
A B
X Y
X Z
Y Z
Z V
B C
Z U
V W
Z V
A B C
X Z U
X Z V
Y Z U
Y Z V
Z V W
19
Natural Join
• Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ?
• Given R(A, B, C), S(D, E), what is R ⋈ S ?
• Given R(A, B), S(A, B), what is R ⋈ S ?
20
Theta Join
• A join that involves a predicate
• R1 ⋈ R2 = (R1 R2)
• Here can be any condition
21
Eq-join
• A theta join where is an equality
R1 ⋈A=B R2 = A=B (R1 R2)
• Example:– Employee ⋈SSN=SSN Dependents
• Most useful join in practice(difference to natural join?)
22
Semijoin
• R ⋉ S = A1,…,An (R ⋈ S)
• Where A1, …, An are the attributes in R
• Example:– Employee ⋉ Dependents
23
Semijoins in Distributed Databases
• Semijoins are used in distributed databases
SSN Name
. . . . . .
SSN Dname Age
. . . . . .
EmployeeDependents
network
Employee ⋈ssn=ssn (age>71 (Dependents))Employee ⋈ssn=ssn (age>71 (Dependents))
T = SSN age>71 (Dependents)R = Employee T⋉
Answer = R ⋈ Dependents
24
Complex RA Expressions
Person Purchase Person Product
name=fred name=gizmo
pid ssn
seller-ssn=ssn
pid=pid
buyer-ssn=ssn
name
25
Application: Query Rewriting for Optimization
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
Reserves Sailors
sid=sid
bid=100
sname
rating > 5(Scan;write to temp T1)
(Scan;write totemp T2)
The earlier we process selections, less tuples we need to manipulatehigher up in the tree (predicate pushdown)Disadvantages?
26
Algebraic Laws (Examples)
• Commutative and Associative Laws– R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T
– R S = S R, R (S T) = (R S) T
• Laws involving selection– C AND C’(R) = C( C’(R)) = C(R) ∩ C’(R)
– C (R S) = C (R) S • When C involves only attributes of R
• Laws involving projections– M(N(R)) = M,N(R)
27
Operations on Bags
A bag = a set with repeated elements
All operations need to be defined carefully on bags• {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}• {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}
• C(R): preserve the number of occurrences
• A(R): no duplicate elimination
• Cartesian product, join: no duplicate elimination
Important ! Relational Engines work on bags, not sets !
28
Finally: RA has Limitations !
• Cannot compute “transitive closure”
• Find all direct and indirect relatives of Fred• Cannot express in RA !!! Need to write C program
Name1 Name2 Relationship
Fred Mary Father
Mary Joe Cousin
Mary Bill Spouse
Nancy Lou Sister
29
Formulating queries in RA
• Consider a database for student enrollment for courses, and books used in the courses– STUDENT (SSN, Name, Major, Bdate)
– COURSE (Course#, Cname, Dept)
– ENROLL (SSN, Course#, Quarter, Grade)
– BOOK_ADOPTION (Course#, Quarter, Book_ISBN)
– TEXT (Book_ISBN, Book_Title, Publisher, Author)
30
Formulating queries in RA
• Specify the following queries in relational algebra– List the number of courses (Course#) taken by all
students named ‘John Smith’ in Winter 1999 (i.e., Quarter = W99)
– List any department which has all its adopted books published by ‘BC Publishing’
31
Formulating Queries in RA
Course# (Quarter=W99 ((Name= ‘John Smith’ (STUDENT) ⋈ ENROLL))
OtherDept = Dept ((Publisher <> ‘PS Publishers’
(BOOK_ADOPTION ⋈ TEXT)) ⋈ COURSE) AllDept = Dept (BOOK_ADOPTION ⋈ COURSE)Answer = AllDept - OtherDept
And how will you express it in SQL?
WHY?