+ All Categories
Home > Documents > Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook:...

Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook:...

Date post: 13-Mar-2020
Category:
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
42
3/28/14 1 Chapter 6: The Relational Algebra and Relational Calculus Anis Koubaa Advanced Databases (SE487) Prince Sultan University College of Computer and Information Sciences SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 2 Chapter 6 Outline Unary Relational Operations: SELECT and PROJECT Relational Algebra Operations from Set Theory Binary Relational Operations: JOIN and DIVISION Additional Relational Operations Examples of Queries in Relational Algebra The Tuple Relational Calculus The Domain Relational Calculus
Transcript
Page 1: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

1

Chapter 6: The Relational Algebra and Relational Calculus

Anis Koubaa

Advanced Databases (SE487) Prince Sultan University

College of Computer and Information Sciences

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 2

Chapter 6 Outline

}  Unary Relational Operations: SELECT and PROJECT }  Relational Algebra Operations from Set Theory }  Binary Relational Operations: JOIN and DIVISION }  Additional Relational Operations }  Examples of Queries in Relational Algebra }  The Tuple Relational Calculus }  The Domain Relational Calculus

Page 2: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

2

Why Relational Algebra/Calculus?

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 4

Formal Relational Query Languages }  Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation:

}  Relational Algebra: More operational, very useful for representing execution plans.

}  Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-procedural, declarative.)

4

  Understanding Algebra & Calculus is key to understanding SQL, query processing!

Page 3: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

3

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 5

Preliminaries

}  A query is applied to relation instances, and the result of a query is also a relation instance. }  Schemas of input relations for a query are fixed (but query

will run over any legal instance) }  The schema for the result of a given query is also fixed. It

is determined by the definitions of the query language constructs.

}  Positional vs. named-field notation: }  Positional notation easier for formal definitions, named-field

notation more readable. }  Both used in SQL

}  Though positional notation is not encouraged

5

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 6

The Relational Algebra and Relational Calculus

}  Relational algebra }  Basic set of operations for the relational model

}  Relational algebra expression }  Sequence of relational algebra operations

}  Relational calculus }  Higher-level declarative language for specifying relational

queries

Page 4: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

4

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 7

Relational Algebra: 5 Basic Operations

}  Selection ( σ ) Selects a subset of rows from relation (horizontal).

}  Projection ( π ) Retains only wanted columns from relation (vertical).

}  Cross-product ( × ) Allows us to combine two relations.

}  Set-difference ( — ) Tuples in r1, but not in r2. }  Union ( ∪ ) Tuples in r1 and/or in r2. }  Since each operation returns a relation, operations can be

composed! (Algebra is “closed”.)

7

Unary Relational Operations: SELECT and PROJECT

The SELECT Operation

Page 5: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

5

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 9

SELECT Operation }  The SELECT Operation:

}  used to choose a subset of the tuples from a relation that satisfies a selection condition (filter):

}  Example

6.1 Unary Relational Operations: SELECT and PROJECT 147

culus.2 In tuple relational calculus, variables range over tuples, whereas in domainrelational calculus, variables range over the domains (values) of attributes. InAppendix C we give an overview of the Query-By-Example (QBE) language, whichis a graphical user-friendly relational language based on domain relational calculus.Section 6.8 summarizes the chapter.

For the reader who is interested in a less detailed introduction to formal relationallanguages, Sections 6.4, 6.6, and 6.7 may be skipped.

6.1 Unary Relational Operations: SELECT and PROJECT

6.1.1 The SELECT OperationThe SELECT operation is used to choose a subset of the tuples from a relation thatsatisfies a selection condition.3 One can consider the SELECT operation to be afilter that keeps only those tuples that satisfy a qualifying condition. Alternatively,we can consider the SELECT operation to restrict the tuples in a relation to onlythose tuples that satisfy the condition. The SELECT operation can also be visualizedas a horizontal partition of the relation into two sets of tuples—those tuples that sat-isfy the condition and are selected, and those tuples that do not satisfy the conditionand are discarded. For example, to select the EMPLOYEE tuples whose department is4, or those whose salary is greater than $30,000, we can individually specify each ofthese two conditions with a SELECT operation as follows:

!Dno=4(EMPLOYEE)!Salary>30000(EMPLOYEE)

In general, the SELECT operation is denoted by!<selection condition>(R)

where the symbol ! (sigma) is used to denote the SELECT operator and the selec-tion condition is a Boolean expression (condition) specified on the attributes ofrelation R. Notice that R is generally a relational algebra expression whose result is arelation—the simplest such expression is just the name of a database relation. Therelation resulting from the SELECT operation has the same attributes as R.

The Boolean expression specified in <selection condition> is made up of a numberof clauses of the form

<attribute name> <comparison op> <constant value>

or

<attribute name> <comparison op> <attribute name>

2In this chapter no familiarity with first-order predicate calculus—which deals with quantified variablesand values—is assumed.3The SELECT operation is different from the SELECT clause of SQL. The SELECT operation choosestuples from a table, and is sometimes called a RESTRICT or FILTER operation.

6.1 Unary Relational Operations: SELECT and PROJECT 147

culus.2 In tuple relational calculus, variables range over tuples, whereas in domainrelational calculus, variables range over the domains (values) of attributes. InAppendix C we give an overview of the Query-By-Example (QBE) language, whichis a graphical user-friendly relational language based on domain relational calculus.Section 6.8 summarizes the chapter.

For the reader who is interested in a less detailed introduction to formal relationallanguages, Sections 6.4, 6.6, and 6.7 may be skipped.

6.1 Unary Relational Operations: SELECT and PROJECT

6.1.1 The SELECT OperationThe SELECT operation is used to choose a subset of the tuples from a relation thatsatisfies a selection condition.3 One can consider the SELECT operation to be afilter that keeps only those tuples that satisfy a qualifying condition. Alternatively,we can consider the SELECT operation to restrict the tuples in a relation to onlythose tuples that satisfy the condition. The SELECT operation can also be visualizedas a horizontal partition of the relation into two sets of tuples—those tuples that sat-isfy the condition and are selected, and those tuples that do not satisfy the conditionand are discarded. For example, to select the EMPLOYEE tuples whose department is4, or those whose salary is greater than $30,000, we can individually specify each ofthese two conditions with a SELECT operation as follows:

!Dno=4(EMPLOYEE)!Salary>30000(EMPLOYEE)

In general, the SELECT operation is denoted by!<selection condition>(R)

where the symbol ! (sigma) is used to denote the SELECT operator and the selec-tion condition is a Boolean expression (condition) specified on the attributes ofrelation R. Notice that R is generally a relational algebra expression whose result is arelation—the simplest such expression is just the name of a database relation. Therelation resulting from the SELECT operation has the same attributes as R.

The Boolean expression specified in <selection condition> is made up of a numberof clauses of the form

<attribute name> <comparison op> <constant value>

or

<attribute name> <comparison op> <attribute name>

2In this chapter no familiarity with first-order predicate calculus—which deals with quantified variablesand values—is assumed.3The SELECT operation is different from the SELECT clause of SQL. The SELECT operation choosestuples from a table, and is sometimes called a RESTRICT or FILTER operation.

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 10

SELECT Operation }  The SELECT Operation:

}  The Boolean expression specified in <selection condition> is made up of a number of clauses of the form

}  Example: Select the tuples for all employees who either work in department 4 and make over $25,000 per year, or work in department 5 and make over $30,000,

6.1 Unary Relational Operations: SELECT and PROJECT 147

culus.2 In tuple relational calculus, variables range over tuples, whereas in domainrelational calculus, variables range over the domains (values) of attributes. InAppendix C we give an overview of the Query-By-Example (QBE) language, whichis a graphical user-friendly relational language based on domain relational calculus.Section 6.8 summarizes the chapter.

For the reader who is interested in a less detailed introduction to formal relationallanguages, Sections 6.4, 6.6, and 6.7 may be skipped.

6.1 Unary Relational Operations: SELECT and PROJECT

6.1.1 The SELECT OperationThe SELECT operation is used to choose a subset of the tuples from a relation thatsatisfies a selection condition.3 One can consider the SELECT operation to be afilter that keeps only those tuples that satisfy a qualifying condition. Alternatively,we can consider the SELECT operation to restrict the tuples in a relation to onlythose tuples that satisfy the condition. The SELECT operation can also be visualizedas a horizontal partition of the relation into two sets of tuples—those tuples that sat-isfy the condition and are selected, and those tuples that do not satisfy the conditionand are discarded. For example, to select the EMPLOYEE tuples whose department is4, or those whose salary is greater than $30,000, we can individually specify each ofthese two conditions with a SELECT operation as follows:

!Dno=4(EMPLOYEE)!Salary>30000(EMPLOYEE)

In general, the SELECT operation is denoted by!<selection condition>(R)

where the symbol ! (sigma) is used to denote the SELECT operator and the selec-tion condition is a Boolean expression (condition) specified on the attributes ofrelation R. Notice that R is generally a relational algebra expression whose result is arelation—the simplest such expression is just the name of a database relation. Therelation resulting from the SELECT operation has the same attributes as R.

The Boolean expression specified in <selection condition> is made up of a numberof clauses of the form

<attribute name> <comparison op> <constant value>

or

<attribute name> <comparison op> <attribute name>

2In this chapter no familiarity with first-order predicate calculus—which deals with quantified variablesand values—is assumed.3The SELECT operation is different from the SELECT clause of SQL. The SELECT operation choosestuples from a table, and is sometimes called a RESTRICT or FILTER operation.

148 Chapter 6 The Relational Algebra and Relational Calculus

where <attribute name> is the name of an attribute of R, <comparison op> is nor-mally one of the operators {=, <, !, >, ", #}, and <constant value> is a constant valuefrom the attribute domain. Clauses can be connected by the standard Boolean oper-ators and, or, and not to form a general selection condition. For example, to selectthe tuples for all employees who either work in department 4 and make over$25,000 per year, or work in department 5 and make over $30,000, we can specifythe following SELECT operation:

$(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000)(EMPLOYEE)

The result is shown in Figure 6.1(a).

Notice that all the comparison operators in the set {=, <, !, >, ", #} can apply toattributes whose domains are ordered values, such as numeric or date domains.Domains of strings of characters are also considered to be ordered based on the col-lating sequence of the characters. If the domain of an attribute is a set of unorderedvalues, then only the comparison operators in the set {=, #} can be used. An exam-ple of an unordered domain is the domain Color = { ‘red’, ‘blue’, ‘green’, ‘white’, ‘yel-low’, ...}, where no order is specified among the various colors. Some domains allowadditional types of comparison operators; for example, a domain of characterstrings may allow the comparison operator SUBSTRING_OF.

In general, the result of a SELECT operation can be determined as follows. The<selection condition> is applied independently to each individual tuple t in R. Thisis done by substituting each occurrence of an attribute Ai in the selection conditionwith its value in the tuple t[Ai]. If the condition evaluates to TRUE, then tuple t is

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn DnoFranklin

JenniferRamesh

T Wong

WallaceNarayan

333445555

987654321666884444

1955-12-08

1941-06-201962-09-15

638 Voss, Houston, TX

291 Berry, Bellaire, TX975 Fire Oak, Humble, TX

M

FM

40000

4300038000

888665555

888665555333445555

5

45

Lname Fname SalarySmithWongZelaya

WallaceNarayanEnglishJabbar

Borg

JohnFranklinAlicia

JenniferRameshJoyceAhmad

James

300004000025000

43000380002500025000

300004000025000

43000380002500055000

55000

Sex Salary

MMF

FMMM

(c)(b)

(a)

S

K

Figure 6.1Results of SELECT and PROJECT operations. (a) $(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000) (EMPLOYEE).(b) %Lname, Fname, Salary(EMPLOYEE). (c) %Sex, Salary(EMPLOYEE).

148 Chapter 6 The Relational Algebra and Relational Calculus

where <attribute name> is the name of an attribute of R, <comparison op> is nor-mally one of the operators {=, <, !, >, ", #}, and <constant value> is a constant valuefrom the attribute domain. Clauses can be connected by the standard Boolean oper-ators and, or, and not to form a general selection condition. For example, to selectthe tuples for all employees who either work in department 4 and make over$25,000 per year, or work in department 5 and make over $30,000, we can specifythe following SELECT operation:

$(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000)(EMPLOYEE)

The result is shown in Figure 6.1(a).

Notice that all the comparison operators in the set {=, <, !, >, ", #} can apply toattributes whose domains are ordered values, such as numeric or date domains.Domains of strings of characters are also considered to be ordered based on the col-lating sequence of the characters. If the domain of an attribute is a set of unorderedvalues, then only the comparison operators in the set {=, #} can be used. An exam-ple of an unordered domain is the domain Color = { ‘red’, ‘blue’, ‘green’, ‘white’, ‘yel-low’, ...}, where no order is specified among the various colors. Some domains allowadditional types of comparison operators; for example, a domain of characterstrings may allow the comparison operator SUBSTRING_OF.

In general, the result of a SELECT operation can be determined as follows. The<selection condition> is applied independently to each individual tuple t in R. Thisis done by substituting each occurrence of an attribute Ai in the selection conditionwith its value in the tuple t[Ai]. If the condition evaluates to TRUE, then tuple t is

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn DnoFranklin

JenniferRamesh

T Wong

WallaceNarayan

333445555

987654321666884444

1955-12-08

1941-06-201962-09-15

638 Voss, Houston, TX

291 Berry, Bellaire, TX975 Fire Oak, Humble, TX

M

FM

40000

4300038000

888665555

888665555333445555

5

45

Lname Fname SalarySmithWongZelaya

WallaceNarayanEnglishJabbar

Borg

JohnFranklinAlicia

JenniferRameshJoyceAhmad

James

300004000025000

43000380002500025000

300004000025000

43000380002500055000

55000

Sex Salary

MMF

FMMM

(c)(b)

(a)

S

K

Figure 6.1Results of SELECT and PROJECT operations. (a) $(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000) (EMPLOYEE).(b) %Lname, Fname, Salary(EMPLOYEE). (c) %Sex, Salary(EMPLOYEE).

Page 6: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

6

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 11

Example

148 Chapter 6 The Relational Algebra and Relational Calculus

where <attribute name> is the name of an attribute of R, <comparison op> is nor-mally one of the operators {=, <, !, >, ", #}, and <constant value> is a constant valuefrom the attribute domain. Clauses can be connected by the standard Boolean oper-ators and, or, and not to form a general selection condition. For example, to selectthe tuples for all employees who either work in department 4 and make over$25,000 per year, or work in department 5 and make over $30,000, we can specifythe following SELECT operation:

$(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000)(EMPLOYEE)

The result is shown in Figure 6.1(a).

Notice that all the comparison operators in the set {=, <, !, >, ", #} can apply toattributes whose domains are ordered values, such as numeric or date domains.Domains of strings of characters are also considered to be ordered based on the col-lating sequence of the characters. If the domain of an attribute is a set of unorderedvalues, then only the comparison operators in the set {=, #} can be used. An exam-ple of an unordered domain is the domain Color = { ‘red’, ‘blue’, ‘green’, ‘white’, ‘yel-low’, ...}, where no order is specified among the various colors. Some domains allowadditional types of comparison operators; for example, a domain of characterstrings may allow the comparison operator SUBSTRING_OF.

In general, the result of a SELECT operation can be determined as follows. The<selection condition> is applied independently to each individual tuple t in R. Thisis done by substituting each occurrence of an attribute Ai in the selection conditionwith its value in the tuple t[Ai]. If the condition evaluates to TRUE, then tuple t is

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn DnoFranklin

JenniferRamesh

T Wong

WallaceNarayan

333445555

987654321666884444

1955-12-08

1941-06-201962-09-15

638 Voss, Houston, TX

291 Berry, Bellaire, TX975 Fire Oak, Humble, TX

M

FM

40000

4300038000

888665555

888665555333445555

5

45

Lname Fname SalarySmithWongZelaya

WallaceNarayanEnglishJabbar

Borg

JohnFranklinAlicia

JenniferRameshJoyceAhmad

James

300004000025000

43000380002500025000

300004000025000

43000380002500055000

55000

Sex Salary

MMF

FMMM

(c)(b)

(a)

S

K

Figure 6.1Results of SELECT and PROJECT operations. (a) $(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000) (EMPLOYEE).(b) %Lname, Fname, Salary(EMPLOYEE). (c) %Sex, Salary(EMPLOYEE).148 Chapter 6 The Relational Algebra and Relational Calculus

where <attribute name> is the name of an attribute of R, <comparison op> is nor-mally one of the operators {=, <, !, >, ", #}, and <constant value> is a constant valuefrom the attribute domain. Clauses can be connected by the standard Boolean oper-ators and, or, and not to form a general selection condition. For example, to selectthe tuples for all employees who either work in department 4 and make over$25,000 per year, or work in department 5 and make over $30,000, we can specifythe following SELECT operation:

$(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000)(EMPLOYEE)

The result is shown in Figure 6.1(a).

Notice that all the comparison operators in the set {=, <, !, >, ", #} can apply toattributes whose domains are ordered values, such as numeric or date domains.Domains of strings of characters are also considered to be ordered based on the col-lating sequence of the characters. If the domain of an attribute is a set of unorderedvalues, then only the comparison operators in the set {=, #} can be used. An exam-ple of an unordered domain is the domain Color = { ‘red’, ‘blue’, ‘green’, ‘white’, ‘yel-low’, ...}, where no order is specified among the various colors. Some domains allowadditional types of comparison operators; for example, a domain of characterstrings may allow the comparison operator SUBSTRING_OF.

In general, the result of a SELECT operation can be determined as follows. The<selection condition> is applied independently to each individual tuple t in R. Thisis done by substituting each occurrence of an attribute Ai in the selection conditionwith its value in the tuple t[Ai]. If the condition evaluates to TRUE, then tuple t is

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn DnoFranklin

JenniferRamesh

T Wong

WallaceNarayan

333445555

987654321666884444

1955-12-08

1941-06-201962-09-15

638 Voss, Houston, TX

291 Berry, Bellaire, TX975 Fire Oak, Humble, TX

M

FM

40000

4300038000

888665555

888665555333445555

5

45

Lname Fname SalarySmithWongZelaya

WallaceNarayanEnglishJabbar

Borg

JohnFranklinAlicia

JenniferRameshJoyceAhmad

James

300004000025000

43000380002500025000

300004000025000

43000380002500055000

55000

Sex Salary

MMF

FMMM

(c)(b)

(a)

S

K

Figure 6.1Results of SELECT and PROJECT operations. (a) $(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000) (EMPLOYEE).(b) %Lname, Fname, Salary(EMPLOYEE). (c) %Sex, Salary(EMPLOYEE).

Select the tuples for all employees who either work in department 4 and make over $25,000 per year, or work in department 5 and make over $30,000,

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 12

Unary Relational Operations: SELECT and PROJECT (cont’d.)

}  <selection condition> applied independently to each individual tuple t in R }  If condition evaluates to TRUE, tuple selected

}  Boolean conditions AND, OR, and NOT }  SELECT operator is Unary

}  Applied to a single relation

6.1 Unary Relational Operations: SELECT and PROJECT 149

selected. All the selected tuples appear in the result of the SELECT operation. TheBoolean conditions AND, OR, and NOT have their normal interpretation, as follows:

! (cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE; other-wise, it is FALSE.

! (cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE;otherwise, it is FALSE.

! (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.

The SELECT operator is unary; that is, it is applied to a single relation. Moreover,the selection operation is applied to each tuple individually; hence, selection condi-tions cannot involve more than one tuple. The degree of the relation resulting froma SELECT operation—its number of attributes—is the same as the degree of R. Thenumber of tuples in the resulting relation is always less than or equal to the numberof tuples in R. That is, |!c (R)| " |R| for any condition C. The fraction of tuplesselected by a selection condition is referred to as the selectivity of the condition.

Notice that the SELECT operation is commutative; that is,

!<cond1>(!<cond2>(R)) = !<cond2>(!<cond1>(R))

Hence, a sequence of SELECTs can be applied in any order. In addition, we canalways combine a cascade (or sequence) of SELECT operations into a singleSELECT operation with a conjunctive (AND) condition; that is,

!<cond1>(!<cond2>(...(!<condn>(R)) ...)) = !<cond1> AND<cond2> AND...AND <condn>(R)

In SQL, the SELECT condition is typically specified in the WHERE clause of a query.For example, the following operation:

!Dno=4 AND Salary>25000 (EMPLOYEE)

would correspond to the following SQL query:

SELECT *FROM EMPLOYEEWHERE Dno=4 AND Salary>25000;

6.1.2 The PROJECT OperationIf we think of a relation as a table, the SELECT operation chooses some of the rowsfrom the table while discarding other rows. The PROJECT operation, on the otherhand, selects certain columns from the table and discards the other columns. If we areinterested in only certain attributes of a relation, we use the PROJECT operation toproject the relation over these attributes only. Therefore, the result of the PROJECToperation can be visualized as a vertical partition of the relation into two relations:one has the needed columns (attributes) and contains the result of the operation,and the other contains the discarded columns. For example, to list each employee’sfirst and last name and salary, we can use the PROJECT operation as follows:

#Lname, Fname, Salary(EMPLOYEE)

Page 7: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

7

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 13

Unary Relational Operations: SELECT and PROJECT (cont’d.)

}  The fraction of tuples selected by a selection condition is referred to as the selectivity of the condition.

}  SELECT operation commutative

}  we can always combine a Cascade SELECT operations into a single operation with AND condition

6.1 Unary Relational Operations: SELECT and PROJECT 149

selected. All the selected tuples appear in the result of the SELECT operation. TheBoolean conditions AND, OR, and NOT have their normal interpretation, as follows:

! (cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE; other-wise, it is FALSE.

! (cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE;otherwise, it is FALSE.

! (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.

The SELECT operator is unary; that is, it is applied to a single relation. Moreover,the selection operation is applied to each tuple individually; hence, selection condi-tions cannot involve more than one tuple. The degree of the relation resulting froma SELECT operation—its number of attributes—is the same as the degree of R. Thenumber of tuples in the resulting relation is always less than or equal to the numberof tuples in R. That is, |!c (R)| " |R| for any condition C. The fraction of tuplesselected by a selection condition is referred to as the selectivity of the condition.

Notice that the SELECT operation is commutative; that is,

!<cond1>(!<cond2>(R)) = !<cond2>(!<cond1>(R))

Hence, a sequence of SELECTs can be applied in any order. In addition, we canalways combine a cascade (or sequence) of SELECT operations into a singleSELECT operation with a conjunctive (AND) condition; that is,

!<cond1>(!<cond2>(...(!<condn>(R)) ...)) = !<cond1> AND<cond2> AND...AND <condn>(R)

In SQL, the SELECT condition is typically specified in the WHERE clause of a query.For example, the following operation:

!Dno=4 AND Salary>25000 (EMPLOYEE)

would correspond to the following SQL query:

SELECT *FROM EMPLOYEEWHERE Dno=4 AND Salary>25000;

6.1.2 The PROJECT OperationIf we think of a relation as a table, the SELECT operation chooses some of the rowsfrom the table while discarding other rows. The PROJECT operation, on the otherhand, selects certain columns from the table and discards the other columns. If we areinterested in only certain attributes of a relation, we use the PROJECT operation toproject the relation over these attributes only. Therefore, the result of the PROJECToperation can be visualized as a vertical partition of the relation into two relations:one has the needed columns (attributes) and contains the result of the operation,and the other contains the discarded columns. For example, to list each employee’sfirst and last name and salary, we can use the PROJECT operation as follows:

#Lname, Fname, Salary(EMPLOYEE)

6.1 Unary Relational Operations: SELECT and PROJECT 149

selected. All the selected tuples appear in the result of the SELECT operation. TheBoolean conditions AND, OR, and NOT have their normal interpretation, as follows:

! (cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE; other-wise, it is FALSE.

! (cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE;otherwise, it is FALSE.

! (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.

The SELECT operator is unary; that is, it is applied to a single relation. Moreover,the selection operation is applied to each tuple individually; hence, selection condi-tions cannot involve more than one tuple. The degree of the relation resulting froma SELECT operation—its number of attributes—is the same as the degree of R. Thenumber of tuples in the resulting relation is always less than or equal to the numberof tuples in R. That is, |!c (R)| " |R| for any condition C. The fraction of tuplesselected by a selection condition is referred to as the selectivity of the condition.

Notice that the SELECT operation is commutative; that is,

!<cond1>(!<cond2>(R)) = !<cond2>(!<cond1>(R))

Hence, a sequence of SELECTs can be applied in any order. In addition, we canalways combine a cascade (or sequence) of SELECT operations into a singleSELECT operation with a conjunctive (AND) condition; that is,

!<cond1>(!<cond2>(...(!<condn>(R)) ...)) = !<cond1> AND<cond2> AND...AND <condn>(R)

In SQL, the SELECT condition is typically specified in the WHERE clause of a query.For example, the following operation:

!Dno=4 AND Salary>25000 (EMPLOYEE)

would correspond to the following SQL query:

SELECT *FROM EMPLOYEEWHERE Dno=4 AND Salary>25000;

6.1.2 The PROJECT OperationIf we think of a relation as a table, the SELECT operation chooses some of the rowsfrom the table while discarding other rows. The PROJECT operation, on the otherhand, selects certain columns from the table and discards the other columns. If we areinterested in only certain attributes of a relation, we use the PROJECT operation toproject the relation over these attributes only. Therefore, the result of the PROJECToperation can be visualized as a vertical partition of the relation into two relations:one has the needed columns (attributes) and contains the result of the operation,and the other contains the discarded columns. For example, to list each employee’sfirst and last name and salary, we can use the PROJECT operation as follows:

#Lname, Fname, Salary(EMPLOYEE)

6.1 Unary Relational Operations: SELECT and PROJECT 149

selected. All the selected tuples appear in the result of the SELECT operation. TheBoolean conditions AND, OR, and NOT have their normal interpretation, as follows:

! (cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE; other-wise, it is FALSE.

! (cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE;otherwise, it is FALSE.

! (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.

The SELECT operator is unary; that is, it is applied to a single relation. Moreover,the selection operation is applied to each tuple individually; hence, selection condi-tions cannot involve more than one tuple. The degree of the relation resulting froma SELECT operation—its number of attributes—is the same as the degree of R. Thenumber of tuples in the resulting relation is always less than or equal to the numberof tuples in R. That is, |!c (R)| " |R| for any condition C. The fraction of tuplesselected by a selection condition is referred to as the selectivity of the condition.

Notice that the SELECT operation is commutative; that is,

!<cond1>(!<cond2>(R)) = !<cond2>(!<cond1>(R))

Hence, a sequence of SELECTs can be applied in any order. In addition, we canalways combine a cascade (or sequence) of SELECT operations into a singleSELECT operation with a conjunctive (AND) condition; that is,

!<cond1>(!<cond2>(...(!<condn>(R)) ...)) = !<cond1> AND<cond2> AND...AND <condn>(R)

In SQL, the SELECT condition is typically specified in the WHERE clause of a query.For example, the following operation:

!Dno=4 AND Salary>25000 (EMPLOYEE)

would correspond to the following SQL query:

SELECT *FROM EMPLOYEEWHERE Dno=4 AND Salary>25000;

6.1.2 The PROJECT OperationIf we think of a relation as a table, the SELECT operation chooses some of the rowsfrom the table while discarding other rows. The PROJECT operation, on the otherhand, selects certain columns from the table and discards the other columns. If we areinterested in only certain attributes of a relation, we use the PROJECT operation toproject the relation over these attributes only. Therefore, the result of the PROJECToperation can be visualized as a vertical partition of the relation into two relations:one has the needed columns (attributes) and contains the result of the operation,and the other contains the discarded columns. For example, to list each employee’sfirst and last name and salary, we can use the PROJECT operation as follows:

#Lname, Fname, Salary(EMPLOYEE)

6.1 Unary Relational Operations: SELECT and PROJECT 149

selected. All the selected tuples appear in the result of the SELECT operation. TheBoolean conditions AND, OR, and NOT have their normal interpretation, as follows:

! (cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE; other-wise, it is FALSE.

! (cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE;otherwise, it is FALSE.

! (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.

The SELECT operator is unary; that is, it is applied to a single relation. Moreover,the selection operation is applied to each tuple individually; hence, selection condi-tions cannot involve more than one tuple. The degree of the relation resulting froma SELECT operation—its number of attributes—is the same as the degree of R. Thenumber of tuples in the resulting relation is always less than or equal to the numberof tuples in R. That is, |!c (R)| " |R| for any condition C. The fraction of tuplesselected by a selection condition is referred to as the selectivity of the condition.

Notice that the SELECT operation is commutative; that is,

!<cond1>(!<cond2>(R)) = !<cond2>(!<cond1>(R))

Hence, a sequence of SELECTs can be applied in any order. In addition, we canalways combine a cascade (or sequence) of SELECT operations into a singleSELECT operation with a conjunctive (AND) condition; that is,

!<cond1>(!<cond2>(...(!<condn>(R)) ...)) = !<cond1> AND<cond2> AND...AND <condn>(R)

In SQL, the SELECT condition is typically specified in the WHERE clause of a query.For example, the following operation:

!Dno=4 AND Salary>25000 (EMPLOYEE)

would correspond to the following SQL query:

SELECT *FROM EMPLOYEEWHERE Dno=4 AND Salary>25000;

6.1.2 The PROJECT OperationIf we think of a relation as a table, the SELECT operation chooses some of the rowsfrom the table while discarding other rows. The PROJECT operation, on the otherhand, selects certain columns from the table and discards the other columns. If we areinterested in only certain attributes of a relation, we use the PROJECT operation toproject the relation over these attributes only. Therefore, the result of the PROJECToperation can be visualized as a vertical partition of the relation into two relations:one has the needed columns (attributes) and contains the result of the operation,and the other contains the discarded columns. For example, to list each employee’sfirst and last name and salary, we can use the PROJECT operation as follows:

#Lname, Fname, Salary(EMPLOYEE)

SQL

Unary Relational Operations: SELECT and PROJECT

The PROJECT Operation

Page 8: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

8

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 15

The PROJECT Operation }  The PROJECT operation, on the other hand,

selects certain columns from the table and discards the other columns

}  Selects columns from table and discards the other columns:

}  Degree }  Number of attributes in <attribute list>

}  Duplicate elimination }  Result of PROJECT operation is a set of distinct

tuples

150 Chapter 6 The Relational Algebra and Relational Calculus

The resulting relation is shown in Figure 6.1(b). The general form of the PROJECToperation is

!<attribute list>(R)

where ! (pi) is the symbol used to represent the PROJECT operation, and <attributelist> is the desired sublist of attributes from the attributes of relation R. Again,notice that R is, in general, a relational algebra expression whose result is a relation,which in the simplest case is just the name of a database relation. The result of thePROJECT operation has only the attributes specified in <attribute list> in the sameorder as they appear in the list. Hence, its degree is equal to the number of attributesin <attribute list>.

If the attribute list includes only nonkey attributes of R, duplicate tuples are likely tooccur. The PROJECT operation removes any duplicate tuples, so the result of thePROJECT operation is a set of distinct tuples, and hence a valid relation. This isknown as duplicate elimination. For example, consider the following PROJECToperation:

!Sex, Salary(EMPLOYEE)

The result is shown in Figure 6.1(c). Notice that the tuple <‘F’, 25000> appears onlyonce in Figure 6.1(c), even though this combination of values appears twice in theEMPLOYEE relation. Duplicate elimination involves sorting or some other tech-nique to detect duplicates and thus adds more processing. If duplicates are not elim-inated, the result would be a multiset or bag of tuples rather than a set. This was notpermitted in the formal relational model, but is allowed in SQL (see Section 4.3).

The number of tuples in a relation resulting from a PROJECT operation is alwaysless than or equal to the number of tuples in R. If the projection list is a superkey ofR—that is, it includes some key of R—the resulting relation has the same number oftuples as R. Moreover,

!<list1> (!<list2>(R)) = !<list1>(R)

as long as <list2> contains the attributes in <list1>; otherwise, the left-hand side isan incorrect expression. It is also noteworthy that commutativity does not hold onPROJECT.

In SQL, the PROJECT attribute list is specified in the SELECT clause of a query. Forexample, the following operation:

!Sex, Salary(EMPLOYEE)

would correspond to the following SQL query:

SELECT DISTINCT Sex, SalaryFROM EMPLOYEE

Notice that if we remove the keyword DISTINCT from this SQL query, then dupli-cates will not be eliminated. This option is not available in the formal relationalalgebra.

148 Chapter 6 The Relational Algebra and Relational Calculus

where <attribute name> is the name of an attribute of R, <comparison op> is nor-mally one of the operators {=, <, !, >, ", #}, and <constant value> is a constant valuefrom the attribute domain. Clauses can be connected by the standard Boolean oper-ators and, or, and not to form a general selection condition. For example, to selectthe tuples for all employees who either work in department 4 and make over$25,000 per year, or work in department 5 and make over $30,000, we can specifythe following SELECT operation:

$(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000)(EMPLOYEE)

The result is shown in Figure 6.1(a).

Notice that all the comparison operators in the set {=, <, !, >, ", #} can apply toattributes whose domains are ordered values, such as numeric or date domains.Domains of strings of characters are also considered to be ordered based on the col-lating sequence of the characters. If the domain of an attribute is a set of unorderedvalues, then only the comparison operators in the set {=, #} can be used. An exam-ple of an unordered domain is the domain Color = { ‘red’, ‘blue’, ‘green’, ‘white’, ‘yel-low’, ...}, where no order is specified among the various colors. Some domains allowadditional types of comparison operators; for example, a domain of characterstrings may allow the comparison operator SUBSTRING_OF.

In general, the result of a SELECT operation can be determined as follows. The<selection condition> is applied independently to each individual tuple t in R. Thisis done by substituting each occurrence of an attribute Ai in the selection conditionwith its value in the tuple t[Ai]. If the condition evaluates to TRUE, then tuple t is

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn DnoFranklin

JenniferRamesh

T Wong

WallaceNarayan

333445555

987654321666884444

1955-12-08

1941-06-201962-09-15

638 Voss, Houston, TX

291 Berry, Bellaire, TX975 Fire Oak, Humble, TX

M

FM

40000

4300038000

888665555

888665555333445555

5

45

Lname Fname SalarySmithWongZelaya

WallaceNarayanEnglishJabbar

Borg

JohnFranklinAlicia

JenniferRameshJoyceAhmad

James

300004000025000

43000380002500025000

300004000025000

43000380002500055000

55000

Sex Salary

MMF

FMMM

(c)(b)

(a)

S

K

Figure 6.1Results of SELECT and PROJECT operations. (a) $(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000) (EMPLOYEE).(b) %Lname, Fname, Salary(EMPLOYEE). (c) %Sex, Salary(EMPLOYEE).

148 Chapter 6 The Relational Algebra and Relational Calculus

where <attribute name> is the name of an attribute of R, <comparison op> is nor-mally one of the operators {=, <, !, >, ", #}, and <constant value> is a constant valuefrom the attribute domain. Clauses can be connected by the standard Boolean oper-ators and, or, and not to form a general selection condition. For example, to selectthe tuples for all employees who either work in department 4 and make over$25,000 per year, or work in department 5 and make over $30,000, we can specifythe following SELECT operation:

$(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000)(EMPLOYEE)

The result is shown in Figure 6.1(a).

Notice that all the comparison operators in the set {=, <, !, >, ", #} can apply toattributes whose domains are ordered values, such as numeric or date domains.Domains of strings of characters are also considered to be ordered based on the col-lating sequence of the characters. If the domain of an attribute is a set of unorderedvalues, then only the comparison operators in the set {=, #} can be used. An exam-ple of an unordered domain is the domain Color = { ‘red’, ‘blue’, ‘green’, ‘white’, ‘yel-low’, ...}, where no order is specified among the various colors. Some domains allowadditional types of comparison operators; for example, a domain of characterstrings may allow the comparison operator SUBSTRING_OF.

In general, the result of a SELECT operation can be determined as follows. The<selection condition> is applied independently to each individual tuple t in R. Thisis done by substituting each occurrence of an attribute Ai in the selection conditionwith its value in the tuple t[Ai]. If the condition evaluates to TRUE, then tuple t is

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn DnoFranklin

JenniferRamesh

T Wong

WallaceNarayan

333445555

987654321666884444

1955-12-08

1941-06-201962-09-15

638 Voss, Houston, TX

291 Berry, Bellaire, TX975 Fire Oak, Humble, TX

M

FM

40000

4300038000

888665555

888665555333445555

5

45

Lname Fname SalarySmithWongZelaya

WallaceNarayanEnglishJabbar

Borg

JohnFranklinAlicia

JenniferRameshJoyceAhmad

James

300004000025000

43000380002500025000

300004000025000

43000380002500055000

55000

Sex Salary

MMF

FMMM

(c)(b)

(a)

S

K

Figure 6.1Results of SELECT and PROJECT operations. (a) $(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000) (EMPLOYEE).(b) %Lname, Fname, Salary(EMPLOYEE). (c) %Sex, Salary(EMPLOYEE).

6.1 Unary Relational Operations: SELECT and PROJECT 149

selected. All the selected tuples appear in the result of the SELECT operation. TheBoolean conditions AND, OR, and NOT have their normal interpretation, as follows:

! (cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE; other-wise, it is FALSE.

! (cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE;otherwise, it is FALSE.

! (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.

The SELECT operator is unary; that is, it is applied to a single relation. Moreover,the selection operation is applied to each tuple individually; hence, selection condi-tions cannot involve more than one tuple. The degree of the relation resulting froma SELECT operation—its number of attributes—is the same as the degree of R. Thenumber of tuples in the resulting relation is always less than or equal to the numberof tuples in R. That is, |!c (R)| " |R| for any condition C. The fraction of tuplesselected by a selection condition is referred to as the selectivity of the condition.

Notice that the SELECT operation is commutative; that is,

!<cond1>(!<cond2>(R)) = !<cond2>(!<cond1>(R))

Hence, a sequence of SELECTs can be applied in any order. In addition, we canalways combine a cascade (or sequence) of SELECT operations into a singleSELECT operation with a conjunctive (AND) condition; that is,

!<cond1>(!<cond2>(...(!<condn>(R)) ...)) = !<cond1> AND<cond2> AND...AND <condn>(R)

In SQL, the SELECT condition is typically specified in the WHERE clause of a query.For example, the following operation:

!Dno=4 AND Salary>25000 (EMPLOYEE)

would correspond to the following SQL query:

SELECT *FROM EMPLOYEEWHERE Dno=4 AND Salary>25000;

6.1.2 The PROJECT OperationIf we think of a relation as a table, the SELECT operation chooses some of the rowsfrom the table while discarding other rows. The PROJECT operation, on the otherhand, selects certain columns from the table and discards the other columns. If we areinterested in only certain attributes of a relation, we use the PROJECT operation toproject the relation over these attributes only. Therefore, the result of the PROJECToperation can be visualized as a vertical partition of the relation into two relations:one has the needed columns (attributes) and contains the result of the operation,and the other contains the discarded columns. For example, to list each employee’sfirst and last name and salary, we can use the PROJECT operation as follows:

#Lname, Fname, Salary(EMPLOYEE)

150 Chapter 6 The Relational Algebra and Relational Calculus

The resulting relation is shown in Figure 6.1(b). The general form of the PROJECToperation is

!<attribute list>(R)

where ! (pi) is the symbol used to represent the PROJECT operation, and <attributelist> is the desired sublist of attributes from the attributes of relation R. Again,notice that R is, in general, a relational algebra expression whose result is a relation,which in the simplest case is just the name of a database relation. The result of thePROJECT operation has only the attributes specified in <attribute list> in the sameorder as they appear in the list. Hence, its degree is equal to the number of attributesin <attribute list>.

If the attribute list includes only nonkey attributes of R, duplicate tuples are likely tooccur. The PROJECT operation removes any duplicate tuples, so the result of thePROJECT operation is a set of distinct tuples, and hence a valid relation. This isknown as duplicate elimination. For example, consider the following PROJECToperation:

!Sex, Salary(EMPLOYEE)

The result is shown in Figure 6.1(c). Notice that the tuple <‘F’, 25000> appears onlyonce in Figure 6.1(c), even though this combination of values appears twice in theEMPLOYEE relation. Duplicate elimination involves sorting or some other tech-nique to detect duplicates and thus adds more processing. If duplicates are not elim-inated, the result would be a multiset or bag of tuples rather than a set. This was notpermitted in the formal relational model, but is allowed in SQL (see Section 4.3).

The number of tuples in a relation resulting from a PROJECT operation is alwaysless than or equal to the number of tuples in R. If the projection list is a superkey ofR—that is, it includes some key of R—the resulting relation has the same number oftuples as R. Moreover,

!<list1> (!<list2>(R)) = !<list1>(R)

as long as <list2> contains the attributes in <list1>; otherwise, the left-hand side isan incorrect expression. It is also noteworthy that commutativity does not hold onPROJECT.

In SQL, the PROJECT attribute list is specified in the SELECT clause of a query. Forexample, the following operation:

!Sex, Salary(EMPLOYEE)

would correspond to the following SQL query:

SELECT DISTINCT Sex, SalaryFROM EMPLOYEE

Notice that if we remove the keyword DISTINCT from this SQL query, then dupli-cates will not be eliminated. This option is not available in the formal relationalalgebra.

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 16

Sequences of Operations and the RENAME Operation

}  In-line expression:

}  list each employee’s first and last name and salary,

}  as long as <list2> contains the attributes in <list1>; otherwise, the left-hand side is an incorrect expression.

}  It is also noteworthy that commutativity does not hold on PROJECT.

6.1 Unary Relational Operations: SELECT and PROJECT 149

selected. All the selected tuples appear in the result of the SELECT operation. TheBoolean conditions AND, OR, and NOT have their normal interpretation, as follows:

! (cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE; other-wise, it is FALSE.

! (cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE;otherwise, it is FALSE.

! (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.

The SELECT operator is unary; that is, it is applied to a single relation. Moreover,the selection operation is applied to each tuple individually; hence, selection condi-tions cannot involve more than one tuple. The degree of the relation resulting froma SELECT operation—its number of attributes—is the same as the degree of R. Thenumber of tuples in the resulting relation is always less than or equal to the numberof tuples in R. That is, |!c (R)| " |R| for any condition C. The fraction of tuplesselected by a selection condition is referred to as the selectivity of the condition.

Notice that the SELECT operation is commutative; that is,

!<cond1>(!<cond2>(R)) = !<cond2>(!<cond1>(R))

Hence, a sequence of SELECTs can be applied in any order. In addition, we canalways combine a cascade (or sequence) of SELECT operations into a singleSELECT operation with a conjunctive (AND) condition; that is,

!<cond1>(!<cond2>(...(!<condn>(R)) ...)) = !<cond1> AND<cond2> AND...AND <condn>(R)

In SQL, the SELECT condition is typically specified in the WHERE clause of a query.For example, the following operation:

!Dno=4 AND Salary>25000 (EMPLOYEE)

would correspond to the following SQL query:

SELECT *FROM EMPLOYEEWHERE Dno=4 AND Salary>25000;

6.1.2 The PROJECT OperationIf we think of a relation as a table, the SELECT operation chooses some of the rowsfrom the table while discarding other rows. The PROJECT operation, on the otherhand, selects certain columns from the table and discards the other columns. If we areinterested in only certain attributes of a relation, we use the PROJECT operation toproject the relation over these attributes only. Therefore, the result of the PROJECToperation can be visualized as a vertical partition of the relation into two relations:one has the needed columns (attributes) and contains the result of the operation,and the other contains the discarded columns. For example, to list each employee’sfirst and last name and salary, we can use the PROJECT operation as follows:

#Lname, Fname, Salary(EMPLOYEE)

148 Chapter 6 The Relational Algebra and Relational Calculus

where <attribute name> is the name of an attribute of R, <comparison op> is nor-mally one of the operators {=, <, !, >, ", #}, and <constant value> is a constant valuefrom the attribute domain. Clauses can be connected by the standard Boolean oper-ators and, or, and not to form a general selection condition. For example, to selectthe tuples for all employees who either work in department 4 and make over$25,000 per year, or work in department 5 and make over $30,000, we can specifythe following SELECT operation:

$(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000)(EMPLOYEE)

The result is shown in Figure 6.1(a).

Notice that all the comparison operators in the set {=, <, !, >, ", #} can apply toattributes whose domains are ordered values, such as numeric or date domains.Domains of strings of characters are also considered to be ordered based on the col-lating sequence of the characters. If the domain of an attribute is a set of unorderedvalues, then only the comparison operators in the set {=, #} can be used. An exam-ple of an unordered domain is the domain Color = { ‘red’, ‘blue’, ‘green’, ‘white’, ‘yel-low’, ...}, where no order is specified among the various colors. Some domains allowadditional types of comparison operators; for example, a domain of characterstrings may allow the comparison operator SUBSTRING_OF.

In general, the result of a SELECT operation can be determined as follows. The<selection condition> is applied independently to each individual tuple t in R. Thisis done by substituting each occurrence of an attribute Ai in the selection conditionwith its value in the tuple t[Ai]. If the condition evaluates to TRUE, then tuple t is

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn DnoFranklin

JenniferRamesh

T Wong

WallaceNarayan

333445555

987654321666884444

1955-12-08

1941-06-201962-09-15

638 Voss, Houston, TX

291 Berry, Bellaire, TX975 Fire Oak, Humble, TX

M

FM

40000

4300038000

888665555

888665555333445555

5

45

Lname Fname SalarySmithWongZelaya

WallaceNarayanEnglishJabbar

Borg

JohnFranklinAlicia

JenniferRameshJoyceAhmad

James

300004000025000

43000380002500025000

300004000025000

43000380002500055000

55000

Sex Salary

MMF

FMMM

(c)(b)

(a)

S

K

Figure 6.1Results of SELECT and PROJECT operations. (a) $(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000) (EMPLOYEE).(b) %Lname, Fname, Salary(EMPLOYEE). (c) %Sex, Salary(EMPLOYEE).

150 Chapter 6 The Relational Algebra and Relational Calculus

The resulting relation is shown in Figure 6.1(b). The general form of the PROJECToperation is

!<attribute list>(R)

where ! (pi) is the symbol used to represent the PROJECT operation, and <attributelist> is the desired sublist of attributes from the attributes of relation R. Again,notice that R is, in general, a relational algebra expression whose result is a relation,which in the simplest case is just the name of a database relation. The result of thePROJECT operation has only the attributes specified in <attribute list> in the sameorder as they appear in the list. Hence, its degree is equal to the number of attributesin <attribute list>.

If the attribute list includes only nonkey attributes of R, duplicate tuples are likely tooccur. The PROJECT operation removes any duplicate tuples, so the result of thePROJECT operation is a set of distinct tuples, and hence a valid relation. This isknown as duplicate elimination. For example, consider the following PROJECToperation:

!Sex, Salary(EMPLOYEE)

The result is shown in Figure 6.1(c). Notice that the tuple <‘F’, 25000> appears onlyonce in Figure 6.1(c), even though this combination of values appears twice in theEMPLOYEE relation. Duplicate elimination involves sorting or some other tech-nique to detect duplicates and thus adds more processing. If duplicates are not elim-inated, the result would be a multiset or bag of tuples rather than a set. This was notpermitted in the formal relational model, but is allowed in SQL (see Section 4.3).

The number of tuples in a relation resulting from a PROJECT operation is alwaysless than or equal to the number of tuples in R. If the projection list is a superkey ofR—that is, it includes some key of R—the resulting relation has the same number oftuples as R. Moreover,

!<list1> (!<list2>(R)) = !<list1>(R)

as long as <list2> contains the attributes in <list1>; otherwise, the left-hand side isan incorrect expression. It is also noteworthy that commutativity does not hold onPROJECT.

In SQL, the PROJECT attribute list is specified in the SELECT clause of a query. Forexample, the following operation:

!Sex, Salary(EMPLOYEE)

would correspond to the following SQL query:

SELECT DISTINCT Sex, SalaryFROM EMPLOYEE

Notice that if we remove the keyword DISTINCT from this SQL query, then dupli-cates will not be eliminated. This option is not available in the formal relationalalgebra.

Page 9: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

9

Unary Relational Operations: SELECT and PROJECT

Sequences of Operations and the RENAME Operation

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 18

Sequences of Operations and the RENAME Operation }  The relations shown in Figure 6.1 that depict operation

results do not have any names. }  In general, for most queries, we need to apply several

relational algebra operations one after the other. There are two ways: }  Write the operations as a single relational algebra

expression by nesting the operations, or }  Write one operation at a time and create intermediate result

relations. In this case, we must give names to the relations that hold the intermediate results

}  Example: retrieve the first name, last name, and salary of all employees who work in department number 5,

Page 10: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

10

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 19

Sequences of Operations }  Query: retrieve the first name, last name, and salary of all

employees who work in department number 5, }  Method 1: In-Line Expression

}  Method 2: Sequence of operations: giving a name to each intermediate relation

6.1 Unary Relational Operations: SELECT and PROJECT 151

6.1.3 Sequences of Operations and the RENAME OperationThe relations shown in Figure 6.1 that depict operation results do not have anynames. In general, for most queries, we need to apply several relational algebraoperations one after the other. Either we can write the operations as a singlerelational algebra expression by nesting the operations, or we can apply one oper-ation at a time and create intermediate result relations. In the latter case, we mustgive names to the relations that hold the intermediate results. For example, toretrieve the first name, last name, and salary of all employees who work in depart-ment number 5, we must apply a SELECT and a PROJECT operation. We can write asingle relational algebra expression, also known as an in-line expression, as follows:

!Fname, Lname, Salary("Dno=5(EMPLOYEE))

Figure 6.2(a) shows the result of this in-line relational algebra expression.Alternatively, we can explicitly show the sequence of operations, giving a name toeach intermediate relation, as follows:

DEP5_EMPS # "Dno=5(EMPLOYEE)RESULT # !Fname, Lname, Salary(DEP5_EMPS)

It is sometimes simpler to break down a complex sequence of operations by specify-ing intermediate result relations than to write a single relational algebra expression.We can also use this technique to rename the attributes in the intermediate and

(b)

(a)

TEMP

FnameJohnFranklinRameshJoyce

MinitBTKA

LnameSmithWongNarayanEnglish

Ssn123456789333445555666884444453453453

Bdate1965-01-091955-12-081962-09-151972-07-31

Address731 Fondren, Houston,TX638 Voss, Houston,TX975 Fire Oak, Humble,TX5631 Rice, Houston, TX

SexMMMF

Salary30000400003800025000

Dno5555

Super_ssn333445555888665555333445555333445555

SmithWongNarayanEnglish

30000400003800025000

Fname Lname SalaryJohnFranklinRameshJoyce

SmithWongNarayanEnglish

30000400003800025000

First_name Last_name SalaryJohnFranklinRameshJoyce

R

Figure 6.2Results of a sequence of operations. (a) !Fname, Lname, Salary("Dno=5(EMPLOYEE)). (b) Using intermediate relationsand renaming of attributes.

6.1 Unary Relational Operations: SELECT and PROJECT 151

6.1.3 Sequences of Operations and the RENAME OperationThe relations shown in Figure 6.1 that depict operation results do not have anynames. In general, for most queries, we need to apply several relational algebraoperations one after the other. Either we can write the operations as a singlerelational algebra expression by nesting the operations, or we can apply one oper-ation at a time and create intermediate result relations. In the latter case, we mustgive names to the relations that hold the intermediate results. For example, toretrieve the first name, last name, and salary of all employees who work in depart-ment number 5, we must apply a SELECT and a PROJECT operation. We can write asingle relational algebra expression, also known as an in-line expression, as follows:

!Fname, Lname, Salary("Dno=5(EMPLOYEE))

Figure 6.2(a) shows the result of this in-line relational algebra expression.Alternatively, we can explicitly show the sequence of operations, giving a name toeach intermediate relation, as follows:

DEP5_EMPS # "Dno=5(EMPLOYEE)RESULT # !Fname, Lname, Salary(DEP5_EMPS)

It is sometimes simpler to break down a complex sequence of operations by specify-ing intermediate result relations than to write a single relational algebra expression.We can also use this technique to rename the attributes in the intermediate and

(b)

(a)

TEMP

FnameJohnFranklinRameshJoyce

MinitBTKA

LnameSmithWongNarayanEnglish

Ssn123456789333445555666884444453453453

Bdate1965-01-091955-12-081962-09-151972-07-31

Address731 Fondren, Houston,TX638 Voss, Houston,TX975 Fire Oak, Humble,TX5631 Rice, Houston, TX

SexMMMF

Salary30000400003800025000

Dno5555

Super_ssn333445555888665555333445555333445555

SmithWongNarayanEnglish

30000400003800025000

Fname Lname SalaryJohnFranklinRameshJoyce

SmithWongNarayanEnglish

30000400003800025000

First_name Last_name SalaryJohnFranklinRameshJoyce

R

Figure 6.2Results of a sequence of operations. (a) !Fname, Lname, Salary("Dno=5(EMPLOYEE)). (b) Using intermediate relationsand renaming of attributes.

6.1 Unary Relational Operations: SELECT and PROJECT 151

6.1.3 Sequences of Operations and the RENAME OperationThe relations shown in Figure 6.1 that depict operation results do not have anynames. In general, for most queries, we need to apply several relational algebraoperations one after the other. Either we can write the operations as a singlerelational algebra expression by nesting the operations, or we can apply one oper-ation at a time and create intermediate result relations. In the latter case, we mustgive names to the relations that hold the intermediate results. For example, toretrieve the first name, last name, and salary of all employees who work in depart-ment number 5, we must apply a SELECT and a PROJECT operation. We can write asingle relational algebra expression, also known as an in-line expression, as follows:

!Fname, Lname, Salary("Dno=5(EMPLOYEE))

Figure 6.2(a) shows the result of this in-line relational algebra expression.Alternatively, we can explicitly show the sequence of operations, giving a name toeach intermediate relation, as follows:

DEP5_EMPS # "Dno=5(EMPLOYEE)RESULT # !Fname, Lname, Salary(DEP5_EMPS)

It is sometimes simpler to break down a complex sequence of operations by specify-ing intermediate result relations than to write a single relational algebra expression.We can also use this technique to rename the attributes in the intermediate and

(b)

(a)

TEMP

FnameJohnFranklinRameshJoyce

MinitBTKA

LnameSmithWongNarayanEnglish

Ssn123456789333445555666884444453453453

Bdate1965-01-091955-12-081962-09-151972-07-31

Address731 Fondren, Houston,TX638 Voss, Houston,TX975 Fire Oak, Humble,TX5631 Rice, Houston, TX

SexMMMF

Salary30000400003800025000

Dno5555

Super_ssn333445555888665555333445555333445555

SmithWongNarayanEnglish

30000400003800025000

Fname Lname SalaryJohnFranklinRameshJoyce

SmithWongNarayanEnglish

30000400003800025000

First_name Last_name SalaryJohnFranklinRameshJoyce

R

Figure 6.2Results of a sequence of operations. (a) !Fname, Lname, Salary("Dno=5(EMPLOYEE)). (b) Using intermediate relationsand renaming of attributes.

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 20

RENAME Operation }  It is sometimes simpler to break down a complex sequence

of operations by specifying intermediate result relations than to write a single relational algebra expression.

}  We can also use this technique to rename the attributes in the intermediate and result relations

152 Chapter 6 The Relational Algebra and Relational Calculus

result relations. This can be useful in connection with more complex operationssuch as UNION and JOIN, as we shall see. To rename the attributes in a relation, wesimply list the new attribute names in parentheses, as in the following example:

TEMP ! "Dno=5(EMPLOYEE)R(First_name, Last_name, Salary) ! #Fname, Lname, Salary(TEMP)

These two operations are illustrated in Figure 6.2(b).

If no renaming is applied, the names of the attributes in the resulting relation of aSELECT operation are the same as those in the original relation and in the sameorder. For a PROJECT operation with no renaming, the resulting relation has thesame attribute names as those in the projection list and in the same order in whichthey appear in the list.

We can also define a formal RENAME operation—which can rename either the rela-tion name or the attribute names, or both—as a unary operator. The generalRENAME operation when applied to a relation R of degree n is denoted by any of thefollowing three forms:

$S(B1, B2, ..., Bn)(R) or $S(R) or $(B1, B2, ..., Bn)(R)

where the symbol $ (rho) is used to denote the RENAME operator, S is the new rela-tion name, and B1, B2, ..., Bn are the new attribute names. The first expressionrenames both the relation and its attributes, the second renames the relation only,and the third renames the attributes only. If the attributes of R are (A1, A2, ..., An) inthat order, then each Ai is renamed as Bi.

In SQL, a single query typically represents a complex relational algebra expression.Renaming in SQL is accomplished by aliasing using AS, as in the following example:

SELECT E.Fname AS First_name, E.Lname AS Last_name, E.Salary AS SalaryFROM EMPLOYEE AS EWHERE E.Dno=5,

6.2 Relational Algebra Operations from Set Theory

6.2.1 The UNION, INTERSECTION, and MINUS OperationsThe next group of relational algebra operations are the standard mathematicaloperations on sets. For example, to retrieve the Social Security numbers of allemployees who either work in department 5 or directly supervise an employee whoworks in department 5, we can use the UNION operation as follows:4

4As a single relational algebra expression, this becomes Result ! #Ssn ("Dno=5 (EMPLOYEE) ) %#Super_ssn ("Dno=5 (EMPLOYEE))

6.1 Unary Relational Operations: SELECT and PROJECT 151

6.1.3 Sequences of Operations and the RENAME OperationThe relations shown in Figure 6.1 that depict operation results do not have anynames. In general, for most queries, we need to apply several relational algebraoperations one after the other. Either we can write the operations as a singlerelational algebra expression by nesting the operations, or we can apply one oper-ation at a time and create intermediate result relations. In the latter case, we mustgive names to the relations that hold the intermediate results. For example, toretrieve the first name, last name, and salary of all employees who work in depart-ment number 5, we must apply a SELECT and a PROJECT operation. We can write asingle relational algebra expression, also known as an in-line expression, as follows:

!Fname, Lname, Salary("Dno=5(EMPLOYEE))

Figure 6.2(a) shows the result of this in-line relational algebra expression.Alternatively, we can explicitly show the sequence of operations, giving a name toeach intermediate relation, as follows:

DEP5_EMPS # "Dno=5(EMPLOYEE)RESULT # !Fname, Lname, Salary(DEP5_EMPS)

It is sometimes simpler to break down a complex sequence of operations by specify-ing intermediate result relations than to write a single relational algebra expression.We can also use this technique to rename the attributes in the intermediate and

(b)

(a)

TEMP

FnameJohnFranklinRameshJoyce

MinitBTKA

LnameSmithWongNarayanEnglish

Ssn123456789333445555666884444453453453

Bdate1965-01-091955-12-081962-09-151972-07-31

Address731 Fondren, Houston,TX638 Voss, Houston,TX975 Fire Oak, Humble,TX5631 Rice, Houston, TX

SexMMMF

Salary30000400003800025000

Dno5555

Super_ssn333445555888665555333445555333445555

SmithWongNarayanEnglish

30000400003800025000

Fname Lname SalaryJohnFranklinRameshJoyce

SmithWongNarayanEnglish

30000400003800025000

First_name Last_name SalaryJohnFranklinRameshJoyce

R

Figure 6.2Results of a sequence of operations. (a) !Fname, Lname, Salary("Dno=5(EMPLOYEE)). (b) Using intermediate relationsand renaming of attributes.

6.1 Unary Relational Operations: SELECT and PROJECT 151

6.1.3 Sequences of Operations and the RENAME OperationThe relations shown in Figure 6.1 that depict operation results do not have anynames. In general, for most queries, we need to apply several relational algebraoperations one after the other. Either we can write the operations as a singlerelational algebra expression by nesting the operations, or we can apply one oper-ation at a time and create intermediate result relations. In the latter case, we mustgive names to the relations that hold the intermediate results. For example, toretrieve the first name, last name, and salary of all employees who work in depart-ment number 5, we must apply a SELECT and a PROJECT operation. We can write asingle relational algebra expression, also known as an in-line expression, as follows:

!Fname, Lname, Salary("Dno=5(EMPLOYEE))

Figure 6.2(a) shows the result of this in-line relational algebra expression.Alternatively, we can explicitly show the sequence of operations, giving a name toeach intermediate relation, as follows:

DEP5_EMPS # "Dno=5(EMPLOYEE)RESULT # !Fname, Lname, Salary(DEP5_EMPS)

It is sometimes simpler to break down a complex sequence of operations by specify-ing intermediate result relations than to write a single relational algebra expression.We can also use this technique to rename the attributes in the intermediate and

(b)

(a)

TEMP

FnameJohnFranklinRameshJoyce

MinitBTKA

LnameSmithWongNarayanEnglish

Ssn123456789333445555666884444453453453

Bdate1965-01-091955-12-081962-09-151972-07-31

Address731 Fondren, Houston,TX638 Voss, Houston,TX975 Fire Oak, Humble,TX5631 Rice, Houston, TX

SexMMMF

Salary30000400003800025000

Dno5555

Super_ssn333445555888665555333445555333445555

SmithWongNarayanEnglish

30000400003800025000

Fname Lname SalaryJohnFranklinRameshJoyce

SmithWongNarayanEnglish

30000400003800025000

First_name Last_name SalaryJohnFranklinRameshJoyce

R

Figure 6.2Results of a sequence of operations. (a) !Fname, Lname, Salary("Dno=5(EMPLOYEE)). (b) Using intermediate relationsand renaming of attributes.

152 Chapter 6 The Relational Algebra and Relational Calculus

result relations. This can be useful in connection with more complex operationssuch as UNION and JOIN, as we shall see. To rename the attributes in a relation, wesimply list the new attribute names in parentheses, as in the following example:

TEMP ! "Dno=5(EMPLOYEE)R(First_name, Last_name, Salary) ! #Fname, Lname, Salary(TEMP)

These two operations are illustrated in Figure 6.2(b).

If no renaming is applied, the names of the attributes in the resulting relation of aSELECT operation are the same as those in the original relation and in the sameorder. For a PROJECT operation with no renaming, the resulting relation has thesame attribute names as those in the projection list and in the same order in whichthey appear in the list.

We can also define a formal RENAME operation—which can rename either the rela-tion name or the attribute names, or both—as a unary operator. The generalRENAME operation when applied to a relation R of degree n is denoted by any of thefollowing three forms:

$S(B1, B2, ..., Bn)(R) or $S(R) or $(B1, B2, ..., Bn)(R)

where the symbol $ (rho) is used to denote the RENAME operator, S is the new rela-tion name, and B1, B2, ..., Bn are the new attribute names. The first expressionrenames both the relation and its attributes, the second renames the relation only,and the third renames the attributes only. If the attributes of R are (A1, A2, ..., An) inthat order, then each Ai is renamed as Bi.

In SQL, a single query typically represents a complex relational algebra expression.Renaming in SQL is accomplished by aliasing using AS, as in the following example:

SELECT E.Fname AS First_name, E.Lname AS Last_name, E.Salary AS SalaryFROM EMPLOYEE AS EWHERE E.Dno=5,

6.2 Relational Algebra Operations from Set Theory

6.2.1 The UNION, INTERSECTION, and MINUS OperationsThe next group of relational algebra operations are the standard mathematicaloperations on sets. For example, to retrieve the Social Security numbers of allemployees who either work in department 5 or directly supervise an employee whoworks in department 5, we can use the UNION operation as follows:4

4As a single relational algebra expression, this becomes Result ! #Ssn ("Dno=5 (EMPLOYEE) ) %#Super_ssn ("Dno=5 (EMPLOYEE))

Page 11: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

11

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 21

RENAME Operation }  We can also define a formal RENAME operation—which

can rename either the rela- tion name or the attribute names, or both—as a unary operator.

}  The general RENAME operation when applied to a relation R of degree n is denoted by any of the following three forms:

}  where the symbol ρ (rho) is used to denote the RENAME operator, S is the new relation name, and B1, B2, ..., Bn are the new attribute names.

152 Chapter 6 The Relational Algebra and Relational Calculus

result relations. This can be useful in connection with more complex operationssuch as UNION and JOIN, as we shall see. To rename the attributes in a relation, wesimply list the new attribute names in parentheses, as in the following example:

TEMP ! "Dno=5(EMPLOYEE)R(First_name, Last_name, Salary) ! #Fname, Lname, Salary(TEMP)

These two operations are illustrated in Figure 6.2(b).

If no renaming is applied, the names of the attributes in the resulting relation of aSELECT operation are the same as those in the original relation and in the sameorder. For a PROJECT operation with no renaming, the resulting relation has thesame attribute names as those in the projection list and in the same order in whichthey appear in the list.

We can also define a formal RENAME operation—which can rename either the rela-tion name or the attribute names, or both—as a unary operator. The generalRENAME operation when applied to a relation R of degree n is denoted by any of thefollowing three forms:

$S(B1, B2, ..., Bn)(R) or $S(R) or $(B1, B2, ..., Bn)(R)

where the symbol $ (rho) is used to denote the RENAME operator, S is the new rela-tion name, and B1, B2, ..., Bn are the new attribute names. The first expressionrenames both the relation and its attributes, the second renames the relation only,and the third renames the attributes only. If the attributes of R are (A1, A2, ..., An) inthat order, then each Ai is renamed as Bi.

In SQL, a single query typically represents a complex relational algebra expression.Renaming in SQL is accomplished by aliasing using AS, as in the following example:

SELECT E.Fname AS First_name, E.Lname AS Last_name, E.Salary AS SalaryFROM EMPLOYEE AS EWHERE E.Dno=5,

6.2 Relational Algebra Operations from Set Theory

6.2.1 The UNION, INTERSECTION, and MINUS OperationsThe next group of relational algebra operations are the standard mathematicaloperations on sets. For example, to retrieve the Social Security numbers of allemployees who either work in department 5 or directly supervise an employee whoworks in department 5, we can use the UNION operation as follows:4

4As a single relational algebra expression, this becomes Result ! #Ssn ("Dno=5 (EMPLOYEE) ) %#Super_ssn ("Dno=5 (EMPLOYEE))

Relational Algebra Operations from Set Theory

The UNION, INTERSECTION, and MINUS Operations

Page 12: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

12

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 23

The UNION, INTERSECTION, and MINUS Operations }  The next group of relational algebra operations are the

standard mathematical operations on sets. }  Example: to retrieve the Social Security numbers of all

employees who either work in department 5 or directly supervise an employee who works in department 5, we can use the UNION operation as follows:

6.2 Relational Algebra Operations from Set Theory 153

DEP5_EMPS ! "Dno=5(EMPLOYEE)RESULT1 ! #Ssn(DEP5_EMPS)RESULT2(Ssn) ! #Super_ssn(DEP5_EMPS)RESULT ! RESULT1 $ RESULT2

The relation RESULT1 has the Ssn of all employees who work in department 5,whereas RESULT2 has the Ssn of all employees who directly supervise an employeewho works in department 5. The UNION operation produces the tuples that are ineither RESULT1 or RESULT2 or both (see Figure 6.3), while eliminating any dupli-cates. Thus, the Ssn value ‘333445555’ appears only once in the result.

Several set theoretic operations are used to merge the elements of two sets in vari-ous ways, including UNION, INTERSECTION, and SET DIFFERENCE (also calledMINUS or EXCEPT). These are binary operations; that is, each is applied to two sets(of tuples). When these operations are adapted to relational databases, the two rela-tions on which any of these three operations are applied must have the same type oftuples; this condition has been called union compatibility or type compatibility. Tworelations R(A1, A2, ..., An) and S(B1, B2, ..., Bn) are said to be union compatible (ortype compatible) if they have the same degree n and if dom(Ai) = dom(Bi) for 1 i

n. This means that the two relations have the same number of attributes and eachcorresponding pair of attributes has the same domain.

We can define the three operations UNION, INTERSECTION, and SET DIFFERENCEon two union-compatible relations R and S as follows:

! UNION: The result of this operation, denoted by R $ S, is a relation thatincludes all tuples that are either in R or in S or in both R and S. Duplicatetuples are eliminated.

! INTERSECTION: The result of this operation, denoted by R % S, is a relationthat includes all tuples that are in both R and S.

! SET DIFFERENCE (or MINUS): The result of this operation, denoted by R – S, is a relation that includes all tuples that are in R but not in S.

We will adopt the convention that the resulting relation has the same attributenames as the first relation R. It is always possible to rename the attributes in theresult using the rename operator.

RESULT1

Ssn

123456789333445555666884444

453453453

RESULT

Ssn

123456789333445555666884444

453453453888665555

RESULT2

Ssn

333445555888665555

Figure 6.3Result of the UNION operationRESULT ! RESULT1 $RESULT2.

6.2 Relational Algebra Operations from Set Theory 153

DEP5_EMPS ! "Dno=5(EMPLOYEE)RESULT1 ! #Ssn(DEP5_EMPS)RESULT2(Ssn) ! #Super_ssn(DEP5_EMPS)RESULT ! RESULT1 $ RESULT2

The relation RESULT1 has the Ssn of all employees who work in department 5,whereas RESULT2 has the Ssn of all employees who directly supervise an employeewho works in department 5. The UNION operation produces the tuples that are ineither RESULT1 or RESULT2 or both (see Figure 6.3), while eliminating any dupli-cates. Thus, the Ssn value ‘333445555’ appears only once in the result.

Several set theoretic operations are used to merge the elements of two sets in vari-ous ways, including UNION, INTERSECTION, and SET DIFFERENCE (also calledMINUS or EXCEPT). These are binary operations; that is, each is applied to two sets(of tuples). When these operations are adapted to relational databases, the two rela-tions on which any of these three operations are applied must have the same type oftuples; this condition has been called union compatibility or type compatibility. Tworelations R(A1, A2, ..., An) and S(B1, B2, ..., Bn) are said to be union compatible (ortype compatible) if they have the same degree n and if dom(Ai) = dom(Bi) for 1 i

n. This means that the two relations have the same number of attributes and eachcorresponding pair of attributes has the same domain.

We can define the three operations UNION, INTERSECTION, and SET DIFFERENCEon two union-compatible relations R and S as follows:

! UNION: The result of this operation, denoted by R $ S, is a relation thatincludes all tuples that are either in R or in S or in both R and S. Duplicatetuples are eliminated.

! INTERSECTION: The result of this operation, denoted by R % S, is a relationthat includes all tuples that are in both R and S.

! SET DIFFERENCE (or MINUS): The result of this operation, denoted by R – S, is a relation that includes all tuples that are in R but not in S.

We will adopt the convention that the resulting relation has the same attributenames as the first relation R. It is always possible to rename the attributes in theresult using the rename operator.

RESULT1

Ssn

123456789333445555666884444

453453453

RESULT

Ssn

123456789333445555666884444

453453453888665555

RESULT2

Ssn

333445555888665555

Figure 6.3Result of the UNION operationRESULT ! RESULT1 $RESULT2.

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 24

Relational Algebra Operations from Set Theory

}  Several set theoretic operations are used to merge the elements of two sets in various ways, including UNION, INTERSECTION, and SET DIFFERENCE (also called MINUS or EXCEPT).

}  UNION, INTERSECTION, and MINUS }  Merge the elements of two sets in various ways }  Binary operations }  Relations must have the same type of tuples (type

compatible)

Page 13: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

13

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 25

The UNION, INTERSECTION, and MINUS Operations

}  UNION }  R ∪ S }  Includes all tuples that are either in R or in S or in both R and S }  Duplicate tuples eliminated

}  INTERSECTION }  R ∩ S }  Includes all tuples that are in both R and S

}  SET DIFFERENCE (or MINUS) }  R – S }  Includes all tuples that are in R but not in S

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 26

The UNION, INTERSECTION, and MINUS Operations

STUDENT(a)

Fn

SusanRameshJohnnyBarbaraAmyJimmy

Ernest

Ln

YaoShahKohlerJones

FordWang

Gilbert

(b) Fn

SusanRameshJohnnyBarbaraAmyJimmy

Ernest

Ln

YaoShahKohlerJones

FordWang

GilbertJohn SmithRicardo BrowneFrancis Johnson

(d) FnJohnny

BarbaraAmyJimmy

Ernest

LnKohler

Jones

FordWang

Gilbert

(c) Fn

SusanRamesh

Ln

YaoShah

INSTRUCTOR

Fname

JohnRicardoSusanFrancisRamesh

Lname

SmithBrowneYaoJohnson

Shah

(e) Fname

JohnRicardoFrancis

Lname

SmithBrowneJohnson

154 Chapter 6 The Relational Algebra and Relational Calculus

Figure 6.4 illustrates the three operations. The relations STUDENT andINSTRUCTOR in Figure 6.4(a) are union compatible and their tuples represent thenames of students and the names of instructors, respectively. The result of theUNION operation in Figure 6.4(b) shows the names of all students and instructors.Note that duplicate tuples appear only once in the result. The result of theINTERSECTION operation (Figure 6.4(c)) includes only those who are both studentsand instructors.

Notice that both UNION and INTERSECTION are commutative operations; that is,

R ! S = S ! R and R " S = S " R

Both UNION and INTERSECTION can be treated as n-ary operations applicable toany number of relations because both are also associative operations; that is,

R ! (S ! T ) = (R ! S) ! T and (R " S ) " T = R " (S " T )

The MINUS operation is not commutative; that is, in general,

R # S $ S # R

Figure 6.4The set operations UNION, INTERSECTION, and MINUS. (a) Two union-compatible relations. (b) STUDENT ! INSTRUCTOR. (c) STUDENT " INSTRUCTOR. (d) STUDENT # INSTRUCTOR.(e) INSTRUCTOR # STUDENT.

}  Figure 6.4 illustrates the three operations.

}  The relations STUDENT and INSTRUCTOR are union compatible and their tuples represent the names of students and the names of instructors, respectively.

}  The result of the UNION operation in Figure 6.4(b) shows the names of all students and instructors.

}  Note that duplicate tuples appear only once in the result.

}  The result of the INTERSECTION operation (Figure 6.4(c)) includes only those who are both students and instructors.

Page 14: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

14

The CARTESIAN PRODUCT (CROSS PRODUCT) Operation

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 28

The CARTESIAN PRODUCT (CROSS PRODUCT) Operation

}  CARTESIAN PRODUCT }  CROSS PRODUCT or CROSS JOIN }  Denoted by × }  Binary set operation }  Relations do not have to be union compatible }  Useful when followed by a selection that matches values of

attributes

Page 15: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

15

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 29

The CARTESIAN PRODUCT (CROSS PRODUCT) Operation

}  In general, the CARTESIAN PRODUCT operation applied by itself is generally meaningless. }  It is mostly useful when followed by a selection that matches

values of attributes coming from the component relations.

}  Example: suppose that we want to retrieve a list of names of each female employee’s dependents.

6.2 Relational Algebra Operations from Set Theory 155

Figure 6.4(d) shows the names of students who are not instructors, and Figure6.4(e) shows the names of instructors who are not students.

Note that INTERSECTION can be expressed in terms of union and set difference asfollows:

R ! S = ((R " S ) # (R # S )) # (S # R)

In SQL, there are three operations—UNION, INTERSECT, and EXCEPT—that corre-spond to the set operations described here. In addition, there are multiset opera-tions (UNION ALL, INTERSECT ALL, and EXCEPT ALL) that do not eliminateduplicates (see Section 4.3.4).

6.2.2 The CARTESIAN PRODUCT (CROSS PRODUCT)Operation

Next, we discuss the CARTESIAN PRODUCT operation—also known as CROSSPRODUCT or CROSS JOIN—which is denoted by $. This is also a binary set opera-tion, but the relations on which it is applied do not have to be union compatible. Inits binary form, this set operation produces a new element by combining everymember (tuple) from one relation (set) with every member (tuple) from the otherrelation (set). In general, the result of R(A1, A2, ..., An) $ S(B1, B2, ..., Bm) is a rela-tion Q with degree n + m attributes Q(A1, A2, ..., An, B1, B2, ..., Bm), in that order.The resulting relation Q has one tuple for each combination of tuples—one from Rand one from S. Hence, if R has nR tuples (denoted as |R| = nR), and S has nS tuples,then R $ S will have nR * nS tuples.

The n-ary CARTESIAN PRODUCT operation is an extension of the above concept,which produces new tuples by concatenating all possible combinations of tuplesfrom n underlying relations.

In general, the CARTESIAN PRODUCT operation applied by itself is generally mean-ingless. It is mostly useful when followed by a selection that matches values ofattributes coming from the component relations. For example, suppose that wewant to retrieve a list of names of each female employee’s dependents. We can dothis as follows:

FEMALE_EMPS % &Sex=‘F’(EMPLOYEE)EMPNAMES % 'Fname, Lname, Ssn(FEMALE_EMPS)EMP_DEPENDENTS % EMPNAMES $ DEPENDENTACTUAL_DEPENDENTS % &Ssn=Essn(EMP_DEPENDENTS)RESULT % 'Fname, Lname, Dependent_name(ACTUAL_DEPENDENTS)

The resulting relations from this sequence of operations are shown in Figure 6.5.The EMP_DEPENDENTS relation is the result of applying the CARTESIAN PROD-UCT operation to EMPNAMES from Figure 6.5 with DEPENDENT from Figure 3.6.In EMP_DEPENDENTS, every tuple from EMPNAMES is combined with every tuplefrom DEPENDENT, giving a result that is not very meaningful (every dependent iscombined with every female employee). We want to combine a female employeetuple only with her particular dependents—namely, the DEPENDENT tuples whose

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 30

The CARTESIAN PRODUCT (CROSS PRODUCT) Operation 156 Chapter 6 The Relational Algebra and Relational Calculus

Fname

FEMALE_EMPS

Alicia

JenniferJoyce A

J

S

Minit

English

Zelaya

Wallace

Lname

453453453

999887777 3321Castle, Spring, TX

987654321

Ssn

1972-07-31

1968-07-19

1941-06-20

Bdate

5631 Rice, Houston, TX291Berry, Bellaire, TX

F

FF

Address Sex Dno25000

2500043000

4

54

Salary987654321

333445555888665555

Super_ssn

Fname

EMPNAMES

Alicia

JenniferJoyce English

Zelaya

Wallace

Lname

453453453

999887777

987654321

Ssn

Fname

EMP_DEPENDENTS

Alicia

AliciaAliciaAliciaAlicia

AliciaAlicia

Jennifer

JenniferJennifer

Jennifer

JenniferJoyce

Jennifer

Jennifer

JoyceJoyce

Zelaya

Zelaya

Zelaya

Zelaya

Zelaya

Zelaya

Wallace

Wallace

Wallace

Wallace

Wallace

Wallace

English

Zelaya

English

Wallace

English

Lname

999887777

999887777 Alice

999887777

Ssn

333445555

333445555

333445555

Essn

Abner

TheodoreJoy

F

FM

Dependent_name Sex . . .. . .

. . .

. . .

1986-04-05

1958-05-031983-10-25

999887777

999887777Michael999887777

123456789

987654321

123456789

ElizabethAlice

M

FM

. . .

. . .

. . .

1942-02-28

1988-12-301988-01-04

987654321

999887777Alice987654321

333445555

123456789

333445555

JoyTheodore

F

MF

. . .

. . .

. . .

1967-05-05

1983-10-251986-04-05

987654321

987654321Abner987654321

123456789

333445555

987654321

AliceMichael

F

MM

. . .

. . .

. . .

1958-05-03

1988-01-041942-02-28

453453453

987654321Elizabeth987654321

333445555

123456789

123456789

TheodoreAlice

F

FF

. . .

. . .

. . .

1988-12-30

1986-04-051967-05-05

453453453Joy453453453

333445555

333445555

M

F

. . .

. . .

1983-10-25

1958-05-03

Bdate

Joyce

Joyce

Joyce

JoyceEnglish

English

English

English

453453453

Abner453453453123456789987654321

AliceMichael M

M. . .. . .

1988-01-041942-02-28

453453453Elizabeth453453453

123456789

123456789

F

F

. . .

. . .

1988-12-30

1967-05-05

Fname

ACTUAL_DEPENDENTSLname Ssn Essn Dependent_name Sex . . .Bdate

Jennifer Wallace Abner987654321 987654321 M . . .1942-02-28

Fname

RESULTLname Dependent_name

Jennifer Wallace Abner

Figure 6.5The Cartesian Product (Cross Product) operation.

6.2 Relational Algebra Operations from Set Theory 155

Figure 6.4(d) shows the names of students who are not instructors, and Figure6.4(e) shows the names of instructors who are not students.

Note that INTERSECTION can be expressed in terms of union and set difference asfollows:

R ! S = ((R " S ) # (R # S )) # (S # R)

In SQL, there are three operations—UNION, INTERSECT, and EXCEPT—that corre-spond to the set operations described here. In addition, there are multiset opera-tions (UNION ALL, INTERSECT ALL, and EXCEPT ALL) that do not eliminateduplicates (see Section 4.3.4).

6.2.2 The CARTESIAN PRODUCT (CROSS PRODUCT)Operation

Next, we discuss the CARTESIAN PRODUCT operation—also known as CROSSPRODUCT or CROSS JOIN—which is denoted by $. This is also a binary set opera-tion, but the relations on which it is applied do not have to be union compatible. Inits binary form, this set operation produces a new element by combining everymember (tuple) from one relation (set) with every member (tuple) from the otherrelation (set). In general, the result of R(A1, A2, ..., An) $ S(B1, B2, ..., Bm) is a rela-tion Q with degree n + m attributes Q(A1, A2, ..., An, B1, B2, ..., Bm), in that order.The resulting relation Q has one tuple for each combination of tuples—one from Rand one from S. Hence, if R has nR tuples (denoted as |R| = nR), and S has nS tuples,then R $ S will have nR * nS tuples.

The n-ary CARTESIAN PRODUCT operation is an extension of the above concept,which produces new tuples by concatenating all possible combinations of tuplesfrom n underlying relations.

In general, the CARTESIAN PRODUCT operation applied by itself is generally mean-ingless. It is mostly useful when followed by a selection that matches values ofattributes coming from the component relations. For example, suppose that wewant to retrieve a list of names of each female employee’s dependents. We can dothis as follows:

FEMALE_EMPS % &Sex=‘F’(EMPLOYEE)EMPNAMES % 'Fname, Lname, Ssn(FEMALE_EMPS)EMP_DEPENDENTS % EMPNAMES $ DEPENDENTACTUAL_DEPENDENTS % &Ssn=Essn(EMP_DEPENDENTS)RESULT % 'Fname, Lname, Dependent_name(ACTUAL_DEPENDENTS)

The resulting relations from this sequence of operations are shown in Figure 6.5.The EMP_DEPENDENTS relation is the result of applying the CARTESIAN PROD-UCT operation to EMPNAMES from Figure 6.5 with DEPENDENT from Figure 3.6.In EMP_DEPENDENTS, every tuple from EMPNAMES is combined with every tuplefrom DEPENDENT, giving a result that is not very meaningful (every dependent iscombined with every female employee). We want to combine a female employeetuple only with her particular dependents—namely, the DEPENDENT tuples whose

The EMP_DEPENDENTS relation is the result of applying the CARTESIAN PRODUCT operation to EMPNAMES from Figure 6.5 with DEPENDENT from Figure 3.6. In EMP_DEPENDENTS, every tuple from EMPNAMES is combined with every tuple from DEPENDENT, giving a result that is not very meaningful (every dependent is combined with every female employee). We want to combine a female employee tuple only with her particular dependents—namely, the DEPENDENT tuples whose Essn value match the Ssn value of the EMPLOYEE tuple. The ACTUAL_DEPENDENTS relation accomplishes this. The EMP_DEPENDENTS relation is a good example of the case where relational algebra can be correctly applied to yield results that make no sense at all. It is the responsibility of the user to make sure to apply only meaningful operations to relations.

Page 16: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

16

Binary Relational Operations: JOIN and DIVISION

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 32

JOIN Operation }  The JOIN Operation

}  Denoted by }  Combines related tuples from two relations into single “longer” tuples }  General join condition of the form <condition> AND <condition> AND...AND <condition>

}  Example: retrieve the name of the manager of each department.

6.3 Binary Relational Operations: JOIN and DIVISION 157

Essn value match the Ssn value of the EMPLOYEE tuple. The ACTUAL_DEPENDENTSrelation accomplishes this. The EMP_DEPENDENTS relation is a good example ofthe case where relational algebra can be correctly applied to yield results that makeno sense at all. It is the responsibility of the user to make sure to apply only mean-ingful operations to relations.

The CARTESIAN PRODUCT creates tuples with the combined attributes of two rela-tions. We can SELECT related tuples only from the two relations by specifying anappropriate selection condition after the Cartesian product, as we did in the preced-ing example. Because this sequence of CARTESIAN PRODUCT followed by SELECTis quite commonly used to combine related tuples from two relations, a special oper-ation, called JOIN, was created to specify this sequence as a single operation. We dis-cuss the JOIN operation next.

In SQL, CARTESIAN PRODUCT can be realized by using the CROSS JOIN option injoined tables (see Section 5.1.6). Alternatively, if there are two tables in the WHEREclause and there is no corresponding join condition in the query, the result will alsobe the CARTESIAN PRODUCT of the two tables (see Q10 in Section 4.3.3).

6.3 Binary Relational Operations: JOIN and DIVISION

6.3.1 The JOIN OperationThe JOIN operation, denoted by , is used to combine related tuples from two rela-tions into single “longer” tuples. This operation is very important for any relationaldatabase with more than a single relation because it allows us to process relation-ships among relations. To illustrate JOIN, suppose that we want to retrieve the nameof the manager of each department. To get the manager’s name, we need to combineeach department tuple with the employee tuple whose Ssn value matches theMgr_ssn value in the department tuple. We do this by using the JOIN operation andthen projecting the result over the necessary attributes, as follows:

DEPT_MGR ! DEPARTMENT Mgr_ssn=Ssn EMPLOYEERESULT ! "Dname, Lname, Fname(DEPT_MGR)

The first operation is illustrated in Figure 6.6. Note that Mgr_ssn is a foreign key ofthe DEPARTMENT relation that references Ssn, the primary key of the EMPLOYEErelation. This referential integrity constraint plays a role in having matching tuplesin the referenced relation EMPLOYEE.

The JOIN operation can be specified as a CARTESIAN PRODUCT operation followedby a SELECT operation. However, JOIN is very important because it is used very fre-quently when specifying database queries. Consider the earlier example illustratingCARTESIAN PRODUCT, which included the following sequence of operations:

EMP_DEPENDENTS ! EMPNAMES # DEPENDENTACTUAL_DEPENDENTS ! $Ssn=Essn(EMP_DEPENDENTS)

158 Chapter 6 The Relational Algebra and Relational Calculus

DEPT_MGR

Dname Dnumber Mgr_ssn Fname Minit Lname SsnResearch 5 333445555 Franklin T Wong 333445555

Administration 4 987654321 Jennifer S Wallace 987654321Headquarters 1 888665555 James E Borg 888665555

. . . . . .

. . .

. . .

. . .

. . .

. . .

. . .

Figure 6.6Result of the JOIN operation DEPT_MGR ! DEPARTMENT Mgr_ssn=SsnEMPLOYEE.

These two operations can be replaced with a single JOIN operation as follows:

ACTUAL_DEPENDENTS ! EMPNAMES Ssn=EssnDEPENDENT

The general form of a JOIN operation on two relations5 R(A1, A2, ..., An) and S(B1,B2, ..., Bm) is

R <join condition>S

The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, ..., An, B1, B2,... , Bm) in that order; Q has one tuple for each combination of tuples—one from Rand one from S—whenever the combination satisfies the join condition. This is themain difference between CARTESIAN PRODUCT and JOIN. In JOIN, only combina-tions of tuples satisfying the join condition appear in the result, whereas in theCARTESIAN PRODUCT all combinations of tuples are included in the result. Thejoin condition is specified on attributes from the two relations R and S and is evalu-ated for each combination of tuples. Each tuple combination for which the joincondition evaluates to TRUE is included in the resulting relation Q as a single com-bined tuple.

A general join condition is of the form

<condition> AND <condition> AND...AND <condition>

where each <condition> is of the form Ai " Bj, Ai is an attribute of R, Bj is an attrib-ute of S, Ai and Bj have the same domain, and " (theta) is one of the comparisonoperators {=, <, #, >, $, %}. A JOIN operation with such a general join condition iscalled a THETA JOIN. Tuples whose join attributes are NULL or for which the joincondition is FALSE do not appear in the result. In that sense, the JOIN operation doesnot necessarily preserve all of the information in the participating relations, becausetuples that do not get combined with matching ones in the other relation do notappear in the result.

5Again, notice that R and S can be any relations that result from general relational algebra expressions.

Page 17: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

17

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 33

JOIN Operation }  The JOIN operation can be specified as a CARTESIAN

PRODUCT operation followed by a SELECT operation.

}  Example: retrieve a list of names of each female employee’s dependents

}  These two operations can be replaced with a single JOIN operation as follows:

6.3 Binary Relational Operations: JOIN and DIVISION 157

Essn value match the Ssn value of the EMPLOYEE tuple. The ACTUAL_DEPENDENTSrelation accomplishes this. The EMP_DEPENDENTS relation is a good example ofthe case where relational algebra can be correctly applied to yield results that makeno sense at all. It is the responsibility of the user to make sure to apply only mean-ingful operations to relations.

The CARTESIAN PRODUCT creates tuples with the combined attributes of two rela-tions. We can SELECT related tuples only from the two relations by specifying anappropriate selection condition after the Cartesian product, as we did in the preced-ing example. Because this sequence of CARTESIAN PRODUCT followed by SELECTis quite commonly used to combine related tuples from two relations, a special oper-ation, called JOIN, was created to specify this sequence as a single operation. We dis-cuss the JOIN operation next.

In SQL, CARTESIAN PRODUCT can be realized by using the CROSS JOIN option injoined tables (see Section 5.1.6). Alternatively, if there are two tables in the WHEREclause and there is no corresponding join condition in the query, the result will alsobe the CARTESIAN PRODUCT of the two tables (see Q10 in Section 4.3.3).

6.3 Binary Relational Operations: JOIN and DIVISION

6.3.1 The JOIN OperationThe JOIN operation, denoted by , is used to combine related tuples from two rela-tions into single “longer” tuples. This operation is very important for any relationaldatabase with more than a single relation because it allows us to process relation-ships among relations. To illustrate JOIN, suppose that we want to retrieve the nameof the manager of each department. To get the manager’s name, we need to combineeach department tuple with the employee tuple whose Ssn value matches theMgr_ssn value in the department tuple. We do this by using the JOIN operation andthen projecting the result over the necessary attributes, as follows:

DEPT_MGR ! DEPARTMENT Mgr_ssn=Ssn EMPLOYEERESULT ! "Dname, Lname, Fname(DEPT_MGR)

The first operation is illustrated in Figure 6.6. Note that Mgr_ssn is a foreign key ofthe DEPARTMENT relation that references Ssn, the primary key of the EMPLOYEErelation. This referential integrity constraint plays a role in having matching tuplesin the referenced relation EMPLOYEE.

The JOIN operation can be specified as a CARTESIAN PRODUCT operation followedby a SELECT operation. However, JOIN is very important because it is used very fre-quently when specifying database queries. Consider the earlier example illustratingCARTESIAN PRODUCT, which included the following sequence of operations:

EMP_DEPENDENTS ! EMPNAMES # DEPENDENTACTUAL_DEPENDENTS ! $Ssn=Essn(EMP_DEPENDENTS)

158 Chapter 6 The Relational Algebra and Relational Calculus

DEPT_MGR

Dname Dnumber Mgr_ssn Fname Minit Lname SsnResearch 5 333445555 Franklin T Wong 333445555

Administration 4 987654321 Jennifer S Wallace 987654321Headquarters 1 888665555 James E Borg 888665555

. . . . . .

. . .

. . .

. . .

. . .

. . .

. . .

Figure 6.6Result of the JOIN operation DEPT_MGR ! DEPARTMENT Mgr_ssn=SsnEMPLOYEE.

These two operations can be replaced with a single JOIN operation as follows:

ACTUAL_DEPENDENTS ! EMPNAMES Ssn=EssnDEPENDENT

The general form of a JOIN operation on two relations5 R(A1, A2, ..., An) and S(B1,B2, ..., Bm) is

R <join condition>S

The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, ..., An, B1, B2,... , Bm) in that order; Q has one tuple for each combination of tuples—one from Rand one from S—whenever the combination satisfies the join condition. This is themain difference between CARTESIAN PRODUCT and JOIN. In JOIN, only combina-tions of tuples satisfying the join condition appear in the result, whereas in theCARTESIAN PRODUCT all combinations of tuples are included in the result. Thejoin condition is specified on attributes from the two relations R and S and is evalu-ated for each combination of tuples. Each tuple combination for which the joincondition evaluates to TRUE is included in the resulting relation Q as a single com-bined tuple.

A general join condition is of the form

<condition> AND <condition> AND...AND <condition>

where each <condition> is of the form Ai " Bj, Ai is an attribute of R, Bj is an attrib-ute of S, Ai and Bj have the same domain, and " (theta) is one of the comparisonoperators {=, <, #, >, $, %}. A JOIN operation with such a general join condition iscalled a THETA JOIN. Tuples whose join attributes are NULL or for which the joincondition is FALSE do not appear in the result. In that sense, the JOIN operation doesnot necessarily preserve all of the information in the participating relations, becausetuples that do not get combined with matching ones in the other relation do notappear in the result.

5Again, notice that R and S can be any relations that result from general relational algebra expressions.

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 34

JOIN Operation }  The general form of a JOIN operation on two relations

R(A1, A2, ..., An) and S(B1, B2, ..., Bm) is

}  A general join condition is of the form

158 Chapter 6 The Relational Algebra and Relational Calculus

DEPT_MGR

Dname Dnumber Mgr_ssn Fname Minit Lname SsnResearch 5 333445555 Franklin T Wong 333445555

Administration 4 987654321 Jennifer S Wallace 987654321Headquarters 1 888665555 James E Borg 888665555

. . . . . .

. . .

. . .

. . .

. . .

. . .

. . .

Figure 6.6Result of the JOIN operation DEPT_MGR ! DEPARTMENT Mgr_ssn=SsnEMPLOYEE.

These two operations can be replaced with a single JOIN operation as follows:

ACTUAL_DEPENDENTS ! EMPNAMES Ssn=EssnDEPENDENT

The general form of a JOIN operation on two relations5 R(A1, A2, ..., An) and S(B1,B2, ..., Bm) is

R <join condition>S

The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, ..., An, B1, B2,... , Bm) in that order; Q has one tuple for each combination of tuples—one from Rand one from S—whenever the combination satisfies the join condition. This is themain difference between CARTESIAN PRODUCT and JOIN. In JOIN, only combina-tions of tuples satisfying the join condition appear in the result, whereas in theCARTESIAN PRODUCT all combinations of tuples are included in the result. Thejoin condition is specified on attributes from the two relations R and S and is evalu-ated for each combination of tuples. Each tuple combination for which the joincondition evaluates to TRUE is included in the resulting relation Q as a single com-bined tuple.

A general join condition is of the form

<condition> AND <condition> AND...AND <condition>

where each <condition> is of the form Ai " Bj, Ai is an attribute of R, Bj is an attrib-ute of S, Ai and Bj have the same domain, and " (theta) is one of the comparisonoperators {=, <, #, >, $, %}. A JOIN operation with such a general join condition iscalled a THETA JOIN. Tuples whose join attributes are NULL or for which the joincondition is FALSE do not appear in the result. In that sense, the JOIN operation doesnot necessarily preserve all of the information in the participating relations, becausetuples that do not get combined with matching ones in the other relation do notappear in the result.

5Again, notice that R and S can be any relations that result from general relational algebra expressions.

158 Chapter 6 The Relational Algebra and Relational Calculus

DEPT_MGR

Dname Dnumber Mgr_ssn Fname Minit Lname SsnResearch 5 333445555 Franklin T Wong 333445555

Administration 4 987654321 Jennifer S Wallace 987654321Headquarters 1 888665555 James E Borg 888665555

. . . . . .

. . .

. . .

. . .

. . .

. . .

. . .

Figure 6.6Result of the JOIN operation DEPT_MGR ! DEPARTMENT Mgr_ssn=SsnEMPLOYEE.

These two operations can be replaced with a single JOIN operation as follows:

ACTUAL_DEPENDENTS ! EMPNAMES Ssn=EssnDEPENDENT

The general form of a JOIN operation on two relations5 R(A1, A2, ..., An) and S(B1,B2, ..., Bm) is

R <join condition>S

The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, ..., An, B1, B2,... , Bm) in that order; Q has one tuple for each combination of tuples—one from Rand one from S—whenever the combination satisfies the join condition. This is themain difference between CARTESIAN PRODUCT and JOIN. In JOIN, only combina-tions of tuples satisfying the join condition appear in the result, whereas in theCARTESIAN PRODUCT all combinations of tuples are included in the result. Thejoin condition is specified on attributes from the two relations R and S and is evalu-ated for each combination of tuples. Each tuple combination for which the joincondition evaluates to TRUE is included in the resulting relation Q as a single com-bined tuple.

A general join condition is of the form

<condition> AND <condition> AND...AND <condition>

where each <condition> is of the form Ai " Bj, Ai is an attribute of R, Bj is an attrib-ute of S, Ai and Bj have the same domain, and " (theta) is one of the comparisonoperators {=, <, #, >, $, %}. A JOIN operation with such a general join condition iscalled a THETA JOIN. Tuples whose join attributes are NULL or for which the joincondition is FALSE do not appear in the result. In that sense, the JOIN operation doesnot necessarily preserve all of the information in the participating relations, becausetuples that do not get combined with matching ones in the other relation do notappear in the result.

5Again, notice that R and S can be any relations that result from general relational algebra expressions.

Page 18: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

18

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 35

THETA JOIN Operation

}  THETA JOIN }  Each <condition> of the form Ai θ Bj

}  θ (theta) is one of the comparison operators: {=, <, ≤, >, ≥, ≠} }  Ai is an attribute of R }  Bj is an attribute of S }  Ai and Bj have the same domain

Variations of JOIN: The EQUIJOIN and NATURAL JOIN

Page 19: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

19

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 37

The EQUIJOIN

} EQUIJOIN }  Only = comparison operator used }  Always have one or more pairs of attributes that

have identical values in every tuple

158 Chapter 6 The Relational Algebra and Relational Calculus

DEPT_MGR

Dname Dnumber Mgr_ssn Fname Minit Lname SsnResearch 5 333445555 Franklin T Wong 333445555

Administration 4 987654321 Jennifer S Wallace 987654321Headquarters 1 888665555 James E Borg 888665555

. . . . . .

. . .

. . .

. . .

. . .

. . .

. . .

Figure 6.6Result of the JOIN operation DEPT_MGR ! DEPARTMENT Mgr_ssn=SsnEMPLOYEE.

These two operations can be replaced with a single JOIN operation as follows:

ACTUAL_DEPENDENTS ! EMPNAMES Ssn=EssnDEPENDENT

The general form of a JOIN operation on two relations5 R(A1, A2, ..., An) and S(B1,B2, ..., Bm) is

R <join condition>S

The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, ..., An, B1, B2,... , Bm) in that order; Q has one tuple for each combination of tuples—one from Rand one from S—whenever the combination satisfies the join condition. This is themain difference between CARTESIAN PRODUCT and JOIN. In JOIN, only combina-tions of tuples satisfying the join condition appear in the result, whereas in theCARTESIAN PRODUCT all combinations of tuples are included in the result. Thejoin condition is specified on attributes from the two relations R and S and is evalu-ated for each combination of tuples. Each tuple combination for which the joincondition evaluates to TRUE is included in the resulting relation Q as a single com-bined tuple.

A general join condition is of the form

<condition> AND <condition> AND...AND <condition>

where each <condition> is of the form Ai " Bj, Ai is an attribute of R, Bj is an attrib-ute of S, Ai and Bj have the same domain, and " (theta) is one of the comparisonoperators {=, <, #, >, $, %}. A JOIN operation with such a general join condition iscalled a THETA JOIN. Tuples whose join attributes are NULL or for which the joincondition is FALSE do not appear in the result. In that sense, the JOIN operation doesnot necessarily preserve all of the information in the participating relations, becausetuples that do not get combined with matching ones in the other relation do notappear in the result.

5Again, notice that R and S can be any relations that result from general relational algebra expressions.

6.3 Binary Relational Operations: JOIN and DIVISION 157

Essn value match the Ssn value of the EMPLOYEE tuple. The ACTUAL_DEPENDENTSrelation accomplishes this. The EMP_DEPENDENTS relation is a good example ofthe case where relational algebra can be correctly applied to yield results that makeno sense at all. It is the responsibility of the user to make sure to apply only mean-ingful operations to relations.

The CARTESIAN PRODUCT creates tuples with the combined attributes of two rela-tions. We can SELECT related tuples only from the two relations by specifying anappropriate selection condition after the Cartesian product, as we did in the preced-ing example. Because this sequence of CARTESIAN PRODUCT followed by SELECTis quite commonly used to combine related tuples from two relations, a special oper-ation, called JOIN, was created to specify this sequence as a single operation. We dis-cuss the JOIN operation next.

In SQL, CARTESIAN PRODUCT can be realized by using the CROSS JOIN option injoined tables (see Section 5.1.6). Alternatively, if there are two tables in the WHEREclause and there is no corresponding join condition in the query, the result will alsobe the CARTESIAN PRODUCT of the two tables (see Q10 in Section 4.3.3).

6.3 Binary Relational Operations: JOIN and DIVISION

6.3.1 The JOIN OperationThe JOIN operation, denoted by , is used to combine related tuples from two rela-tions into single “longer” tuples. This operation is very important for any relationaldatabase with more than a single relation because it allows us to process relation-ships among relations. To illustrate JOIN, suppose that we want to retrieve the nameof the manager of each department. To get the manager’s name, we need to combineeach department tuple with the employee tuple whose Ssn value matches theMgr_ssn value in the department tuple. We do this by using the JOIN operation andthen projecting the result over the necessary attributes, as follows:

DEPT_MGR ! DEPARTMENT Mgr_ssn=Ssn EMPLOYEERESULT ! "Dname, Lname, Fname(DEPT_MGR)

The first operation is illustrated in Figure 6.6. Note that Mgr_ssn is a foreign key ofthe DEPARTMENT relation that references Ssn, the primary key of the EMPLOYEErelation. This referential integrity constraint plays a role in having matching tuplesin the referenced relation EMPLOYEE.

The JOIN operation can be specified as a CARTESIAN PRODUCT operation followedby a SELECT operation. However, JOIN is very important because it is used very fre-quently when specifying database queries. Consider the earlier example illustratingCARTESIAN PRODUCT, which included the following sequence of operations:

EMP_DEPENDENTS ! EMPNAMES # DEPENDENTACTUAL_DEPENDENTS ! $Ssn=Essn(EMP_DEPENDENTS)

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 38

NATURAL JOIN }  NATURAL JOIN

}  Denoted by * }  Removes second (superfluous) attribute in an EQUIJOIN

condition

}  Or

}  The attribute Dnum is called the join attribute for the NATURAL JOIN operation, because it is the only attribute with the same name in both relations.

6.3 Binary Relational Operations: JOIN and DIVISION 159

6.3.2 Variations of JOIN: The EQUIJOIN and NATURAL JOIN

The most common use of JOIN involves join conditions with equality comparisonsonly. Such a JOIN, where the only comparison operator used is =, is called anEQUIJOIN. Both previous examples were EQUIJOINs. Notice that in the result of anEQUIJOIN we always have one or more pairs of attributes that have identical valuesin every tuple. For example, in Figure 6.6, the values of the attributes Mgr_ssn andSsn are identical in every tuple of DEPT_MGR (the EQUIJOIN result) because theequality join condition specified on these two attributes requires the values to beidentical in every tuple in the result. Because one of each pair of attributes withidentical values is superfluous, a new operation called NATURAL JOIN—denoted by*—was created to get rid of the second (superfluous) attribute in an EQUIJOIN con-dition.6 The standard definition of NATURAL JOIN requires that the two join attrib-utes (or each pair of join attributes) have the same name in both relations. If this isnot the case, a renaming operation is applied first.

Suppose we want to combine each PROJECT tuple with the DEPARTMENT tuple thatcontrols the project. In the following example, first we rename the Dnumber attributeof DEPARTMENT to Dnum—so that it has the same name as the Dnum attribute inPROJECT—and then we apply NATURAL JOIN:

PROJ_DEPT ! PROJECT * "(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)

The same query can be done in two steps by creating an intermediate table DEPT asfollows:

DEPT ! "(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)PROJ_DEPT ! PROJECT * DEPT

The attribute Dnum is called the join attribute for the NATURAL JOIN operation,because it is the only attribute with the same name in both relations. The resultingrelation is illustrated in Figure 6.7(a). In the PROJ_DEPT relation, each tuple com-bines a PROJECT tuple with the DEPARTMENT tuple for the department that con-trols the project, but only one join attribute value is kept.

If the attributes on which the natural join is specified already have the same names inboth relations, renaming is unnecessary. For example, to apply a natural join on theDnumber attributes of DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write

DEPT_LOCS ! DEPARTMENT * DEPT_LOCATIONS

The resulting relation is shown in Figure 6.7(b), which combines each departmentwith its locations and has one tuple for each location. In general, the join conditionfor NATURAL JOIN is constructed by equating each pair of join attributes that havethe same name in the two relations and combining these conditions with AND.There can be a list of join attributes from each relation, and each corresponding pairmust have the same name.

6NATURAL JOIN is basically an EQUIJOIN followed by the removal of the superfluous attributes.

6.3 Binary Relational Operations: JOIN and DIVISION 159

6.3.2 Variations of JOIN: The EQUIJOIN and NATURAL JOIN

The most common use of JOIN involves join conditions with equality comparisonsonly. Such a JOIN, where the only comparison operator used is =, is called anEQUIJOIN. Both previous examples were EQUIJOINs. Notice that in the result of anEQUIJOIN we always have one or more pairs of attributes that have identical valuesin every tuple. For example, in Figure 6.6, the values of the attributes Mgr_ssn andSsn are identical in every tuple of DEPT_MGR (the EQUIJOIN result) because theequality join condition specified on these two attributes requires the values to beidentical in every tuple in the result. Because one of each pair of attributes withidentical values is superfluous, a new operation called NATURAL JOIN—denoted by*—was created to get rid of the second (superfluous) attribute in an EQUIJOIN con-dition.6 The standard definition of NATURAL JOIN requires that the two join attrib-utes (or each pair of join attributes) have the same name in both relations. If this isnot the case, a renaming operation is applied first.

Suppose we want to combine each PROJECT tuple with the DEPARTMENT tuple thatcontrols the project. In the following example, first we rename the Dnumber attributeof DEPARTMENT to Dnum—so that it has the same name as the Dnum attribute inPROJECT—and then we apply NATURAL JOIN:

PROJ_DEPT ! PROJECT * "(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)

The same query can be done in two steps by creating an intermediate table DEPT asfollows:

DEPT ! "(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)PROJ_DEPT ! PROJECT * DEPT

The attribute Dnum is called the join attribute for the NATURAL JOIN operation,because it is the only attribute with the same name in both relations. The resultingrelation is illustrated in Figure 6.7(a). In the PROJ_DEPT relation, each tuple com-bines a PROJECT tuple with the DEPARTMENT tuple for the department that con-trols the project, but only one join attribute value is kept.

If the attributes on which the natural join is specified already have the same names inboth relations, renaming is unnecessary. For example, to apply a natural join on theDnumber attributes of DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write

DEPT_LOCS ! DEPARTMENT * DEPT_LOCATIONS

The resulting relation is shown in Figure 6.7(b), which combines each departmentwith its locations and has one tuple for each location. In general, the join conditionfor NATURAL JOIN is constructed by equating each pair of join attributes that havethe same name in the two relations and combining these conditions with AND.There can be a list of join attributes from each relation, and each corresponding pairmust have the same name.

6NATURAL JOIN is basically an EQUIJOIN followed by the removal of the superfluous attributes.

Page 20: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

20

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 39

NATURAL JOIN }  NATURAL JOIN

6.3 Binary Relational Operations: JOIN and DIVISION 159

6.3.2 Variations of JOIN: The EQUIJOIN and NATURAL JOIN

The most common use of JOIN involves join conditions with equality comparisonsonly. Such a JOIN, where the only comparison operator used is =, is called anEQUIJOIN. Both previous examples were EQUIJOINs. Notice that in the result of anEQUIJOIN we always have one or more pairs of attributes that have identical valuesin every tuple. For example, in Figure 6.6, the values of the attributes Mgr_ssn andSsn are identical in every tuple of DEPT_MGR (the EQUIJOIN result) because theequality join condition specified on these two attributes requires the values to beidentical in every tuple in the result. Because one of each pair of attributes withidentical values is superfluous, a new operation called NATURAL JOIN—denoted by*—was created to get rid of the second (superfluous) attribute in an EQUIJOIN con-dition.6 The standard definition of NATURAL JOIN requires that the two join attrib-utes (or each pair of join attributes) have the same name in both relations. If this isnot the case, a renaming operation is applied first.

Suppose we want to combine each PROJECT tuple with the DEPARTMENT tuple thatcontrols the project. In the following example, first we rename the Dnumber attributeof DEPARTMENT to Dnum—so that it has the same name as the Dnum attribute inPROJECT—and then we apply NATURAL JOIN:

PROJ_DEPT ! PROJECT * "(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)

The same query can be done in two steps by creating an intermediate table DEPT asfollows:

DEPT ! "(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)PROJ_DEPT ! PROJECT * DEPT

The attribute Dnum is called the join attribute for the NATURAL JOIN operation,because it is the only attribute with the same name in both relations. The resultingrelation is illustrated in Figure 6.7(a). In the PROJ_DEPT relation, each tuple com-bines a PROJECT tuple with the DEPARTMENT tuple for the department that con-trols the project, but only one join attribute value is kept.

If the attributes on which the natural join is specified already have the same names inboth relations, renaming is unnecessary. For example, to apply a natural join on theDnumber attributes of DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write

DEPT_LOCS ! DEPARTMENT * DEPT_LOCATIONS

The resulting relation is shown in Figure 6.7(b), which combines each departmentwith its locations and has one tuple for each location. In general, the join conditionfor NATURAL JOIN is constructed by equating each pair of join attributes that havethe same name in the two relations and combining these conditions with AND.There can be a list of join attributes from each relation, and each corresponding pairmust have the same name.

6NATURAL JOIN is basically an EQUIJOIN followed by the removal of the superfluous attributes.

6.3 Binary Relational Operations: JOIN and DIVISION 159

6.3.2 Variations of JOIN: The EQUIJOIN and NATURAL JOIN

The most common use of JOIN involves join conditions with equality comparisonsonly. Such a JOIN, where the only comparison operator used is =, is called anEQUIJOIN. Both previous examples were EQUIJOINs. Notice that in the result of anEQUIJOIN we always have one or more pairs of attributes that have identical valuesin every tuple. For example, in Figure 6.6, the values of the attributes Mgr_ssn andSsn are identical in every tuple of DEPT_MGR (the EQUIJOIN result) because theequality join condition specified on these two attributes requires the values to beidentical in every tuple in the result. Because one of each pair of attributes withidentical values is superfluous, a new operation called NATURAL JOIN—denoted by*—was created to get rid of the second (superfluous) attribute in an EQUIJOIN con-dition.6 The standard definition of NATURAL JOIN requires that the two join attrib-utes (or each pair of join attributes) have the same name in both relations. If this isnot the case, a renaming operation is applied first.

Suppose we want to combine each PROJECT tuple with the DEPARTMENT tuple thatcontrols the project. In the following example, first we rename the Dnumber attributeof DEPARTMENT to Dnum—so that it has the same name as the Dnum attribute inPROJECT—and then we apply NATURAL JOIN:

PROJ_DEPT ! PROJECT * "(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)

The same query can be done in two steps by creating an intermediate table DEPT asfollows:

DEPT ! "(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)PROJ_DEPT ! PROJECT * DEPT

The attribute Dnum is called the join attribute for the NATURAL JOIN operation,because it is the only attribute with the same name in both relations. The resultingrelation is illustrated in Figure 6.7(a). In the PROJ_DEPT relation, each tuple com-bines a PROJECT tuple with the DEPARTMENT tuple for the department that con-trols the project, but only one join attribute value is kept.

If the attributes on which the natural join is specified already have the same names inboth relations, renaming is unnecessary. For example, to apply a natural join on theDnumber attributes of DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write

DEPT_LOCS ! DEPARTMENT * DEPT_LOCATIONS

The resulting relation is shown in Figure 6.7(b), which combines each departmentwith its locations and has one tuple for each location. In general, the join conditionfor NATURAL JOIN is constructed by equating each pair of join attributes that havethe same name in the two relations and combining these conditions with AND.There can be a list of join attributes from each relation, and each corresponding pairmust have the same name.

6NATURAL JOIN is basically an EQUIJOIN followed by the removal of the superfluous attributes.

160 Chapter 6 The Relational Algebra and Relational Calculus

Pname

PROJ_DEPT(a)

ProductX

ProductYProductZ

ComputerizationReorganizationNewbenefits

3

1

2

30

1020

Pnumber

Houston

Bellaire

Sugarland

Stafford

StaffordHouston

Plocation

5

5 333445555

5

4

41

Dnum

Research

Research

Research

Administration

AdministrationHeadquarters

Dname

333445555333445555

987654321

987654321888665555

1988-05-22

1988-05-221988-05-22

1995-01-01

1995-01-011981-06-19

Mgr_ssn Mgr_start_date

Dname

DEPT_LOCS(b)

5

1

4

55

Dnumber

333445555

888665555

987654321

333445555333445555

Mgr_ssn

1988-05-22

1981-06-19

1995-01-01

Research

ResearchResearch

Administration

1988-05-221988-05-22

Headquarters Houston

BellaireStafford

SugarlandHouston

LocationMgr_start_date

Figure 6.7Results of two NATURAL JOIN operations. (a) PROJ_DEPT ! PROJECT * DEPT.(b) DEPT_LOCS ! DEPARTMENT * DEPT_LOCATIONS.

A more general, but nonstandard definition for NATURAL JOIN is

Q ! R *(<list1>),(<list2>)S

In this case, <list1> specifies a list of i attributes from R, and <list2> specifies a listof i attributes from S. The lists are used to form equality comparison conditionsbetween pairs of corresponding attributes, and the conditions are then ANDedtogether. Only the list corresponding to attributes of the first relation R—<list1>—is kept in the result Q.

Notice that if no combination of tuples satisfies the join condition, the result of aJOIN is an empty relation with zero tuples. In general, if R has nR tuples and S has nStuples, the result of a JOIN operation R <join condition> S will have between zero andnR * nS tuples. The expected size of the join result divided by the maximum size nR *nS leads to a ratio called join selectivity, which is a property of each join condition.If there is no join condition, all combinations of tuples qualify and the JOIN degen-erates into a CARTESIAN PRODUCT, also called CROSS PRODUCT or CROSS JOIN.

As we can see, a single JOIN operation is used to combine data from two relations sothat related information can be presented in a single table. These operations are alsoknown as inner joins, to distinguish them from a different join variation called

160 Chapter 6 The Relational Algebra and Relational Calculus

Pname

PROJ_DEPT(a)

ProductX

ProductYProductZ

ComputerizationReorganizationNewbenefits

3

1

2

30

1020

Pnumber

Houston

Bellaire

Sugarland

Stafford

StaffordHouston

Plocation

5

5 333445555

5

4

41

Dnum

Research

Research

Research

Administration

AdministrationHeadquarters

Dname

333445555333445555

987654321

987654321888665555

1988-05-22

1988-05-221988-05-22

1995-01-01

1995-01-011981-06-19

Mgr_ssn Mgr_start_date

Dname

DEPT_LOCS(b)

5

1

4

55

Dnumber

333445555

888665555

987654321

333445555333445555

Mgr_ssn

1988-05-22

1981-06-19

1995-01-01

Research

ResearchResearch

Administration

1988-05-221988-05-22

Headquarters Houston

BellaireStafford

SugarlandHouston

LocationMgr_start_date

Figure 6.7Results of two NATURAL JOIN operations. (a) PROJ_DEPT ! PROJECT * DEPT.(b) DEPT_LOCS ! DEPARTMENT * DEPT_LOCATIONS.

A more general, but nonstandard definition for NATURAL JOIN is

Q ! R *(<list1>),(<list2>)S

In this case, <list1> specifies a list of i attributes from R, and <list2> specifies a listof i attributes from S. The lists are used to form equality comparison conditionsbetween pairs of corresponding attributes, and the conditions are then ANDedtogether. Only the list corresponding to attributes of the first relation R—<list1>—is kept in the result Q.

Notice that if no combination of tuples satisfies the join condition, the result of aJOIN is an empty relation with zero tuples. In general, if R has nR tuples and S has nStuples, the result of a JOIN operation R <join condition> S will have between zero andnR * nS tuples. The expected size of the join result divided by the maximum size nR *nS leads to a ratio called join selectivity, which is a property of each join condition.If there is no join condition, all combinations of tuples qualify and the JOIN degen-erates into a CARTESIAN PRODUCT, also called CROSS PRODUCT or CROSS JOIN.

As we can see, a single JOIN operation is used to combine data from two relations sothat related information can be presented in a single table. These operations are alsoknown as inner joins, to distinguish them from a different join variation called

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 40

NATURAL JOIN }  NATURAL JOIN

6.3 Binary Relational Operations: JOIN and DIVISION 159

6.3.2 Variations of JOIN: The EQUIJOIN and NATURAL JOIN

The most common use of JOIN involves join conditions with equality comparisonsonly. Such a JOIN, where the only comparison operator used is =, is called anEQUIJOIN. Both previous examples were EQUIJOINs. Notice that in the result of anEQUIJOIN we always have one or more pairs of attributes that have identical valuesin every tuple. For example, in Figure 6.6, the values of the attributes Mgr_ssn andSsn are identical in every tuple of DEPT_MGR (the EQUIJOIN result) because theequality join condition specified on these two attributes requires the values to beidentical in every tuple in the result. Because one of each pair of attributes withidentical values is superfluous, a new operation called NATURAL JOIN—denoted by*—was created to get rid of the second (superfluous) attribute in an EQUIJOIN con-dition.6 The standard definition of NATURAL JOIN requires that the two join attrib-utes (or each pair of join attributes) have the same name in both relations. If this isnot the case, a renaming operation is applied first.

Suppose we want to combine each PROJECT tuple with the DEPARTMENT tuple thatcontrols the project. In the following example, first we rename the Dnumber attributeof DEPARTMENT to Dnum—so that it has the same name as the Dnum attribute inPROJECT—and then we apply NATURAL JOIN:

PROJ_DEPT ! PROJECT * "(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)

The same query can be done in two steps by creating an intermediate table DEPT asfollows:

DEPT ! "(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)PROJ_DEPT ! PROJECT * DEPT

The attribute Dnum is called the join attribute for the NATURAL JOIN operation,because it is the only attribute with the same name in both relations. The resultingrelation is illustrated in Figure 6.7(a). In the PROJ_DEPT relation, each tuple com-bines a PROJECT tuple with the DEPARTMENT tuple for the department that con-trols the project, but only one join attribute value is kept.

If the attributes on which the natural join is specified already have the same names inboth relations, renaming is unnecessary. For example, to apply a natural join on theDnumber attributes of DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write

DEPT_LOCS ! DEPARTMENT * DEPT_LOCATIONS

The resulting relation is shown in Figure 6.7(b), which combines each departmentwith its locations and has one tuple for each location. In general, the join conditionfor NATURAL JOIN is constructed by equating each pair of join attributes that havethe same name in the two relations and combining these conditions with AND.There can be a list of join attributes from each relation, and each corresponding pairmust have the same name.

6NATURAL JOIN is basically an EQUIJOIN followed by the removal of the superfluous attributes.

160 Chapter 6 The Relational Algebra and Relational Calculus

Pname

PROJ_DEPT(a)

ProductX

ProductYProductZ

ComputerizationReorganizationNewbenefits

3

1

2

30

1020

Pnumber

Houston

Bellaire

Sugarland

Stafford

StaffordHouston

Plocation

5

5 333445555

5

4

41

Dnum

Research

Research

Research

Administration

AdministrationHeadquarters

Dname

333445555333445555

987654321

987654321888665555

1988-05-22

1988-05-221988-05-22

1995-01-01

1995-01-011981-06-19

Mgr_ssn Mgr_start_date

Dname

DEPT_LOCS(b)

5

1

4

55

Dnumber

333445555

888665555

987654321

333445555333445555

Mgr_ssn

1988-05-22

1981-06-19

1995-01-01

Research

ResearchResearch

Administration

1988-05-221988-05-22

Headquarters Houston

BellaireStafford

SugarlandHouston

LocationMgr_start_date

Figure 6.7Results of two NATURAL JOIN operations. (a) PROJ_DEPT ! PROJECT * DEPT.(b) DEPT_LOCS ! DEPARTMENT * DEPT_LOCATIONS.

A more general, but nonstandard definition for NATURAL JOIN is

Q ! R *(<list1>),(<list2>)S

In this case, <list1> specifies a list of i attributes from R, and <list2> specifies a listof i attributes from S. The lists are used to form equality comparison conditionsbetween pairs of corresponding attributes, and the conditions are then ANDedtogether. Only the list corresponding to attributes of the first relation R—<list1>—is kept in the result Q.

Notice that if no combination of tuples satisfies the join condition, the result of aJOIN is an empty relation with zero tuples. In general, if R has nR tuples and S has nStuples, the result of a JOIN operation R <join condition> S will have between zero andnR * nS tuples. The expected size of the join result divided by the maximum size nR *nS leads to a ratio called join selectivity, which is a property of each join condition.If there is no join condition, all combinations of tuples qualify and the JOIN degen-erates into a CARTESIAN PRODUCT, also called CROSS PRODUCT or CROSS JOIN.

As we can see, a single JOIN operation is used to combine data from two relations sothat related information can be presented in a single table. These operations are alsoknown as inner joins, to distinguish them from a different join variation called

Page 21: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

21

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 41

Variations of JOIN: The EQUIJOIN and NATURAL JOIN (cont’d.)

}  In general, if R has nR tuples and S has nS tuples, the result of a JOIN operation will have between zero and nR * nS tuples. nR * nS is the maximum size of the join.

}  Join selectivity }  Ratio of the expected size of join result divided by the

maximum size nR * nS

}  Inner joins }  Type of match and combine operation }  Defined formally as a combination of CARTESIAN PRODUCT

and SELECTION

A Complete Set of Relational Algebra Operations

Page 22: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

22

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 43

A Complete Set of Relational Algebra Operations

}  It has been shown that the set of relational algebra operations {σ, π, ∪, ρ, –, ×} is a complete set }  This means that: any relational algebra operation can be

expressed as a sequence of operations from this set }  For example, the INTERSECTION operation can be

expressed by using UNION and MINUS as follows:

}  a JOIN operation can be specified as a CARTESIAN PRODUCT followed by a SELECT operation

6.3 Binary Relational Operations: JOIN and DIVISION 161

outer joins (see Section 6.4.4). Informally, an inner join is a type of match and com-bine operation defined formally as a combination of CARTESIAN PRODUCT andSELECTION. Note that sometimes a join may be specified between a relation anditself, as we will illustrate in Section 6.4.3. The NATURAL JOIN or EQUIJOIN opera-tion can also be specified among multiple tables, leading to an n-way join. Forexample, consider the following three-way join:

((PROJECT Dnum=DnumberDEPARTMENT) Mgr_ssn=SsnEMPLOYEE)

This combines each project tuple with its controlling department tuple into a singletuple, and then combines that tuple with an employee tuple that is the departmentmanager. The net result is a consolidated relation in which each tuple contains thisproject-department-manager combined information.

In SQL, JOIN can be realized in several different ways. The first method is to specifythe <join conditions> in the WHERE clause, along with any other selection condi-tions. This is very common, and is illustrated by queries Q1, Q1A, Q1B, Q2, and Q8in Sections 4.3.1 and 4.3.2, as well as by many other query examples in Chapters 4and 5. The second way is to use a nested relation, as illustrated by queries Q4A andQ16 in Section 5.1.2. Another way is to use the concept of joined tables, as illus-trated by the queries Q1A, Q1B, Q8B, and Q2A in Section 5.1.6. The construct ofjoined tables was added to SQL2 to allow the user to specify explicitly all the varioustypes of joins, because the other methods were more limited. It also allows the userto clearly distinguish join conditions from the selection conditions in the WHEREclause.

6.3.3 A Complete Set of Relational Algebra OperationsIt has been shown that the set of relational algebra operations {!, ", #, $, –, %} is acomplete set; that is, any of the other original relational algebra operations can beexpressed as a sequence of operations from this set. For example, the INTERSECTIONoperation can be expressed by using UNION and MINUS as follows:

R & S ' (R # S) – ((R – S) # (S – R))

Although, strictly speaking, INTERSECTION is not required, it is inconvenient tospecify this complex expression every time we wish to specify an intersection. Asanother example, a JOIN operation can be specified as a CARTESIAN PRODUCT fol-lowed by a SELECT operation, as we discussed:

R <condition>S ' !<condition>(R % S)

Similarly, a NATURAL JOIN can be specified as a CARTESIAN PRODUCT preceded byRENAME and followed by SELECT and PROJECT operations. Hence, the variousJOIN operations are also not strictly necessary for the expressive power of the rela-tional algebra. However, they are important to include as separate operationsbecause they are convenient to use and are very commonly applied in databaseapplications. Other operations have been included in the basic relational algebra forconvenience rather than necessity. We discuss one of these—the DIVISION opera-tion—in the next section.

6.3 Binary Relational Operations: JOIN and DIVISION 161

outer joins (see Section 6.4.4). Informally, an inner join is a type of match and com-bine operation defined formally as a combination of CARTESIAN PRODUCT andSELECTION. Note that sometimes a join may be specified between a relation anditself, as we will illustrate in Section 6.4.3. The NATURAL JOIN or EQUIJOIN opera-tion can also be specified among multiple tables, leading to an n-way join. Forexample, consider the following three-way join:

((PROJECT Dnum=DnumberDEPARTMENT) Mgr_ssn=SsnEMPLOYEE)

This combines each project tuple with its controlling department tuple into a singletuple, and then combines that tuple with an employee tuple that is the departmentmanager. The net result is a consolidated relation in which each tuple contains thisproject-department-manager combined information.

In SQL, JOIN can be realized in several different ways. The first method is to specifythe <join conditions> in the WHERE clause, along with any other selection condi-tions. This is very common, and is illustrated by queries Q1, Q1A, Q1B, Q2, and Q8in Sections 4.3.1 and 4.3.2, as well as by many other query examples in Chapters 4and 5. The second way is to use a nested relation, as illustrated by queries Q4A andQ16 in Section 5.1.2. Another way is to use the concept of joined tables, as illus-trated by the queries Q1A, Q1B, Q8B, and Q2A in Section 5.1.6. The construct ofjoined tables was added to SQL2 to allow the user to specify explicitly all the varioustypes of joins, because the other methods were more limited. It also allows the userto clearly distinguish join conditions from the selection conditions in the WHEREclause.

6.3.3 A Complete Set of Relational Algebra OperationsIt has been shown that the set of relational algebra operations {!, ", #, $, –, %} is acomplete set; that is, any of the other original relational algebra operations can beexpressed as a sequence of operations from this set. For example, the INTERSECTIONoperation can be expressed by using UNION and MINUS as follows:

R & S ' (R # S) – ((R – S) # (S – R))

Although, strictly speaking, INTERSECTION is not required, it is inconvenient tospecify this complex expression every time we wish to specify an intersection. Asanother example, a JOIN operation can be specified as a CARTESIAN PRODUCT fol-lowed by a SELECT operation, as we discussed:

R <condition>S ' !<condition>(R % S)

Similarly, a NATURAL JOIN can be specified as a CARTESIAN PRODUCT preceded byRENAME and followed by SELECT and PROJECT operations. Hence, the variousJOIN operations are also not strictly necessary for the expressive power of the rela-tional algebra. However, they are important to include as separate operationsbecause they are convenient to use and are very commonly applied in databaseapplications. Other operations have been included in the basic relational algebra forconvenience rather than necessity. We discuss one of these—the DIVISION opera-tion—in the next section.

Division Operation

Page 23: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

23

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 45

The DIVISION Operation }  Denoted by ÷ }  Example: retrieve the names of employees who work on all the projects

that ‘John Smith’ works on }  To express this query using the DIVISION operation, proceed as follows. First,

retrieve the list of project numbers that ‘John Smith’ works on in the intermediate relation SMITH_PNOS:

}  Next, create a relation that includes a tuple <Pno, Essn> whenever the employee whose Ssn is Essn works on the project whose number is Pno in the intermediate relation SSN_PNOS:

}  Finally, apply the DIVISION operation to the two relations, which gives the desired employees’ Social Security numbers:

162 Chapter 6 The Relational Algebra and Relational Calculus

Essn

SSN_PNOS(a)

123456789

123456789

666884444453453453

453453453

333445555333445555333445555333445555999887777

999887777987987987987987987987654321987654321888665555

3

1

2

2

1

2

30

3030

1010

31020

2020

Pno A

R(b)

a1

a2

a3a4

a1

a3a2a3a4a1

a2a3

b1

b1

b1

b2

b1

b2

b4

b4b4

b3b3b3

B

SMITH_PNOS

1

2

Pno

S

a1

a2

a3

A

T

b1

b4

B

SSNS

123456789

453453453

Ssn

Figure 6.8The DIVISION operation. (a) Dividing SSN_PNOS by SMITH_PNOS. (b) T ! R ÷ S.

6.3.4 The DIVISION OperationThe DIVISION operation, denoted by ÷, is useful for a special kind of query thatsometimes occurs in database applications. An example is Retrieve the names ofemployees who work on all the projects that ‘John Smith’ works on. To express thisquery using the DIVISION operation, proceed as follows. First, retrieve the list ofproject numbers that ‘John Smith’ works on in the intermediate relationSMITH_PNOS:

SMITH ! "Fname=‘John’ AND Lname=‘Smith’(EMPLOYEE)SMITH_PNOS ! #Pno(WORKS_ON Essn=SsnSMITH)

Next, create a relation that includes a tuple <Pno, Essn> whenever the employeewhose Ssn is Essn works on the project whose number is Pno in the intermediaterelation SSN_PNOS:

SSN_PNOS ! #Essn, Pno(WORKS_ON)

Finally, apply the DIVISION operation to the two relations, which gives the desiredemployees’ Social Security numbers:

SSNS(Ssn) ! SSN_PNOS ÷ SMITH_PNOSRESULT ! #Fname, Lname(SSNS * EMPLOYEE)

The preceding operations are shown in Figure 6.8(a).

162 Chapter 6 The Relational Algebra and Relational Calculus

Essn

SSN_PNOS(a)

123456789

123456789

666884444453453453

453453453

333445555333445555333445555333445555999887777

999887777987987987987987987987654321987654321888665555

3

1

2

2

1

2

30

3030

1010

31020

2020

Pno A

R(b)

a1

a2

a3a4

a1

a3a2a3a4a1

a2a3

b1

b1

b1

b2

b1

b2

b4

b4b4

b3b3b3

B

SMITH_PNOS

1

2

Pno

S

a1

a2

a3

A

T

b1

b4

B

SSNS

123456789

453453453

Ssn

Figure 6.8The DIVISION operation. (a) Dividing SSN_PNOS by SMITH_PNOS. (b) T ! R ÷ S.

6.3.4 The DIVISION OperationThe DIVISION operation, denoted by ÷, is useful for a special kind of query thatsometimes occurs in database applications. An example is Retrieve the names ofemployees who work on all the projects that ‘John Smith’ works on. To express thisquery using the DIVISION operation, proceed as follows. First, retrieve the list ofproject numbers that ‘John Smith’ works on in the intermediate relationSMITH_PNOS:

SMITH ! "Fname=‘John’ AND Lname=‘Smith’(EMPLOYEE)SMITH_PNOS ! #Pno(WORKS_ON Essn=SsnSMITH)

Next, create a relation that includes a tuple <Pno, Essn> whenever the employeewhose Ssn is Essn works on the project whose number is Pno in the intermediaterelation SSN_PNOS:

SSN_PNOS ! #Essn, Pno(WORKS_ON)

Finally, apply the DIVISION operation to the two relations, which gives the desiredemployees’ Social Security numbers:

SSNS(Ssn) ! SSN_PNOS ÷ SMITH_PNOSRESULT ! #Fname, Lname(SSNS * EMPLOYEE)

The preceding operations are shown in Figure 6.8(a).

162 Chapter 6 The Relational Algebra and Relational Calculus

Essn

SSN_PNOS(a)

123456789

123456789

666884444453453453

453453453

333445555333445555333445555333445555999887777

999887777987987987987987987987654321987654321888665555

3

1

2

2

1

2

30

3030

1010

31020

2020

Pno A

R(b)

a1

a2

a3a4

a1

a3a2a3a4a1

a2a3

b1

b1

b1

b2

b1

b2

b4

b4b4

b3b3b3

B

SMITH_PNOS

1

2

Pno

S

a1

a2

a3

A

T

b1

b4

B

SSNS

123456789

453453453

Ssn

Figure 6.8The DIVISION operation. (a) Dividing SSN_PNOS by SMITH_PNOS. (b) T ! R ÷ S.

6.3.4 The DIVISION OperationThe DIVISION operation, denoted by ÷, is useful for a special kind of query thatsometimes occurs in database applications. An example is Retrieve the names ofemployees who work on all the projects that ‘John Smith’ works on. To express thisquery using the DIVISION operation, proceed as follows. First, retrieve the list ofproject numbers that ‘John Smith’ works on in the intermediate relationSMITH_PNOS:

SMITH ! "Fname=‘John’ AND Lname=‘Smith’(EMPLOYEE)SMITH_PNOS ! #Pno(WORKS_ON Essn=SsnSMITH)

Next, create a relation that includes a tuple <Pno, Essn> whenever the employeewhose Ssn is Essn works on the project whose number is Pno in the intermediaterelation SSN_PNOS:

SSN_PNOS ! #Essn, Pno(WORKS_ON)

Finally, apply the DIVISION operation to the two relations, which gives the desiredemployees’ Social Security numbers:

SSNS(Ssn) ! SSN_PNOS ÷ SMITH_PNOSRESULT ! #Fname, Lname(SSNS * EMPLOYEE)

The preceding operations are shown in Figure 6.8(a).

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 46

The DIVISION Operation

162 Chapter 6 The Relational Algebra and Relational Calculus

Essn

SSN_PNOS(a)

123456789

123456789

666884444453453453

453453453

333445555333445555333445555333445555999887777

999887777987987987987987987987654321987654321888665555

3

1

2

2

1

2

30

3030

1010

31020

2020

Pno A

R(b)

a1

a2

a3a4

a1

a3a2a3a4a1

a2a3

b1

b1

b1

b2

b1

b2

b4

b4b4

b3b3b3

B

SMITH_PNOS

1

2

Pno

S

a1

a2

a3

A

T

b1

b4

B

SSNS

123456789

453453453

Ssn

Figure 6.8The DIVISION operation. (a) Dividing SSN_PNOS by SMITH_PNOS. (b) T ! R ÷ S.

6.3.4 The DIVISION OperationThe DIVISION operation, denoted by ÷, is useful for a special kind of query thatsometimes occurs in database applications. An example is Retrieve the names ofemployees who work on all the projects that ‘John Smith’ works on. To express thisquery using the DIVISION operation, proceed as follows. First, retrieve the list ofproject numbers that ‘John Smith’ works on in the intermediate relationSMITH_PNOS:

SMITH ! "Fname=‘John’ AND Lname=‘Smith’(EMPLOYEE)SMITH_PNOS ! #Pno(WORKS_ON Essn=SsnSMITH)

Next, create a relation that includes a tuple <Pno, Essn> whenever the employeewhose Ssn is Essn works on the project whose number is Pno in the intermediaterelation SSN_PNOS:

SSN_PNOS ! #Essn, Pno(WORKS_ON)

Finally, apply the DIVISION operation to the two relations, which gives the desiredemployees’ Social Security numbers:

SSNS(Ssn) ! SSN_PNOS ÷ SMITH_PNOSRESULT ! #Fname, Lname(SSNS * EMPLOYEE)

The preceding operations are shown in Figure 6.8(a).

162 Chapter 6 The Relational Algebra and Relational Calculus

Essn

SSN_PNOS(a)

123456789

123456789

666884444453453453

453453453

333445555333445555333445555333445555999887777

999887777987987987987987987987654321987654321888665555

3

1

2

2

1

2

30

3030

1010

31020

2020

Pno A

R(b)

a1

a2

a3a4

a1

a3a2a3a4a1

a2a3

b1

b1

b1

b2

b1

b2

b4

b4b4

b3b3b3

B

SMITH_PNOS

1

2

Pno

S

a1

a2

a3

A

T

b1

b4

B

SSNS

123456789

453453453

Ssn

Figure 6.8The DIVISION operation. (a) Dividing SSN_PNOS by SMITH_PNOS. (b) T ! R ÷ S.

6.3.4 The DIVISION OperationThe DIVISION operation, denoted by ÷, is useful for a special kind of query thatsometimes occurs in database applications. An example is Retrieve the names ofemployees who work on all the projects that ‘John Smith’ works on. To express thisquery using the DIVISION operation, proceed as follows. First, retrieve the list ofproject numbers that ‘John Smith’ works on in the intermediate relationSMITH_PNOS:

SMITH ! "Fname=‘John’ AND Lname=‘Smith’(EMPLOYEE)SMITH_PNOS ! #Pno(WORKS_ON Essn=SsnSMITH)

Next, create a relation that includes a tuple <Pno, Essn> whenever the employeewhose Ssn is Essn works on the project whose number is Pno in the intermediaterelation SSN_PNOS:

SSN_PNOS ! #Essn, Pno(WORKS_ON)

Finally, apply the DIVISION operation to the two relations, which gives the desiredemployees’ Social Security numbers:

SSNS(Ssn) ! SSN_PNOS ÷ SMITH_PNOSRESULT ! #Fname, Lname(SSNS * EMPLOYEE)

The preceding operations are shown in Figure 6.8(a).

162 Chapter 6 The Relational Algebra and Relational Calculus

Essn

SSN_PNOS(a)

123456789

123456789

666884444453453453

453453453

333445555333445555333445555333445555999887777

999887777987987987987987987987654321987654321888665555

3

1

2

2

1

2

30

3030

1010

31020

2020

Pno A

R(b)

a1

a2

a3a4

a1

a3a2a3a4a1

a2a3

b1

b1

b1

b2

b1

b2

b4

b4b4

b3b3b3

B

SMITH_PNOS

1

2

Pno

S

a1

a2

a3

A

T

b1

b4

B

SSNS

123456789

453453453

Ssn

Figure 6.8The DIVISION operation. (a) Dividing SSN_PNOS by SMITH_PNOS. (b) T ! R ÷ S.

6.3.4 The DIVISION OperationThe DIVISION operation, denoted by ÷, is useful for a special kind of query thatsometimes occurs in database applications. An example is Retrieve the names ofemployees who work on all the projects that ‘John Smith’ works on. To express thisquery using the DIVISION operation, proceed as follows. First, retrieve the list ofproject numbers that ‘John Smith’ works on in the intermediate relationSMITH_PNOS:

SMITH ! "Fname=‘John’ AND Lname=‘Smith’(EMPLOYEE)SMITH_PNOS ! #Pno(WORKS_ON Essn=SsnSMITH)

Next, create a relation that includes a tuple <Pno, Essn> whenever the employeewhose Ssn is Essn works on the project whose number is Pno in the intermediaterelation SSN_PNOS:

SSN_PNOS ! #Essn, Pno(WORKS_ON)

Finally, apply the DIVISION operation to the two relations, which gives the desiredemployees’ Social Security numbers:

SSNS(Ssn) ! SSN_PNOS ÷ SMITH_PNOSRESULT ! #Fname, Lname(SSNS * EMPLOYEE)

The preceding operations are shown in Figure 6.8(a).

162 Chapter 6 The Relational Algebra and Relational Calculus

Essn

SSN_PNOS(a)

123456789

123456789

666884444453453453

453453453

333445555333445555333445555333445555999887777

999887777987987987987987987987654321987654321888665555

3

1

2

2

1

2

30

3030

1010

31020

2020

Pno A

R(b)

a1

a2

a3a4

a1

a3a2a3a4a1

a2a3

b1

b1

b1

b2

b1

b2

b4

b4b4

b3b3b3

B

SMITH_PNOS

1

2

Pno

S

a1

a2

a3

A

T

b1

b4

B

SSNS

123456789

453453453

Ssn

Figure 6.8The DIVISION operation. (a) Dividing SSN_PNOS by SMITH_PNOS. (b) T ! R ÷ S.

6.3.4 The DIVISION OperationThe DIVISION operation, denoted by ÷, is useful for a special kind of query thatsometimes occurs in database applications. An example is Retrieve the names ofemployees who work on all the projects that ‘John Smith’ works on. To express thisquery using the DIVISION operation, proceed as follows. First, retrieve the list ofproject numbers that ‘John Smith’ works on in the intermediate relationSMITH_PNOS:

SMITH ! "Fname=‘John’ AND Lname=‘Smith’(EMPLOYEE)SMITH_PNOS ! #Pno(WORKS_ON Essn=SsnSMITH)

Next, create a relation that includes a tuple <Pno, Essn> whenever the employeewhose Ssn is Essn works on the project whose number is Pno in the intermediaterelation SSN_PNOS:

SSN_PNOS ! #Essn, Pno(WORKS_ON)

Finally, apply the DIVISION operation to the two relations, which gives the desiredemployees’ Social Security numbers:

SSNS(Ssn) ! SSN_PNOS ÷ SMITH_PNOSRESULT ! #Fname, Lname(SSNS * EMPLOYEE)

The preceding operations are shown in Figure 6.8(a).

162 Chapter 6 The Relational Algebra and Relational Calculus

Essn

SSN_PNOS(a)

123456789

123456789

666884444453453453

453453453

333445555333445555333445555333445555999887777

999887777987987987987987987987654321987654321888665555

3

1

2

2

1

2

30

3030

1010

31020

2020

Pno A

R(b)

a1

a2

a3a4

a1

a3a2a3a4a1

a2a3

b1

b1

b1

b2

b1

b2

b4

b4b4

b3b3b3

B

SMITH_PNOS

1

2

Pno

S

a1

a2

a3

A

T

b1

b4

B

SSNS

123456789

453453453

Ssn

Figure 6.8The DIVISION operation. (a) Dividing SSN_PNOS by SMITH_PNOS. (b) T ! R ÷ S.

6.3.4 The DIVISION OperationThe DIVISION operation, denoted by ÷, is useful for a special kind of query thatsometimes occurs in database applications. An example is Retrieve the names ofemployees who work on all the projects that ‘John Smith’ works on. To express thisquery using the DIVISION operation, proceed as follows. First, retrieve the list ofproject numbers that ‘John Smith’ works on in the intermediate relationSMITH_PNOS:

SMITH ! "Fname=‘John’ AND Lname=‘Smith’(EMPLOYEE)SMITH_PNOS ! #Pno(WORKS_ON Essn=SsnSMITH)

Next, create a relation that includes a tuple <Pno, Essn> whenever the employeewhose Ssn is Essn works on the project whose number is Pno in the intermediaterelation SSN_PNOS:

SSN_PNOS ! #Essn, Pno(WORKS_ON)

Finally, apply the DIVISION operation to the two relations, which gives the desiredemployees’ Social Security numbers:

SSNS(Ssn) ! SSN_PNOS ÷ SMITH_PNOSRESULT ! #Fname, Lname(SSNS * EMPLOYEE)

The preceding operations are shown in Figure 6.8(a).

Figure 6.8(b) illustrates a DIVISION operation where X = {A}, Y = {B}, and Z = {A, B}. Notice that the tuples (values) b

1 and b

4 appear in R in combination with all three tuples in S; that is why they appear in the resulting relation T. All other values of B in R do not appear with all the tuples in S and are not selected: b

2 does not appear with a2, and b

3 does not appear with a

1.

6.3 Binary Relational Operations: JOIN and DIVISION 163

In general, the DIVISION operation is applied to two relations R(Z) ÷ S(X), wherethe attributes of R are a subset of the attributes of S; that is, X ! Z. Let Y be the setof attributes of R that are not attributes of S; that is, Y = Z – X (and hence Z = X "Y ). The result of DIVISION is a relation T(Y) that includes a tuple t if tuples tR appearin R with tR [Y] = t, and with tR [X] = tS for every tuple tS in S. This means that, fora tuple t to appear in the result T of the DIVISION, the values in t must appear in R incombination with every tuple in S. Note that in the formulation of the DIVISIONoperation, the tuples in the denominator relation S restrict the numerator relation Rby selecting those tuples in the result that match all values present in the denomina-tor. It is not necessary to know what those values are as they can be computed byanother operation, as illustrated in the SMITH_PNOS relation in the above example.

Figure 6.8(b) illustrates a DIVISION operation where X = {A}, Y = {B}, and Z = {A,B}. Notice that the tuples (values) b1 and b4 appear in R in combination with allthree tuples in S; that is why they appear in the resulting relation T. All other valuesof B in R do not appear with all the tuples in S and are not selected: b2 does notappear with a2, and b3 does not appear with a1.

The DIVISION operation can be expressed as a sequence of #, $, and – operations asfollows:

T1 % #Y(R)T2 % #Y((S $ T1) – R)T % T1 – T2

The DIVISION operation is defined for convenience for dealing with queries thatinvolve universal quantification (see Section 6.6.7) or the all condition. MostRDBMS implementations with SQL as the primary query language do not directlyimplement division. SQL has a roundabout way of dealing with the type of queryillustrated above (see Section 5.1.4, queries Q3A and Q3B). Table 6.1 lists the variousbasic relational algebra operations we have discussed.

6.3.5 Notation for Query TreesIn this section we describe a notation typically used in relational systems to repre-sent queries internally. The notation is called a query tree or sometimes it is knownas a query evaluation tree or query execution tree. It includes the relational algebraoperations being executed and is used as a possible data structure for the internalrepresentation of the query in an RDBMS.

A query tree is a tree data structure that corresponds to a relational algebra expres-sion. It represents the input relations of the query as leaf nodes of the tree, and rep-resents the relational algebra operations as internal nodes. An execution of thequery tree consists of executing an internal node operation whenever its operands(represented by its child nodes) are available, and then replacing that internal nodeby the relation that results from executing the operation. The execution terminateswhen the root node is executed and produces the result relation for the query.

The DIVISION operation can be expressed as a sequence of π, ×, and – operations as follows:

Page 24: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

24

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 47

The DIVISION Operation }  In general, the DIVISION operation is applied to two

relations R(Z) ÷ S(X), where the attributes of R are a subset of the attributes of S;

}  that is, X ⊆ Z. Let Y be the set of attributes of R that are not attributes of S;

}  The result of DIVISION is a relation T(Y) that includes a tuple t if tuples tR appear in R with tR [Y]=t, and with tR [X]=tS for every tuple tS in S.

}  This means that, for a tuple t to appear in the result T of the DIVISION, the values in t must appear in R in combination with every tuple in S.

Operations of Relational Algebra

Page 25: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

25

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 49

Operations of Relational Algebra 164 Chapter 6 The Relational Algebra and Relational Calculus

Table 6.1 Operations of Relational Algebra

OPERATION PURPOSE NOTATION

SELECT Selects all tuples that satisfy the selection conditionfrom a relation R.

!<selection condition>(R)

PROJECT Produces a new relation with only some of the attrib-utes of R, and removes duplicate tuples.

"<attribute list>(R)

THETA JOIN Produces all combinations of tuples from R1 and R2that satisfy the join condition.

R1 <join condition> R2

EQUIJOIN Produces all the combinations of tuples from R1 andR2 that satisfy a join condition with only equality comparisons.

R1 <join condition> R2, ORR1 (<join attributes 1>),

(<join attributes 2>) R2

NATURAL JOIN Same as EQUIJOIN except that the join attributes of R2are not included in the resulting relation; if the joinattributes have the same names, they do not have to be specified at all.

R1*<join condition> R2,OR R1* (<join attributes 1>),

(<join attributes 2>) R2OR R1 * R2

UNION Produces a relation that includes all the tuples in R1or R2 or both R1 and R2; R1 and R2 must be union compatible.

R1 # R2

INTERSECTION Produces a relation that includes all the tuples in bothR1 and R2; R1 and R2 must be union compatible.

R1 $ R2

DIFFERENCE Produces a relation that includes all the tuples in R1that are not in R2; R1 and R2 must be union compatible.

R1 – R2

CARTESIAN

PRODUCT

Produces a relation that has the attributes of R1 andR2 and includes as tuples all possible combinations oftuples from R1 and R2.

R1 % R2

DIVISION Produces a relation R(X) that includes all tuples t[X]in R1(Z) that appear in R1 in combination with everytuple from R2(Y), where Z = X # Y.

R1(Z) ÷ R2(Y)

Figure 6.9 shows a query tree for Query 2 (see Section 4.3.1): For every projectlocated in ‘Stafford’, list the project number, the controlling department number, andthe department manager’s last name, address, and birth date. This query is specifiedon the relational schema of Figure 3.5 and corresponds to the following relationalalgebra expression:

"Pnumber, Dnum, Lname, Address, Bdate(((!Plocation=‘Stafford’(PROJECT))

Dnum=Dnumber(DEPARTMENT)) Mgr_ssn=Ssn(EMPLOYEE))

In Figure 6.9, the three leaf nodes P, D, and E represent the three relations PROJECT,DEPARTMENT, and EMPLOYEE. The relational algebra operations in the expression

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 50

Operations of Relational Algebra (cont’d.) 164 Chapter 6 The Relational Algebra and Relational Calculus

Table 6.1 Operations of Relational Algebra

OPERATION PURPOSE NOTATION

SELECT Selects all tuples that satisfy the selection conditionfrom a relation R.

!<selection condition>(R)

PROJECT Produces a new relation with only some of the attrib-utes of R, and removes duplicate tuples.

"<attribute list>(R)

THETA JOIN Produces all combinations of tuples from R1 and R2that satisfy the join condition.

R1 <join condition> R2

EQUIJOIN Produces all the combinations of tuples from R1 andR2 that satisfy a join condition with only equality comparisons.

R1 <join condition> R2, ORR1 (<join attributes 1>),

(<join attributes 2>) R2

NATURAL JOIN Same as EQUIJOIN except that the join attributes of R2are not included in the resulting relation; if the joinattributes have the same names, they do not have to be specified at all.

R1*<join condition> R2,OR R1* (<join attributes 1>),

(<join attributes 2>) R2OR R1 * R2

UNION Produces a relation that includes all the tuples in R1or R2 or both R1 and R2; R1 and R2 must be union compatible.

R1 # R2

INTERSECTION Produces a relation that includes all the tuples in bothR1 and R2; R1 and R2 must be union compatible.

R1 $ R2

DIFFERENCE Produces a relation that includes all the tuples in R1that are not in R2; R1 and R2 must be union compatible.

R1 – R2

CARTESIAN

PRODUCT

Produces a relation that has the attributes of R1 andR2 and includes as tuples all possible combinations oftuples from R1 and R2.

R1 % R2

DIVISION Produces a relation R(X) that includes all tuples t[X]in R1(Z) that appear in R1 in combination with everytuple from R2(Y), where Z = X # Y.

R1(Z) ÷ R2(Y)

Figure 6.9 shows a query tree for Query 2 (see Section 4.3.1): For every projectlocated in ‘Stafford’, list the project number, the controlling department number, andthe department manager’s last name, address, and birth date. This query is specifiedon the relational schema of Figure 3.5 and corresponds to the following relationalalgebra expression:

"Pnumber, Dnum, Lname, Address, Bdate(((!Plocation=‘Stafford’(PROJECT))

Dnum=Dnumber(DEPARTMENT)) Mgr_ssn=Ssn(EMPLOYEE))

In Figure 6.9, the three leaf nodes P, D, and E represent the three relations PROJECT,DEPARTMENT, and EMPLOYEE. The relational algebra operations in the expression

164 Chapter 6 The Relational Algebra and Relational Calculus

Table 6.1 Operations of Relational Algebra

OPERATION PURPOSE NOTATION

SELECT Selects all tuples that satisfy the selection conditionfrom a relation R.

!<selection condition>(R)

PROJECT Produces a new relation with only some of the attrib-utes of R, and removes duplicate tuples.

"<attribute list>(R)

THETA JOIN Produces all combinations of tuples from R1 and R2that satisfy the join condition.

R1 <join condition> R2

EQUIJOIN Produces all the combinations of tuples from R1 andR2 that satisfy a join condition with only equality comparisons.

R1 <join condition> R2, ORR1 (<join attributes 1>),

(<join attributes 2>) R2

NATURAL JOIN Same as EQUIJOIN except that the join attributes of R2are not included in the resulting relation; if the joinattributes have the same names, they do not have to be specified at all.

R1*<join condition> R2,OR R1* (<join attributes 1>),

(<join attributes 2>) R2OR R1 * R2

UNION Produces a relation that includes all the tuples in R1or R2 or both R1 and R2; R1 and R2 must be union compatible.

R1 # R2

INTERSECTION Produces a relation that includes all the tuples in bothR1 and R2; R1 and R2 must be union compatible.

R1 $ R2

DIFFERENCE Produces a relation that includes all the tuples in R1that are not in R2; R1 and R2 must be union compatible.

R1 – R2

CARTESIAN

PRODUCT

Produces a relation that has the attributes of R1 andR2 and includes as tuples all possible combinations oftuples from R1 and R2.

R1 % R2

DIVISION Produces a relation R(X) that includes all tuples t[X]in R1(Z) that appear in R1 in combination with everytuple from R2(Y), where Z = X # Y.

R1(Z) ÷ R2(Y)

Figure 6.9 shows a query tree for Query 2 (see Section 4.3.1): For every projectlocated in ‘Stafford’, list the project number, the controlling department number, andthe department manager’s last name, address, and birth date. This query is specifiedon the relational schema of Figure 3.5 and corresponds to the following relationalalgebra expression:

"Pnumber, Dnum, Lname, Address, Bdate(((!Plocation=‘Stafford’(PROJECT))

Dnum=Dnumber(DEPARTMENT)) Mgr_ssn=Ssn(EMPLOYEE))

In Figure 6.9, the three leaf nodes P, D, and E represent the three relations PROJECT,DEPARTMENT, and EMPLOYEE. The relational algebra operations in the expression

Page 26: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

26

Notation for Query Trees

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 52

Notation for Query Trees

}  A query tree is a tree data structure that corresponds to a relational algebra expression.

}  It represents the input relations of the query as leaf nodes of the tree, and represents the relational algebra operations as internal nodes.

}  An execution of the query tree consists of executing an internal node operation whenever its operands (represented by its child nodes) are available, and then replacing that internal node by the relation that results from executing the operation.

}  Query tree }  Represents the input relations of query as leaf nodes of the tree }  Represents the relational algebra operations as internal nodes

Page 27: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

27

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 53

6.4 Additional Relational Operations 165

(1)

(2)

(3)

P.Pnumber,P.Dnum,E.Lname,E.Address,E.Bdate!

D.Mgr_ssn=E.Ssn

P.Dnum=D.Dnumber

" P.Plocation= ‘Stafford’

E

D

P

EMPLOYEE

DEPARTMENT

PROJECT

Figure 6.9Query tree correspondingto the relational algebraexpression for Q2.

are represented by internal tree nodes. The query tree signifies an explicit order ofexecution in the following sense. In order to execute Q2, the node marked (1) inFigure 6.9 must begin execution before node (2) because some resulting tuples ofoperation (1) must be available before we can begin to execute operation (2).Similarly, node (2) must begin to execute and produce results before node (3) canstart execution, and so on. In general, a query tree gives a good visual representationand understanding of the query in terms of the relational operations it uses and isrecommended as an additional means for expressing queries in relational algebra.We will revisit query trees when we discuss query processing and optimization inChapter 19.

6.4 Additional Relational OperationsSome common database requests—which are needed in commercial applicationsfor RDBMSs—cannot be performed with the original relational algebra operationsdescribed in Sections 6.1 through 6.3. In this section we define additional opera-tions to express these requests. These operations enhance the expressive power ofthe original relational algebra.

6.4.1 Generalized ProjectionThe generalized projection operation extends the projection operation by allowingfunctions of attributes to be included in the projection list. The generalized formcan be expressed as:

!F1, F2, ..., Fn (R)

Query Trees }  Query 2: For every project located in ‘Stafford’, list the

project number, the controlling department number, and the department manager’s last name, address, and birth date.

164 Chapter 6 The Relational Algebra and Relational Calculus

Table 6.1 Operations of Relational Algebra

OPERATION PURPOSE NOTATION

SELECT Selects all tuples that satisfy the selection conditionfrom a relation R.

!<selection condition>(R)

PROJECT Produces a new relation with only some of the attrib-utes of R, and removes duplicate tuples.

"<attribute list>(R)

THETA JOIN Produces all combinations of tuples from R1 and R2that satisfy the join condition.

R1 <join condition> R2

EQUIJOIN Produces all the combinations of tuples from R1 andR2 that satisfy a join condition with only equality comparisons.

R1 <join condition> R2, ORR1 (<join attributes 1>),

(<join attributes 2>) R2

NATURAL JOIN Same as EQUIJOIN except that the join attributes of R2are not included in the resulting relation; if the joinattributes have the same names, they do not have to be specified at all.

R1*<join condition> R2,OR R1* (<join attributes 1>),

(<join attributes 2>) R2OR R1 * R2

UNION Produces a relation that includes all the tuples in R1or R2 or both R1 and R2; R1 and R2 must be union compatible.

R1 # R2

INTERSECTION Produces a relation that includes all the tuples in bothR1 and R2; R1 and R2 must be union compatible.

R1 $ R2

DIFFERENCE Produces a relation that includes all the tuples in R1that are not in R2; R1 and R2 must be union compatible.

R1 – R2

CARTESIAN

PRODUCT

Produces a relation that has the attributes of R1 andR2 and includes as tuples all possible combinations oftuples from R1 and R2.

R1 % R2

DIVISION Produces a relation R(X) that includes all tuples t[X]in R1(Z) that appear in R1 in combination with everytuple from R2(Y), where Z = X # Y.

R1(Z) ÷ R2(Y)

Figure 6.9 shows a query tree for Query 2 (see Section 4.3.1): For every projectlocated in ‘Stafford’, list the project number, the controlling department number, andthe department manager’s last name, address, and birth date. This query is specifiedon the relational schema of Figure 3.5 and corresponds to the following relationalalgebra expression:

"Pnumber, Dnum, Lname, Address, Bdate(((!Plocation=‘Stafford’(PROJECT))

Dnum=Dnumber(DEPARTMENT)) Mgr_ssn=Ssn(EMPLOYEE))

In Figure 6.9, the three leaf nodes P, D, and E represent the three relations PROJECT,DEPARTMENT, and EMPLOYEE. The relational algebra operations in the expression

In Figure 6.9, the three leaf nodes P, D, and E represent the three relations PROJECT, DEPARTMENT, and EMPLOYEE. The relational algebra operations in the expression are represented by internal tree nodes. The query tree signifies an explicit order of execution in the following sense. In order to execute Q2, the node marked (1) in Figure 6.9 must begin execution before node (2) because some resulting tuples of operation (1) must be available before we can begin to execute operation (2). Similarly, node (2) must begin to execute and produce results before node (3) can start execution, and so on. In general, a query tree gives a good visual representation and understanding of the query in terms of the relational operations it uses and is recommended as an additional means for expressing queries in relational algebra.

Additional Relational Operations

Page 28: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

28

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 55

Generalized Projection }  Generalized projection: The generalized projection

operation extends the projection operation by allowing functions of attributes to be included in the projection list. The generalized form can be expressed as:

where F1, F2, ..., Fn are functions over the attributes in relation R and may involve arithmetic operations and constant values.

166 Chapter 6 The Relational Algebra and Relational Calculus

where F1, F2, ..., Fn are functions over the attributes in relation R and may involvearithmetic operations and constant values. This operation is helpful when develop-ing reports where computed values have to be produced in the columns of a queryresult.

As an example, consider the relation

EMPLOYEE (Ssn, Salary, Deduction, Years_service)

A report may be required to show

Net Salary = Salary – Deduction,Bonus = 2000 * Years_service, andTax = 0.25 * Salary.

Then a generalized projection combined with renaming may be used as follows:

REPORT ! "(Ssn, Net_salary, Bonus, Tax)(#Ssn, Salary – Deduction, 2000 * Years_service,

0.25 * Salary(EMPLOYEE)).

6.4.2 Aggregate Functions and GroupingAnother type of request that cannot be expressed in the basic relational algebra is tospecify mathematical aggregate functions on collections of values from the data-base. Examples of such functions include retrieving the average or total salary of allemployees or the total number of employee tuples. These functions are used in sim-ple statistical queries that summarize information from the database tuples.Common functions applied to collections of numeric values include SUM,AVERAGE, MAXIMUM, and MINIMUM. The COUNT function is used for countingtuples or values.

Another common type of request involves grouping the tuples in a relation by thevalue of some of their attributes and then applying an aggregate functionindependently to each group. An example would be to group EMPLOYEE tuples byDno, so that each group includes the tuples for employees working in the samedepartment. We can then list each Dno value along with, say, the average salary ofemployees within the department, or the number of employees who work in thedepartment.

We can define an AGGREGATE FUNCTION operation, using the symbol $ (pro-nounced script F)7, to specify these types of requests as follows:

<grouping attributes> $ <function list> (R)

where <grouping attributes> is a list of attributes of the relation specified in R, and<function list> is a list of (<function> <attribute>) pairs. In each such pair,<function> is one of the allowed functions—such as SUM, AVERAGE, MAXIMUM,MINIMUM, COUNT—and <attribute> is an attribute of the relation specified by R. The

7There is no single agreed-upon notation for specifying aggregate functions. In some cases a “script A”is used.

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 56

Aggregate Functions }  Aggregate functions and grouping:

}  Common functions applied to collections of numeric values }  Include SUM, AVERAGE, MAXIMUM, and MINIMUM

}  Examples of such functions include retrieving the average or total salary of all employees or the total number of employee tuples.

}  Group tuples by the value of some of their attributes }  Apply aggregate function independently to each group

}  where <grouping attributes> is a list of attributes of the relation specified in R, and <function list> is a list of (<function> <attribute>) pairs.

166 Chapter 6 The Relational Algebra and Relational Calculus

where F1, F2, ..., Fn are functions over the attributes in relation R and may involvearithmetic operations and constant values. This operation is helpful when develop-ing reports where computed values have to be produced in the columns of a queryresult.

As an example, consider the relation

EMPLOYEE (Ssn, Salary, Deduction, Years_service)

A report may be required to show

Net Salary = Salary – Deduction,Bonus = 2000 * Years_service, andTax = 0.25 * Salary.

Then a generalized projection combined with renaming may be used as follows:

REPORT ! "(Ssn, Net_salary, Bonus, Tax)(#Ssn, Salary – Deduction, 2000 * Years_service,

0.25 * Salary(EMPLOYEE)).

6.4.2 Aggregate Functions and GroupingAnother type of request that cannot be expressed in the basic relational algebra is tospecify mathematical aggregate functions on collections of values from the data-base. Examples of such functions include retrieving the average or total salary of allemployees or the total number of employee tuples. These functions are used in sim-ple statistical queries that summarize information from the database tuples.Common functions applied to collections of numeric values include SUM,AVERAGE, MAXIMUM, and MINIMUM. The COUNT function is used for countingtuples or values.

Another common type of request involves grouping the tuples in a relation by thevalue of some of their attributes and then applying an aggregate functionindependently to each group. An example would be to group EMPLOYEE tuples byDno, so that each group includes the tuples for employees working in the samedepartment. We can then list each Dno value along with, say, the average salary ofemployees within the department, or the number of employees who work in thedepartment.

We can define an AGGREGATE FUNCTION operation, using the symbol $ (pro-nounced script F)7, to specify these types of requests as follows:

<grouping attributes> $ <function list> (R)

where <grouping attributes> is a list of attributes of the relation specified in R, and<function list> is a list of (<function> <attribute>) pairs. In each such pair,<function> is one of the allowed functions—such as SUM, AVERAGE, MAXIMUM,MINIMUM, COUNT—and <attribute> is an attribute of the relation specified by R. The

7There is no single agreed-upon notation for specifying aggregate functions. In some cases a “script A”is used.

Page 29: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

29

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 57

Aggregate Functions }  Example: to retrieve each department number, the

number of employees in the department, and their average salary, while renaming the resulting attributes as indicated below,

6.4 Additional Relational Operations 167

Count_ssn

8 35125

Dno Count_ssn541

431

332503100055000

Average_salary

Average_salary

(b)

(c)

431

332503100055000

(a) Dno541

No_of_employees Average_sal

R

Figure 6.10The aggregate function operation.a. !R(Dno, No_of_employees, Average_sal)(Dno " COUNT Ssn, AVERAGE Salary(EMPLOYEE)).

b. Dno " COUNT Ssn, AVERAGE Salary(EMPLOYEE).

c. " COUNT Ssn, AVERAGE Salary(EMPLOYEE).

resulting relation has the grouping attributes plus one attribute for each element inthe function list. For example, to retrieve each department number, the number ofemployees in the department, and their average salary, while renaming the resultingattributes as indicated below, we write:

!R(Dno, No_of_employees, Average_sal)(Dno " COUNT Ssn, AVERAGE Salary (EMPLOYEE))

The result of this operation on the EMPLOYEE relation of Figure 3.6 is shown inFigure 6.10(a).

In the above example, we specified a list of attribute names—between parenthesesin the RENAME operation—for the resulting relation R. If no renaming is applied,then the attributes of the resulting relation that correspond to the function list willeach be the concatenation of the function name with the attribute name in the form<function>_<attribute>.8 For example, Figure 6.10(b) shows the result of the fol-lowing operation:

Dno " COUNT Ssn, AVERAGE Salary(EMPLOYEE)

If no grouping attributes are specified, the functions are applied to all the tuples inthe relation, so the resulting relation has a single tuple only. For example, Figure6.10(c) shows the result of the following operation:

" COUNT Ssn, AVERAGE Salary(EMPLOYEE)

It is important to note that, in general, duplicates are not eliminated when an aggre-gate function is applied; this way, the normal interpretation of functions such as

8Note that this is an arbitrary notation we are suggesting. There is no standard notation.

6.4 Additional Relational Operations 167

Count_ssn

8 35125

Dno Count_ssn541

431

332503100055000

Average_salary

Average_salary

(b)

(c)

431

332503100055000

(a) Dno541

No_of_employees Average_sal

R

Figure 6.10The aggregate function operation.a. !R(Dno, No_of_employees, Average_sal)(Dno " COUNT Ssn, AVERAGE Salary(EMPLOYEE)).

b. Dno " COUNT Ssn, AVERAGE Salary(EMPLOYEE).

c. " COUNT Ssn, AVERAGE Salary(EMPLOYEE).

resulting relation has the grouping attributes plus one attribute for each element inthe function list. For example, to retrieve each department number, the number ofemployees in the department, and their average salary, while renaming the resultingattributes as indicated below, we write:

!R(Dno, No_of_employees, Average_sal)(Dno " COUNT Ssn, AVERAGE Salary (EMPLOYEE))

The result of this operation on the EMPLOYEE relation of Figure 3.6 is shown inFigure 6.10(a).

In the above example, we specified a list of attribute names—between parenthesesin the RENAME operation—for the resulting relation R. If no renaming is applied,then the attributes of the resulting relation that correspond to the function list willeach be the concatenation of the function name with the attribute name in the form<function>_<attribute>.8 For example, Figure 6.10(b) shows the result of the fol-lowing operation:

Dno " COUNT Ssn, AVERAGE Salary(EMPLOYEE)

If no grouping attributes are specified, the functions are applied to all the tuples inthe relation, so the resulting relation has a single tuple only. For example, Figure6.10(c) shows the result of the following operation:

" COUNT Ssn, AVERAGE Salary(EMPLOYEE)

It is important to note that, in general, duplicates are not eliminated when an aggre-gate function is applied; this way, the normal interpretation of functions such as

8Note that this is an arbitrary notation we are suggesting. There is no standard notation.

Recursive Closure Operations

Page 30: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

30

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 59

Recursive Closure Operations }  Operation applied to a recursive relationship between

tuples of same type

6.4 Additional Relational Operations 169

SUPERVISION

Ssn1 Ssn2123456789333445555999887777987654321666884444453453453987987987888665555

333445555888665555987654321888665555333445555333445555987654321null

(Borg’s Ssn is 888665555)(Ssn) (Super_ssn)

RESULT1

Ssn333445555987654321

(Supervised by Borg)

RESULT

Ssn123456789999887777666884444453453453987987987333445555987654321

(RESULT1 ! RESULT2)

RESULT2

Ssn123456789999887777666884444453453453987987987

(Supervised byBorg’s subordinates) Figure 6.11

A two-level recursivequery.

6.4.4 OUTER JOIN OperationsNext, we discuss some additional extensions to the JOIN operation that are neces-sary to specify certain types of queries. The JOIN operations described earlier matchtuples that satisfy the join condition. For example, for a NATURAL JOIN operation R * S, only tuples from R that have matching tuples in S—and vice versa—appear inthe result. Hence, tuples without a matching (or related) tuple are eliminated fromthe JOIN result. Tuples with NULL values in the join attributes are also eliminated.This type of join, where tuples with no match are eliminated, is known as an innerjoin. The join operations we described earlier in Section 6.3 are all inner joins. Thisamounts to the loss of information if the user wants the result of the JOIN to includeall the tuples in one or more of the component relations.

A set of operations, called outer joins, were developed for the case where the userwants to keep all the tuples in R, or all those in S, or all those in both relations in theresult of the JOIN, regardless of whether or not they have matching tuples in theother relation. This satisfies the need of queries in which tuples from two tables are

168 Chapter 6 The Relational Algebra and Relational Calculus

SUM and AVERAGE is computed.9 It is worth emphasizing that the result of apply-ing an aggregate function is a relation, not a scalar number—even if it has a singlevalue. This makes the relational algebra a closed mathematical system.

6.4.3 Recursive Closure OperationsAnother type of operation that, in general, cannot be specified in the basic originalrelational algebra is recursive closure. This operation is applied to a recursive rela-tionship between tuples of the same type, such as the relationship between anemployee and a supervisor. This relationship is described by the foreign key Super_ssnof the EMPLOYEE relation in Figures 3.5 and 3.6, and it relates each employee tuple (inthe role of supervisee) to another employee tuple (in the role of supervisor). Anexample of a recursive operation is to retrieve all supervisees of an employee e at alllevels—that is, all employees e! directly supervised by e, all employees e!! directlysupervised by each employee e!, all employees e!!! directly supervised by eachemployee e!!, and so on.

It is relatively straightforward in the relational algebra to specify all employeessupervised by e at a specific level by joining the table with itself one or more times.However, it is difficult to specify all supervisees at all levels. For example, to specifythe Ssns of all employees e! directly supervised—at level one—by the employee ewhose name is ‘James Borg’ (see Figure 3.6), we can apply the following operation:

BORG_SSN " #Ssn($Fname=‘James’ AND Lname=‘Borg’(EMPLOYEE))SUPERVISION(Ssn1, Ssn2) " #Ssn,Super_ssn(EMPLOYEE)RESULT1(Ssn) " #Ssn1(SUPERVISION Ssn2=SsnBORG_SSN)

To retrieve all employees supervised by Borg at level 2—that is, all employees e!!supervised by some employee e! who is directly supervised by Borg—we can applyanother JOIN to the result of the first query, as follows:

RESULT2(Ssn) " #Ssn1(SUPERVISION Ssn2=SsnRESULT1)

To get both sets of employees supervised at levels 1 and 2 by ‘James Borg’, we canapply the UNION operation to the two results, as follows:

RESULT " RESULT2 % RESULT1

The results of these queries are illustrated in Figure 6.11. Although it is possible toretrieve employees at each level and then take their UNION, we cannot, in general,specify a query such as “retrieve the supervisees of ‘James Borg’ at all levels” withoututilizing a looping mechanism unless we know the maximum number of levels.10

An operation called the transitive closure of relations has been proposed to computethe recursive relationship as far as the recursion proceeds.

9In SQL, the option of eliminating duplicates before applying the aggregate function is available byincluding the keyword DISTINCT (see Section 4.4.4).10The SQL3 standard includes syntax for recursive closure.

168 Chapter 6 The Relational Algebra and Relational Calculus

SUM and AVERAGE is computed.9 It is worth emphasizing that the result of apply-ing an aggregate function is a relation, not a scalar number—even if it has a singlevalue. This makes the relational algebra a closed mathematical system.

6.4.3 Recursive Closure OperationsAnother type of operation that, in general, cannot be specified in the basic originalrelational algebra is recursive closure. This operation is applied to a recursive rela-tionship between tuples of the same type, such as the relationship between anemployee and a supervisor. This relationship is described by the foreign key Super_ssnof the EMPLOYEE relation in Figures 3.5 and 3.6, and it relates each employee tuple (inthe role of supervisee) to another employee tuple (in the role of supervisor). Anexample of a recursive operation is to retrieve all supervisees of an employee e at alllevels—that is, all employees e! directly supervised by e, all employees e!! directlysupervised by each employee e!, all employees e!!! directly supervised by eachemployee e!!, and so on.

It is relatively straightforward in the relational algebra to specify all employeessupervised by e at a specific level by joining the table with itself one or more times.However, it is difficult to specify all supervisees at all levels. For example, to specifythe Ssns of all employees e! directly supervised—at level one—by the employee ewhose name is ‘James Borg’ (see Figure 3.6), we can apply the following operation:

BORG_SSN " #Ssn($Fname=‘James’ AND Lname=‘Borg’(EMPLOYEE))SUPERVISION(Ssn1, Ssn2) " #Ssn,Super_ssn(EMPLOYEE)RESULT1(Ssn) " #Ssn1(SUPERVISION Ssn2=SsnBORG_SSN)

To retrieve all employees supervised by Borg at level 2—that is, all employees e!!supervised by some employee e! who is directly supervised by Borg—we can applyanother JOIN to the result of the first query, as follows:

RESULT2(Ssn) " #Ssn1(SUPERVISION Ssn2=SsnRESULT1)

To get both sets of employees supervised at levels 1 and 2 by ‘James Borg’, we canapply the UNION operation to the two results, as follows:

RESULT " RESULT2 % RESULT1

The results of these queries are illustrated in Figure 6.11. Although it is possible toretrieve employees at each level and then take their UNION, we cannot, in general,specify a query such as “retrieve the supervisees of ‘James Borg’ at all levels” withoututilizing a looping mechanism unless we know the maximum number of levels.10

An operation called the transitive closure of relations has been proposed to computethe recursive relationship as far as the recursion proceeds.

9In SQL, the option of eliminating duplicates before applying the aggregate function is available byincluding the keyword DISTINCT (see Section 4.4.4).10The SQL3 standard includes syntax for recursive closure.

168 Chapter 6 The Relational Algebra and Relational Calculus

SUM and AVERAGE is computed.9 It is worth emphasizing that the result of apply-ing an aggregate function is a relation, not a scalar number—even if it has a singlevalue. This makes the relational algebra a closed mathematical system.

6.4.3 Recursive Closure OperationsAnother type of operation that, in general, cannot be specified in the basic originalrelational algebra is recursive closure. This operation is applied to a recursive rela-tionship between tuples of the same type, such as the relationship between anemployee and a supervisor. This relationship is described by the foreign key Super_ssnof the EMPLOYEE relation in Figures 3.5 and 3.6, and it relates each employee tuple (inthe role of supervisee) to another employee tuple (in the role of supervisor). Anexample of a recursive operation is to retrieve all supervisees of an employee e at alllevels—that is, all employees e! directly supervised by e, all employees e!! directlysupervised by each employee e!, all employees e!!! directly supervised by eachemployee e!!, and so on.

It is relatively straightforward in the relational algebra to specify all employeessupervised by e at a specific level by joining the table with itself one or more times.However, it is difficult to specify all supervisees at all levels. For example, to specifythe Ssns of all employees e! directly supervised—at level one—by the employee ewhose name is ‘James Borg’ (see Figure 3.6), we can apply the following operation:

BORG_SSN " #Ssn($Fname=‘James’ AND Lname=‘Borg’(EMPLOYEE))SUPERVISION(Ssn1, Ssn2) " #Ssn,Super_ssn(EMPLOYEE)RESULT1(Ssn) " #Ssn1(SUPERVISION Ssn2=SsnBORG_SSN)

To retrieve all employees supervised by Borg at level 2—that is, all employees e!!supervised by some employee e! who is directly supervised by Borg—we can applyanother JOIN to the result of the first query, as follows:

RESULT2(Ssn) " #Ssn1(SUPERVISION Ssn2=SsnRESULT1)

To get both sets of employees supervised at levels 1 and 2 by ‘James Borg’, we canapply the UNION operation to the two results, as follows:

RESULT " RESULT2 % RESULT1

The results of these queries are illustrated in Figure 6.11. Although it is possible toretrieve employees at each level and then take their UNION, we cannot, in general,specify a query such as “retrieve the supervisees of ‘James Borg’ at all levels” withoututilizing a looping mechanism unless we know the maximum number of levels.10

An operation called the transitive closure of relations has been proposed to computethe recursive relationship as far as the recursion proceeds.

9In SQL, the option of eliminating duplicates before applying the aggregate function is available byincluding the keyword DISTINCT (see Section 4.4.4).10The SQL3 standard includes syntax for recursive closure.

OUTER JOIN Operations

Page 31: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

31

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 61

OUTER JOIN Operations }  Outer Joins

}  Keep all tuples in R, or all those in S, or all those in both relations regardless of whether or not they have matching tuples in the other relation

}  Types •  LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER

JOIN

}  Example:

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 62

The OUTER UNION Operation }  Take union of tuples from two relations that have some

common attributes }  Not union (type) compatible

}  Partially compatible }  All tuples from both relations included in the result }  Tut tuples with the same value combination will appear only

once

170 Chapter 6 The Relational Algebra and Relational Calculus

RESULT

Fname Minit Lname DnameJohnFranklinAliciaJenniferRameshJoyceAhmadJames

BTJSKAVE

SmithWongZelayaWallaceNarayanEnglishJabbarBorg

NULLResearchNULLAdministrationNULLNULLNULLHeadquarters

Figure 6.12The result of a LEFTOUTER JOIN opera-tion.

to be combined by matching corresponding rows, but without losing any tuples forlack of matching values. For example, suppose that we want a list of all employeenames as well as the name of the departments they manage if they happen to managea department; if they do not manage one, we can indicate it with a NULL value. Wecan apply an operation LEFT OUTER JOIN, denoted by , to retrieve the result asfollows:

TEMP ! (EMPLOYEE Ssn=Mgr_ssnDEPARTMENT)

RESULT ! "Fname, Minit, Lname, Dname(TEMP)

The LEFT OUTER JOIN operation keeps every tuple in the first, or left, relation R in RS; if no matching tuple is found in S, then the attributes of S in the join result are

filled or padded with NULL values. The result of these operations is shown in Figure6.12.

A similar operation, RIGHT OUTER JOIN, denoted by , keeps every tuple in thesecond, or right, relation S in the result of R S. A third operation, FULL OUTERJOIN, denoted by , keeps all tuples in both the left and the right relations when nomatching tuples are found, padding them with NULL values as needed. The threeouter join operations are part of the SQL2 standard (see Section 5.1.6). These oper-ations were provided later as an extension of relational algebra in response to thetypical need in business applications to show related information from multipletables exhaustively. Sometimes a complete reporting of data from multiple tables isrequired whether or not there are matching values.

6.4.5 The OUTER UNION OperationThe OUTER UNION operation was developed to take the union of tuples from tworelations that have some common attributes, but are not union (type) compatible.This operation will take the UNION of tuples in two relations R(X, Y) and S(X, Z)that are partially compatible, meaning that only some of their attributes, say X, areunion compatible. The attributes that are union compatible are represented onlyonce in the result, and those attributes that are not union compatible from either

Page 32: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

32

Examples of Queries in Relational Algebra

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 64

Examples of Queries in Relational Algebra

6.5 Examples of Queries in Relational Algebra 171

relation are also kept in the result relation T(X, Y, Z). It is therefore the same as aFULL OUTER JOIN on the common attributes.

Two tuples t1 in R and t2 in S are said to match if t1[X]=t2[X]. These will be com-bined (unioned) into a single tuple in t. Tuples in either relation that have nomatching tuple in the other relation are padded with NULL values. For example, anOUTER UNION can be applied to two relations whose schemas are STUDENT(Name,Ssn, Department, Advisor) and INSTRUCTOR(Name, Ssn, Department, Rank). Tuplesfrom the two relations are matched based on having the same combination of valuesof the shared attributes—Name, Ssn, Department. The resulting relation,STUDENT_OR_INSTRUCTOR, will have the following attributes:

STUDENT_OR_INSTRUCTOR(Name, Ssn, Department, Advisor, Rank)

All the tuples from both relations are included in the result, but tuples with the same(Name, Ssn, Department) combination will appear only once in the result. Tuplesappearing only in STUDENT will have a NULL for the Rank attribute, whereas tuplesappearing only in INSTRUCTOR will have a NULL for the Advisor attribute. A tuplethat exists in both relations, which represent a student who is also an instructor, willhave values for all its attributes.11

Notice that the same person may still appear twice in the result. For example, wecould have a graduate student in the Mathematics department who is an instructorin the Computer Science department. Although the two tuples representing thatperson in STUDENT and INSTRUCTOR will have the same (Name, Ssn) values, theywill not agree on the Department value, and so will not be matched. This is becauseDepartment has two different meanings in STUDENT (the department where the per-son studies) and INSTRUCTOR (the department where the person is employed as aninstructor). If we wanted to apply the OUTER UNION based on the same (Name, Ssn)combination only, we should rename the Department attribute in each table to reflectthat they have different meanings and designate them as not being part of theunion-compatible attributes. For example, we could rename the attributes asMajorDept in STUDENT and WorkDept in INSTRUCTOR.

6.5 Examples of Queries in Relational Algebra

The following are additional examples to illustrate the use of the relational algebraoperations. All examples refer to the database in Figure 3.6. In general, the samequery can be stated in numerous ways using the various operations. We will stateeach query in one way and leave it to the reader to come up with equivalent formu-lations.

Query 1. Retrieve the name and address of all employees who work for the‘Research’ department.

11Note that OUTER UNION is equivalent to a FULL OUTER JOIN if the join attributes are all the com-mon attributes of the two relations.

172 Chapter 6 The Relational Algebra and Relational Calculus

RESEARCH_DEPT ! "Dname=‘Research’(DEPARTMENT)RESEARCH_EMPS ! (RESEARCH_DEPT Dnumber=DnoEMPLOYEE)RESULT ! #Fname, Lname, Address(RESEARCH_EMPS)

As a single in-line expression, this query becomes:

#Fname, Lname, Address ("Dname=‘Research’(DEPARTMENT Dnumber=Dno(EMPLOYEE))

This query could be specified in other ways; for example, the order of the JOIN andSELECT operations could be reversed, or the JOIN could be replaced by a NATURALJOIN after renaming one of the join attributes to match the other join attributename.

Query 2. For every project located in ‘Stafford’, list the project number, thecontrolling department number, and the department manager’s last name,address, and birth date.

STAFFORD_PROJS ! "Plocation=‘Stafford’(PROJECT)CONTR_DEPTS ! (STAFFORD_PROJS Dnum=DnumberDEPARTMENT)PROJ_DEPT_MGRS ! (CONTR_DEPTS Mgr_ssn=SsnEMPLOYEE)RESULT ! #Pnumber, Dnum, Lname, Address, Bdate(PROJ_DEPT_MGRS)

In this example, we first select the projects located in Stafford, then join them withtheir controlling departments, and then join the result with the department man-agers. Finally, we apply a project operation on the desired attributes.

Query 3. Find the names of employees who work on all the projects controlledby department number 5.

DEPT5_PROJS ! $(Pno)(#Pnumber("Dnum=5(PROJECT)))EMP_PROJ ! $(Ssn, Pno)(#Essn, Pno(WORKS_ON))RESULT_EMP_SSNS ! EMP_PROJ ÷ DEPT5_PROJSRESULT ! #Lname, Fname(RESULT_EMP_SSNS * EMPLOYEE)

In this query, we first create a table DEPT5_PROJS that contains the project numbersof all projects controlled by department 5. Then we create a table EMP_PROJ thatholds (Ssn, Pno) tuples, and apply the division operation. Notice that we renamedthe attributes so that they will be correctly used in the division operation. Finally, wejoin the result of the division, which holds only Ssn values, with the EMPLOYEEtable to retrieve the desired attributes from EMPLOYEE.

Query 4. Make a list of project numbers for projects that involve an employeewhose last name is ‘Smith’, either as a worker or as a manager of the departmentthat controls the project.

SMITHS(Essn) ! #Ssn ("Lname=‘Smith’(EMPLOYEE))SMITH_WORKER_PROJS ! #Pno(WORKS_ON * SMITHS)MGRS ! #Lname, Dnumber(EMPLOYEE Ssn=Mgr_ssnDEPARTMENT)SMITH_MANAGED_DEPTS(Dnum) ! #Dnumber ("Lname=‘Smith’(MGRS))SMITH_MGR_PROJS(Pno) ! #Pnumber(SMITH_MANAGED_DEPTS * PROJECT)RESULT ! (SMITH_WORKER_PROJS % SMITH_MGR_PROJS)

Page 33: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

33

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 65

Examples of Queries in Relational Algebra (cont’d.)

172 Chapter 6 The Relational Algebra and Relational Calculus

RESEARCH_DEPT ! "Dname=‘Research’(DEPARTMENT)RESEARCH_EMPS ! (RESEARCH_DEPT Dnumber=DnoEMPLOYEE)RESULT ! #Fname, Lname, Address(RESEARCH_EMPS)

As a single in-line expression, this query becomes:

#Fname, Lname, Address ("Dname=‘Research’(DEPARTMENT Dnumber=Dno(EMPLOYEE))

This query could be specified in other ways; for example, the order of the JOIN andSELECT operations could be reversed, or the JOIN could be replaced by a NATURALJOIN after renaming one of the join attributes to match the other join attributename.

Query 2. For every project located in ‘Stafford’, list the project number, thecontrolling department number, and the department manager’s last name,address, and birth date.

STAFFORD_PROJS ! "Plocation=‘Stafford’(PROJECT)CONTR_DEPTS ! (STAFFORD_PROJS Dnum=DnumberDEPARTMENT)PROJ_DEPT_MGRS ! (CONTR_DEPTS Mgr_ssn=SsnEMPLOYEE)RESULT ! #Pnumber, Dnum, Lname, Address, Bdate(PROJ_DEPT_MGRS)

In this example, we first select the projects located in Stafford, then join them withtheir controlling departments, and then join the result with the department man-agers. Finally, we apply a project operation on the desired attributes.

Query 3. Find the names of employees who work on all the projects controlledby department number 5.

DEPT5_PROJS ! $(Pno)(#Pnumber("Dnum=5(PROJECT)))EMP_PROJ ! $(Ssn, Pno)(#Essn, Pno(WORKS_ON))RESULT_EMP_SSNS ! EMP_PROJ ÷ DEPT5_PROJSRESULT ! #Lname, Fname(RESULT_EMP_SSNS * EMPLOYEE)

In this query, we first create a table DEPT5_PROJS that contains the project numbersof all projects controlled by department 5. Then we create a table EMP_PROJ thatholds (Ssn, Pno) tuples, and apply the division operation. Notice that we renamedthe attributes so that they will be correctly used in the division operation. Finally, wejoin the result of the division, which holds only Ssn values, with the EMPLOYEEtable to retrieve the desired attributes from EMPLOYEE.

Query 4. Make a list of project numbers for projects that involve an employeewhose last name is ‘Smith’, either as a worker or as a manager of the departmentthat controls the project.

SMITHS(Essn) ! #Ssn ("Lname=‘Smith’(EMPLOYEE))SMITH_WORKER_PROJS ! #Pno(WORKS_ON * SMITHS)MGRS ! #Lname, Dnumber(EMPLOYEE Ssn=Mgr_ssnDEPARTMENT)SMITH_MANAGED_DEPTS(Dnum) ! #Dnumber ("Lname=‘Smith’(MGRS))SMITH_MGR_PROJS(Pno) ! #Pnumber(SMITH_MANAGED_DEPTS * PROJECT)RESULT ! (SMITH_WORKER_PROJS % SMITH_MGR_PROJS)

172 Chapter 6 The Relational Algebra and Relational Calculus

RESEARCH_DEPT ! "Dname=‘Research’(DEPARTMENT)RESEARCH_EMPS ! (RESEARCH_DEPT Dnumber=DnoEMPLOYEE)RESULT ! #Fname, Lname, Address(RESEARCH_EMPS)

As a single in-line expression, this query becomes:

#Fname, Lname, Address ("Dname=‘Research’(DEPARTMENT Dnumber=Dno(EMPLOYEE))

This query could be specified in other ways; for example, the order of the JOIN andSELECT operations could be reversed, or the JOIN could be replaced by a NATURALJOIN after renaming one of the join attributes to match the other join attributename.

Query 2. For every project located in ‘Stafford’, list the project number, thecontrolling department number, and the department manager’s last name,address, and birth date.

STAFFORD_PROJS ! "Plocation=‘Stafford’(PROJECT)CONTR_DEPTS ! (STAFFORD_PROJS Dnum=DnumberDEPARTMENT)PROJ_DEPT_MGRS ! (CONTR_DEPTS Mgr_ssn=SsnEMPLOYEE)RESULT ! #Pnumber, Dnum, Lname, Address, Bdate(PROJ_DEPT_MGRS)

In this example, we first select the projects located in Stafford, then join them withtheir controlling departments, and then join the result with the department man-agers. Finally, we apply a project operation on the desired attributes.

Query 3. Find the names of employees who work on all the projects controlledby department number 5.

DEPT5_PROJS ! $(Pno)(#Pnumber("Dnum=5(PROJECT)))EMP_PROJ ! $(Ssn, Pno)(#Essn, Pno(WORKS_ON))RESULT_EMP_SSNS ! EMP_PROJ ÷ DEPT5_PROJSRESULT ! #Lname, Fname(RESULT_EMP_SSNS * EMPLOYEE)

In this query, we first create a table DEPT5_PROJS that contains the project numbersof all projects controlled by department 5. Then we create a table EMP_PROJ thatholds (Ssn, Pno) tuples, and apply the division operation. Notice that we renamedthe attributes so that they will be correctly used in the division operation. Finally, wejoin the result of the division, which holds only Ssn values, with the EMPLOYEEtable to retrieve the desired attributes from EMPLOYEE.

Query 4. Make a list of project numbers for projects that involve an employeewhose last name is ‘Smith’, either as a worker or as a manager of the departmentthat controls the project.

SMITHS(Essn) ! #Ssn ("Lname=‘Smith’(EMPLOYEE))SMITH_WORKER_PROJS ! #Pno(WORKS_ON * SMITHS)MGRS ! #Lname, Dnumber(EMPLOYEE Ssn=Mgr_ssnDEPARTMENT)SMITH_MANAGED_DEPTS(Dnum) ! #Dnumber ("Lname=‘Smith’(MGRS))SMITH_MGR_PROJS(Pno) ! #Pnumber(SMITH_MANAGED_DEPTS * PROJECT)RESULT ! (SMITH_WORKER_PROJS % SMITH_MGR_PROJS)

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 66

Examples of Queries in Relational Algebra (cont’d.)

172 Chapter 6 The Relational Algebra and Relational Calculus

RESEARCH_DEPT ! "Dname=‘Research’(DEPARTMENT)RESEARCH_EMPS ! (RESEARCH_DEPT Dnumber=DnoEMPLOYEE)RESULT ! #Fname, Lname, Address(RESEARCH_EMPS)

As a single in-line expression, this query becomes:

#Fname, Lname, Address ("Dname=‘Research’(DEPARTMENT Dnumber=Dno(EMPLOYEE))

This query could be specified in other ways; for example, the order of the JOIN andSELECT operations could be reversed, or the JOIN could be replaced by a NATURALJOIN after renaming one of the join attributes to match the other join attributename.

Query 2. For every project located in ‘Stafford’, list the project number, thecontrolling department number, and the department manager’s last name,address, and birth date.

STAFFORD_PROJS ! "Plocation=‘Stafford’(PROJECT)CONTR_DEPTS ! (STAFFORD_PROJS Dnum=DnumberDEPARTMENT)PROJ_DEPT_MGRS ! (CONTR_DEPTS Mgr_ssn=SsnEMPLOYEE)RESULT ! #Pnumber, Dnum, Lname, Address, Bdate(PROJ_DEPT_MGRS)

In this example, we first select the projects located in Stafford, then join them withtheir controlling departments, and then join the result with the department man-agers. Finally, we apply a project operation on the desired attributes.

Query 3. Find the names of employees who work on all the projects controlledby department number 5.

DEPT5_PROJS ! $(Pno)(#Pnumber("Dnum=5(PROJECT)))EMP_PROJ ! $(Ssn, Pno)(#Essn, Pno(WORKS_ON))RESULT_EMP_SSNS ! EMP_PROJ ÷ DEPT5_PROJSRESULT ! #Lname, Fname(RESULT_EMP_SSNS * EMPLOYEE)

In this query, we first create a table DEPT5_PROJS that contains the project numbersof all projects controlled by department 5. Then we create a table EMP_PROJ thatholds (Ssn, Pno) tuples, and apply the division operation. Notice that we renamedthe attributes so that they will be correctly used in the division operation. Finally, wejoin the result of the division, which holds only Ssn values, with the EMPLOYEEtable to retrieve the desired attributes from EMPLOYEE.

Query 4. Make a list of project numbers for projects that involve an employeewhose last name is ‘Smith’, either as a worker or as a manager of the departmentthat controls the project.

SMITHS(Essn) ! #Ssn ("Lname=‘Smith’(EMPLOYEE))SMITH_WORKER_PROJS ! #Pno(WORKS_ON * SMITHS)MGRS ! #Lname, Dnumber(EMPLOYEE Ssn=Mgr_ssnDEPARTMENT)SMITH_MANAGED_DEPTS(Dnum) ! #Dnumber ("Lname=‘Smith’(MGRS))SMITH_MGR_PROJS(Pno) ! #Pnumber(SMITH_MANAGED_DEPTS * PROJECT)RESULT ! (SMITH_WORKER_PROJS % SMITH_MGR_PROJS)

172 Chapter 6 The Relational Algebra and Relational Calculus

RESEARCH_DEPT ! "Dname=‘Research’(DEPARTMENT)RESEARCH_EMPS ! (RESEARCH_DEPT Dnumber=DnoEMPLOYEE)RESULT ! #Fname, Lname, Address(RESEARCH_EMPS)

As a single in-line expression, this query becomes:

#Fname, Lname, Address ("Dname=‘Research’(DEPARTMENT Dnumber=Dno(EMPLOYEE))

This query could be specified in other ways; for example, the order of the JOIN andSELECT operations could be reversed, or the JOIN could be replaced by a NATURALJOIN after renaming one of the join attributes to match the other join attributename.

Query 2. For every project located in ‘Stafford’, list the project number, thecontrolling department number, and the department manager’s last name,address, and birth date.

STAFFORD_PROJS ! "Plocation=‘Stafford’(PROJECT)CONTR_DEPTS ! (STAFFORD_PROJS Dnum=DnumberDEPARTMENT)PROJ_DEPT_MGRS ! (CONTR_DEPTS Mgr_ssn=SsnEMPLOYEE)RESULT ! #Pnumber, Dnum, Lname, Address, Bdate(PROJ_DEPT_MGRS)

In this example, we first select the projects located in Stafford, then join them withtheir controlling departments, and then join the result with the department man-agers. Finally, we apply a project operation on the desired attributes.

Query 3. Find the names of employees who work on all the projects controlledby department number 5.

DEPT5_PROJS ! $(Pno)(#Pnumber("Dnum=5(PROJECT)))EMP_PROJ ! $(Ssn, Pno)(#Essn, Pno(WORKS_ON))RESULT_EMP_SSNS ! EMP_PROJ ÷ DEPT5_PROJSRESULT ! #Lname, Fname(RESULT_EMP_SSNS * EMPLOYEE)

In this query, we first create a table DEPT5_PROJS that contains the project numbersof all projects controlled by department 5. Then we create a table EMP_PROJ thatholds (Ssn, Pno) tuples, and apply the division operation. Notice that we renamedthe attributes so that they will be correctly used in the division operation. Finally, wejoin the result of the division, which holds only Ssn values, with the EMPLOYEEtable to retrieve the desired attributes from EMPLOYEE.

Query 4. Make a list of project numbers for projects that involve an employeewhose last name is ‘Smith’, either as a worker or as a manager of the departmentthat controls the project.

SMITHS(Essn) ! #Ssn ("Lname=‘Smith’(EMPLOYEE))SMITH_WORKER_PROJS ! #Pno(WORKS_ON * SMITHS)MGRS ! #Lname, Dnumber(EMPLOYEE Ssn=Mgr_ssnDEPARTMENT)SMITH_MANAGED_DEPTS(Dnum) ! #Dnumber ("Lname=‘Smith’(MGRS))SMITH_MGR_PROJS(Pno) ! #Pnumber(SMITH_MANAGED_DEPTS * PROJECT)RESULT ! (SMITH_WORKER_PROJS % SMITH_MGR_PROJS)

Page 34: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

34

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 67

Examples of Queries in Relational Algebra (cont’d.)

172 Chapter 6 The Relational Algebra and Relational Calculus

RESEARCH_DEPT ! "Dname=‘Research’(DEPARTMENT)RESEARCH_EMPS ! (RESEARCH_DEPT Dnumber=DnoEMPLOYEE)RESULT ! #Fname, Lname, Address(RESEARCH_EMPS)

As a single in-line expression, this query becomes:

#Fname, Lname, Address ("Dname=‘Research’(DEPARTMENT Dnumber=Dno(EMPLOYEE))

This query could be specified in other ways; for example, the order of the JOIN andSELECT operations could be reversed, or the JOIN could be replaced by a NATURALJOIN after renaming one of the join attributes to match the other join attributename.

Query 2. For every project located in ‘Stafford’, list the project number, thecontrolling department number, and the department manager’s last name,address, and birth date.

STAFFORD_PROJS ! "Plocation=‘Stafford’(PROJECT)CONTR_DEPTS ! (STAFFORD_PROJS Dnum=DnumberDEPARTMENT)PROJ_DEPT_MGRS ! (CONTR_DEPTS Mgr_ssn=SsnEMPLOYEE)RESULT ! #Pnumber, Dnum, Lname, Address, Bdate(PROJ_DEPT_MGRS)

In this example, we first select the projects located in Stafford, then join them withtheir controlling departments, and then join the result with the department man-agers. Finally, we apply a project operation on the desired attributes.

Query 3. Find the names of employees who work on all the projects controlledby department number 5.

DEPT5_PROJS ! $(Pno)(#Pnumber("Dnum=5(PROJECT)))EMP_PROJ ! $(Ssn, Pno)(#Essn, Pno(WORKS_ON))RESULT_EMP_SSNS ! EMP_PROJ ÷ DEPT5_PROJSRESULT ! #Lname, Fname(RESULT_EMP_SSNS * EMPLOYEE)

In this query, we first create a table DEPT5_PROJS that contains the project numbersof all projects controlled by department 5. Then we create a table EMP_PROJ thatholds (Ssn, Pno) tuples, and apply the division operation. Notice that we renamedthe attributes so that they will be correctly used in the division operation. Finally, wejoin the result of the division, which holds only Ssn values, with the EMPLOYEEtable to retrieve the desired attributes from EMPLOYEE.

Query 4. Make a list of project numbers for projects that involve an employeewhose last name is ‘Smith’, either as a worker or as a manager of the departmentthat controls the project.

SMITHS(Essn) ! #Ssn ("Lname=‘Smith’(EMPLOYEE))SMITH_WORKER_PROJS ! #Pno(WORKS_ON * SMITHS)MGRS ! #Lname, Dnumber(EMPLOYEE Ssn=Mgr_ssnDEPARTMENT)SMITH_MANAGED_DEPTS(Dnum) ! #Dnumber ("Lname=‘Smith’(MGRS))SMITH_MGR_PROJS(Pno) ! #Pnumber(SMITH_MANAGED_DEPTS * PROJECT)RESULT ! (SMITH_WORKER_PROJS % SMITH_MGR_PROJS)

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 68

Examples of Queries in Relational Algebra (cont’d.)

6.5 Examples of Queries in Relational Algebra 173

In this query, we retrieved the project numbers for projects that involve anemployee named Smith as a worker in SMITH_WORKER_PROJS. Then we retrievedthe project numbers for projects that involve an employee named Smith as managerof the department that controls the project in SMITH_MGR_PROJS. Finally, weapplied the UNION operation on SMITH_WORKER_PROJS andSMITH_MGR_PROJS. As a single in-line expression, this query becomes:

!Pno (WORKS_ON Essn=Ssn(!Ssn ("Lname=‘Smith’(EMPLOYEE))) # !Pno((!Dnumber ("Lname=‘Smith’(!Lname, Dnumber(EMPLOYEE)))

Ssn=Mgr_ssnDEPARTMENT)) Dnumber=DnumPROJECT)

Query 5. List the names of all employees with two or more dependents.

Strictly speaking, this query cannot be done in the basic (original) relationalalgebra. We have to use the AGGREGATE FUNCTION operation with the COUNTaggregate function. We assume that dependents of the same employee havedistinct Dependent_name values.

T1(Ssn, No_of_dependents)$ Essn % COUNT Dependent_name(DEPENDENT)T2 $ "No_of_dependents>2(T1)RESULT $ !Lname, Fname(T2 * EMPLOYEE)

Query 6. Retrieve the names of employees who have no dependents.

This is an example of the type of query that uses the MINUS (SET DIFFERENCE)operation.

ALL_EMPS $ !Ssn(EMPLOYEE)EMPS_WITH_DEPS(Ssn) $ !Essn(DEPENDENT)EMPS_WITHOUT_DEPS $ (ALL_EMPS – EMPS_WITH_DEPS)RESULT $ !Lname, Fname(EMPS_WITHOUT_DEPS * EMPLOYEE)

We first retrieve a relation with all employee Ssns in ALL_EMPS. Then we create atable with the Ssns of employees who have at least one dependent inEMPS_WITH_DEPS. Then we apply the SET DIFFERENCE operation to retrieveemployees Ssns with no dependents in EMPS_WITHOUT_DEPS, and finally join thiswith EMPLOYEE to retrieve the desired attributes. As a single in-line expression, thisquery becomes:

!Lname, Fname((!Ssn(EMPLOYEE) – &Ssn(!Essn(DEPENDENT))) * EMPLOYEE)

Query 7. List the names of managers who have at least one dependent.

MGRS(Ssn) $ !Mgr_ssn(DEPARTMENT)EMPS_WITH_DEPS(Ssn) $ !Essn(DEPENDENT)MGRS_WITH_DEPS $ (MGRS ' EMPS_WITH_DEPS)RESULT $ !Lname, Fname(MGRS_WITH_DEPS * EMPLOYEE)

In this query, we retrieve the Ssns of managers in MGRS, and the Ssns of employeeswith at least one dependent in EMPS_WITH_DEPS, then we apply the SET INTERSECTION operation to get the Ssns of managers who have at least onedependent.

Page 35: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

35

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 69

Examples of Queries in Relational Algebra (cont’d.)

6.5 Examples of Queries in Relational Algebra 173

In this query, we retrieved the project numbers for projects that involve anemployee named Smith as a worker in SMITH_WORKER_PROJS. Then we retrievedthe project numbers for projects that involve an employee named Smith as managerof the department that controls the project in SMITH_MGR_PROJS. Finally, weapplied the UNION operation on SMITH_WORKER_PROJS andSMITH_MGR_PROJS. As a single in-line expression, this query becomes:

!Pno (WORKS_ON Essn=Ssn(!Ssn ("Lname=‘Smith’(EMPLOYEE))) # !Pno((!Dnumber ("Lname=‘Smith’(!Lname, Dnumber(EMPLOYEE)))

Ssn=Mgr_ssnDEPARTMENT)) Dnumber=DnumPROJECT)

Query 5. List the names of all employees with two or more dependents.

Strictly speaking, this query cannot be done in the basic (original) relationalalgebra. We have to use the AGGREGATE FUNCTION operation with the COUNTaggregate function. We assume that dependents of the same employee havedistinct Dependent_name values.

T1(Ssn, No_of_dependents)$ Essn % COUNT Dependent_name(DEPENDENT)T2 $ "No_of_dependents>2(T1)RESULT $ !Lname, Fname(T2 * EMPLOYEE)

Query 6. Retrieve the names of employees who have no dependents.

This is an example of the type of query that uses the MINUS (SET DIFFERENCE)operation.

ALL_EMPS $ !Ssn(EMPLOYEE)EMPS_WITH_DEPS(Ssn) $ !Essn(DEPENDENT)EMPS_WITHOUT_DEPS $ (ALL_EMPS – EMPS_WITH_DEPS)RESULT $ !Lname, Fname(EMPS_WITHOUT_DEPS * EMPLOYEE)

We first retrieve a relation with all employee Ssns in ALL_EMPS. Then we create atable with the Ssns of employees who have at least one dependent inEMPS_WITH_DEPS. Then we apply the SET DIFFERENCE operation to retrieveemployees Ssns with no dependents in EMPS_WITHOUT_DEPS, and finally join thiswith EMPLOYEE to retrieve the desired attributes. As a single in-line expression, thisquery becomes:

!Lname, Fname((!Ssn(EMPLOYEE) – &Ssn(!Essn(DEPENDENT))) * EMPLOYEE)

Query 7. List the names of managers who have at least one dependent.

MGRS(Ssn) $ !Mgr_ssn(DEPARTMENT)EMPS_WITH_DEPS(Ssn) $ !Essn(DEPENDENT)MGRS_WITH_DEPS $ (MGRS ' EMPS_WITH_DEPS)RESULT $ !Lname, Fname(MGRS_WITH_DEPS * EMPLOYEE)

In this query, we retrieve the Ssns of managers in MGRS, and the Ssns of employeeswith at least one dependent in EMPS_WITH_DEPS, then we apply the SET INTERSECTION operation to get the Ssns of managers who have at least onedependent.

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 70

Examples of Queries in Relational Algebra (cont’d.)

6.5 Examples of Queries in Relational Algebra 173

In this query, we retrieved the project numbers for projects that involve anemployee named Smith as a worker in SMITH_WORKER_PROJS. Then we retrievedthe project numbers for projects that involve an employee named Smith as managerof the department that controls the project in SMITH_MGR_PROJS. Finally, weapplied the UNION operation on SMITH_WORKER_PROJS andSMITH_MGR_PROJS. As a single in-line expression, this query becomes:

!Pno (WORKS_ON Essn=Ssn(!Ssn ("Lname=‘Smith’(EMPLOYEE))) # !Pno((!Dnumber ("Lname=‘Smith’(!Lname, Dnumber(EMPLOYEE)))

Ssn=Mgr_ssnDEPARTMENT)) Dnumber=DnumPROJECT)

Query 5. List the names of all employees with two or more dependents.

Strictly speaking, this query cannot be done in the basic (original) relationalalgebra. We have to use the AGGREGATE FUNCTION operation with the COUNTaggregate function. We assume that dependents of the same employee havedistinct Dependent_name values.

T1(Ssn, No_of_dependents)$ Essn % COUNT Dependent_name(DEPENDENT)T2 $ "No_of_dependents>2(T1)RESULT $ !Lname, Fname(T2 * EMPLOYEE)

Query 6. Retrieve the names of employees who have no dependents.

This is an example of the type of query that uses the MINUS (SET DIFFERENCE)operation.

ALL_EMPS $ !Ssn(EMPLOYEE)EMPS_WITH_DEPS(Ssn) $ !Essn(DEPENDENT)EMPS_WITHOUT_DEPS $ (ALL_EMPS – EMPS_WITH_DEPS)RESULT $ !Lname, Fname(EMPS_WITHOUT_DEPS * EMPLOYEE)

We first retrieve a relation with all employee Ssns in ALL_EMPS. Then we create atable with the Ssns of employees who have at least one dependent inEMPS_WITH_DEPS. Then we apply the SET DIFFERENCE operation to retrieveemployees Ssns with no dependents in EMPS_WITHOUT_DEPS, and finally join thiswith EMPLOYEE to retrieve the desired attributes. As a single in-line expression, thisquery becomes:

!Lname, Fname((!Ssn(EMPLOYEE) – &Ssn(!Essn(DEPENDENT))) * EMPLOYEE)

Query 7. List the names of managers who have at least one dependent.

MGRS(Ssn) $ !Mgr_ssn(DEPARTMENT)EMPS_WITH_DEPS(Ssn) $ !Essn(DEPENDENT)MGRS_WITH_DEPS $ (MGRS ' EMPS_WITH_DEPS)RESULT $ !Lname, Fname(MGRS_WITH_DEPS * EMPLOYEE)

In this query, we retrieve the Ssns of managers in MGRS, and the Ssns of employeeswith at least one dependent in EMPS_WITH_DEPS, then we apply the SET INTERSECTION operation to get the Ssns of managers who have at least onedependent.

Page 36: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

36

The Tuple Relational Calculus

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 72

The Tuple Relational Calculus

}  Declarative expression }  Specify a retrieval request nonprocedural language

}  Any retrieval that can be specified in basic relational algebra }  Can also be specified in relational calculus

Page 37: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

37

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 73

Tuple Variables and Range Relations

}  Tuple variables }  Ranges over a particular database relation

}  Satisfy COND(t): }  Specify:

}  Range relation R of t }  Select particular combinations of tuples }  Set of attributes to be retrieved (requested attributes)

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 74

Expressions and Formulas in Tuple Relational Calculus

}  General expression of tuple relational calculus is of the form:

}  Truth value of an atom }  Evaluates to either TRUE or FALSE for a specific combination

of tuples

}  Formula (Boolean condition) }  Made up of one or more atoms connected via logical

operators AND, OR, and NOT

Page 38: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

38

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 75

Existential and Universal Quantifiers

}  Universal quantifier (∀) }  Existential quantifier (∃) }  Define a tuple variable in a formula as free or bound

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 76

Sample Queries in Tuple Relational Calculus

Page 39: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

39

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 77

Notation for Query Graphs

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 78

Transforming the Universal and Existential Quantifiers

}  Transform one type of quantifier into other with negation (preceded by NOT) }  AND and OR replace one another }  Negated formula becomes unnegated }  Unnegated formula becomes negated

Page 40: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

40

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 79

Using the Universal Quantifier in Queries

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 80

Safe Expressions

}  Guaranteed to yield a finite number of tuples as its result

}  Otherwise expression is called unsafe

}  Expression is safe }  If all values in its result are from the domain of the expression

Page 41: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

41

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 81

The Domain Relational Calculus

}  Differs from tuple calculus in type of variables used in formulas }  Variables range over single values from domains of attributes

}  Formula is made up of atoms }  Evaluate to either TRUE or FALSE for a specific set of values

•  Called the truth values of the atoms

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 82

The Domain Relational Calculus (cont’d.)

}  QBE language }  Based on domain relational calculus

Page 42: Advanced Databases (SE487)...3/28/14 3 SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education

3/28/14

42

SE487: Advanced Databases by Dr. Anis Koubaa Textbook: Fundamentals of Database Systems, 6th Edition Elmasri & Navathe, (c) Pearson Education - Addison-Wesley, 2011 83

Summary

}  Formal languages for relational model of data: }  Relational algebra: operations, unary and binary operators }  Some queries cannot be stated with basic relational algebra

operations •  But are important for practical use

}  Relational calculus }  Based predicate calculus


Recommended