Lecture 3 Relational Model: Relational Algebra and...

Post on 06-May-2018

231 views 5 download

transcript

Lecture 3

Relational Model:

Relational Algebra and

Relational Calculus

• T. Connolly, and C. Begg, “Database Systems: A Practical Approach to Design, Implementation, and Management”, 5th edition,

Addison-Wesley, 2009. ISBN: 0-321-60110-6, ISBN-13: 978-0-321-60110-0 (International Edition).

• T. Connolly, and C. Begg, “Database Systems: A Practical Approach to Design, Implementation, and Management”, 4th edition,

Addison-Wesley, 2004. ISBN: 0-321-21025-5.

• R. Elmasri and S. B. Navathe, “Fundamentals of Database Systems”, 5th ed., Pearson, 2007, ISBN: 0-321-41506-X.

Walailak University

ITM-661 ระบบฐานขอ้มลู (Database system)

Walailak - 2013

Objectives Overview of the relational model, e.g., how tables

represent data.

The connection between mathematical relations and

relations in the relational model.

Properties of database relations

How to identify candidate, primary, and foreign keys.

The meaning of entity integrity and referential integrity.

How to form queries in relational algebra.

How relational calculus queries are expressed.

The purpose and advantages of views in relational

systems.

2

An Example (Branch and Staff Relations)

3

• A relation is a table with

columns and rows. Only applies to logical structure of

the database, not the physical

structure.

• An attribute is a named

column of a relation.

• A domain is a set of

allowable values for one or

more attributes.

• A tuple is a row of a

relation.

• A degree is a number of

attributes in a relation.

• A cardinality is a number of

tuples in a relation.

• A relational database is a

collection of normalized

relations.

Terminology for Relational Model and Attribute Domain

4

Terminology

Attribute Domain

Mathematical Relations (I) Mathematical definition of relation

Consider two sets D1 = {2, 4} and D2 = {1, 3, 5}.

Cartesian product is D1D2, the set of all ordered pairs, 1st element is member of D1 and 2nd element is member of D2.

Alternative way is to find all combinations of elements with first from D1 and second from D2.

D1D2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)}

Another example: D3 = {“Mr. X” , “Ms. Y”} and

D4 = { “M”, “F” } (i.e., M=Male, F=Female)

D3D4 = { (“Mr. X”, “M”), (“Mr. X”, “F”),

(“Ms. Y”, “M”), (“Ms. Y”, “F”) }

5

Mathematical Relations (II) Any subset of Cartesian product is a relation.

R = {(2, 1), (4, 1)}

May specify which pairs are in relation using some condition for selection. For example, the second element is 1 R = {(x, y) | x D1, y D2, and y = 1}

Using same sets, form another relation S, where first element is always twice the second. S = {(x, y) | x D1, y D2, and x = 2y}

Only one ordered pair in the Cartesian Product satisfies this condition. S = {(2, 1)}

6

Mathematical Relations (III) Consider three sets D1, D2, and D3 with Cartesian

Product D1D2D3. For example

D1 = {1, 3} , D2 = {2, 4} , D3 = {5, 6}

D1D2D3 = {(1,2,5), (1,2,6), (1,4,5), (1,4,6), (3,2,5),

(3,2,6), (3,4,5), (3,4,6)}

Any subset of these ordered triples is a relation. T = {(x, y, z) | x D1, y D2, z D3 and y = 2x

and z = 3y}

T = { (1, 2, 6) }

7

Mathematical Relations (IV)

8

To define a general relation on n domains…let D1, D2, . . ., Dn be n sets with Cartesian product defined as

D1 D2 . . . Dn = {(d1, d2, . . . , dn) | d1 D1, d2 D2, . . . , dn Dn} usually written as

In defining relations we specify the sets, or domains, from which we chose values.

i

n

i

D1

Database Relations and Their Properties Relation schema

Named relation defined by a set of attribute and domain name pairs.

Relational database schema Set of relation schemas, each with a distinct name.

Relation name is distinct from all other relations.

Each cell of relation contains exactly one atomic (single) value.

Each attribute has a distinct name.

Values of an attribute are all from the same domain.

Order of attributes has no significance.

Each tuple is distinct; there are no duplicate tuples.

Order of tuples has no significance, theoretically.

9

Relational Keys (I) Superkey

An attribute or a set of attributes that uniquely

identifies a tuple within a relation.

Candidate Key

A superkey (K) such that no proper subset is a

superkey within the relation.

In each tuple of R, the values of K uniquely

identify that tuple (uniqueness).

No proper subset of K has the uniqueness property

(irreducicility).

10

Relational Keys (II) Primary Key

Candidate key selected to identify tuples uniquely

within relation.

Alternate Keys

Candidate keys that are not selected to be the

primary key.

Foreign Key

An attribute or set of attributes within one relation

that matches candidate key of some (possibly

same) relation.

11

Relational Integrity (I) Null

Represents a value for an attribute that is currently

unknown or is not applicable for this tuple.

Deals with incomplete or exceptional data.

represents the absence of a value and is not the

same as zero or spaces, which are values.

12

Relational Integrity (II) Entity Integrity

In a base relation, no attribute of a primary key can be null.

Referential Integrity

If foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in its home relation or foreign key value must be wholly null.

Enterprise Constraints

Additional rules specified by users or database administrators.

13

Relational Algebra and Calculus Relational Algebra

Unary Relational Operations

Relational Algebra Operations From Set Theory

Binary Relational Operations

Additional Relational Operations

Examples of Queries in Relational Algebra

Relational Calculus

Tuple Relational Calculus

Domain Relational Calculus

14

Relational Algebra and Calculus Relational algebra and relational calculus are formal

languages associated with the relational model.

Informally,

Relational algebra is a (high-level) procedural language and

Relational calculus a non-procedural language.

However, formally both are equivalent to one

another.

A language that produces a relation that can be

derived using relational calculus is relationally

complete.

15

Relational Algebra Relational algebra operations work on one or

more relations to define another relation

without changing the original relations.

Thus, both operands and results are relations,

so output from one operation can become input

to another operation.

This allows expressions to be nested, just as in

arithmetic. This property is called closure.

16

Relational Algebra (Overview) Relational Algebra consists of several groups of operations

Unary Relational Operations

SELECT (symbol: σ (sigma))

PROJECT (symbol: π (pi))

RENAME (symbol: ρ (rho))

Relational Algebra Operations From Set Theory

UNION ( ∪ ), INTERSECTION ( ∩ ), DIFFERENCE (or MINUS, – )

CARTESIAN PRODUCT ( x )

Binary Relational Operations

JOIN (several variations of JOIN exist)

DIVISION

Additional Relational Operations

OUTER JOINS, OUTER UNION

AGGREGATE FUNCTIONS These compute summary of information: for example, SUM, COUNT,

AVG, MIN, MAX

17

Relational Algebra There are 5 basic operations, in relational algebra, that

performs most of the data retrieval operations needed.

Selection

Projection

Cartesian Product

Union

Set Difference

Also operations that can be expressed by 5 basic operations.

Join

Intersection

Division

18

Relational Algebra Operations

19

Relational Algebra Operations

20

An Example (Home Rental Database)

21

An Example (Home Rental Database)

22

An Example (Home Rental Database)

23

Selection (or Restriction)

24

spredicate(R)

Selection operation works on a single relation R and defines a

relation that contains only those tuples (rows) of R that satisfy

the specified condition (predicate).

Ex.: List all staff with a salary greater than 10,000.

ssalary> 10000 (Staff)

Projection

25

Pcol1, . . . , coln(R)

Projection operation works on a single relation R and defines a

relation that contains a vertical subset of R, extracting the values of

specified attributes and eliminating duplicates.

Ex.: Produce a list of salaries for all staff, showing only the

StaffNo, fName, lName, and salary details.

PstaffNo, fName, lName, salary(Staff)

Cartesian Product R S

– The Cartesian product operation defines a relation that is the concatenation of every tuple of relation R with every tuple of relation S.

– Ex.: List the names and comments of all renters who have viewed a property.

26

(PclientNo, fName, lName(Client))(PclientNo, propertyNo,comment (Viewing))

Example - Cartesian Product and Selection

27

Use selection operation to extract those tuples

where Renter.Rno = Viewing.Rno.

sClient.clientNo=Viewing.clientNo

((PclientNo,fName,lName(Client))(PclientNo,propertyNo,comment(Viewing)))

Note that: Cartesian product and Selection can be reduced to a

single operation called a join.

Union

28

R S Union of two relations R and S defines a

relation that contains all the tuples of R, or

S, or both R and S, duplicate tuples being

eliminated.

R and S must be union-compatible.

If R and S have I and J tuples, respectively,

union is obtained by concatenating them into

one relation with a maximum of (I + J) tuples.

List all cities where there is either a branch

office or a property for rent.

Pcity(Branch) Pcity (PropertyForRent)

Set Difference

29

R – S Define a relation consisting of the tuples that are in relation

R, but not in S.

R and S must be union-compatible.

List all cities where there is a branch office but no

properties for rent.

Pcity (Branch) – Pcity (PropertyForRent)

Join Operations Join is a derivative of Cartesian product.

Equivalent to performing a selection, using the

join predicate as the selection formula, over

the Cartesian product of the two operand

relations.

One of the most difficult operations to

implement efficiently in a relational DBMS

and one of the reasons why RDBMSs have

intrinsic performance problems.

30

Join Operations There are various forms of join operation

Theta-join

Equi-join (a particular type of theta-join)

Natural join

Outer join

Semi-join

31

Theta-join (q-join) R F S

Defines a relation that contains tuples

satisfying the predicate F from the Cartesian

product of R and S.

The predicate F is of the form R.ai q S.bi

where q may be one of the comparison

operators (<, < =, >, > =, =, ~ =).

32

Theta-join (q-join) We can rewrite the theta-join in terms of the

basic Selection and Cartesian product

operations.

R F S = sF(R S)

Degree of a theta-join is sum of the degrees of

the operand relations R and S. If predicate F

contains only equality (=), the term equi-join is

used.

33

Example - Equi-join

34

List the names and comments of all clients

who have viewed a property for rent.

(PclientNo, fName, lName(Client)) Client.clientNo = Viewing.clientNo

(PclientNo, propertyNo, comment(Viewing))

Natural Join

35

R S Natural join is an equi-join of the two relations R and S over all

common attributes x. One occurrence of each common

attribute is eliminated from the result.

List the names and comments of all clients who have viewed a

property for rent.

(PclientNo, fName, lName(Client)) (PclientNo, propertyNo, comment(Viewing))

Outer Join Often in joining two relations, there is no

matching value in the join columns. To display

rows in the result that do not have matching

values in the join column, we use the outer

join.

R S

The (left) outer join is a join in which tuples from

R that do not have matching values in the common

columns of S are also included in the result

relation.

36

Example - Left Outer Join

37

Produce a status report on property viewings.

PpropertyNo, street, city(PropertyForRent) Viewing

Semi-join

38

R F S The semi-join operation defines a relation that contains the tuples

of R that participate in the join of R with S.

Can rewrite Semijoin using Projection and Join:

R F S = PA(R F S)

List complete details of all staff who work at the branch in

Partick.

Staff Staff.branchNo = Branch.branchNo and Branch.city = ‘Glasgow’ Branch

Intersection

39

R S

The intersection operation consists of the set

of all tuples that are in both R and S.

R and S must be union-compatible.

Expressed using basic operations

R S = R – (R – S)

List all cities where there is both a branch

office and at least one property for rent.

Pcity(Branch) Pcity(PropertyForRent)

Division R S

The division operation consists of the set of tuples

from R defined over the attributes C that match the

combination of every tuple in S.

Expressed using basic operations

T1 = PC(R)

T2 = PC(( ST1) – R)

T = T1 – T2

40

Example - Division

41

Identify all clients who have viewed all

properties with three rooms.

(PclientNo, propertyNo(Viewing))

(PpropertyNo(srooms = 3 (PropertyForRent)))

Relational Algebra - Aggregate Function

Aggregate Function Operation

MAX Salary (EMPLOYEE) retrieves the maximum salary

value from the EMPLOYEE relation

MIN Salary (EMPLOYEE) retrieves the minimum Salary

value from the EMPLOYEE relation

SUM Salary (EMPLOYEE) retrieves the sum of the Salary

from the EMPLOYEE relation

COUNT SSN, AVERAGE Salary (EMPLOYEE) computes the

count (number) of employees and their average salary

Note: count just counts the number of rows, without

removing duplicates

42

Relational Algebra - Aggregate Function

43

Group by Dno

Relational Calculus Relational calculus query specifies what is to be

retrieved rather than how to retrieve it.

No description of how to evaluate a query.

In first-order logic (or predicate calculus), predicate

is a truth-valued function with arguments.

When we substitute values for the arguments,

function yields an expression, called a proposition,

which can be either true or false.

When applied to databases, relational calculus is in

two forms: tuple-oriented and domain-oriented.

44

Relational Calculus If a predicate contains a variable, as in ‘x is

a member of staff’, there must be a range

for x.

When we substitute some values of this

range for x, the proposition may be true; for

other values, it may be false.

If P is a predicate, then we write the set of

all x such that P is true for x, as {x | P(x)}

Predicates can be connected using (AND), (OR),

and ~ (NOT) 45

Tuple-oriented Relational Calculus Interested in finding tuples for which a predicate is

true. Based on use of tuple variables.

Tuple variable is a variable that ‘ranges over’ a

named relation: i.e., variable whose only permitted

values are tuples of the relation.

Specify range of a tuple variable S as the Staff

relation as:

Staff(S)

To find set of all tuples S such that P(S) is true:

{S | P(S)}

46

Tuple-oriented Relational Calculus (Examples and quantifiers)

Examples of tuple-oriented relational calculus To find details of all staffs earning more than £10,000:

{S | Staff(S) S.salary > 10000}

To find a particular attribute, such as salary, write:

{S.salary | Staff(S) S.salary > 10000}

Can use two quantifiers to tell how many instances the predicate applies to: Existential quantifier $ (‘there exists’)

Universal quantifier " (‘for all’)

Tuple variables qualified by " or $ are called bound variables, otherwise called free variables.

47

Existential quantifier (Tuple-oriented Relational Calculus)

Existential quantifier used in formulae that must

be true for at least one instance, such as:

{ S | Staff(S) ($B)(Branch(B)

(B.branchNo=S.branchNo)

B.city = ‘London’) }

Means ‘There exists a Branch tuple with same

branchNo as the branchNo of the current Staff

tuple, S, and is located in London’.

48

Universal quantifier (Tuple-oriented Relational Calculus)

Universal quantifier is used in statements

about every instance, such as:

("B) (B.city ‘Paris’)

Means ‘For all Branch tuples, the address is

not in Paris’.

Can also use ~($B) (B.city=‘Paris’) which means ‘There are no branches with an address in Paris’.

49

Tuple-oriented Relational Calculus Formulae should be unambiguous and make sense.

A (well-formed) formula is made out of atoms:

R(Si), where Si is a tuple variable and R is a relation

Si.a1 q Sj.a2

Si.a1 q c

Can recursively build up formulae from atoms:

An atom is a formula

If F1 and F2 are formulae, so are their conjunction, F1 F2;

disjunction, F1 F2; and negation, ~F1

If F is a formula with free variable X, then ($X)(F) and

("X)(F) are also formulae.

50

{ S1.a1, S2.a2, …, Sn.an | F(S1, S2, …, Sn) } General form

q is one of comparison operators (<,>, so on)

c is a constant.

Tuple-oriented Relational Calculus (An example – I)

List the names of all managers who earn more

than £25,000.

{S.fName, S.lName | Staff(S)

S.position = ‘Manager’ S.salary > 25000}

List the staff who manage properties for rent in

Glasgow.

{S | Staff(S) ($P) (PropertyForRent(P)

(P.staffNo = S.staffNo) P.city = ‘Glasgow’)}

51

Tuple-oriented Relational Calculus (An example – II)

List the names of staff who currently do not

manage any properties.

{S.fName, S.lName | Staff(S) (~($P)

(PropertyForRent(P)(S.staffNo = P.staffNo)))}

Or

{S.fName, S.lName | Staff(S) (("P)

(~PropertyForRent(P) ~(S.staffNo = P.staffNo)))}

52

Tuple-oriented Relational Calculus (An example – III)

List the names of clients who have viewed a

property for rent in Glasgow.

{C.fName, C.lName | Client(C)

(($V)($P) (Viewing(V) PropertyForRent(P)

(C.clientNo = V.clientNo)

(V.propertyNo=P.propertyNo)

P.city=‘Glasgow’ ) ) }

53

Tuple-oriented Relational Calculus (An example – IV)

Expressions can generate an infinite set. For

example: {S | ~Staff(S)}

To avoid this, add restriction that all values in

result must be values in the domain of the

expression.

54

Domain Relational Calculus

Domain-oriented Relational Calculus

Uses variables that take values from domains

instead of tuples of relations.

If F(d1, d2, . . . , dn) stands for a formula

composed of atoms and d1, d2, . . . , dn

represent domain variables, then:

{d1, d2, . . . , dn | F(d1, d2, . . . , dn)}

is a general domain relational calculus expression.

55

Domain-oriented Relational Calculus (An example – I)

Find the names of all managers who earn more than £25,000.

{fN, lN | ($sN, posn, sex, DOB, sal, bN)

(Staff (sN, fN, lN, posn, sex, DOB, sal, bN)

posn = ‘Manager’ sal > 25000)}

List the staff who manage properties for rent in Glasgow.

{sN, fN, lN, posn, sex, DOB, sal, bN |

($sN1,cty)(Staff(sN,fN,lN,posn,sex,DOB,sal,bN)

PropertyForRent(pN, st, cty, pc, typ, rms, rnt, oN, sN1, bN1)

(sN=sN1) cty=‘Glasgow’)}

56

Domain-oriented Relational Calculus (An example – II) List the names of staff who currently do not manage any

properties for rent.

{fN, lN | ($sN)

(Staff(sN,fN,lN,posn,sex,DOB,sal,bN)

(~($sN1) (PropertyForRent(pN, st, cty, pc, typ,

rms, rnt, oN, sN1, bN1) (sN=sN1))))}

List the names of clients who have viewed a property for rent in Glasgow.

{fN, lN | ($cN, cN1, pN, pN1, cty)

(Client(cN, fN, lN,tel, pT, mR) Viewing(cN1, pN1, dt, cmt)

PropertyForRent(pN, st, cty, pc, typ, rms, rnt,oN, sN, bN)

(cN = cN1) (pN = pN1) cty = ‘Glasgow’)}

57

Domain-oriented Relational Calculus

When domain relational calculus is restricted

to safe expressions, it is equivalent to tuple

relational calculus restricted to safe

expressions, which is equivalent to relational

algebra.

This means every relational algebra expression

has an equivalent relational calculus

expression, and vice versa.

58

Other Languages Transform-oriented languages are non-procedural

languages that use relations to transform input data into outputs (e.g. SQL).

Graphical languages provide the user with a picture or illustration of the structure of the relation. The user fills in an example of what is wanted and the system returns the required data in that format (e.g QBE).

Fourth-generation languages (4GLs) can create a complete customized application using a limited set of commands in a user-friendly, often menu-driven environment.

Some systems accept a form of natural language, sometimes called a fifth-generation language (5GL). This development is still in its infancy.

59

Exercise The following tables form a part of a database held in a relational

DBMS: Hotel: (hotelNo, hotelName, hotelAddress)

Room: (roomNo, hotelNo, Type, Price)

Booking: (hotelNo, guestNo, dateFrom, dataTo, roomNo)

Guest: (guestNo, guestName, guestAddress)

Where the primary keys are underlined.

Generate the relational algebra, tuple-oriented and domain-oriented calculus for the following queries:

1. List all hotels.

2. List all single rooms with a price below 20 per night.

3. List the names and addresses of all guests.

4. List the price and type of all rooms at the Grosvenor Hotel.

5. List all guests currently staying at the Grovenor Hotel.

6. List the details of all rooms at the Grosvenor Hotel, including the name of the guest staying in the room, if the room is occupied.

7. List the guest details (guestNo, guestName, and guestAddress) of all guests staying at the Grosvenor Hotel.

60