2 December 2005
Introduction to Databases Relational Model and Relational Algebra
Prof. Beat Signer
Department of Computer Science
Vrije Universiteit Brussel
http://www.beatsigner.com
Beat Signer - Department of Computer Science - [email protected]
2 February 28, 2014
Relational Model
Theory for data management developed
by Edgar F. Codd while working at IBM Edgar F. Codd, A Relational Model of Data for
Large Shared Data Banks, Communications of the ACM 13(6), June 1970
data independence
- between logical and physical level
set-based query language
- relational query language
normalisation
- avoid redundancy
IBM first did not implement the relational model in order
to "protect" their IMS/DB revenues
Edgar F. Codd
Beat Signer - Department of Computer Science - [email protected]
3 February 28, 2014
Relational Model …
IBM's System R (1974) was a DBMS prototype
implementing Codd's relational model first implementation of the Structured English Query
Language (SEQUEL)
- later renamed to Structured Query Language (SQL)
The System R prototype finally led to the development of
different commercial DBMSs including IBM's DB2,
Oracle or Microsoft SQL Server
Beat Signer - Department of Computer Science - [email protected]
4 February 28, 2014
Relational Database
A relational database consists of a number of tables each table row defines a relationship between a set of values
There is an analogy between the concept of a table
(collection of relationships) and the mathematical
concept of a relation in the following we therefore talk about relations instead of tables
a relational database consists of a collection of relations
customerID name street postcode city
1 Max Frisch Bahnhofstrasse 7 8001 Zurich
2 Eddy Merckx Pleinlaan 25 1050 Brussels
... ... ... ... ...
Beat Signer - Department of Computer Science - [email protected]
5 February 28, 2014
Relational Database ...
Information is normally partitioned into different relations
since the storage in a single relation would lead to a replication of information (redundancy)
a large number of necessary null values
While tables are used at the logical level, different
storage structures can be used at the physical level data independence
Beat Signer - Department of Computer Science - [email protected]
6 February 28, 2014
Relation
The column headers of a table are called attributes and
for each attribute ai there is a set of permitted values
called the domain Di of ai
Given the domains D1, D2,..., Dn, a relation r is defined as
a subset of the cartesian product D1D2...Dn
name street ... city
Max Frisch Bahnhofstrasse 7 ... Zurich
... ... ... ...
D1 D2 Dn
t1
tm
a1 a2 an attributes
tuples
relation r
degree
cardinality
Beat Signer - Department of Computer Science - [email protected]
7 February 28, 2014
Relation ...
A relation r is a set of n-ary tuples ti = (a1, a2,..., an) where
each aiDi the number of attributes is called a relation's degree
the number of tuples is called a relation's cardinality
Since a relation is a set of tuples, the order of the tuples
is irrelevant each tuple is distinctive (no duplicate tuples)
The order of attributes is irrelevant
The domain of each attribute has to be atomic all members of the domain have to be indivisible units
a relation with only atomic values is normalised (first normal form)
Beat Signer - Department of Computer Science - [email protected]
8 February 28, 2014
Relation ...
Multiple attributes can have the same domain Di
we will see later how the domain types can be defined (in SQL)
A tuple variable is a variable that stands for a tuple its domain is defined by the set of all tuples
The special null value is part of any domain Di used to represent an unknown or non-existing value
null values are not easy to handle (not the same as empty string)
- avoid null values whenever possible
Beat Signer - Department of Computer Science - [email protected]
9 February 28, 2014
Example of a Relation
Let us assume that we have the following attributes with
their domains name = {Merckx, Frisch, Botta, ...}
street = {Bahnhofstrasse, Pleinlaan, Via Nassa, ...}
city = {Zurich, Brussels, Lugano, Paris, ...}
Then r = {(Merckx, Pleinlaan, Brussels),
(Frisch, Bahnhofstrasse, Zurich), (Botta, Via Nassa, Lugano), (Botta, Bahnhofstrasse, Lugano)} forms a relation over namestreetcity
Beat Signer - Department of Computer Science - [email protected]
10 February 28, 2014
Relational Database Example
customerID name street postcode city
1 Max Frisch Bahnhofstrasse 7 8001 Zurich
2 Eddy Merckx Pleinlaan 25 1050 Brussels
5 Claude Debussy 12 Rue Louise 75008 Paris
53 Albert Einstein Bergstrasse 18 8037 Zurich
8 Max Frisch ETH Zentrum 8092 Zurich
cdID name duration price year
1 Falling into Place 2007 17.90 2007
2 Carcassonne 3156 15.50 1993
customer
cd
Beat Signer - Department of Computer Science - [email protected]
11 February 28, 2014
Relational Database Example ...
orderID customerID cdID date amount status
1 53 2 13.02.2010 2 open
2 2 1 15.02.2010 1 delivered
order
supplierID name city
5 Max Frisch Zurich
2 Mario Botta Lugano
supplier
Customer (customerID, name, street, postcode, city) CD (cdID, name, duration, price, year) Order (orderId, customerID, cdID, date, amount, status) Supplier (supplierID, name, city)
relational database schema
Beat Signer - Department of Computer Science - [email protected]
12 February 28, 2014
Database Schema
The logical design of a database is defined by the
database schema
A database instance is a snapshot of the data stored in a
database at a given time
A relation schema R = (A1, A2,..., An) is defined by a list of
attributes by convention, the name of a relation schema starts with an
uppercase letter (in contrast to relations)
- e.g. Customer = (customerID, name, street, postcode, city)
a relation can be defined based on the relation schema
- e.g. customer = (Customer)
a relation instance contains the relation's actual values (tuples)
Beat Signer - Department of Computer Science - [email protected]
13 February 28, 2014
Keys
For the keys we use the same terminology as
introduced earlier for the ER model
K R is a superkey of R if the values of K uniquely
identify any tuple of possible relation instances r(R) t1 and t2 r and t1 t2 t1[K] t2[K]
e.g. {cdID} and {cdID, name} are both superkeys of CD
K is a candidate key if K is minimal e.g. {cdID} is a candidate key of CD
The DB designer has to choose one of the candidate
keys for each relation schema as primary key if possible, the value of a primary key should not change or only in
very rare cases
Beat Signer - Department of Computer Science - [email protected]
14 February 28, 2014
Foreign Keys
A relation schema may have one or multiple attributes
that correspond to the primary key of another relation
schema and are called foreign keys e.g. customerID is a foreign key of the order relation schema
note that a foreign key of the referencing relation can only contain values that occur in the primary key of the referenced relation or it must be null referential integrity
customerID
name street postcode city
cdID
name duration price year
orderID
customerID cdID date amount status
customer cd order
schema diagram
Beat Signer - Department of Computer Science - [email protected]
15 February 28, 2014
A query language is a language that is used to access
(read) information stored in a database as well as to
create, update and delete information (CRUD)
There are different types of query languages procedural
- e.g. relational algebra
declarative
- e.g. Structured Query Language (SQL)
The relational algebra as well as the tuple and domain
relational calculus form the basis for "higher level" query
languages (e.g. SQL)
Query Language
Beat Signer - Department of Computer Science - [email protected]
16 February 28, 2014
Relational Algebra
The relational algebra consists of six fundamental
operations
unary operations selection: s
projection: p
rename: r
binary operations union:
set difference: -
cartesian product:
An operator takes one (unary) or two (binary) relations
as input and returns a new single relation as output
Beat Signer - Department of Computer Science - [email protected]
17 February 28, 2014
Relational Algebra ...
Based on the six fundamental operations, we can define
additional relational operations (add no additional power) set intersection:
natural join: ⋈
theta join: q
equijoin: q
semijoin: ⋉
division:
assignment:
Extended operations with additional expressiveness generalised projection
aggregate function
outer join
Beat Signer - Department of Computer Science - [email protected]
18 February 28, 2014
Selection (s)
Goal: Select specific tuples (rows) from a relation (table)
sp(r) = {t | t r and p(t)} p is a selection predicate
- consists of terms pi connected by an and (), or () or not ()
a term pi has the form attributem operator attributen or constant
- the available operators are: =, , >, <, ,
Examples "Find all tuples in the customer relation that are in the city of
Zurich and have a postcode greater than 8010."
scity = "Zurich" postcode > 8010 (customer)
customerID name street postcode city
53 Albert Einstein Bergstrasse 18 8037 Zurich
8 Max Frisch ETH Zentrum 8092 Zurich
Beat Signer - Department of Computer Science - [email protected]
19 February 28, 2014
Selection (s) ...
Examples ... A selection predicate may contain a comparison between two
attributes
"Find all tuples in the cd relation with a value for the duration attribute that is equal to the year of release."
sduration = year (cd)
Attention: The selection operation in relational algebra
has a different meaning than the SELECT statement used
in SQL SELECT in SQL corresponds to a projection in relational algebra
cdID name duration price year
1 Falling into Place 2007 17.90 2007
Beat Signer - Department of Computer Science - [email protected]
20 February 28, 2014
Projection (p)
Goal: Select specific attributes (columns) from a relation
pA1, A2,..., Am(r)
returns a relation instance that only contains the columns for which an attribute Ai has been listed
duplicate tuples are removed from the resulting relation (set)
Example "Return the name and city of all tuples in the customer relation."
pname,city (customer)
name city
Max Frisch Zurich
Eddy Merckx Brussels
Claude Debussy Paris
Albert Einstein Zurich
Beat Signer - Department of Computer Science - [email protected]
21 February 28, 2014
Composition of Relational Operations
Since the result of a relational operation is a new single
relation instance, multiple operations can be combined
Example "Find the names of all customers who live in Zurich."
pname (scity = "Zurich" (customer))
name
Max Frisch
Albert Einstein
Beat Signer - Department of Computer Science - [email protected]
22 February 28, 2014
Union ()
Goal: Unify tuples from two relations
r s = {t | t r or t s} r and s must have the same degree (same number of attributes)
the corresponding attribute domains must have a compatible type
Example "Find the names of persons who are either customers or suppliers."
pname (customer) pname (supplier)
name
Max Frisch
Eddy Merckx
Claude Debussy
Albert Einstein
Mario Botta
Beat Signer - Department of Computer Science - [email protected]
23 February 28, 2014
Set Difference (-)
Goal: Find tuples that are in one relation but not in
another relation
r - s = {t | t r and t s} r and s must have the same degree
the corresponding attribute domains must have a compatible type
finds tuples that are in a relation r but not in another relation s
Example "Find the names of suppliers who are no customers."
pname (supplier) - pname (customer)
name Mario Botta
Beat Signer - Department of Computer Science - [email protected]
24 February 28, 2014
Cartesian Product ()
Goal: Combine information from any two relations
r s = {tu | t r and u s} the attribute names of r(R) and s(S) have to be distinct
- attributes with the same name have to be renamed via the rename operator r
Example pname (customer) pcity (customer)
name city
Max Frisch Zurich
Max Frisch Brussels
Max Frisch Paris
Eddy Merckx Zurich
Eddy Merckx Brussels
... ...
Beat Signer - Department of Computer Science - [email protected]
25 February 28, 2014
Cartesian Product () ...
Example ... "List the names of all customers with at least one order."
pname( scustomer.customerID = order.customerID (customer order) )
Note that we will later see another operator (natural join)
for the combination of two tables based on common
attributes
name
Eddy Merckx
Albert Einstein
Beat Signer - Department of Computer Science - [email protected]
26 February 28, 2014
Rename (r)
Goal: Rename a relation or an attribute
rx(E) renames the result of expression E to x
rx(A1,A2,..., An) (E)
renames the result of expression E to x and renames the attributes to A1,A2,...,An
Example rperson(name,location) (pname,city (customer))
name location
Max Frisch Zurich
... ...
person
Beat Signer - Department of Computer Science - [email protected]
27 February 28, 2014
Rename (r) ...
Example "Find the price of the most expensive CD in the cd relation."
Pprice (cd) - Pcd.price(scd.price < d.price (cd rd (cd)))
We will later see another operator (aggregate function)
that can be used to simplify this type of query for a max
value
price
17.90
Beat Signer - Department of Computer Science - [email protected]
28 February 28, 2014
Formal Definition
A basic relational algebra expression is either a relation
from the database or a constant relation
The set of all relational algebra expressions is defined by E1 E2
E1 - E2
E1 E2
sp(E1)
pA1,A2,..., Am(E1)
rx(A1,A2,..., An) (E1)
Beat Signer - Department of Computer Science - [email protected]
29 February 28, 2014
Set Intersection
r s = {t | t r and t s} r and s must have the same degree
the corresponding attribute domains must have a compatible type
Note that the set intersection can be implemented via a
pair of set difference operations r s = r - (r - s)
Example "Find the names of people who are customers and suppliers."
pname (customer) pname (supplier)
name
Max Frisch
Beat Signer - Department of Computer Science - [email protected]
30 February 28, 2014
Natural Join (⋈)
r ⋈ s = pR S (sr.A1= s.A1 r.A2= s.A2 ... r.An= s.An(r s))
where R S = {A1, A2,..., An}
Keep all tuples of the cartesian product r s that have the
same value for the shared attributes of r(R) and s(S) the natural join is an associative operation: (r ⋈ s) ⋈ t = r ⋈ (s ⋈ t)
Example "List the name and street of customers whose order is still open."
pname, street(sstatus="open"(order ⋈ customer))
note that we could do the selection first (more efficient)
- we will review this later when discussing query optimisation
name street
Albert Einstein Bergstrasse 18
Beat Signer - Department of Computer Science - [email protected]
31 February 28, 2014
Theta Join (q) and Equijoin
r ⋈p s = sp (r s) where the predicate p is of the form r.Ai q s.Aj and q is a
comparison operator (=, , >, <, , )
Example customer ⋈postcode duration (rcd(cdID,cdName,duration,price,year) (cd))
An equijoin (r ⋈p s) is a special form of a theta join where
only the equality operator (=) may be used note that a natural join is a equijoin over all common attributes of
a relation r and s
customerID name ... cdID cdName ...
2 Eddy Merckx ... 1 Falling into Place ...
2 Eddy Merckx ... 2 Carcassonne ...
Beat Signer - Department of Computer Science - [email protected]
32 February 28, 2014
Semijoin (⋉)
r ⋉ s = pA1, A2,..., An (r ⋈ s)
where A1, A2,..., An are all the attributes of r
Example "List the customers whose order is still open."
customer ⋉ sstatus="open" (order)
customerID Name Street Postcode city
53 Albert Einstein Bergstrasse 18 8037 Zurich
Beat Signer - Department of Computer Science - [email protected]
33 February 28, 2014
Division ()
r s = {t | t pR-S(r) and " u s tu r} for the relations r(R) and s(S) and S R
suited for queries that include the phrase "for all"
Example r s
A B C D
a 3 a 1
a 1 a 3
a 3 b 3
b 1 a 1
c 4 b 3
b 2 b 2
c 4 a 1
r
C D
b 3
a 1
A B
a 3
c 4
s
r s
Beat Signer - Department of Computer Science - [email protected]
34 February 28, 2014
Assigment
variable E
Works like an assigment in programming languages
Assigments must always be made to temporaray relation
variables no database modification
Example temp1 pname, street(sstatus="open"(order ⋈ customer))
temp2 pname,street (scity = "Brussels" (customer))
result temp1 temp2
name street
Eddy Merckx Pleinlaan 25
Albert Einstein Bergstrasse 18
Beat Signer - Department of Computer Science - [email protected]
35 February 28, 2014
Generalised Projection (p)
pF1, F2,..., Fm(E)
generalisation of the project operation that supports the projection to arithmetic expressions F1, F2,..., Fm
Example "Show a list of the names of all CDs together with a reduced
price (80% of the original price)."
pname, price 0.8 as reducedPrice (cd)
name reducedPrice
Falling into Place 14.32
Carcassonne 12.72
Beat Signer - Department of Computer Science - [email protected]
36 February 28, 2014
Aggregate Function
G1, G2,..., GmG F1(A1), F2(A2),..., Fn(An) (E)
takes a collection of values as input and returns a single value based on the following operations
- min: minimum value
- max: maximum value
- sum: sum of values
- count: number of value
- avg: average value
G1, G2,..., Gm lists the attributes on which to group
- can be empty
Fi are the aggregate functions
by default, duplicates are not elimiated before aggregating
- can use the keyword distict to eliminate duplicates (e.g. sum-distinct)
Beat Signer - Department of Computer Science - [email protected]
37 February 28, 2014
Aggregate Function ...
Examples "List the average amount of items in an order."
G avg(amount)(order)
"List the number of customers for each city."
cityG count(customerID) as customerNo (customer)
avg(amount)
1.5
city customerNo
Zurich 3
Brussels 1
Paris 1
Beat Signer - Department of Computer Science - [email protected]
38 February 28, 2014
Outer Joins
The left outer join (⟕), right outer join (⟖) and full outer
join (⟗) are extensions of the natural join operation
Computes the natural join and then adds the tuples from
one relation that do not match the other relation filled up with null values
Example customer ⟕ order
customerID name ... orderID cdID ...
53 Albert Einstein ... 1 2 ...
2 Eddy Merckx ... 2 1 ...
5 Claude Debussy ... null null ...
... ... ... ... ... ...
Beat Signer - Department of Computer Science - [email protected]
39 February 28, 2014
Outer Joins ...
left outer join (⟕) keeps all tuples of the left relation
non-matching parts filled up with null values
right outer join (⟖) keeps all tuples of the right relation
non-matching parts filled up with null values
full outer join (⟗) keeps all tuples of both relations
non-matching parts filled up with null values
Beat Signer - Department of Computer Science - [email protected]
40 February 28, 2014
Null Values
A null value means that the value is unknown or
nonexistent
Primary key attributes can never be null (entity integrity)
The result of an arithmetic operation that involves a null
value is null
For grouping and duplicate elimination null values are
treated like other values (two null values are the same)
Null values are ignored in aggregate functions
Beat Signer - Department of Computer Science - [email protected]
41 February 28, 2014
Database Modifications
A database may be modified by using one of the
following operations insert
update
delete
These operations are defined via the assigment operator
Beat Signer - Department of Computer Science - [email protected]
42 February 28, 2014
Insert
r r E
To insert new data into a relation we can explicitly specify the tuples to be inserted
write a query and insert the result tuple set
Example cd cd {(3, "Chromatic", 3012, 16.50, 1994)}
cdID name duration price year
1 Falling into Place 2007 17.90 2007
2 Carcassonne 3156 15.50 1993
3 Chromatic 3012 16.50 1994
Beat Signer - Department of Computer Science - [email protected]
43 February 28, 2014
Update
r pF1, F2,..., Fn(r)
Update specific values in a tuple r
Fi is either the old value of r or a new value if the
attribute has to be updated
Example "Increase the price of all CDs by 10%."
cd pcdID, name, duration, price 1.1, year (cd)
cdID name duration price year
1 Falling into Place 2007 19.69 2007
2 Carcassonne 3156 17.05 1993
3 Chromatic 3012 18.15 1994
Beat Signer - Department of Computer Science - [email protected]
44 February 28, 2014
Delete
r r - E
A delete is expressed similar to a query except that the
result is removed from the database
Cannot remove single attributes
Example "Remove the CD with the name 'Carcassonne' from the
database."
cd cd - sname = "Carcassonne" (cd)
cdID name duration price year
1 Falling into Place 2007 19.69 2007
3 Chromatic 3012 18.15 1994
Beat Signer - Department of Computer Science - [email protected]
45 February 28, 2014
Homework
Study the following chapters of the
Database System Concepts book chapter 2
- Relational Model
chapter 6
- section 6.1
- Relational Algebra
Beat Signer - Department of Computer Science - [email protected]
46 February 28, 2014
Exercise 3
Relational model
Relational algebra
Beat Signer - Department of Computer Science - [email protected]
47 February 28, 2014
References
A. Silberschatz, H. Korth and S. Sudarshan,
Database System Concepts (Sixth Edition),
McGraw-Hill, 2010
E. F. Codd, A Relational Model of Data for Large Shared
Data Banks, Communications of the ACM 13(6),
June 1970
2 December 2005
Next Week Relational Database Design