Foundation of Relational Databases - DREAM...

Post on 25-Jun-2020

4 views 0 download

transcript

1

Foundation of Relational Databases

Yanlei Diao

Slides Courtesy of R. Ramakrishnan and J. Gehrke

2

Databases and DBMS’s

v A database is a large, integrated collection of datav A database management system (DBMS) is a

software system designed to store and manage a large amount of data§ Declarative interface to define, add/update, and query data§ Efficient querying§ Concurrent users§ Crash recovery§ Access control… DB

DBMS

3

What Type of Data is stored?

v All critical business data!§ Banking§ Ticketing§ Retail§ Electronic commerce§ Insurance§ Healthcare§ Enterprise HR§ Government§ Telecommunications§ Social networks

4

v Many small data items, many queries and updates§ Banking, airline reservations

v 1960s Navigational DBMS§ Tree / graph-based data model§ Manual navigation to find what you want§ Support for �search� = �programming�

Early DBMS’s

v 1973 Turing Award Winner§ Charles William Bachman§ “The Programmer as Navigator”§ The network data model

5

Relational DBMS’s

v Relational model (1970)§ Data independence: hides details of physical storage

from users§ Declarative query language: say what you want, not how

to compute it § Mathematical foundation: what queries mean, possible

implementations

v 1981 Turing Award Winner§ Edgar F. (“Ted”) Codd§ Mathematically-inclined researcher§ Legitimized DBMS’s as a theoretically

respectable research field in CS

6

Relational DBMS

v Query optimization (1970�s till now)§ Earliest: System R at IBM, INGRES at UC Berkeley§ Queries can be efficiently executed despite data independence and

declarative queries!

v 2014 Turing Award Winner§ Michael Stonebraker (Berkeley / MIT)§ “For fundamental contributions to

modern database systems”

1974 Debate at an ACM Workshop

7

Evolution of DBMS’s

System R

INGRES

Sybase

Informix Postgres

MS SQL Server

IBM DB2

OracleMySQL

UC Berkeley, Stonebraker et al

IBM San Jose, Gray, Selinger et al

8

The Picture Today (Gartner 2015)

9

Foundation of Relational Databases

10

Foundation of Relational Databases

v Relational Model

v Formal Query Languages

§ Relational Algebra

§ Relational Calculus

§ Language Theory

11

Relational Model

v A relational database is a set of relations.v Each relation has:

§ Schema : specifies name of relation, name and type (domain) of each attribute.• Students(sid:string, name:string, login:string, age:integer, gpa:real).

§ Instance : a table with rows (tuples) and columns (attributes, fields). cardinality = #rows, degree / arity = #columns.

v A relation is a set of tuples (in theory). § All rows must be distinct, no duplicates.

12

Example Instance of Students Relation

sid name login age gpa 53666 Jones jones@cs 18 3.4 53688 Smith smith@eecs 18 3.2 53650 Smith smith@math 19 3.8

v Cardinality = 3, degree = 5

v All rows are distinct.

v Some columns of two rows can be the same.

13

Creating Relations in SQL

CREATE TABLE Students(sid CHAR(20), name CHAR(20), login CHAR(10),age INTEGER,gpa REAL);

CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2));

v Create the Students relation

v Specify domain constraints:§ type of each field § later enforced by the

DBMS upon tuple insertion or update.

15

Adding Tuples

v Can insert a single tuple using:

INSERT INTO Students (sid, name, login, age, gpa)VALUES (�53688�, �Smith�, �smith@ee�, 18, 3.2);

☛ Powerful variants of these commands are available; more later!

16

Integrity Constraints

v Integrity Constraints (IC�s): condition that must be true for any instance of the database. § Domain constraint§ Primary key constraint§ Foreign key constraint§ Specified when the schema is defined.

v DBMS enforces ICs. § Stored data is faithful to real-world meaning.§ Avoids data entry errors, too!

17

Primary Key Constraints

v Key of a relation: minimum set of attributes that uniquely identify each entity.1. No two tuples can have same values in all key fields.2. This is not true for any subset of the key.§ Part 2 false? A superkey.§ If more than 1 key for a relation, candidate keys.§ One of candidate keys is chosen to be the primary key.

v E.g., Students(sid, name, login, age, gpa)

18

Primary and Candidate Keys in SQLv Specify candidate keys using UNIQUE.v Choose one candidate key as the primary key.

CREATE TABLE Enrolled(sid CHAR(20),cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid));

�For a given student and course, there is a single grade.�

�… and no two students in a course receive the same grade.�

CREATE TABLE Enrolled(sid CHAR(20),cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid),UNIQUE (cid, grade) );

19

Foreign Keys

v Foreign key: set of fields used to `refer� to the primary key of another relation. § Like a `logical pointer�.

v E.g., Enrolled(sid: string, cid: string, grade: string):§ sid is a foreign key referring to Students.

20

Foreign Keys in SQL

v Only students listed in the Students relation should be allowed to enroll for courses.

CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2),

PRIMARY KEY (sid,cid),FOREIGN KEY (sid) REFERENCES Students );

sid name login age gpa53666 Jones jones@cs 18 3.453688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8

sid cid grade53666 Carnatic101 C53666 Reggae203 B53650 Topology112 A53666 History105 B

EnrolledStudents

21

Referential Integrity

v Referential integrity: any foreign key value must have a matching primary key value in referenced reln. § E.g., every sid value in Enrolled must appear in Students.§ No dangling references.

v Can you name a data model without referential integrity?

22

Enforcing Referential Integrity

v What if an Enrolled tuple with a non-existent student id is inserted? § Reject it!

v What if a Students tuple is deleted?§ CASCADE: delete all Enrolled tuples that refer to it.§ NO ACTION: disallow if the Students tuple is referred to.§ SET DEFAULT: set the foreign key to a default sid.§ SET NULL: set the foreign key to a special value null,

denoting `unknown� or `inapplicable�.v Updates to sid in Students are treated similarly.

23

Referential Integrity in SQL

CREATE TABLE Enrolled(sid CHAR(20),cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid),FOREIGN KEY (sid)

REFERENCES Students (sid)ON DELETE CASCADEON UPDATE NO ACTION);

24

Where do IC�s Come From?

v Based upon real-world business logic. v Can check violation in a database instance, but can

NEVER infer an IC by looking at an instance.§ An IC is a statement about all possible instances!§ E.g., name of the Students relation.

28

Outline

v Relational Model

v Formal Query Languages

§ Relational Algebra

§ Relational Calculus

§ Language Theory

29

Relational Query Languages

v Relational model allows simple, powerful querying of data.

v Relational query languages

§ Declarative: say �what you want�, not �how to get it�§ Formal mathematical foundation§ Query optimization

30

Formal Relational Query Languages

v Two mathematical languages form the basis for the �real� one, SQL, and for implementation:§ Relational Algebra: operational, useful for

representing execution plans.

§ Relational Calculus: declarative, useful for defining query semantics.

32

What is �Relational Algebra�

v Relational algebra:§ Operands are relations. § Operators each take 1 or 2 relations and produce a

relation.v Closure property: relational algebra is closed

under the relational model.§ Relational operators can be arbitrarily composed!

33

Relational Algebra

v Basic operations:§ Selection ( s ) Selects a subset of rows from a relation.

§ Projection ( p ) Retains a subset of columns in a relation.

§ Cross-product ( ´ ) Allows us to combine two relations.

§ Set-difference ( - ) Tuples in reln. 1, but not in reln. 2.§ Union ( È ) Tuples in reln. 1 or in reln. 2.

v Additional operations:§ Join ( wv ), Intersection ( Ç ), Division ( / ), Renaming ( r ) § Can be derived from basic operators. Not essential, but

useful!

34

Example Instances

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

sid bid day22 101 10/10/9658 103 11/12/96

R1

S1

S2

Sailors

Reserves

35

Projection

sname ratingyuppy 9lubber 8guppy 5rusty 10

π sname rating S, ( )2

v Retain only attributes in the projection list; delete others.v Schema of result contains exactly the fields in projection list.

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

36

Projection (contd.)

age35.055.5

πage S( )2

v Projection operator has to eliminate duplicates! § SQL (real) systems typically don’t do duplicate

elimination unless the user explicitly asks for it. (Why not?)

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

37

Selection

σ rating S>8 2( )

sid sname rating age28 yuppy 9 35.058 rusty 10 35.0

v Select rows that satisfy the selection condition; discard others.

v Schema of resultidentical to schema of input.

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

38

Selection (contd.)

σ rating S>8 2( )

sid sname rating age28 yuppy 9 35.058 rusty 10 35.0

sname ratingyuppy 9rusty 10

π σsname rating rating S, ( ( ))>8 2

v Composition: result relation of an operator can be the input to another operator.

39

Union, Intersection, Set-Difference

v Set operations: § Union ( È )§ Intersection ( Ç )§ Set difference ( - )

v Two input relations must be union-compatible:§ Same number of fields.§ Corresponding fields have

the same type.

v What is the schema of result?

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S1

S2

40

Example Set Operationssid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0

S S1 2∪

Duplicate elimination: removetuples that have same values in all attributes.

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S1

S2

41

Example Set Operations

sid sname rating age31 lubber 8 55.558 rusty 10 35.0

S S1 2∩

sid sname rating age22 dustin 7 45.0

S S1 2−

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S1

S2

Duplicates?

42

Cross (Cartesian) Productv S1 ´ R1: each row of S1 is paired with each row of R1.

sid bid day22 101 10/10/9658 103 11/12/96

R1sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

S1

(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 11/12/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 11/12/96

S1 ´ R1

43

Cross-Product (contd.)

v Result schema inherits all fields of S1 and R1.§ Conflict: Both S1 and R1 have a field called sid.

ρ ( ( , ), )C sid sid S R1 1 5 2 1 1→ → ×

(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 11/12/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 11/12/96

v Renaming operator:

44

Joins

v Condition (theta) Join:

§ Result schema same as that of cross-product.§ But often fewer tuples, more efficient for computation.

R c S c R S▹◃ = ×σ ( )

(sid) sname rating age (sid) bid day22 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 58 103 11/12/96

S RS sid R sid1 11 1▹◃ . .<

45

Joins

v Equi-Join: A special case of condition join where the condition q contains only equalities

§ Result schema contains only one copy of fields for which equality is specified.

v Natural Join ( R wv S ): equijoin on all common fields

sid sname rating age bid day22 dustin 7 45.0 101 10/10/9658 rusty 10 35.0 103 11/12/96

11 RS sid◃▹

50

Example Schema

v Sailors(sid: integer, sname: string, rating: integer, age: integer)

v Boats(bid: integer, color: string)v Reserves(sid: integer, bid: integer, day: date)

51

v Sailors(sid: integer, sname: string, rating: integer,age: integer)

v Boats(bid: integer, color: string)v Reserves(sid: integer, bid: integer, day: date)

Find names of sailors who�ve reserved boat #103

1) How many relations do we need?2) How do we compare two relations?3) Where do we place the selection?4) Which attributes are retained for output?

52

Find names of sailors who�ve reserved boat #103

v Solution 1: π σsname bid serves Sailors(( Re ) )=103 ▹◃

v Solution 2: ρ σ( , Re )Temp servesbid1 103=

ρ ( , )Temp Temp Sailors2 1▹◃

π sname Temp( )2

v Solution 3: π σsname bid serves Sailors( (Re ))=103 ▹◃

Algebraic equivalence!

53

Find names of sailors who�ve reserved a red boat

v Boat color is only available in Boats; so need an extra join:

π σsname color red Boats serves Sailors(( ' ' ) Re )=

▹◃ ▹◃

v Sailors(sid: integer, sname: string, rating: integer,age: integer)

v Boats(bid: integer, color: string)v Reserves(sid: integer, bid: integer, day: date)

54

Find sailors who�ve reserved a red or a green boat

v Can identify all red or green boats, then find sailors who�ve reserved one of these boats:

ρ σ( , ( ' ' ' ' ))Tempboats color red color green Boats= ∨ =

π sname Tempboats serves Sailors( Re )▹◃ ▹◃

v Can also define Tempboats using union. How?

55

Find sailors who�ve reserved a red and a green boat

v Will a single selection work?v Instead, intersect sailors who�ve reserved red boats

and sailors who�ve reserved green boats.

ρ π σ( , (( ' ' ) Re ))Tempred sid color red Boats serves=

▹◃

π sname Tempred Tempgreen Sailors(( ) )∩ ▹◃

ρ π σ( , (( ' ' ) Re ))Tempgreen sid color green Boats serves=

▹◃

sid is a key for Sailors

56

Find the names of sailors who�ve reserved all boats

v Sailors(sid: integer, sname: string, rating: integer, age: integer)

v Boats(bid: integer, color: string)v Reserves(sid: integer, bid: integer, day: date)

Relational algebraSelection ( s ) Selects a subset of rows from a relation.

Projection ( p ) Retains a subset of columns in a relation.

Cross-product ( ´ ) Allows us to combine two relations.

Set-difference ( - ) Tuples in reln. 1, but not in reln. 2.Union ( È ) Tuples in reln. 1 or in reln. 2.Join ( wv ), Intersection ( Ç ), Division ( / ), Renaming ( r )

58

Outline

v Relational Model

v Formal Query Languages

§ Relational Algebra

§ Relational Calculus

§ Language Theory

59

Relational Calculus

v Query has the form:x x xn p x x xn1 2 1 2, ,..., | , ,...,!

"

###

$

%

&&&

'

()

*)

+

,)

-)

v Answer includes all tuples that

make the formula true.

x x xn1 2, ,...,

p x x xn1 2, ,...,!

"

###

$

%

&&&

domain variables, or constants formula

60

Formulas

v Formula is recursively defined:§ Atomic formulas:

• getting tuples from relations, or • making comparisons of values

§ Logical connectives: ¬, Ù, Ú§ Quantifiers: $, "

61

Free and Bound Variables

v The use of quantifiers and in a formula is said to bind X.§ A variable that is not bound is free.

v Let us revisit the definition of a query:

∃ X ∀ X

x x xn p x x xn1 2 1 2, ,..., | , ,...,!

"

###

$

%

&&&

'

()

*)

+

,)

-)

v There is an important restriction: the variables x1, ..., xn that appear to the left of ‘|’ must be the only free variables in the formula p(...).

62

Find names of sailors rated>7 who have reserved boat #103

))()Re(( 7103 Sailorsserves ratingbidsname >=σσπ ◃▹

{Xsname | $ Xsid, Xrating, Xage Sailors(Xsid, Xsname, Xrating, Xage) Ù Xrating > 7 Ù $ Xbid, Xday Reserves(Xsid, Xbid, Xday) Ù Xbid=103 }

Relational Algebra:

Relational Calculus:

v Where is the join?§ Use $ to find a tuple in Reserves that ‘joins with’ the Sailors tuple

under consideration.

63

Find names of sailors who’ve reserved all boats

v To find sailors who�ve reserved all red boats:

{Xsname | $ Xsid, Xrating, Xage áXsid, Xsname, Xrating, XageñÎSailors Ù" áXbid, XcolorñÎBoats

($ Xday áXsid, Xbid, Xdayñ Î Reserves) }

{Xsname | $ Xsid, Xrating, Xage áXsid, Xsname, Xrating, XageñÎSailors Ù" áXbid, XcolorñÎBoats

(Xcolor¹'red' Ú $ Xday áXsid, Xbid, Xdayñ Î Reserves) }

p −> q ≡ ¬p ∨ q

65

Find names of sailors who’ve reserved all boats

{Xsname | $ Xsid, Xrating, Xage áXsid, Xsname, Xrating, XageñÎSailors Ù" áXbid, XcolorñÎBoats

($ Xday áXsid, Xbid, Xdayñ Î Reserves) }

{Xsname | $ Xsid, Xrating, Xage áXsid, Xsname, Xrating, XageñÎSailors Ù¬ $ áXbid, XcolorñÎBoats

(¬ $ Xday áXsid, Xbid, Xdayñ Î Reserves) }

∀x∈R F(x) ≡ ¬∃x ∈R ¬F(x)

How do we write it in Relational Algebra?

66

Find the names of sailors who’ve reserved all boats

v Step 1: find all sailors such that there exists a boat that he has not reserved (called formula F).

ρ (S _ neg, πsid ( (π sid Re serves) × (πbid Boats) − (π sid,bidRe serves)))

π sname ( (π sidRe serves − S _ neg) Sailors )

v Step 2: find sailors for which F is not true and retrieve their names

�-�: the only way to express negation in relational algebra!