+ All Categories
Home > Documents > Temple University – CIS Dept. CIS331– Principles of Database Systems

Temple University – CIS Dept. CIS331– Principles of Database Systems

Date post: 22-Mar-2016
Category:
Upload: gretel
View: 32 times
Download: 1 times
Share this document with a friend
Description:
Temple University – CIS Dept. CIS331– Principles of Database Systems. V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU). Overview. Relational model formal query languages - PowerPoint PPT Presentation
Popular Tags:
62
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth, and Sudarshan and
Transcript
Page 1: Temple University – CIS Dept. CIS331– Principles of Database Systems

Temple University – CIS Dept.CIS331– Principles of Database Systems

V. Megalooikonomou

Database Design and Normalization

(based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU)

Page 2: Temple University – CIS Dept. CIS331– Principles of Database Systems

Overview Relational model

formal query languages commercial query languages (SQL)

Integrity constraints domain I.C., foreign keys functional dependencies

Functional Dependencies DB design and normalization

Page 3: Temple University – CIS Dept. CIS331– Principles of Database Systems

Overview - detailed DB design and normalization

pitfalls of bad design decomposition normal forms

Page 4: Temple University – CIS Dept. CIS331– Principles of Database Systems

Design ‘good’ tables sub-goal#1: define what ‘good’ means sub-goal#2: fix ‘bad’ tables

in short: “we want tables where the attributes

depend on the primary key, on the whole key, and nothing but the key”

Let’s see why, and how:

Goal

Page 5: Temple University – CIS Dept. CIS331– Principles of Database Systems

Pitfallstakes1 (ssn, c-id, grade, name,

address)Ssn c-id Grade Name Address

123cs331 A smith Main

Page 6: Temple University – CIS Dept. CIS331– Principles of Database Systems

Pitfalls‘Bad’ - why? because: ssn->address, name

Ssn c-id Grade Name Address

123 cs331 A smith Main

123 cs351 B smith Main

123 cs211 A smith Main

Page 7: Temple University – CIS Dept. CIS331– Principles of Database Systems

Pitfalls Redundancy

space (inconsistencies) insertion/deletion anomalies:

Page 8: Temple University – CIS Dept. CIS331– Principles of Database Systems

Pitfalls insertion anomaly:

“jones” registers, but takes no class - no place to store his address!

Ssn c-id Grade Name Address

123 cs331 A smith Main

… … … … …

234 null null jones Forbes

Page 9: Temple University – CIS Dept. CIS331– Principles of Database Systems

Pitfalls deletion anomaly:

delete the last record of ‘smith’ (we lose his address!)

Ssn c-id Grade Name Address

123 cs331 A smith Main

123 cs351 B smith Main

123 cs211 A smith Main

Page 10: Temple University – CIS Dept. CIS331– Principles of Database Systems

Solution: decomposition split offending table in two (or more),

e.g.:Ssn c-id Grade Name Address123 cs331A smith Main123 cs351B smith Main123 cs211A smith Main

? ?

Page 11: Temple University – CIS Dept. CIS331– Principles of Database Systems

Overview - detailed DB design and normalization

pitfalls of bad design decomposition

lossless join dependency preserving

normal forms

Page 12: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decompositions there are ‘bad’ decompositions we want:

lossless and dependency preserving

Page 13: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decompositions - lossy:R1(ssn, grade, name, address) R2(c-

id,grade)

Ssn c-id Grade Name Address123 cs331 A smith Main123 cs351 B smith Main234 cs211 A jones Forbes

ssn->name, address

ssn, c-id -> grade

Ssn Grade Name Address123 A smith Main123 B smith Main234 A jones Forbes

c-id Gradecs331 Acs351 Bcs211 A

Page 14: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decompositions - lossy:can not recover original table with a

join!

Ssn c-id Grade Name Address123 cs331 A smith Main123 cs351 B smith Main234 cs211 A jones Forbes

ssn->name, address

ssn, c-id -> grade

Ssn Grade Name Address123 A smith Main123 B smith Main234 A jones Forbes

c-id Gradecs331 Acs351 Bcs211 A

Page 15: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decompositions – lossy: Another example

Decomposition of R = (A, B) into R1 = (A), R2 = (B)

A B

121

A

B

12

rA(r) B(r)

A (r) B (r) A B

1212

Page 16: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decompositionsexample of non-dependency preserving

S# address status123 London E125 Paris E234 Pitts. A

S# -> address, status

address -> status

S# status123 E125 E234 A

S# address123 London125 Paris234 Pitts.

S# -> address S# -> status

Page 17: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decompositionsis it lossless?S# address status123 London E125 Paris E234 Pitts. A

S# -> address, status

address -> status

S# status123 E125 E234 A

S# address123 London125 Paris234 Pitts.

S# -> address S# -> status

Page 18: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decompositions - losslessDefinition: Consider schema R, with FD ‘F’. R1, R2 is a lossless join decomposition

of R if we always have:

An easier criterion?rrr 21

Page 19: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decomposition - losslessTheorem: lossless join decomposition if the

joining attribute is a superkey in at least one of the new tables

Formally:

221121RRRorRRR

Page 20: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decomposition - losslessexample:

Ssn c-id Grade Name Address123 cs331 A smith Main123 cs351 B smith Main234 cs211 A jones Forbes

ssn->name, address

ssn, c-id -> grade

Ssn c-id Grade123 cs331 A123 cs351 B234 cs211 A

Ssn Name Address123 smith Main234 jones Forbes

ssn->name, addressssn, c-id -> grade

R1R2

Page 21: Temple University – CIS Dept. CIS331– Principles of Database Systems

Overview - detailed DB design and normalization

pitfalls of bad design decomposition

lossless join decomp. dependency preserving

normal forms

Page 22: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decomposition - depend. pres.

informally: we don’t want the original FDs to span two tables - counter-example:

S# address status123 London E125 Paris E234 Pitts. A

S# -> address, status

address -> status

S# status123 E125 E234 A

S# address123 London125 Paris234 Pitts.

S# -> address S# -> status

Page 23: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decomposition - depend. pres.

dependency preserving decomposition:

S# address status123 London E125 Paris E234 Pitts. A

S# -> address, status

address -> status

S# address123 London125 Paris234 Pitts.

S# -> address address -> status

address statusLondon EParis EPitts. A

(but: S#->status ?)

Page 24: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decomposition - depend. pres.

informally: we don’t want the original FDs to span two tables

more specifically: … the FDs of the canonical cover

Let Fi be the set of dependencies F+ that include only attributes in Ri.

Preferably the decomposition should be dependency preserving, that is, (F1 F2 … Fn)+ = F+

Otherwise, checking updates for violation of functional dependencies may require computing joins expensive

Page 25: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decomposition - depend. pres.

why is dependency preservation good?

S# address123 London125 Paris234 Pitts.

S# -> address address -> status

address statusLondon EParis EPitts. A

S# status123 E125 E234 A

S# address123 London125 Paris234 Pitts.

S# -> address S# -> status

(address->status: ‘lost’)

Page 26: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decomposition - depend. pres.

A: eg., record that ‘Philly’ has status ‘A’

S# address123 London125 Paris234 Pitts.

S# -> address address -> status

address statusLondon EParis EPitts. A

S# status123 E125 E234 A

S# address123 London125 Paris234 Pitts.

S# -> addressS# -> status

(address->status: ‘lost’)

Page 27: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decomposition - depend. pres.

To check if a dependency is preserved in a decomposition of R into R1, R2, …, Rn we apply the following test (with attribute closure done w.r.t. F)

result = while (changes to result) do

for each Ri in the decompositiont = (result Ri)+ Riresult = result t

If result contains all attributes in , then functional dependency is preserved

We apply the test on all dependencies in F to check if a decomposition is dependency preserving

The test takes polynomial time Computing F+ and (F1 F2 … Fn)+ needs exponential time

Page 28: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decomposition - conclusions

decompositions should always be lossless joining attribute -> superkey

whenever possible, we want them to be dependency preserving (occasionally, impossible - see ‘STJ’ example later…)

Page 29: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normalization using FD When decomposing a relation schema R with a set of

functional dependencies F into R1, R2,…, Rn we want:

Lossless-join decomposition: otherwise … information loss

No redundancy: relations Ri preferably should be in either Boyce-Codd Normal Form or Third Normal Form

Dependency preservation: Let Fi be the set of dependencies in F+ that include only attributes in Ri.

Preferably the decomposition should be dependency preserving, i.e., (F1 F2 … Fn)+ = F+

Otherwise, checking updates for violation of functional dependencies may require computing joins expensive

Page 30: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normalization using FD - Example

R = (A, B, C)F = {A B, B C)

R1 = (A, B), R2 = (B, C) Lossless-join decomposition:

R1 R2 = {B} and B BC Dependency preserving

R1 = (A, B), R2 = (A, C) Lossless-join decomposition:

R1 R2 = {A} and A AB Not dependency preserving

(cannot check B C without computing R1 R2)

Page 31: Temple University – CIS Dept. CIS331– Principles of Database Systems

Overview - detailed DB design and normalization

pitfalls of bad design decomposition ( how to fix the problem) normal forms ( how to detect the

problem) BCNF, 3NF, (1NF, 2NF)

Page 32: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - BCNFWe saw how to fix ‘bad’ schemas -but what is a ‘good’ schema?

Answer: ‘good’, if it obeys a ‘normal form’,i.e., a set of rules

Typically: Boyce-Codd Normal Form (BCNF)

Page 33: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - BCNFDefn.: Rel. R is in BCNF w.r.t. F, if

informally: everything depends on the full key, and nothing but the key

semi-formally: every determinant (of the cover) is a candidate key

Page 34: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - BCNFExample and counter-example:

Ssn Name Address123 smith Main123 smith Main234 jones Forbes

ssn->name, address

Ssn c-id Grade Name Address123 cs331 A smith Main123 cs351 B smith Main234 cs211 A jones Forbes

ssn->name, address

ssn, c-id -> grade

Page 35: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - BCNFFormally: for every FD a->b in F+

a->b is trivial (a is a superset of b) or a is a superkey (or both)

Page 36: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - BCNFTheorem: given a schema R and a set of

FD ‘F’, we can always decompose it to schemas R1, … Rn, so that R1, … Rn are in BCNF and the decomposition is lossless

(…but, some decomp. might lose dependencies)

Page 37: Temple University – CIS Dept. CIS331– Principles of Database Systems

BCNF Decomposition

How? ….essentially, break off FDs of the cover

eg. TAKES1(ssn, c-id, grade, name, address)ssn -> name, addressssn, c-id -> grade

Page 38: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - BCNFeg. TAKES1(ssn, c-id, grade, name, address)

ssn -> name, address ssn, c-id -> grade

name

addressgradec-id

ssn

Page 39: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - BCNF

Ssn c-id Grade Name Address123 cs331 A smith Main123 cs351 B smith Main234 cs211 A jones Forbes

ssn->name, address

ssn, c-id -> grade

Ssn c-id Grade123 cs331 A123 cs351 B234 cs211 A

Ssn Name Address123 smith Main123 smith Main234 jones Forbes

ssn->name, addressssn, c-id -> grade

Page 40: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - BCNFpictorially: we want a ‘star’ shape

name

addressgradec-id

ssn:not in BCNF

Page 41: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - BCNFpictorially: we want a ‘star’ shape

B

C

A G

E

Dor

F

H

Page 42: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - BCNFor a star-like: (e.g., 2 cand. keys):

STUDENT(ssn, st#, name, address)

name

address

ssn

st#

=

name

address

ssn

st#

Page 43: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - BCNFbut not:

or

B

C

A

D

G

E

D

F

H

Page 44: Temple University – CIS Dept. CIS331– Principles of Database Systems

BCNF Decompositionresult := {R};

done := false;compute F+;while (not done) doif (there is a schema Ri in result that is not in BCNF)then beginlet be a nontrivial functionaldependency that holds on Risuch that Ri is not in F+, and = ; result := (result – Ri) (Ri – ) (, ); endelse done := true;

Note: each Ri is in BCNF, and decomposition is lossless-join

Page 45: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - 3NFconsider the ‘classic’ case:STJ( Student, Teacher, subJect)

T-> JS,J -> T

is it BCNF? S

TJ

Page 46: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - 3NFSTJ( Student, Teacher, subJect)

T-> J S,J -> THow to decompose it to BCNF?

S

TJ

Page 47: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - 3NF

STJ( Student, Teacher, subJect)T-> J S,J -> T

1) R1(T,J) R2(S,J) (BCNF? - lossless? - dep. pres.? )

2) R1(T,J) R2(S,T) (BCNF? - lossless? - dep. pres.? )

Page 48: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - 3NF

STJ( Student, Teacher, subJect)T-> J S,J -> T

1) R1(T,J) R2(S,J) (BCNF? Y+Y - lossless? N - dep. pres.? N )

2) R1(T,J) R2(S,T) (BCNF? Y+Y - lossless? Y - dep. pres.? N )

Page 49: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - 3NFSTJ( Student, Teacher, subJect)

T-> J S,J -> Tin this case: impossible to have both BCNF and dependency preservation

Welcome 3NF (…a weaker normal form)!

Page 50: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - 3NFSTJ( Student, Teacher, subJect)

T-> J S,J -> T

S

JT

informally, 3NF ‘forgives’ the red arrow in the can. cover

Page 51: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - 3NFSTJ( Student,

Teacher, subJect)T-> J S,J -> T

S

JT

Formally, a rel. R with FDs ‘F’ is in 3NF if: for every a->b in F+:•it is trivial or•a is a superkey or•each b-a attr.: part of a cand. key

Page 52: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - 3NF R = (J, K, L)

F = {JK L, L K}Two candidate keys = JK and JL

R is not in BCNF Any decomposition of R will fail to preserve

JK L BCNF decomposition has (JL) and (LK)

Testing for JK L requires a join

R is in 3NFJK L JK is a superkeyL K K is contained in a candidate key

There is some redundancy in this schema…

Page 53: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - 3NF TESTING FOR 3NF Optimization: Need to check only FDs in F, need

not check all FDs in F+

Use attribute closure to check, for each dependency , if is a superkey

If is not a superkey, we have to verify if each attribute in is contained in a candidate key of R this test is more expensive; it involves finding

candidate keys testing for 3NF is NP-hard Interestingly, decomposition into 3NF (described

shortly) can be done in polynomial time

Page 54: Temple University – CIS Dept. CIS331– Principles of Database Systems

Decomposition into 3NF Let Fc be a canonical cover for F;

i := 0;for each functional dependency in Fc doif none of the schemas Rj, 1 j i contains then begini := i + 1;Ri := endif none of the schemas Rj, 1 j i contains a candidate key for Rthen begini := i + 1;Ri := any candidate key for R;end return (R1, R2, ..., Ri)

The dependencies are preserved by building explicitly a schema for each given dependency

Guarantees a lossless-join decomposition by having at least one schema containing a candidate key for the schema being decomposed

Page 55: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - 3NFhow to bring a schema to 3NF?

In short ….for each FD in the cover, put it in a

table

Page 56: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - 3NF vs BCNF

If ‘R’ is in BCNF, it is always in 3NF (but not the reverse)

In practice, aim for BCNF; lossless join; and dep. preservation

if impossible, we accept 3NF; but insist on lossless join and dep.

preservation 3NF has problems with transitive

dependecies

Page 57: Temple University – CIS Dept. CIS331– Principles of Database Systems

3NF vs BCNF (cont.) Example of problems due to redundancy in 3NF

R = (J, K, L)F = {JK L, L K}

A schema that is in 3NF but not in BCNF has the problems of repetition of information (e.g., the relationship l1, k1) need to use null values (e.g., to represent the relationship

l2, k2 where there is no corresponding value for J).

J L Kj1

j2

j3

null

l1

l1

l1

l2

k1

k1

k1

k2

Page 58: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - more details

why ‘3’NF? what is 2NF? 1NF? 1NF: attributes are atomic (i.e., no

set-valued attr., a.k.a. ‘repeating groups’)Ssn Name Dependents123 Smith Peter

MaryJohn

234 Jones AnnMichael

not 1NF

Page 59: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - more details

2NF: 1NF and non-key attr. fully depend on the keycounter-example: TAKES1(ssn, c-id, grade, name, address)ssn -> name, address ssn, c-id -> grade

name

addressgradec-id

ssn

Page 60: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - more details

3NF: 2NF and no transitive dependencies counter-example:

B

C

A

Din 2NF, but not in 3NF

Page 61: Temple University – CIS Dept. CIS331– Principles of Database Systems

Normal forms - more details

4NF, multivalued dependencies etc:

later… in practice, E-R diagrams usually

lead to tables in BCNF

Page 62: Temple University – CIS Dept. CIS331– Principles of Database Systems

Overview - conclusionsDB design and normalization

pitfalls of bad design decompositions (lossless, dep.

preserving) normal forms (BCNF or 3NF)

“everything should depend on the key, the whole key, and nothing but the key”


Recommended