Temple University – CIS Dept.CIS331– Principles of Database Systems
V. Megalooikonomou
Database Design and Normalization
(based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU)
Overview Relational model
formal query languages commercial query languages (SQL)
Integrity constraints domain I.C., foreign keys functional dependencies
Functional Dependencies DB design and normalization
Overview - detailed DB design and normalization
pitfalls of bad design decomposition normal forms
Design ‘good’ tables sub-goal#1: define what ‘good’ means sub-goal#2: fix ‘bad’ tables
in short: “we want tables where the attributes
depend on the primary key, on the whole key, and nothing but the key”
Let’s see why, and how:
Goal
Pitfallstakes1 (ssn, c-id, grade, name,
address)Ssn c-id Grade Name Address
123cs331 A smith Main
Pitfalls‘Bad’ - why? because: ssn->address, name
Ssn c-id Grade Name Address
123 cs331 A smith Main
123 cs351 B smith Main
123 cs211 A smith Main
Pitfalls Redundancy
space (inconsistencies) insertion/deletion anomalies:
Pitfalls insertion anomaly:
“jones” registers, but takes no class - no place to store his address!
Ssn c-id Grade Name Address
123 cs331 A smith Main
… … … … …
234 null null jones Forbes
Pitfalls deletion anomaly:
delete the last record of ‘smith’ (we lose his address!)
Ssn c-id Grade Name Address
123 cs331 A smith Main
123 cs351 B smith Main
123 cs211 A smith Main
Solution: decomposition split offending table in two (or more),
e.g.:Ssn c-id Grade Name Address123 cs331A smith Main123 cs351B smith Main123 cs211A smith Main
? ?
Overview - detailed DB design and normalization
pitfalls of bad design decomposition
lossless join dependency preserving
normal forms
Decompositions there are ‘bad’ decompositions we want:
lossless and dependency preserving
Decompositions - lossy:R1(ssn, grade, name, address) R2(c-
id,grade)
Ssn c-id Grade Name Address123 cs331 A smith Main123 cs351 B smith Main234 cs211 A jones Forbes
ssn->name, address
ssn, c-id -> grade
Ssn Grade Name Address123 A smith Main123 B smith Main234 A jones Forbes
c-id Gradecs331 Acs351 Bcs211 A
Decompositions - lossy:can not recover original table with a
join!
Ssn c-id Grade Name Address123 cs331 A smith Main123 cs351 B smith Main234 cs211 A jones Forbes
ssn->name, address
ssn, c-id -> grade
Ssn Grade Name Address123 A smith Main123 B smith Main234 A jones Forbes
c-id Gradecs331 Acs351 Bcs211 A
Decompositions – lossy: Another example
Decomposition of R = (A, B) into R1 = (A), R2 = (B)
A B
121
A
B
12
rA(r) B(r)
A (r) B (r) A B
1212
Decompositionsexample of non-dependency preserving
S# address status123 London E125 Paris E234 Pitts. A
S# -> address, status
address -> status
S# status123 E125 E234 A
S# address123 London125 Paris234 Pitts.
S# -> address S# -> status
Decompositionsis it lossless?S# address status123 London E125 Paris E234 Pitts. A
S# -> address, status
address -> status
S# status123 E125 E234 A
S# address123 London125 Paris234 Pitts.
S# -> address S# -> status
Decompositions - losslessDefinition: Consider schema R, with FD ‘F’. R1, R2 is a lossless join decomposition
of R if we always have:
An easier criterion?rrr 21
Decomposition - losslessTheorem: lossless join decomposition if the
joining attribute is a superkey in at least one of the new tables
Formally:
221121RRRorRRR
Decomposition - losslessexample:
Ssn c-id Grade Name Address123 cs331 A smith Main123 cs351 B smith Main234 cs211 A jones Forbes
ssn->name, address
ssn, c-id -> grade
Ssn c-id Grade123 cs331 A123 cs351 B234 cs211 A
Ssn Name Address123 smith Main234 jones Forbes
ssn->name, addressssn, c-id -> grade
R1R2
Overview - detailed DB design and normalization
pitfalls of bad design decomposition
lossless join decomp. dependency preserving
normal forms
Decomposition - depend. pres.
informally: we don’t want the original FDs to span two tables - counter-example:
S# address status123 London E125 Paris E234 Pitts. A
S# -> address, status
address -> status
S# status123 E125 E234 A
S# address123 London125 Paris234 Pitts.
S# -> address S# -> status
Decomposition - depend. pres.
dependency preserving decomposition:
S# address status123 London E125 Paris E234 Pitts. A
S# -> address, status
address -> status
S# address123 London125 Paris234 Pitts.
S# -> address address -> status
address statusLondon EParis EPitts. A
(but: S#->status ?)
Decomposition - depend. pres.
informally: we don’t want the original FDs to span two tables
more specifically: … the FDs of the canonical cover
Let Fi be the set of dependencies F+ that include only attributes in Ri.
Preferably the decomposition should be dependency preserving, that is, (F1 F2 … Fn)+ = F+
Otherwise, checking updates for violation of functional dependencies may require computing joins expensive
Decomposition - depend. pres.
why is dependency preservation good?
S# address123 London125 Paris234 Pitts.
S# -> address address -> status
address statusLondon EParis EPitts. A
S# status123 E125 E234 A
S# address123 London125 Paris234 Pitts.
S# -> address S# -> status
(address->status: ‘lost’)
Decomposition - depend. pres.
A: eg., record that ‘Philly’ has status ‘A’
S# address123 London125 Paris234 Pitts.
S# -> address address -> status
address statusLondon EParis EPitts. A
S# status123 E125 E234 A
S# address123 London125 Paris234 Pitts.
S# -> addressS# -> status
(address->status: ‘lost’)
Decomposition - depend. pres.
To check if a dependency is preserved in a decomposition of R into R1, R2, …, Rn we apply the following test (with attribute closure done w.r.t. F)
result = while (changes to result) do
for each Ri in the decompositiont = (result Ri)+ Riresult = result t
If result contains all attributes in , then functional dependency is preserved
We apply the test on all dependencies in F to check if a decomposition is dependency preserving
The test takes polynomial time Computing F+ and (F1 F2 … Fn)+ needs exponential time
Decomposition - conclusions
decompositions should always be lossless joining attribute -> superkey
whenever possible, we want them to be dependency preserving (occasionally, impossible - see ‘STJ’ example later…)
Normalization using FD When decomposing a relation schema R with a set of
functional dependencies F into R1, R2,…, Rn we want:
Lossless-join decomposition: otherwise … information loss
No redundancy: relations Ri preferably should be in either Boyce-Codd Normal Form or Third Normal Form
Dependency preservation: Let Fi be the set of dependencies in F+ that include only attributes in Ri.
Preferably the decomposition should be dependency preserving, i.e., (F1 F2 … Fn)+ = F+
Otherwise, checking updates for violation of functional dependencies may require computing joins expensive
Normalization using FD - Example
R = (A, B, C)F = {A B, B C)
R1 = (A, B), R2 = (B, C) Lossless-join decomposition:
R1 R2 = {B} and B BC Dependency preserving
R1 = (A, B), R2 = (A, C) Lossless-join decomposition:
R1 R2 = {A} and A AB Not dependency preserving
(cannot check B C without computing R1 R2)
Overview - detailed DB design and normalization
pitfalls of bad design decomposition ( how to fix the problem) normal forms ( how to detect the
problem) BCNF, 3NF, (1NF, 2NF)
Normal forms - BCNFWe saw how to fix ‘bad’ schemas -but what is a ‘good’ schema?
Answer: ‘good’, if it obeys a ‘normal form’,i.e., a set of rules
Typically: Boyce-Codd Normal Form (BCNF)
Normal forms - BCNFDefn.: Rel. R is in BCNF w.r.t. F, if
informally: everything depends on the full key, and nothing but the key
semi-formally: every determinant (of the cover) is a candidate key
Normal forms - BCNFExample and counter-example:
Ssn Name Address123 smith Main123 smith Main234 jones Forbes
ssn->name, address
Ssn c-id Grade Name Address123 cs331 A smith Main123 cs351 B smith Main234 cs211 A jones Forbes
ssn->name, address
ssn, c-id -> grade
Normal forms - BCNFFormally: for every FD a->b in F+
a->b is trivial (a is a superset of b) or a is a superkey (or both)
Normal forms - BCNFTheorem: given a schema R and a set of
FD ‘F’, we can always decompose it to schemas R1, … Rn, so that R1, … Rn are in BCNF and the decomposition is lossless
(…but, some decomp. might lose dependencies)
BCNF Decomposition
How? ….essentially, break off FDs of the cover
eg. TAKES1(ssn, c-id, grade, name, address)ssn -> name, addressssn, c-id -> grade
Normal forms - BCNFeg. TAKES1(ssn, c-id, grade, name, address)
ssn -> name, address ssn, c-id -> grade
name
addressgradec-id
ssn
Normal forms - BCNF
Ssn c-id Grade Name Address123 cs331 A smith Main123 cs351 B smith Main234 cs211 A jones Forbes
ssn->name, address
ssn, c-id -> grade
Ssn c-id Grade123 cs331 A123 cs351 B234 cs211 A
Ssn Name Address123 smith Main123 smith Main234 jones Forbes
ssn->name, addressssn, c-id -> grade
Normal forms - BCNFpictorially: we want a ‘star’ shape
name
addressgradec-id
ssn:not in BCNF
Normal forms - BCNFpictorially: we want a ‘star’ shape
B
C
A G
E
Dor
F
H
Normal forms - BCNFor a star-like: (e.g., 2 cand. keys):
STUDENT(ssn, st#, name, address)
name
address
ssn
st#
=
name
address
ssn
st#
Normal forms - BCNFbut not:
or
B
C
A
D
G
E
D
F
H
BCNF Decompositionresult := {R};
done := false;compute F+;while (not done) doif (there is a schema Ri in result that is not in BCNF)then beginlet be a nontrivial functionaldependency that holds on Risuch that Ri is not in F+, and = ; result := (result – Ri) (Ri – ) (, ); endelse done := true;
Note: each Ri is in BCNF, and decomposition is lossless-join
Normal forms - 3NFconsider the ‘classic’ case:STJ( Student, Teacher, subJect)
T-> JS,J -> T
is it BCNF? S
TJ
Normal forms - 3NFSTJ( Student, Teacher, subJect)
T-> J S,J -> THow to decompose it to BCNF?
S
TJ
Normal forms - 3NF
STJ( Student, Teacher, subJect)T-> J S,J -> T
1) R1(T,J) R2(S,J) (BCNF? - lossless? - dep. pres.? )
2) R1(T,J) R2(S,T) (BCNF? - lossless? - dep. pres.? )
Normal forms - 3NF
STJ( Student, Teacher, subJect)T-> J S,J -> T
1) R1(T,J) R2(S,J) (BCNF? Y+Y - lossless? N - dep. pres.? N )
2) R1(T,J) R2(S,T) (BCNF? Y+Y - lossless? Y - dep. pres.? N )
Normal forms - 3NFSTJ( Student, Teacher, subJect)
T-> J S,J -> Tin this case: impossible to have both BCNF and dependency preservation
Welcome 3NF (…a weaker normal form)!
Normal forms - 3NFSTJ( Student, Teacher, subJect)
T-> J S,J -> T
S
JT
informally, 3NF ‘forgives’ the red arrow in the can. cover
Normal forms - 3NFSTJ( Student,
Teacher, subJect)T-> J S,J -> T
S
JT
Formally, a rel. R with FDs ‘F’ is in 3NF if: for every a->b in F+:•it is trivial or•a is a superkey or•each b-a attr.: part of a cand. key
Normal forms - 3NF R = (J, K, L)
F = {JK L, L K}Two candidate keys = JK and JL
R is not in BCNF Any decomposition of R will fail to preserve
JK L BCNF decomposition has (JL) and (LK)
Testing for JK L requires a join
R is in 3NFJK L JK is a superkeyL K K is contained in a candidate key
There is some redundancy in this schema…
Normal forms - 3NF TESTING FOR 3NF Optimization: Need to check only FDs in F, need
not check all FDs in F+
Use attribute closure to check, for each dependency , if is a superkey
If is not a superkey, we have to verify if each attribute in is contained in a candidate key of R this test is more expensive; it involves finding
candidate keys testing for 3NF is NP-hard Interestingly, decomposition into 3NF (described
shortly) can be done in polynomial time
Decomposition into 3NF Let Fc be a canonical cover for F;
i := 0;for each functional dependency in Fc doif none of the schemas Rj, 1 j i contains then begini := i + 1;Ri := endif none of the schemas Rj, 1 j i contains a candidate key for Rthen begini := i + 1;Ri := any candidate key for R;end return (R1, R2, ..., Ri)
The dependencies are preserved by building explicitly a schema for each given dependency
Guarantees a lossless-join decomposition by having at least one schema containing a candidate key for the schema being decomposed
Normal forms - 3NFhow to bring a schema to 3NF?
In short ….for each FD in the cover, put it in a
table
Normal forms - 3NF vs BCNF
If ‘R’ is in BCNF, it is always in 3NF (but not the reverse)
In practice, aim for BCNF; lossless join; and dep. preservation
if impossible, we accept 3NF; but insist on lossless join and dep.
preservation 3NF has problems with transitive
dependecies
3NF vs BCNF (cont.) Example of problems due to redundancy in 3NF
R = (J, K, L)F = {JK L, L K}
A schema that is in 3NF but not in BCNF has the problems of repetition of information (e.g., the relationship l1, k1) need to use null values (e.g., to represent the relationship
l2, k2 where there is no corresponding value for J).
J L Kj1
j2
j3
null
l1
l1
l1
l2
k1
k1
k1
k2
Normal forms - more details
why ‘3’NF? what is 2NF? 1NF? 1NF: attributes are atomic (i.e., no
set-valued attr., a.k.a. ‘repeating groups’)Ssn Name Dependents123 Smith Peter
MaryJohn
234 Jones AnnMichael
not 1NF
Normal forms - more details
2NF: 1NF and non-key attr. fully depend on the keycounter-example: TAKES1(ssn, c-id, grade, name, address)ssn -> name, address ssn, c-id -> grade
name
addressgradec-id
ssn
Normal forms - more details
3NF: 2NF and no transitive dependencies counter-example:
B
C
A
Din 2NF, but not in 3NF
Normal forms - more details
4NF, multivalued dependencies etc:
later… in practice, E-R diagrams usually
lead to tables in BCNF
Overview - conclusionsDB design and normalization
pitfalls of bad design decompositions (lossless, dep.
preserving) normal forms (BCNF or 3NF)
“everything should depend on the key, the whole key, and nothing but the key”