Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | suzanna-perry |
View: | 225 times |
Download: | 7 times |
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Chapter 3
Database Normalization
1
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Database Design Theory
Different Levels of Anomaly Problems
Normalization
2
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Anomaly Problems
3
S #
Salary
STATUS CITY P # QTY
S1 40000
20 LONDON P1 300
S1 40000
20 LONDON P2 200
S1 40000
20 LONDON P3 400
S1 40000
20 LONDON P4 200
S1 40000
20 LONDON P5 100
S1 40000
20 LONDON P6 100
S2 30000
10 PARIS P1 300
S2 30000
10 PARIS P2 400
S3 30000
10 PARIS P2 200
S4 40000
20 LONDON P2 200
S4 40000
20 LONDON P4 300
S4 40000
20 LONDON P5 400
Initial
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 4
Deletion/insertion anomalyS # Salary STATUS CITY P # QTY
S1 40000 20 LONDON P1 300
S1 40000 20 LONDON P2 200
S1 40000 20 LONDON P3 400
S1 40000 20 LONDON P4 200
S1 40000 20 LONDON P5 100
S1 40000 20 LONDON P6 100
S2 30000 10 PARIS P1 300
S2 30000 10 PARIS P2 400
S3 30000 10 PARIS P2 200
S4 40000 20 LONDON P2 200
S4 40000 20 LONDON P4 300
S4 40000 20 LONDON P5 400
S5 60000 30 ATHENS -
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Insertion/update anomaly
5
S # Salary STATUS CITY P # QTY
S1 40000 20 LONDON P1 300
S1 40000 20 LONDON P2 200
S1 40000 20 LONDON P3 400
S1 40000 20 LONDON P4 200
S1 40000 20 LONDON P5 100
S1 40000 20 LONDON P6 100
S2 30000 10 PARIS P1 300
S2 30000 10 PARIS P2 400
S3 30000 10 PARIS P2 200
S4 40000 20 LONDON P2 200
S4 40000 20 LONDON P4 300
S4 40000 20 LONDON P5 400
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Further Normalization
The problem of database design involves the decision of a suitable logical structure for that data. In other words, the decision is what relations are needed and what attributes they should use.
Codd defined three Normal Forms ( 1NF, 2NF, 3NF ) to remove some undesirable properties from relations. Later, both Boyce and Codd defined an even stronger Normal Form called Boyce - Codd (BCNF ). Later, Fagin introduced 4NF and finally 5NF ( PJ/NF ).
6
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 7
UNIVERSE OF RELATIONS (NORMALIZED AND UNNORMALIZED)
1NF RELATIONS (NORMALIZED RELATIONS)
2NF RELATIONS
3NF RELATIONS
BCNF RELATIONS
4NF RELATIONS
5NF RELATIONS
PJ/NF
NORMAL FORMS
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Functional Dependencies (FD)
Given a relation R, attribute Y of R is functionally dependent on attribute X of R if each X - value in R has associated with it precisely one Y - value in R (at any one time).
(no X-values are mapped to two Y-values)
8
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Functional Dependencies (FD)
A functional dependency is a special form of integrity constraint. In other words, every legal extension ( tabulation ) of that relation satisfies that constraint.
An attribute Y is said to be fully functionally dependent on X if Y functionally depends on X but not any proper subset of X. From now on, by FD, we mean full FD.
9
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
First Normal Form Relations (1NF)
A relation is said to be 1NF if all underlying domains contain atomic values only. so any normalized relation is in 1NF.
10
G # SNAME STATUS CITYG1 SMITH,
ADAMS20,30
LONDON,ATHENS
G2 JONES,BLAKE
10,30
PARIS
G3 BLAKE 30 PARISG4 CLARK 20 LONDONG5 ADAMS 30 ATHENS
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
First Normal Form Relations (1NF) Normalized (1NF)
11
G # SNAME STATUS CITYG1 SMITH,
20,
LONDON,
G1 SMITH,
20 ATHENS
G1 SMITH,
30 LONDON
G1 SMITH,
30 ATHENS
G1 SMITH,ADAMS
20,30
LONDON,ATHENS
G2 JONES,BLAKE
10,30
PARIS
G3 BLAKE 30 PARISG4 CLARK 20 LONDONG5 ADAMS 30 ATHENS
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
First Normal Form Relations (1NF)
First Normal Form Relations(1 NF)
All relations will be in 1NF
12
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
First Normal Form Relations (1NF)
First
13
S # STATUS CITY P # QTYS1 20 LONDON P1 300
S1 20 LONDON P2 200
S1 20 LONDON P3 400
S1 20 LONDON P4 200
S1 20 LONDON P5 100
S1 20 LONDON P6 100
S2 10 PARIS P1 300
S2 10 PARIS P2 400
S3 10 PARIS P2 200
S4 20 LONDON P2 200
S4 20 LONDON P4 300
S4 20 LONDON P5 400
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Functional Dependencies In The Relation First
We can verify the FD by SQL; but this is merely a NECESSARY condition (SEE “group by” in Ch6)
14
P#
S#
CITY
STATUS
QTY
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Second Normal Form (2NF)
A relation is in 2NF if it is in 1NF and every nonkey (not part of CK) attribute is fully functionally dependent (ffd) on the primary key.
W=a * Sin X + b * Cos Y
(a and b are two parameters)
W is ffd on X and Y, if both a and b are on-zero
W is not ffd on X and Y, if one of a and b
are zero; W=0 * Sin X + b * Cos Y 15
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
BCNF (Boyce-Codd Normal Form)
For Relations with Equal or More Than One Candidate Key,
A relation R is said to be in BCNF if and only if every determinant is a candidate key.
A determinant is an attribute, possibly composite, on which some other attribute is fully functionally dependent.
16
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
2NF And SP
17
S#
STATUS
CITY
S#
P#
QTY
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
2NF and SP
18
S # STATUS CITYS1 20 LONDONS2 10 PARISS3 10 PARISS4 20 LONDONS5 30 ATHENS
S # P # QTYS1 P1 300S1 P2 200S1 P3 400S1 P4 200S1 P5 100S1 P6 100S2 P1 300S2 P2 400S3 P2 200S4 P2 200S4 P4 300S4 P5 400
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
2NF and SP
19
S #
STATUS CITY AMSTERDAM
S1 20 LONDON S2 10 PARIS S3 10 PARIS S4 20 LONDON S5 30 ATHENS S
#STATUS CITY
S1 20 LONDONS2 10 PARISS3 10 PARISS4 20 LONDONS5 30 ATHENS
Insertion anomaly is fixed
Update anomaly is fixed
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
2NF and SP
20
S # P # QTYS1 P1 300S1 P2 200S1 P3 400S1 P4 200S1 P5 100S1 P6 100S2 P1 300S2 P2 400S3 P2 200S4 P2 200S4 P4 300S4 P5 400
Deletion anomaly is fixed
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
“Degree Two” Problems
Second (Update, deletion and insertion anomaly)
21
S # STATUS CITYS1 20 LONDONS2 10 PARISS3 10 PARISS4 20 LONDON 60 ROME
S # P # QTYS1 P1 300S1 P2 200S1 P3 400S1 P4 200S1 P5 100S1 P6 100S2 P1 300S2 P2 400S3 P2 200S4 P2 200S4 P4 300S4 P5 400
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Functional Dependencies In The Third Normal Form (3NF)
Definition 1 A relation is in 3NF if it is in 2NF and every
non-key attribute is non transitively dependent on the candidate key.
Definition 2 A relation is in 3NF if for every non-trivial FD,
it either starts from super-key or end at part of the CK.
22
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Functional Dependencies In The Third Normal Form (3NF)
Definition 3 A relation is in 3NF iff the non-key attributes of
R are
a) mutually independent
b) fully dependent on the primary key of R.
Definition 3 (In other words) A relation R is in 3NF if, for all time, each tuple
consists of a primary key value that identifies some entity, together with a set of zero or more mutually independent attribute values that describe that entity in some way.
23
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Sample Tabulations OfSC and CS
24
S # CITYS1 LONDONS2 PARISS3 PARISS4 LONDONS5 ATHENS
CITY STATUSATHENS 30LONDON 20PARIS 10
SC
CS
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Functional DependenciesIn The Relations SC and CS
25
S# CITY
CITY STATUS
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Another set of examples (Skip 2012)
26
LOTS
PROPERTY_ID # COUNTY_NAME LOT # AREA PRICE TAX_RATE
fd1
fd2
fd3
fd4
PROPERTY_ID # COUNTY_NAME LOT # AREA PRICE
fd1
fd2
fd4
fd3
LOTS1
LOTS2
COUNTY_NAME TAX_RATE
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Another set of examples (Skip 2012)
27
LOTS1
PROPERTY_ID # COUNTY_NAME LOT # AREA PRICE
fd1
fd2
fd4LOTS2
COUNTY_NAME TAX_RATE
fd3
LOTS1A
PROPERTY_ID # COUNTY_NAME LOT # AREA
LOTS1B
AREA PRICE
fd1
fd2fd4
LOTS
LOTS1 LOTS2
LOTS1A LOTS1BLOTS2
1NF
2NF
3NF
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Another set of examples (Skip 2012)
Figure 13.11 Example to illustrate normalization to 2NF and 3NF. (a) The LOTS relation schema and its
functional dependencies fd1 through fd4.
(b) Decomposing LOTS into the 2NF relations LOTS1 and LOTS2.
(c) Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B.
(d) Summary of normalization of LOTS.
28
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Boyce-Codd Normal Form (BCNF)
Codd did not deal satisfactorily, in 3NF, with the case of a relation that (a) had multiple CKs
(b) CKs were composite
(c) CKs overlapped
The 3NF was subsequently replaced by BCNF.
29
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Boyce-Codd Normal Form (BCNF)
Relations with Equal or More Than One Candidate Key
A relation R is said to be in BCNF iff every determinant is a candidate key.
A determinant is an attribute, possibly composite, on which some other attribute is fully functionally dependent.
30
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Boyce-Codd Normal Form (BCNF)
Consider a relation SJT with attributes S(student), J(subject), and T(teacher). The meaning of the tuple (s,j,t) is that student s is taught subject j by teacher t. Suppose, in addition, that the following constraints apply. For each subject, each student of that
subject is taught by only one teacher.
Each teacher teaches only one subject.
Each subject is taught by several teachers.
31
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Boyce-Codd Normal Form (BCNF)
Problem If we delete the student 'Jones' and the subject
'Physics', we will lose the information that 'Brown' teaches 'Physics' (Professor get fired?).
Solution Split SJT into
ST (S,T) and TJ (T, J)
This decomposition avoids the above problem but introduces different problems, what are they? What are the candidate keys?
32
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Sample Tabulation Of The Relation SJT
S J TSMITH MATH Prof. WHITESMITH PHYSICS Prof. GREENJONES MATH Prof. WHITEJONES PHYSICS Prof.
BROWN
33
S
J
T
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Sample Tabulations
34
J TMATH Prof. WHITEPHYSICS Prof. GREENPHYSICS Prof.
BROWN
S TSMITH Prof. WHITESMITH Prof. GREENJONES Prof. WHITEJONES Prof.
BROWN
JTST
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Boyce-Codd Normal Form (BCNF)
Consider the relation EXAM with overlapping candidate keys (S, J) and (J, P), and with attributes S (student), J (subject), and P (position). The meaning of an EXAM tuple (s, j, p) is that student s was examined in subject j and achieved position P in the class list. Let us assume that the following constraint holds.
There are no ties; that is, no two students obtained the same position in the same subject.
35
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Boyce-Codd Normal Form (BCNF)
Note that update anomalies such as those associated with relation SJT do not apply to relation EXAM, Why?
Overlapping candidate keys do not necessarily lead to problems. In what normal form is relation EXAM?
36
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Sample Tabulation Of SJP
S was examined in subject J and achieved position P
There are no ties; no students obtained The same position in the same subject
37
S J PSMITH MATH FIRST (M)SMITH PHYSICS FIRST (P)JONES MATH SECOND (M)JONES PHYSICS SECOND (P)
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Boyce-Codd Normal Form (BCNF)
Illustrating BCNF: (a) BCNF normalization
with the dependency of fd2 being "lost" in the decomposition.
(b) A relation R in 3NF but not in BCNF.
38
LOTS1A
PROPERTY_ID # COUNTY_NAME LOT# AREA
fd1
fd2
fd5
BCNF Normalization
LOTS1AX
R
A B C
fd1
fd2
LOTS1AY
AREA COUNTY_NAMEPROPERTY_ID # AREA LOT #
(b)
(a)
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Good and Bad Decomposition
In decomposition (A), the two projections are independent of one another, in the following sense :
Updates can be made to either one without regard for the other, provided that it does not violate the primary key uniqueness constraint for that projection. Actually, if attribute CITY of relation SC is
regarded as a foreign key matching the primary key CITY of relation CS, then a certain amount of cross - checking between the two projections will be required on updates after all 39
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Independent Components In decomposition (B), by contrast, update
to either of the two projections must be monitored to ensure that the FD CITY STATUS is not violated (if two suppliers have the same city, they must have the same status). Consider, for example : What is involved in decomposition (B) in
moving supplier S1 from London to Paris ?
In decomposition (B), the FD CITY STATUS has become an inter-relational constraint.
40
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Independent Components
Rissanen shows that projections R1 and R2 of a relation R are independent if and only if 1. Every FD in R can be logically deduced from
those in R1 and R2, and
2. The common attributes of R1 and R2 form a candidate key for at least one of the pair.
Recall the relation SJT with its two projections ST (S, T) and TJ (T,J)
These two projections are not independent. By Rissanen's Theorem, the FD (S, J) T cannot be deduced from the FD TJ
41
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015
Independent Components
Relations which cannot be decomposed into independent components are said to be atomic. Thus, SJT is atomic, even though it is not in BCNF.
Unfortunately, we are forced to the unpleasant conclusion that the two objections of decomposing a relation into BCNF components and decomposing it into independent components may occasionally be in conflict.
42