Date post: | 28-Mar-2015 |
Category: |
Documents |
Upload: | adrian-gallagher |
View: | 212 times |
Download: | 0 times |
1
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Normalisation
5
2
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Outline
Revision: FD, 2NF, 3NF Boyce-Codd Normal Form (BCNF) normalisation
non-loss decomposition Heath’s theorem normalisation process
• semantic assumptions and FDs• CKs• decomposition
normalisation vs dependency preservation• a decomposition may yield to a better solution than another one• either-or situations: normalise or preserve FDs
3
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Functional dependency (FD)
R - relation, X and Y - subsets of attributes of R
X Y iff
in every possible legal value of R each X-value has a single Y-value associated
4
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Functional dependency (FD)
In particular:
Every attribute is dependent on any CK
5
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Full functional dependency (FFD)
R - relation, X and Y - subsets of attributes of R
Y is fully functionally dependent on X iff
(1) X Y(2) there is no X’ subset of X such that
X’ Y
Alternative formulation: the dependency X Y is left-irreducable
6
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Full functional dependency (FFD)
Is every attribute fully dependent on any CK?
Only in 2NF!
7
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
2NF - “simple” definitions
Class: (for a relation with only one CK) - no FD from a subset of the PK to a non-key attribute
Book: (“based only on PK”) - every non-primary-key attribute is fully functionally dependent on the PK
Are they equivalent?
8
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
2NF - general definitions
Class: all non-key attributes are irreducibly dependent on the candidate keys
Book: every non-key attribute is fully functionally dependent on any CK
Are they equivalent?
9
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Transitive FD
If A,B,C are attributes of a relation such thatA B and B C, then C is transitively dependent on A via B (provided that neither B A, nor C A).
10
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
3NF - “simple” definitions
Class: (for a relation with only one CK) - no FD between non-key attributes
Book: (“based only on PK”) - no non-primary-key attribute is transitively dependent on the PK
Are they equivalent?
11
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
3NF - general definition
No non-key attribute is transitively dependent on any CK
12
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
1
13
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
2NF and 3NF
optional 2NF
• a relation is in 2NF if and only if it is in 1NF and all non-key attributes are irreducibly dependent on the candidate keys
3NF (Zaniolo)• R is a relation; X is any set of attributes of R; A is any single
attribute of R; consider the following conditions:– X contains A– X contains a candidate key of R– A is contained in a candidate key of R
• if either of the three is true for every FD X A then R is in 3NF
14
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
15
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
BCNF
a relation is in Boyce/Codd normal form (BCNF) if and only if every non-trivial irreducible FD has a candidate key as its determinant
informally the determinant of each relevant FD is a CK
16
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
17
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
18
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Example
devise examples in class relations in BCNF relations not in BCNF
19
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
BCNF
any relation can be non-loss decomposed into an equivalent set of BCNF relations
BCNF 3NF 2NF 1NF BCNF is still not guaranteed to be free of any update
anomalies
20
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
2
21
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Normalisation
the process of transforming a relation with redundancies into an “equivalent” set of relations that have less redundancies
“transformation” projection input :: one relation, say R output :: many relations, say R1, …, Rn
“equivalent” non-loss decomposition R1 join R2 … join Rn = R
R1, …, Rn should have normal forms higher than or equal to that of R
22
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Projection (Relational Algebra)
If R=(A1,A2,...An)(i.e. R is a relation with attributes A1...An)
and if X is a subset of {A1,A2,...An} then we obtain the projection of R on X by simply keeping the values of R for each attribute in X, and removing the otherattributes from the relation.
Trivially, the degree of the projection is determined by the number of elements in X. What about the cardinality of the projection?
23
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Natural Join (Relational Algebra)
If R and S are relations, we can join them by extending one table to contain all attributes
(horizontally) we include only the tuples in which all common attributes have equal values
(vertically) we include only one of the columns for each common attribute (i.e. repeated columns are discarded).
The resulting table defines a relation which is the natural joinof R and S.We may write T = R join S
24
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Non-loss decomposition
“Split” a given relation to projections, such that the natural join of these projections is equal to the original relation itself.
25
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Non-loss decomposition
(Patient, Symptom, Doctor, Office, Diagnosis)• semantic assumptions
exercise
26
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Lossy Decomposition
(Patient, Symptom, Doctor, Office, Diagnosis)• semantic assumptions
exercise
27
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Heath’s theorem
can be used as the basis for normalisation theorem
suppose• R = (A, B, C), where A, B and C are disjoint sets of attributes
• AB
then • R = (A, B) join (A, C)
state “in English” prove
28
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Normalisation – rules of thumb
take as basis for normalisation/Heath’s theorem a “problem” FD
maximise B when applying Heath’s theorem, on the basis of AB
try to maintain a one-to-one correspondence with real life entities
29
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Normalisation
steps semantic assumptions FDs CKs decomposition
30
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Simple example
(M_id, M_name, Type, Value)• M_id M_name
• M_id Type
• M_id Value
• Type Value
• not BCNF
Heath’s theorem for Type Value results
(Type, Value) (M_id, M_name, Type)
both relations are now in BCNF
31
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
3
32
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Example (R)
(project, task, max-budget, duration, payment-rate, contractor, contr-time)
FDs:
(project, task) max_budget, duration
(task, max_budget, duration) payment_rate
(project, task, contractor) contr_time
(project, task, max-budget, duration, payment-rate, contractor, contr-time)
33
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Example – decomposition for R
Heath’s theorem for R (the initial relation) based on task, max_budget, duration payment_rate
leads to: R1 (task, max_budget, duration, payment_rate)
R2 (project, task, max_budget, duration, contractor, contr_time)
R1 is in BCNF
R2 is not in BCNF, due to
project, task max_budget, duration
34
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Example – decomposition for R2
Heath’s theorem for R2, based on
project, task max_budget, duration leads to R21 (project, task, max_budget, duration)
R22 (project, task, contractor, contracted_time)
R21 is in BCNF
R22 is in BCNF
35
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Example – solution
(task, max_budget, duration, payment_rate) (project, task, max_budget, duration) (project, task, contractor, contracted_time)
36
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
4
37
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Decomposition – 2 or more solutions
in the normalisation process, it may be possible that a certain (non-loss) decomposition yields to a better solution than another one
38
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Decomposition – 2 solutions – example
Modules(M_id, M_name, Type, Value) solution #1
• Modules_Descr(M_id, M_name, Type)
• Type_Val(Type, Val)
solution #2 • Modules_Descr(M_id, M_name, Type)
• Module_Val(M_id, Val)
are they both non-loss? (apply Heath’s theorem) is there one better than the other?
39
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Solution #1 vs Solution #2
updates• u1: insert the fact that a 3 semester module is worth 1.5cu
• u2: modify 1 semester modules; they are not worth 0.5cu any longer, they are 0.75cu
• u3: change the type of a module but forget to change its value
solution #2 • u1 and u2 are impossible or very difficult to perform
• u3 is allowed
solution #1• u1 and u2 are straightforward
• u3 is not allowed
40
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Solution #1 vs Solution #2
solution #1 more expressive
• certain facts cannot be expressed in solution #2; e.g. the value of a new type
• updates can be independently performed on the two component relations (i.e. all constraints are properly expressed)
• in solution #2: Type Value is lost, so this constraint must be enforced by the user by procedural code
independent projections • updates can be performed independently on each projection,
without the danger of ending with inconsistent data
41
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Independent projections
M-id Type Value
Solution #1 Solution #2
M_name
M-id TypeM_name
Type Value
M-id TypeM_name
M_id Value
all direct : intraall transitive : inter
one transitive : intraone direct : lost
42
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Independent projections - Risanen
R1 and R2 are two projections of R; R1 and R2 are independent if and only if
• every FD in R is a logical consequence of the FDs in R1 and R2
• the common attributes of R1 and R2 for a candidate key for at least one of R1 or R2
atomic relation• cannot be decomposed into independent projections
43
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Dependency preservation
R was decomposed (normalisation) into R1, …, Rn
S - the set of FDs for R S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers to
only the attributes of Ri)
S’ = S1 … Sn (usually, S’ S)
the decomposition is dependency preserving if S’+ = S+
44
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
5
45
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Normalisation vs dependency preservation
there are cases when there is an either-or situation regarding the normalisation and the preserving of functional dependencies: either the relation is normalised and some FDs are lost or, some FDs are not lost (they are expressed in the original
relation), but the relation is not in its higher normal form possible
in this case, no solution is better than the other other criteria will have to be considered to judge better
46
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
a patient is treated by a single doctor for a certain disease each doctor only treats one kind of disease a doctor can treat more than one patient
is this relation BCNF? can you identify update anomalies? consider also (Patient, Disease, Doctor, Treatment)
with Patient, Disease Treatment
DiseaseDoctor
Patient
Normalisation vs dependency preservation: Example
Patient Disease Doctor
47
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Possible decompositions
PatientDoctor
PatientDisease
PatientDoctor
DoctorDisease
DiseaseDoctor
PatientDisease
non-loss? (choose PKs)
non-loss? (choose PKs)
Heath’s theorem (choose PKs)
48
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
BCNF vs dependency preservation
Patient Doctor Doctor Diseaseand
do not enforce a FD existing in the original specification, namely:
e.g. a patient can be given two doctors that treat the same disease (the system will not disallow this); the constraint would have to be maintained by procedural code
DiseaseDoctor
Patient
49
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
BCNF vs dependency preservation
not every FD is expressible through normalisation when the relation was in its original form (3NF)
(Patient, Disease) Doctor was expressed• a doctor could not be assigned to more than one patient-disease
Doctor Disease was not expressed • generated update anomalies
in BCNF (decomposed) Doctor Disease was expressed (Patient, Disease) Doctor was not expressed
• generated update anomalies (refer to previous slide)• this latter FD would not have been expressed even if the
decomposition in all three 2-attribute relations had been considered
50
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
–
51
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
Conclusions
normal forms : formalisation of common sense art engineering possibility for automation; difficult, because of non-
determinism (more than one choices at one step)
BCNF always achievable not always free of update anomalies, because it cannot
always express all the FDs existing in the problem
there are higher normal forms (4NF, 5NF) defined on the basis of other concepts (not FDs)
52
Term 2, 2007, Lectures 2/3, Normalisation D. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College
© Pearson Education Limited 1995, 2005
Hierarchy of normal forms