1
Normalization 02
CSE3421 notes
2
Closure of a set of attributes
Given a set of FDs F, find the set of attributes functionally determined by a given set of attributes X. This is called the closure of X, and denoted as X+.
F, X � ? (or, X+ = ? ) Example: F = { A � B, B � C} Closure of A: A+ = ABC
3
Example
F = { C � A,
BC � D,
ACD � B,
D � EG,
AB � C}
AB � ?C
BC
ACD
D
AB
GEDCBA
AB � ABC � ABC(A)D � ABCD(B)EG
Redundant from C
From BC
Redundant from ACDFrom D
4
Closure of a set of FDs
• Given a set of FDs F, find all FDs that can be produced by F. This is called the closure of F, and denoted as F+.
• F+ can be found by applying the Armstrong Axioms.
5
Armstrong Axioms
• (A1) Reflexivity (this produces all trivial FDs)
X Y X Y⊇ ⇒ →• (A2) Augmentation
,X Y XZ YZ Z→ ⇒ → ∀
• (A3) Transitivity
,X Y Y Z X Z→ → ⇒ →
6
Additional rules• Union
,X Y X Z X YZ→ → ⇒ →• Decomposition
X YZ X Y and X Z→ ⇒ → →
By applying (A1), (A2), (A3) repeatedly, we get F+.
7
Example
• Let R: (A, B, C) and F = { A � B, B � C}. F+ = ?
• Apply Armstrong Axioms
• (A1) (reflexivity) produces all trivial FDs, i.e., those whose left-hand-side (LHS) is a superset of the right-hand-side (RHS). For example, AB � A, AB � B, etc).
8
• Non-trivial FDs:– Apply (A3), transitivity:
• A � B with B � C generate A � C.
• (A3) cannot be further applied.
– Apply (A2), augmentation, in all possible permutations.• For A � B: remaining attribute is C. Therefore, AC�BC is in
F+.
• For B�C: add A and get BA�CA. Therefore, AB�AC is in F+.
• For A�C: add B and get AB�BC. Therefore, AB�BC is in F+.
9
So, F+ = { A � B, B � C, all trivial FDs, A � C, AC � BC, AB � AC, AB � BC}
From (A3)
From (A2)
10
Where are we …X+: Closure of set of attributes
F+: closure of set of FDs
MC: minimal cover of F
Lossless join property
Algorithm to compute 3NF
Preservation of dependencies
11
Lossless join
• If a relation R is decomposed into R1, R2, …and the natural join R1 join R2 join R3 …is exactly equal to R, then the decomposition is said to have the lossless join property.
12
Example
c2b2a2
c1b1a1
CBA
R
b2a2
b1a1
BA
R1:(A,B)
c2b2
c1b1
CB
R2:(B,C)
c1b1a1
c2b2a2
CBA
R1 join R2
Exactly equal to R. so have lossless join
property.
13
Example
c2b1a2
c1b1a1
CBA
R
b1a2
b1a1
BA
R1:(A,B)
c2b1
c1b1
CB
R2:(B,C)
c2b1a2
c1b1a2
c1b1a1
c2b1a1
CBA
R1 join R2
Not the same as R. So do
not have lossless join
property. (have lossy
join)
14
Proposition
A decomposition
( )1 2,R Rρ =Has a lossless join with respect to F, if and only if, either
[ ]1 2 1 2R R R R F+∩ → − ∈
or
[ ]1 2 2 1R R R R F+∩ → − ∈
15
Example Assume R1 = {A, B}, R2 = {B, C} is a decomposition of R. Then,
{ }1 2R R B∩ =
{ }1 2R R A− =
{ }2 1R R C− =
For the decomposition to be lossless, it should be either B�A in F+ or B�C in F+. But to figure if this is the case, we should have F first (which we don’t in this example) … stay tuned.
16
Proposition
If X�Y is in F of R and X Y∩ = ∅ then
The decomposition of R into R1: R-Y; R2 : XY, is lossless.
17
Proof
( )( )
1 2
1 2
R R R Y XY
XYW Y XY
XW XY
X
R R X
∩ = − ∩
= − ∩= ∩=⇒ ∩ =
For some W
Now need to show that
X�R1 (i.e., X� R-Y), or
X�R2 (i.e., X�XY)
18
Proof …/ Since
X�Y (assumption), and
X�X (trivial),
we have that X�XY, which is R2.
Therefore, X�R2. (done)
19
Preservation of dependencies
Definition: A decomposition
( )1 2,...,, NR R Rρ =
Preserves a set of FDs F, if
( )1
i
N
Ri
F Fπ+
+ +
=
= ∪
where ( )iR Fπ +
is the set of all dependencies from F+ that are
comprised of attributes in Ri.
20
Note: for simplicity, we may denote
( )iR Fπ +
as
iF +
21
Example
R:(A, B, C)
F= { A � C,
B � C,
A � B}
Assume R1 = (A, B) and R2 = (B, C).
Does the decomposition of R into R1, R2 preserve dependencies?
22
Example …/
Check 1 2F F∪ first (and if not succeed then have to try
1 2F F+ +∪
{ }1 2 ,F F A B B C∪ = → →
Need only show that the 3rd dependency of F, A � C can be generated by
1 2F F∪
Since A�B and B�C, we have that A�C. (done).
Therefore the decomposition preserves dependencies.
23
Example 2
R:(A, B, C)
F= { A � B,
B � C,
C � A}
( )1 2,R Rρ = Where R1 = (A, B), R2 = (B, C)
Therefore, F1 = {A � B} and F2 = { B � C}
Still need C � A. (note, A�B and B�C imply A�C but not C�A.
24
Example 2 …
Have to find 1 2 .F and F+ +
1 :F +
Have to calculate F+ and then project on the appropriate attributes.
To calculate F+, apply Armstrong axioms on F.
25
Example 2 …
F:
A � B (1)
B � C (2)
C � A (3)
(1), (2) � A � C, in F+.
(2), (3) � B � A, in F+ (and also in F1+)
(3), (1) � C � B, in F+ (and also in F2+)
26
Example 2 …/So far …
F1+: A � B (i) B � A (ii)
F2+: B � C (iii)C � B (iv)
Still need C � A. Notice, (iv), (ii) � C � A. i.e. C � A is in � done. Therefore, the decomposition preserves dependencies.
( )1 2F F++ +∪
27
Where are we …X+: Closure of set of attributes
F+: closure of set of FDs
MC: minimal cover of F
Lossless join property
Algorithm to compute 3NF
Preservation of dependencies
28
Minimal cover for a set of FDs
• A minimal cover G for a set of FDs F, is an equivalent set of FDs that is minimal in the following sense:
1. Every dependency in G is as small as possible (i.e., each LHS of each FD has as few attributes as possible, and the RHS has only one attribute).
2. Every FD in G is required for the closure G+ to be equal to the closure F+.
29
How to calculate G?
• Given F, how do we obtain the minimal cover G?
1. Put F in standard form(i.e., all FDs in F have RHS with one attribute).
2. Eliminate extraneous attributes from LHS.
3. Eliminate redundant FDs.
30
Important !!!
1. The above steps should be performed in order (1)-(2)-(3), or else the result may not be a minimal cover.
2. The order in which we process the FDs may result in different minimal covers (… which is ok). i.e., the minimal cover is not unique for a set F.
31
Example
F:A � B (1)ABCD � E (2)EF � G (3)EF � H (4)ACDF � EG (5)
Calculate the minimal cover of F.
32
Step 1: Put F in standard form
• FDs (1)…(4) are already in standard form.
• For FD (5): – ACDF � E (5.1)
– ACDF � G (5.2)
33
Step 2: eliminate extraneous attiributes from LHS(minimize LHSs)
(1) A � B : nothing to eliminate.
(2) ABCD � E. • If delete A, will have BCD � E.
• Is this LHS good enough?
• It is, if either • BCD � E, or
• BCD � W, such that W contains ABCD (the original LHS).
Note, (BCD)+ = BCD. Therefore, cannot delete A.
34
Can we delete B?If so, then will have
ACD � EIs this LHS good enough?It is, if either
ACD � E, orACD � W that contains ABCD.
Note, ACD � ACD � ABCD. Therefore, ACD � W = ABCD, and thus B can be eliminated!
So, ABCD � E becomes ACD � E.
35
Can I delete any more? i.e., delete C or D?If delete C, then ACD � E becomes AD � E. Test:
AD � AD � ABD, does not contain ACD. Therefore, cannot delete C.
Similarly, cannot delete D.
Since we finished scanning the entire LHS of this FD, step 2 is finished for this FD, and the resulting FD is
ACD � E (2.1) – replaces (2) of original F.
36
Repeat the above process for FDs (3), (4), (5.1), (5.2).
For (3): EF � G.
1. Can I delete E? ... If so, will have F � G. – not possible … check.
2. Can I delete F? … if so, will have E � G – not possible .. Check.
Therefore, there is no change in (3).
37
For (4) : EF � HAgain, cannot delete anything from LHS.
For (5.1) [ ACDF � E ].
Delete A?
�CDF � E ? Or,
�CDF � W that contains ACDF ?
(CDF)+ = CDF, which is neither E nor W.
� cannot delete A.
38
For (5.1) [ ACDF � E ] …
Delete C?
�ADF � E, or
�ADF � W that contains ACDF.
ADF � ADF �from (1)� ABDF, which does contain E or ACDF.
� Cannot delete C.
39
For (5.1) [ ACDF � E ] …
Delete D?
�ACF � E, or
�ACF � W that contains ACDF.
ACF � ACF �from (1)� ABCF, which does not contain E or ACDF.
� Cannot delete D.
40
For (5.1) [ ACDF � E ] …Delete F?�ACD � E, or�ACD � W that contains ACDF.
ACD � ACD �from (1)� ABCD�from (2)� ABCDE, which contains E !!
Therefore, F can be deleted from the LHS of (5.1), and (5.1) becomes
ACD ���� E (5.1.1)
Finished step 2 of (5.1)!! .. On to (5.2) …
41
ACDF � G (5.2)• Repeat the above process and find that
nothing can be eliminated from the LHS of (5.2).
• So step 2 of the minimal cover computation is finished (we minimized all LHSs of all FDs).
42
The resulting FDs from step 2, are:
• A � B (1) ---------- (1)
• ACD � E (2.1) ------- (2)
• EF � G (3) --------- (3)
• EF � H (4) --------- (4)
• ACD � E (5.1.1)
• ACDF � G (5.2) ------- (5)Same as (2) !!
43
End of Normalization 02