RelationA B C D E F
1NF?
2NF?3NF?
Relation1A B
Relation2A* C D E*
Help me Codd!! Relation3E F
Normalisation
Reading: Connolly and Begg 13 & 14 (4th ed),
Normalisation
regn
o
stud
ent_
nam
e
sex
stud
ent_
addr
ess
code
resu
lt
title
requ
ires
id lect
urer
_na
me
lect
urer
_ad
dres
s
qual
posi
tion
43414 Jones FemaleEdinburgh55101 Smith Edinburgh BSc Lecturer55144 Brown London BSc Lecturer
40986 Jones MaleOxford 3011 65 Data Structures 3005 55633 Brown Abingdon PhD Reader42331 Smith FemaleLondon 3011 72 Data Structures 3005 55633 Brown Abingdon PhD Reader40986 Jones MaleOxford 3080 Spreadsheets 3011 55633 Brown Abingdon PhD Reader40986 Jones MaleOxford 3025 78 Databases 3011 55981 Adams London Meng Lecturer42331 Smith FemaleLondon 3025 81 Databases 3011 55981 Adams London Meng Lecturer40986 Jones MaleOxford 3081 76 Artificial Intelligence 2080 55981 Adams London Meng Lecturer42331 Smith FemaleLondon 3081 Artificial Intelligence 2080 55981 Adams London Meng Lecturer
3082 Software Engineering 55981 Adams London Meng Lecturer
From this…
…to this
In 3+ easy(?) steps
What is normalisation?
A method for database design– Theory examines how “good” is a schema?– Transform non-normalised schemas– Minimise storage
Takes a set of attributes and derives the relational model– By separating out the required tables
Completely different approach to ERM– But should get the same result
A minimum of 3 steps are used: For each stage, the normal form gets stronger (i.e.
removes redundancy) so less open to update anomalies All based on functional dependencies
Functional Dependency
Underpins normalisation process If every value of column A uniquely determines the value
in column B, then– B is functionally dependent on A (B depends on A)– A determines B, or, formally, A B (A is called the determinant)
For example,– EmpID Age, Dept (AB,C) Employee ID, Project Role (X,
Y Z)– Note multiple attributes are often involved
EmpID Project Age Dept Dsize Budget Role
Rules for functional dependency
A B does NOT automatically mean B A– E.g. student ID name but not name ID
Transitive dependency:If AB and BC then AC
Many other rules– E.g. if X,YZ but XZ also– In this case Z is partially dependent on X,Y
“Transitive” and “partial” dependency are two key concepts of the normalisation process
A Question for you!
EmpID Project Age Dept Dsize Budget Role
EmpID Project Age Dept Dsize Budget Role
E1 P2 33 D2 10 100 AnalystE1 P1 33 D2 10 200 Prog.E2 P1 34 D5 10 200 Prog.E2 P2 34 D5 20 100 Analyst
Which functional dependency is violated by the data?
ABCD
Unnormalised Form
Relation contains:– non-atomic attribute values
non-atomic values
ID Employee Salary Project1 Grey 31000 A2 Brown 35000 B,C3 White 55000 A,B,C4 Black 47000 A,C
Violation of 1NF
First Normal Form
Permits only single (atomic) attribute values
ID Employee Salary1 Grey 310002 Brown 350003 White 550004 Black 47000
ID (fk) Project Budget1 A 102 B 52 C 53 A 53 B 53 C 54 A 104 C 5
Remove Repeating
Group along with primary
key from other Table
ID Employee Salary Project Budget
1 Grey 31000 A 102 Brown 35000 B 52 Brown 35000 C 53 White 55000 A 53 White 55000 B 53 White 55000 C 54 Black 47000 A 104 Black 47000 C 5
redundancy Repeating
Full Functional Dependency (FFD) X Y is FFD
– if removal of any attribute from X removes the dependency
X Y is partially dependent– if removal of attribute from X leaves the dependency
intact 2NF test
– involves testing for partial dependency on the PK (therefore PK MUST be composite to test for 2NF)
Relation R is in 2NF if:– every non-primary-key attribute in R is FFD on the
primary key of R
Second Normal Form
So which FD’s are violating 2NF? “Second Normalised” by:
– removing non-primary-key attributes and forming a FFD on appropriate part of primary key
2NF
EmpID Project Age Dept Dsize Budget Role
{EmpID ,Age, Dept , Dsize} {EmpID*, Project*, Role}
{Project , Budget}
Third Normal Form
Remove Transitive Dependency Conditions
– A non-primary-key attribute Z is transitively dependent on primary key X if:
X Y; Y Z (Y attribute provides the transition to the PK)
[EmpID* Project* Role]
[Project Budget]
[EmpID Age Dept Dsize]
A
B
C
Which of the above could have transitive dependency?
D None of the above
Here is an un-normalised Table
Ord# Date Cust# Name Prod# Desc Qty Supplier Tel1 12/1/01 1 Jones 1 Disk 3 X 1011 12/1/01 1 Jones 2 CD 5 Y 2232 13/1/01 2 Black 1 Disk 1 X 1012 13/1/01 2 Black 2 CD 1 Y 223 2 13/1/01 2 Black 3 Mouse 1 X 1013 13/1/01 1 Jones 3 Mouse 1 X 101
Normalise it to 1NF
Ord# Date Cust# Name1 12/1/01 1 Jones2 13/1/01 2 Black3 13/1/01 1 Jones
Ord# Date Cust# Name Prod# Desc Qty Supplier Tel1 12/1/01 1 Jones 1 Disk 3 X 1011 12/1/01 1 Jones 2 CD 5 Y 2232 13/1/01 2 Black 1 Disk 1 X 1012 13/1/01 2 Black 2 CD 1 Y 223 2 13/1/01 2 Black 3 Mouse 1 X 1013 13/1/01 1 Jones 3 Mouse 1 X 101
Ord# Prod# Desc Qty Supplier Tel1 1 Disk 3 X 1011 2 CD 5 Y 2232 1 Disk 1 X 1012 2 CD 1 Y 2232 3 Mouse 1 X 1013 3 Mouse 1 X 101
fk
Ord# Date Cust# Name1 12/1/01 1 Jones2 13/1/01 2 Black3 13/1/01 1 Jones
Ord# Prod# Desc Qty Supplier Tel1 1 Disk 3 X 1011 2 CD 5 Y 2232 1 Disk 1 X 1012 2 CD 1 Y 2232 3 Mouse 1 X 1013 3 Mouse 1 X 101
Already in 2NF
Prod# Desc Supplier Tel1 Disk X 1012 CD Y 2233 Mouse X 101
Ord# Prod# Qty1 1 31 2 52 1 12 2 12 3 13 3 1
Now we normalise this to 2NFremembering to test on the PK for any partial dependency
fk fk
So, any transitive dependency?
Ord# Date Cust# Name1 12/1/01 1 Jones2 13/1/01 2 Black3 13/1/01 1 Jones
Prod# Desc Supplier Tel1 Disk X 1012 CD Y 2233 Mouse X 101
Ord# Prod# Qty1 1 31 2 52 1 12 2 12 3 13 3 1
fk fk
Yes! But not in all …………….Ord# Date Cust# Name1 12/1/01 1 Jones2 13/1/01 2 Black3 13/1/01 1 Jones
Prod# Desc Supplier Tel1 Disk X 1012 CD Y 2233 Mouse X 101
Prod# Desc Supplier (fk)1 Disk X2 CD Y3 Mouse X
Ord# Date Cust# (fk)1 12/1/01 12 13/1/01 23 13/1/01 1
Supplier TelX 101Y 223
Cust# Name1 Jones2 Black
Ord# Prod# Qty1 1 31 2 52 1 12 2 12 3 13 3 1
OK!
Final Decomposition
Ord#{fk} Prod#{fk} Qty1 1 31 2 52 1 12 2 12 3 13 3 1
Ord# Date Cust# (fk)1 12/1/01 12 13/1/01 23 13/1/01 1
Cust# Name1 Jones2 Black
Prod# Desc Supplier (fk)1 Disk X2 CD Y3 Mouse X
Supplier TelX 101Y 223
Now in 3NF
The underlying E-R Model …..
Ord# Date Cust# Name Prod# Desc Qty Supplier Tel1 12/1/01 1 Jones 1 Disk 3 X 1011 12/1/01 1 Jones 2 CD 5 Y 2232 13/1/01 2 Black 1 Disk 1 X 1012 13/1/01 2 Black 2 CD 1 Y 223 2 13/1/01 2 Black 3 Mouse 1 X 1013 13/1/01 1 Jones 3 Mouse 1 X 101
Customer Order
Product Supplier
0..*1..10..*
0..*
1..11..*
makes
has
despatches
How many tables would you get from mapping?
So Normalisation to 3NF is Normal!!
Remember, 2NF and 3NF disallow partial and transitive dependencies respectively on the PK, otherwise they are open to update anomalies
But ….. even at 3NF, a relation may be open to update anomalies on rare occasions due to redundancy too
So we look briefly at these– Boyce-Codd– 4NF
Boyce-Codd NF
Is a stronger normalised form then 3NF Definition: A relation is in BCNF, if and only if,
every determinant is a candidate key And remember that a candidate key is any key
that could become the PK of the relation (i.e. there may be competition for it!)
Potential to violate BCNF comes from:– A relation containing at least 2 composite candidate
keys– Or candidate keys overlapping (i.e. they have at
least one attribute in common)
BCNF Example
Consider the candidate keys for:
Adapted from Connolly and Begg, 2005, 4th ed. Page 420
clientNo interviewDate interviewTime staffNo roomNo
CR76 13/5/08 10.30 SG5 G101
CR56 13/5/08 12.00 SG5 G101
CR74 13/5/08 12.00 SG37 G102
CR56 1/7/08 10.30 SG5 G102
FD1 {PK}: clientNo, interviewDate interviewTime, staffNo, roomNo FD2 {CK}: staffNo, interviewDate, interviewTime clientNo FD3 {CK}: roomNo, interviewDate, interviewTime staffNo, clientNo FD4: staffNo, interviewDate roomNo
PK is primary key and CK is candidate key.But what about FD4? It is not a CK
So new decomposition?
clientNo interviewDate* interviewTime staffNo*
CR76 13/5/08 10.30 SG5
CR56 13/5/08 12.00 SG5
CR74 13/5/08 12.00 SG37
CR56 1/7/08 10.30 SG5
interviewDate staffNo roomNo
13/5/08 SG5 G101
13/5/08 SG37 G102
1/7/08 SG5 G102
So duplication in the room number is now eradicated
4NF
Comes from 2 multi-valued attributes in a relation
E.g. for each value of A there is a set of values for B and a set for C, while B and C remain independent of each other
Branch
BranchNo
staffName[1..*]
ownerName[1..*]
So if you model your databases from ERM’s this type of dependency should not arise.
Example of 4NF
branchNo staffName ownerName
C003 Anne Carol
C003 David Carol
C003 Anne Tina
C003 David Tina
branchNo* staffName
C003 Anne
C003 David
branchNo* ownerName
C003 Carol
C003 Tina
Note: if step 9 applied to multi-valued attributes then we should map this correctly and avoid such redundancy as the two tables on the right would be the result of the mapping! Adapted from Connolly and Begg, 2005, 4th ed. Page 428
Normal Form Summary
A Relation’s degree of normalisation Stronger in format at each stage
– less vulnerable to update anomalies First Normal Form (1NF)
– The relation has no non-atomic values– Or the relation has “no repeating group”
2nd Normal Form (2NF)– The relation has no partial dependencies– All non-key attributes are fully functionally dependent on the PK
3rd Normal Form (3NF)– The relation has no transitive dependencies
Boyce-Codd– Every determinant is a candidate key
4NF – no multi-valued dependencies