+ All Categories
Home > Documents > Normalisation - computing.unn.ac.ukcomputing.unn.ac.uk/staff/CGMA2/CM0719/wk3/Lecture... ·...

Normalisation - computing.unn.ac.ukcomputing.unn.ac.uk/staff/CGMA2/CM0719/wk3/Lecture... ·...

Date post: 26-Aug-2018
Category:
Upload: lammien
View: 221 times
Download: 0 times
Share this document with a friend
55
Normalisation Lecture 3 (Part 2) Akhtar Ali 10/14/2014 1
Transcript

Normalisation

Lecture 3 (Part 2)

Akhtar Ali

10/14/2014 1

Learning Objectives

1. To consider the process of Normalisation

2. To consider the definition and application of 1NF

3. To consider the definition and application of 2NF

4. To consider the definition and application of 3NF

10/14/2014 2

NORMALISATION PRINCIPLES

10/14/2014 3

Normalisation

• Definition : a systematic method that takes pre-existing relations and produces a canonical set of relations.

– By canonical is meant well-designed, sound, or a recognised and lawful form.

• It can be used both for : • designing canonical relations,

• checking existing relations to ensure they are canonical.

10/14/2014 4

10/14/2014 5

How Normalization Supports Database Design?

10/14/2014 6

Normal forms

1NF 3NF 2NF

Normalisation uses the concept of Normal Forms. They are

organised in a sequence, each successive normal form being

higher than the one before.

A normal form is higher because it applies more stringent

constraints to a relation than a lower normal form.

A relation is said to a be in a certain “normal form” if it

conforms to the constraints of that normal form.

Normalisation as a Relational Design Tool

• Sometimes, we need to use normalisation for designing relations.

– For example, when ER modelling is not feasible or if we deal with a small number of attributes.

– So we need to learn normalisation.

• 1NF stands for First Normal Form, 2NF for Second Normal Form, and so on.

• The constraints of a particular normal form are those of the previous normal form

– plus the additional constraint(s) peculiar to this particular normal form.

10/14/2014 7

The Normalisation Procedure

• The normalisation procedure starts with a set of relations, for each of which, it is presumed to be un-normalised or in 0NF.

– DO FOR xNF = 1NF, ..... 5NF

• DO FOR each relation that exists

– IF relation already conforms to xNF

» THEN it is in xNF, so do nothing

– ELSE create 2 or more replacement relations from it that do conform to xNF.

– END-LOOP

• END-LOOP

• 5NF is the highest possible normal form.

• In practice, 3NF is the highest normal form usually reached.

8

What is a Normal Form?

• Each Normal Form has two parts

– A definition that specifies exactly what constraints apply to a relation in that normal form.

• This is used to check whether any given relation is already in that normal form or not.

– A method to be used to replace the relation with 2 or more that will be in that normal form.

• The method assumes that the relation-to-be-replaced is in the previous normal form.

10/14/2014 9

10/14/2014 10

Normalising : Possibilities

The set of all un-normalised

relations

Relation

The set of all relations

in 1NF

The set of all relations

in 2NF A given relation.

Already in 1NF. Nothing to do.

And so on.

Already in 2NF. Nothing to do.

Consequences of Normalisation • If new, replacement relations are created, then they

must be projections of the original.

– New-Relation πset of attributes (Original-relation)

• The symbol π denotes projection of a set of attributes from a

relation.

• Normalisation always creates new relations such that – Original-relation New-Rel-1 ⋈ New-Rel-2

• The symbol ⋈ denotes a join between two relations.

• This ensures that no information is ever lost.

10/14/2014 11

FIRST NORMAL FORM (1NF)

10/14/2014 12

Definition of 1NF

• A relation is in 1NF if and only if every attribute value it can ever contain is an atomic value

• Question : What is an atomic value ?

• Answer : A value that cannot meaningfully be broken down into two or more constituent parts.

10/14/2014 13

10/14/2014 14

Example : Purchase Order Relation

L6

315

Bloggs

D‟ham

8 June

P3 Q7

Pump Motor

5 5

400

150 250

L5

127

Smith

N‟cle

7 May

N8 B6 L4

Nut Bolt Nail

70 60 100

12

4 5 3

Ord Sno Sname Saddr Date Part Pname Qty Tot Price

The following relation holds data about purchase orders placed on suppliers for parts

Ord Order number that uniquely identifies every purchase order.

Sno Supplier number that uniquely identifies any supplier.

Sname The name of a supplier.

Saddr The address of a supplier.

Date The date on which the order was placed.

Part Part number that uniquely identifies every kind of part used by the company.

Pname The name of a particular kind of part.

Qty The quantity of a particular kind of part ordered on a purchase order.

Price The price of that quantity of that particular kind of part.

Tot The total price to be paid for the whole order.

Not in 1NF

• Attributes Ord, Sno, Sname, Saddr, Date and Tot currently contain only atomic values, and in fact can only ever contain atomic values.

• Attributes Part, Pname, Qty and Price currently contain non-atomic values, and in fact may often contain non-atomic values.

• Therefore the relation is not in 1NF.

10/14/2014 15

10/14/2014 16

Putting Purchase Order into 1NF

• Separate out the atomic and non-atomic attributes

• Put all the atomic attributes in a new replacement relation, which then by definition is in 1NF.

Date Saddr Sname Sno Ord Tot

8 June D‟ham Bloggs 315 L6 400

7 May N‟cle Smith 127 L5 12

The Non-Atomic Attributes

• We can’t just throw away this data because it is a nuisance to store!

• The values in all these attributes repeat together.

– If a part is removed from an order, its values must be removed from all 4 attributes.

– If another part is placed on an order, there must be a value for that part in all 4 attributes.

10/14/2014 17

Repeating Together

• Thus a set of values that repeat together should become a tuple in a new relation.

• Now the attributes in these tuples contain only atomic data !

• Thus we form another new replacement relation to hold the tuples of data that repeat together.

• There is no intrinsic reason why all the non-atomic attributes in an un-normalised relation should always repeat together.

10/14/2014 18

Part Pname Qty Price

N8 Nut 70 4

B6 Bolt 60 5

L4 Nail 100 3

P3 Pump 5 150

Q7 Motor 5 250

Foreign Keys

• The problem with this relation is that the part data is no longer associated with its order data.

• We no longer know which part type was ordered on which purchase order.

• We can solve this problem by adding the (purchase) order number attribute to this relation.

• In general, we must add the attribute(s) which formed a candidate key in the original relation, to this relation as a foreign key. This retains the relationship information.

10/14/2014 19

Ord Part Pname Qty Price

L5 N8 Nut 70 4

L5 B6 Bolt 60 5

L5 L4 Nail 100 3

L6 P3 Pump 5 150

L6 Q7 Motor 5 250

10/14/2014 20

Candidate Keys for Relations

Extend the candidate key to Ord and Part including the foreign key Ord*

The candidate key

is Ord

Date Saddr Sname Sno Ord Tot

8 June D‟ham Bloggs 315 L6 400

7 May N‟cle Smith 127 L5 12

Ord* Part Pname Qty Price

L5 N8 Nut 70 4

L5 B6 Bolt 60 5

L5 L4 Nail 100 3

L6 P3 Pump 5 150

L6 Q7 Motor 5 250

SECOND NORMAL FORM (2NF)

10/14/2014 21

10/14/2014 22

Definition of 2NF

A relation is in 2NF if and only if

it is in 1NF and

every non-key attribute is fully functionally dependent on the candidate key.

The extra constraint applied by 2NF

Note that 2NF is more strict than 1NF because it

requires the relation to conform to the additional “full

functional dependency” constraint.

Fully Functionally Dependent

• Question : What does fully functionally dependent mean?

• We will first consider the principle of functional dependency, and then see

– what full functional dependency means,

– the application to achieve 2NF.

10/14/2014 23

10/14/2014 24

Example of Functional Dependency

Account Number Payment Due

This type of arrow indicates a function dependency.

Assume some kind of loan account where payments of a certain

amount have to be made on a regular basis to pay off the loan.

This means :

• A given account number determines what payment is due.

• In principle, given an account number, one can find out what

regular payment is due. (May not always be easy or feasible in practice).

Terminology

• The Account Number is said to functionally determine the Payment Due.

• The Payment Due is said to be functionally dependent on the Account Number.

• Both are equally good means of expression, and convenience and emphasis usually determine which of the two is preferred in any particular situation.

10/14/2014 25

10/14/2014 26

Definition of Functional Dependency (FD)

A set of attributes Y in a relation is functionally dependent on

a set of attributes X in the same relation

if and only if

a given set of attribute values in X

determines a specific set of attribute values in Y

for every instant of time.

Relationship X:Y in FD is many:1

• For any given set of values X, there is just one corresponding set of values Y.

• It is possible that there may be many sets of values X for which there is just one set of values Y.

• A functional dependency is a permanent association between attributes.

10/14/2014 27

10/14/2014 28

Further FD Examples Example 1:

Supplier Number

Supplier Name

Supplier Address

Supplier Telephone No.

Example 2:

Customer Name

Customer Address

Customer Telephone No.

A set containing one attribute

determining

a set of three attributes.

a set of two attributes

determining

a set containing one

attribute.

10/14/2014 29

Full Functional Dependency & 2NF

• The definition of 2NF requires not merely functional dependency, but full functional dependency.

Definition of FULL Functional Dependency:

A set of attributes Y is fully functionally dependent on

a set of attributes X

if and only if

Y is functionally dependent

on all the attributes of X and

not just a subset of them.

Condition for 2NF

Thus, to be in 2NF means that:

all attributes not in the candidate key

are fully FD on

all those attributes that are in the candidate key.

10/14/2014 30

10/14/2014 31

Examples: Purchase Order Relations

P_ORDER_1: FD Diagram

The functional dependencies of the non-key attributes in P_ORDER_1 on its candidate key can be represented by the following FD diagram :-

As they are all fully FD on Ord, the relation is already in 2NF.

Date Saddr Sname Sno Ord Tot

8 June D‟ham Bloggs 315 L6 400

7 May N‟cle Smith 127 L5 12

P_ORDER_1

Ord

Sno

Sname

Saddr

Date

Tot

10/14/2014 32

Ord* Part Pname Qty Price

L5 N8 Nut 70 4

L5 B6 Bolt 60 5

L5 L4 Nail 100 3

L6 P3 Pump 5 150

L6 Q7 Motor 5 250

P-ITEM-1

P-ITEM-1: FD Diagram

Reason for non-2NF

• Attributes Price and Qty depend on the full key.

– They depend not only on what kind of part they refer to, but also on the order itself

• the quantity of a part type ordered will vary with & depend on the order, as will the price since it depends on the quantity.

• However Pname depends solely on the type of part.

– A particular kind of part will have the same name on every order on which it appears.

10/14/2014 33

Three Problems of a Non-2NF Relation

• Redundant data may be stored.

• Update anomalies

– there can be problems in inserting, deleting and amending some of the data.

• Semantic problems.

– relation does not reflect the real-world meaning of the data, leading to problems in its use.

10/14/2014 34

10/14/2014 35

Redundant Data

Example: Pname is

unnecessarily repeated.

Every time a part type appears on an order (say Q7), its name (Motor) also appears.

N.B. The part number (say Q7) is enough to identify the part type.

Motor is repeated in orders L6 & L7. One order is sufficient to give us the name, so the Pname is redundant (either one).

P-ITEM-1

Ord* Part Pname Qty Price

L5 N8 Nut 70 4

L5 B6 Bolt 60 5

L5 L4 Nail 100 3

L6 P3 Pump 5 150

L6 Q7 Motor 5 250

L7 Q7 Motor 2 100

10/14/2014 36

Ord* Part Pname Qty Price

L5 N8 Nut 70 4

L5 B6 Bolt 60 5

L5 L4 Nail 100 3

L6 P3 Pump 5 150

L6 Q7 Motor 5 250

L7 Q7 Engine 2 100

?? F5 Flange ? ??

Example: Part type details

(Part and Pname) cannot

always be updated.

Update Anomalies

P-ITEM-1

10/14/2014 37

Semantic Problems

Q7 now has two different names.

Ord* Part Pname Qty Price

L5 N8 Nut 70 4

L5 B6 Bolt 60 5

L5 L4 Nail 100 3

L6 P3 Pump 5 150

L6 Q7 Motor 5 250

L7 Q7 Engine 2 100

P-ITEM-1

10/14/2014 38

Putting P_ITEM_1 into 2NF (1) The problem is caused by „Pname‟ being FD on just part, not the whole of the candidate key. The solution is to separate out each determinant and its dependents. Create 2 replacement relations based on these FDs.

Ord

Part

Pname

Qty

Price

Ord

Qty

Price

Part

Part Pname

Satisfaction of 2NF

• A relation created with a determinant as its candidate key, and with non-key attributes that are fully functionally dependent on that candidate key, must be in 2NF by definition.

• Note that a determining attribute - Part in the above example - can appear in more than one complete determinant.

– This is perfectly acceptable. It just depends what attributes form determinants.

10/14/2014 39

10/14/2014 40

Putting P_ITEM_1 into 2NF (2)

Ord* Part Qty Price

L5 N8 70 4

L5 B6 60 5

L5 L4 100 3

L6 P3 5 150

L6 Q7 5 250

L7 Q7 2 100

P-ITEM-2

10/14/2014 41

Putting P_ITEM_1 into 2NF (3)

Part Pname

N8 Nut

B6 Bolt

L4 Nail

P3 Pump

Q7 Motor

PART_2

Benefits of 2NF

• No information has been lost.

– A natural join of P_ITEM_2 and PART_2 on attribute Part will re-create the original relation P_ITEM_1.

• Problems Solved:

– Redundant data removed – each Pname in once

– Update anomalies – no side effects in operations

– Semantic problems – each part type has just one name

10/14/2014 42

THIRD NORMAL FORM (3NF)

10/14/2014 43

10/14/2014 44

Definition of 3NF

A relation is in 3NF

if and only if it is in 2NF

and every non-key attribute is non-transitively fully FD on the candidate key.

The extra constraint applied by 3NF

Question : What does non-transitively mean ?

Note that 3NF is more stringent than 2NF, as it requires that the relation

not only have full functional dependencies on the candidate key, but that

these dependencies must now additionally be “non-transitive”.

10/14/2014 45

Transitivity

Let

and

Then

A C

A B

B C

These FDs are non-transitive i.e. direct, because they do not go via any other sets of attributes.

This FD is transitive, because it is via another set of attributes, in this case „B‟.

Assume there are three sets of attributes,

„A‟, „B‟ and „C‟.

If A determines B, and B

determines C, then

logically A determines C,

but transitively via B.

Example of Transitive FD

• Suppose pilots always fly the same aircraft

– then if we know the pilot, we know the aircraft; so pilot functionally determines aircraft.

• If we know the aircraft, then we know the airline that owns it

– so aircraft functionally determines airline.

• Putting these two dependencies together

– then pilot functionally determines airline.

• But the functional dependency of airline on pilot is transitive, because it goes via aircraft.

10/14/2014 46

Non-Transitive Full FD & 3NF

So, to be in 3NF means that

all attributes not in the candidate key

are non-transitively - i.e. directly - fully FD on

all those attributes that are in candidate key,

and not FD on the candidate key

via some other non-key attribute.

10/14/2014 47

10/14/2014 48

Reviewing the Definition of 3NF 1.

Key NK1 NK2 NK3

R1( Key, NK1, NK2, NK3 ) R1‟s FD diagram shows a

“chain of dependencies”.

It is not in 3NF.

2.

R2( Key, NK1, NK2, NK3 )

Key

NK1

NK2

NK3

R2‟s FD diagram shows no

“chain of dependencies”. It is

in 3NF.

10/14/2014 49

Example: P_ITEM_2

Neither „Price‟ nor „Qty‟ is FD on the

candidate key via the other, but non-

transitively FD on the key.

Thus P_ITEM_2 is already in 3NF.

Ord* Part Qty Price

L5 N8 70 4

L5 B6 60 5

L5 L4 100 3

L6 P3 5 150

L6 Q7 5 250

L7 Q7 2 100

P-ITEM-2

10/14/2014 50

Example : PART_2

Thus PART_2 is already in 3NF.

If a 2NF relation only has one non-key attribute, then it must

already be in 3NF, as there is no other non-key attribute via

which a transitive dependency can occur.

Part Pname

N8 Nut

B6 Bolt

L4 Nail

P3 Pump

Q7 Motor

PART_2

10/14/2014 51

Example : P_ORDER_1

As we have already seen, its FD diagram is :- However, not all of these FDs are non-transitive FDs (= NTFDs).

P_ORDER_1

Date Saddr Sname Sno Ord Tot

8 June D‟ham Bloggs 315 L6 400

7 May N‟cle Smith 127 L5 12

Ord

Sno

Sname

Saddr

Date

Tot

Taking account now of

transitivity, the FD diagram

can be re-drawn as:-

Ord

Sname

Date

Sno

Tot

Saddr

Hence P_ORDER_1 is not in 3NF.

10/14/2014 52

Putting P_ORDER_1 into 3NF (1) The problem is caused by „Sname‟ and „Saddr‟ being only transitively FD on the candidate key.

Solution: separate out each determinant and its NTFD dependents, & create 2

replacement relations based on them.

Sname

Date

Sno

Tot

Saddr

Ord

Date

Tot

Sno

Ord

Sname

Sno

Saddr

10/14/2014 53

Putting P_ORDER_1 into 3NF (2)

The corresponding relation is:-

Date

Tot

Sno

Ord

P_ORDER_3

Sno Ord

315 L6

L5

Date Tot

8 June 400

7 May 12 127

10/14/2014 54

Putting P_ORDER_1 into 3NF (3)

The corresponding relation is:-

Sname

Sno

Saddr

SUPPLIER_3

Saddr Sname Sno

D‟ham Bloggs 315

N‟cle Smith 127

Benefits

• No information has been lost.

– A natural join of P_ORDER_3 and SUPPLIER_3 on attribute Sno will re-create the original relation P_ORDER_1.

• Problems Solved:

– Redundant data removed – each Sname in once

– Update anomalies – no side effects in operations

– Semantic problems – each supplier has just one name

10/14/2014 55


Recommended