+ All Categories
Home > Documents > Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Date post: 20-Jan-2016
Category:
Upload: alvin-nash
View: 221 times
Download: 0 times
Share this document with a friend
39
Database Database Normalization Normalization CP3410 CP3410 Daryle Niedermayer, Daryle Niedermayer, I.S.P., PMP I.S.P., PMP
Transcript
Page 1: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Database NormalizationDatabase Normalization

CP3410CP3410

Daryle Niedermayer, I.S.P., Daryle Niedermayer, I.S.P., PMPPMP

Page 2: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

What is NormalizationWhat is Normalization

Normalization allows us to organize Normalization allows us to organize data so that it:data so that it:• Allows faster access (dependencies Allows faster access (dependencies

make sense)make sense)• Reduced space (less redundancy)Reduced space (less redundancy)

Page 3: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Normal FormsNormal Forms

Normalization is done through Normalization is done through changing or transforming data into changing or transforming data into various Normal Forms. various Normal Forms.

There are 5 Normal Forms but we There are 5 Normal Forms but we almost never use 4NF or 5NF.almost never use 4NF or 5NF.

We will only be concerned with 1NF, We will only be concerned with 1NF, 2NF, and 3NF.2NF, and 3NF.

Page 4: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

For a database to be in a normal For a database to be in a normal form, it must meet all requirements form, it must meet all requirements of the previous forms:of the previous forms:• Eg. For a database to be in 2NF, it must Eg. For a database to be in 2NF, it must

already be in 1NF. For a database to be already be in 1NF. For a database to be in 3NF, it must already be in 1NF and in 3NF, it must already be in 1NF and 2NF.2NF.

Page 5: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Sample DataSample Data

Manager EmployeesFatma Sayed, TariqAbdulaziz Tafla, MohammedAli Sarai, Miriam

This data has some problems:This data has some problems:• The Employees column is not The Employees column is not atomicatomic..

A column must be atomic, meaning that it A column must be atomic, meaning that it can only hold a single item of data. This can only hold a single item of data. This column holds more than one employee column holds more than one employee name.name.

Page 6: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Manager EmployeesFatma Sayed, TariqAbdulkaziz Tafla, MohammedAli Sarai, Miriam

Data that is not atomic means:Data that is not atomic means:• We can’t easily sort the dataWe can’t easily sort the data• We can’t easily search or index the dataWe can’t easily search or index the data• We can’t easily change the dataWe can’t easily change the data• We can’t easily reference the data in We can’t easily reference the data in

other tablesother tables

Page 7: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Manager Employee1 Employee2Fatma Sayed TariqAbdulaziz Tafla MohammedAli Sarai Miriam

Breaking the Employee column into Breaking the Employee column into more than 1 column doesn’t solve more than 1 column doesn’t solve our problems:our problems:• The data may look atomic, but only The data may look atomic, but only

because we have many identical because we have many identical columns storing a single piece of data columns storing a single piece of data instead of a single column storing many instead of a single column storing many pieces of data.pieces of data.

Page 8: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

• We still can’t easily sort, search, or We still can’t easily sort, search, or index our employees.index our employees.

• What if a manager has more than 2 What if a manager has more than 2 employees, 10 employees, 100 employees, 10 employees, 100 employees? We’d need to add columns employees? We’d need to add columns to our database just for these cases.to our database just for these cases.

• It is still hard to reference our It is still hard to reference our employees in other tables. employees in other tables.

Page 9: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Manager Employee1 Employee2Fatma Sayed TariqAbdulaziz Tafla MohammedAli Sarai Miriam

By the way, what would be a good By the way, what would be a good choice of a Primary Key for this choice of a Primary Key for this table?table?

Page 10: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

First Normal FormFirst Normal Form

1NF means that we must:1NF means that we must:• Eliminate duplicate columns from the Eliminate duplicate columns from the

same table, andsame table, and• Create separate tables for each group of Create separate tables for each group of

related data into separate tables, each related data into separate tables, each with a unique row identifier (primary with a unique row identifier (primary key)key)

Let’s get started by making our Let’s get started by making our columns atomic…columns atomic…

Page 11: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Atomic DataAtomic Data

By breaking each By breaking each tuple of our table tuple of our table into an entry for into an entry for each employee, we each employee, we have made our have made our data atomic.data atomic.

What would be the What would be the primary key?primary key?

Manager EmployeeFatma SayedFatma TariqAbdulaziz TaflaAbdulaziz MohammedAli SaraiAli Miriam

Page 12: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Primary KeyPrimary Key

The best primary The best primary key would be the key would be the Employee column.Employee column.

Every employee Every employee only has one only has one manager, therefore manager, therefore an employee is an employee is unique.unique.

Employee ManagerSayed FatmaTariq FatmaTafla AbdulazizMohammed AbdulazizSarai AliMiriam Ali

Page 13: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

First Normal FormFirst Normal Form

Congratulations!Congratulations! The fact that all our The fact that all our

data and columns data and columns is atomic and we is atomic and we have a primary key have a primary key means that we are means that we are in 1NF!in 1NF!

Employee ManagerSayed FatmaTariq FatmaTafla AbdulazizMohammed AbdulazizSarai AliMiriam Ali

Page 14: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

First Normal Form RevisedFirst Normal Form Revised

Of course there Of course there may come a day may come a day when we hire a when we hire a second employee second employee or manager with or manager with the same name. To the same name. To avoid this, let’s use avoid this, let’s use an employee ID an employee ID instead of their instead of their name.name.

ID Employee ManagerID1 Sayed 72 Tariq 73 Tafla 84 Mohammed 85 Sarai 96 Miriam 97 Fatma8 Abdulaziz9 Ali

Page 15: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

1NF: Before and After1NF: Before and After

ID Employee ManagerID1 Sayed 72 Tariq 73 Tafla 84 Mohammed 85 Sarai 96 Miriam 97 Fatma8 Abdulaziz9 Ali

Manager EmployeesFatma Sayed, TariqAbdulaziz Tafla, MohammedAli Sarai, Miriam

Page 16: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Moving to Second Normal FormMoving to Second Normal Form

A database in 2NF must also be in A database in 2NF must also be in 1NF:1NF:• Data must be atomicData must be atomic• Every row (or tuple) must have a unique Every row (or tuple) must have a unique

primary keyprimary key Plus:Plus:

• Subsets of data that apply to multiple Subsets of data that apply to multiple rows (rows (repeating datarepeating data) are moved to ) are moved to separate tablesseparate tables

Page 17: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

CustID FirstName LastName Address City State Zip

1 Bob Smith 123 Main St. Tucson AZ 123452 John Brown 555 2nd Ave. St. Paul MN 543553 Sandy Jessop 4256 James St. Chicago IL 435554 Maria Hernandez 4599 Columbia Vancouver BC V5N 1M05 Gameil Hintz 569 Summit St. St. Paul MN 543556 James Richardson 12 Cameron Bay Regina SK S4T 2V87 Shiela Green 12 Michigan Ave. Chicago IL 435558 Ian Sampson 56 Manitoba St. Winnipeg MB M5W 9N79 Ed Rodgers 15 Athol St. Regina SK S4T 2V9

This data is in 1NF: all fields are atomic and the CustID serves as the primary key

Page 18: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

But let’s pay But let’s pay attention to the attention to the City, State, and Zip City, State, and Zip fields:fields:• There are 2 rows of There are 2 rows of

repeating datarepeating data: : one for Chicago, one for Chicago, and one for St. Paul.and one for St. Paul.

• Both have the same Both have the same city, state and zip city, state and zip codecode

City State Zip

Tucson AZ 12345St. Paul MN 54355Chicago IL 43555Vancouver BC V5N 1M0St. Paul MN 54355Regina SK S4T 2V8Chicago IL 43555Winnipeg MB M5W 9N7Regina SK S4T 2V9

Page 19: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

The CustID determines all the data in the The CustID determines all the data in the row, but U.S. row, but U.S. Zip Zip codes determines the codes determines the CityCity and and StateState. (eg. A given Zip code can . (eg. A given Zip code can only belong to one city and state so only belong to one city and state so storing Zip codes with a City and State is storing Zip codes with a City and State is redundant)redundant)

This means that This means that CityCity and and StateState are are Functionally DependentFunctionally Dependent on the value in on the value in ZipZip code and not only the primary key. code and not only the primary key.

Page 20: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

To be in 2NF, this repeating data To be in 2NF, this repeating data must be in its own table.must be in its own table.

So:So:• Let’s create a Zip code table that maps Let’s create a Zip code table that maps

Zip codes to their City and State.Zip codes to their City and State.• Note that Canadian Postal Codes are Note that Canadian Postal Codes are

different: the same city and state can different: the same city and state can have many different postal codes.have many different postal codes.

Page 21: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Our Data in 2NFOur Data in 2NFCustID FirstName LastName Address Zip

1 Bob Smith 123 Main St. 123452 John Brown 555 2nd Ave. 543553 Sandy Jessop 4256 James St. 435554 Maria Hernandez 4599 Columbia V5N 1M05 Gameil Hintz 569 Summit St. 543556 James Richardson 12 Cameron Bay S4T 2V87 Shiela Green 12 Michigan Ave. 435558 Ian Sampson 56 Manitoba St. M5W 9N79 Ed Rodgers 15 Athol St. S4T 2V9

Zip City State

12345 Tucson AZ54355 St. Paul MN43555 Chicago ILV5N 1M0 Vancouver BCS4T 2V8 Regina SKM5W 9N7 Winnipeg MBS4T 2V9 Regina SK

•We see that we can actually save 2 rows in the Zip Code table by removing these redundancies: 9 customer records only need 7 Zip code records.

•Zip code becomes a foreign key in the customer table linked to the primary key in the Zip code table

Cust

om

er

Table

Zip

Code T

able

Page 22: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Advantages of 2NFAdvantages of 2NF

Saves space in the database by Saves space in the database by reducing redundanciesreducing redundancies

If a customer calls, you can just ask If a customer calls, you can just ask them for their Zip code and you’ll them for their Zip code and you’ll know their city and state! (No more know their city and state! (No more spelling mistakes)spelling mistakes)

If a City name changes, we only need If a City name changes, we only need to make one change to the database.to make one change to the database.

Page 23: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Summary So Far…Summary So Far…

1NF:1NF:• All data is atomicAll data is atomic• All rows have a unique primary keyAll rows have a unique primary key

2NF:2NF:• Data is in 1NFData is in 1NF• Subsets of data in multiple columns are Subsets of data in multiple columns are

moved to a new tablemoved to a new table• These new tables are related using These new tables are related using

foreign keysforeign keys

Page 24: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Moving to 3NFMoving to 3NF

To be in 3NF, a database must be:To be in 3NF, a database must be:• In 2NFIn 2NF• All columns must be fully functionally All columns must be fully functionally

dependent on the primary key (There dependent on the primary key (There are no transitive dependencies)are no transitive dependencies)

Page 25: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

In this table:In this table:• CustomerID and ProdID depend on the OrderID CustomerID and ProdID depend on the OrderID

and no other column (good)and no other column (good)• Stated another way, “If you know the OrderID, Stated another way, “If you know the OrderID,

you know the CustID and the ProdID”you know the CustID and the ProdID” So: OrderID So: OrderID CustID, ProdID CustID, ProdID

OrderID CustID ProdID Price Quantity Total1 1001 AB-111 50 1,000 50,0002 1002 AB-111 60 500 30,0003 1001 ZA-245 35 100 3,5004 1003 MB-153 82 25 2,0505 1004 ZA-245 42 10 4206 1002 ZA-245 40 50 2,0007 1001 AB-111 75 100 7,500

Page 26: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

But there are some fields that are But there are some fields that are not dependent on OrderID:not dependent on OrderID:• Total is the simple product of Total is the simple product of

Price*Quantity. As such, has a transitive Price*Quantity. As such, has a transitive dependency to Price and Quantity.dependency to Price and Quantity.

• Because it is a calculated value, doesn’t Because it is a calculated value, doesn’t need to be included at all.need to be included at all.

OrderID CustID ProdID Price Quantity Total1 1001 AB-111 50 1,000 50,0002 1002 AB-111 60 500 30,0003 1001 ZA-245 35 100 3,5004 1003 MB-153 82 25 2,0505 1004 ZA-245 42 10 4206 1002 ZA-245 40 50 2,0007 1001 AB-111 75 100 7,500

Page 27: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Also, we can see that Price isn’t Also, we can see that Price isn’t really dependent on ProdID, or really dependent on ProdID, or OrderID. Customer 1001 bought AB-OrderID. Customer 1001 bought AB-111 for $50 (in order 1) and for $75 111 for $50 (in order 1) and for $75 (in order 7), while 1002 spent $60 for (in order 7), while 1002 spent $60 for each item in order 2.each item in order 2.

OrderID CustID ProdID Price Quantity Total1 1001 AB-111 50 1,000 50,0002 1002 AB-111 60 500 30,0003 1001 ZA-245 35 100 3,5004 1003 MB-153 82 25 2,0505 1004 ZA-245 42 10 4206 1002 ZA-245 40 50 2,0007 1001 AB-111 75 100 7,500

Page 28: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Maybe price is dependent on the Maybe price is dependent on the ProdID and Quantity: The more you ProdID and Quantity: The more you buy of a given product the cheaper buy of a given product the cheaper that product becomes!that product becomes!

So we ask the business manager and So we ask the business manager and she tells us that this is the case.she tells us that this is the case.

OrderID CustID ProdID Price Quantity Total1 1001 AB-111 50 1,000 50,0002 1002 AB-111 60 500 30,0003 1001 ZA-245 35 100 3,5004 1003 MB-153 82 25 2,0505 1004 ZA-245 42 10 4206 1002 ZA-245 40 50 2,0007 1001 AB-111 75 100 7,500

Page 29: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

We say that Price has a We say that Price has a transitive transitive dependencydependency on ProdID and Quantity. on ProdID and Quantity.• This means that Price isn’t just determined by This means that Price isn’t just determined by

the OrderID. It is also determined by the size the OrderID. It is also determined by the size (or quantity) of the order (and of course what is (or quantity) of the order (and of course what is ordered).ordered).

OrderID CustID ProdID Price Quantity Total1 1001 AB-111 50 1,000 50,0002 1002 AB-111 60 500 30,0003 1001 ZA-245 35 100 3,5004 1003 MB-153 82 25 2,0505 1004 ZA-245 42 10 4206 1002 ZA-245 40 50 2,0007 1001 AB-111 75 100 7,500

Page 30: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Let’s diagram the dependencies.Let’s diagram the dependencies. We can see that all fields are We can see that all fields are

dependent on OrderID, the Primary dependent on OrderID, the Primary Key (white lines)Key (white lines)

OrderID CustID ProdID Price Quantity Total

Page 31: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

But Total is also determined by Price But Total is also determined by Price and Quantity (yellow lines)and Quantity (yellow lines)• This is a derived fieldThis is a derived field

(Price x Quantity = Total)(Price x Quantity = Total)• We can save a lot of space by getting rid We can save a lot of space by getting rid

of it altogether and just calculating total of it altogether and just calculating total when we need itwhen we need it

OrderID CustID ProdID Price Quantity Total

Page 32: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Price is also determined by both Price is also determined by both ProdID and Quantity rather than the ProdID and Quantity rather than the primary key (red lines). This is called primary key (red lines). This is called a a transitive dependencytransitive dependency. We must . We must get rid of transitive dependencies to get rid of transitive dependencies to have 3NF.have 3NF.

OrderID CustID ProdID Price Quantity

Page 33: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

We do this by moving the transitive We do this by moving the transitive dependency into a second table…dependency into a second table…

OrderID CustID ProdID Price Quantity

Page 34: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

By splitting out the By splitting out the table, we can table, we can quickly adjust our quickly adjust our price table to meet price table to meet our competitor, or our competitor, or if the prices if the prices changes from our changes from our suppliers.suppliers.

OrderID CustID ProdID Quantity

ProdID PriceQuantity

Page 35: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

The second table is our pricing list.The second table is our pricing list. Think of Quantity as a range:Think of Quantity as a range:

• AB-111: 1-100, 101-500, 501 and moreAB-111: 1-100, 101-500, 501 and moreZA-245: 1-10, 11-50, 51 and moreZA-245: 1-10, 11-50, 51 and more

The primary Key for this second table is a The primary Key for this second table is a composite of ProdID and Quantity. composite of ProdID and Quantity.

OrderID CustID ProdID Quantity ProdID Quantity Price1 1001 AB-111 1,000 AB-111 1 752 1002 AB-111 500 AB-111 101 603 1001 ZA-245 100 AB-111 501 504 1003 MB-153 25 ZA-245 1 425 1004 ZA-245 10 ZA-245 11 406 1002 ZA-245 50 ZA-245 51 357 1001 AB-111 100 MB-153 1 82

Page 36: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Congratulations! We’re now in 3NF!Congratulations! We’re now in 3NF! We can also quickly figure out what We can also quickly figure out what

price to offer our customers for any price to offer our customers for any quantity they want. quantity they want.

OrderID CustID ProdID Quantity ProdID Quantity Price1 1001 AB-111 1,000 AB-111 1 752 1002 AB-111 500 AB-111 101 603 1001 ZA-245 100 AB-111 501 504 1003 MB-153 25 ZA-245 1 425 1004 ZA-245 10 ZA-245 11 406 1002 ZA-245 50 ZA-245 51 357 1001 AB-111 100 MB-153 1 82

Page 37: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

To Summarize (again)To Summarize (again)

A database is in 3NF if:A database is in 3NF if:• It is in 2NFIt is in 2NF• It has no transitive dependenciesIt has no transitive dependencies

A transitive dependency exists when one A transitive dependency exists when one attribute (or field) is determined by another attribute (or field) is determined by another non-key attribute (or field)non-key attribute (or field)

We remove fields with a transitive We remove fields with a transitive dependency to a new table and link them by dependency to a new table and link them by a foreign key.a foreign key.

Page 38: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

SummarizingSummarizing

A database is in 2NF if:A database is in 2NF if:• It is in 1NFIt is in 1NF• There is no repeating data in its tables.There is no repeating data in its tables.

Put another way, if we use a composite Put another way, if we use a composite primary key, then all attributes are primary key, then all attributes are dependent on all parts of the key.dependent on all parts of the key.

Page 39: Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

And Finally…And Finally…

A database is in 1NF if:A database is in 1NF if:• All its attributes are atomic (meaning All its attributes are atomic (meaning

they contain only a single unit or type of they contain only a single unit or type of data), anddata), and

• All rows have a unique primary key.All rows have a unique primary key.


Recommended