+ All Categories
Home > Documents > Relational Database Systems 1 - TU Braunschweig Database Systems 1 •Functional Dependencies ......

Relational Database Systems 1 - TU Braunschweig Database Systems 1 •Functional Dependencies ......

Date post: 11-Mar-2018
Category:
Upload: lycong
View: 222 times
Download: 4 times
Share this document with a friend
92
Wolf-Tilo Balke Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Relational Database Systems 1
Transcript

Wolf-Tilo Balke

Christoph LofiInstitut für Informationssysteme

Technische Universität Braunschweig

http://www.ifis.cs.tu-bs.de

Relational Database

Systems 1

• Functional Dependencies

– Definition

– Functional Closure

• Normalization

– 1-NF

– 2-NF

– 3-NF

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2

13. Normalization

• We have considered all stages during the life

cycle of a database in this lecture…

– Modeling and implementation of the model

– Querying and manipulating data

– Application programming with databases

– Setting constraints and enforcing

access control

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3

13. Introduction

• But… how do you know whether your design is

sensible?

– Trade-off: redundancy vs. data access speed

• General design: avoid redundancy wherever possible,

because redundancies often lead to inconsistent states

• Example for an exception: materialized views – expensive

to maintain, but boost read efficiency, if often used

– Normal forms can help!

• Functional dependencies measure the amount of

redundancy in the table design

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4

13. Introduction

• In this lecture, you learned the basics of how to

use relational databases

– How do I model data?

– How do I query data?

– What theories are behind queries?

– Ho do I use a DB in my application?

• But we did not tell you how all this stuff really

works!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 5

Relational Databases 2

• That‟s what we will do in Relational Databases 2

– What is the architecture of a DBMS?

– How do you store data on hard disks?

– How does an index work? Why is it so fast?

– How does the DBMS evaluate a query? How can the

evaluation be optimized?

– How are transactions and the ACID principles

enforced?

– What happens to your data if your computer

explodes?

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 6

Relational Databases 2

• Data structures for indexes!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7

Relational Databases 2

• Query Optimization!

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8

Relational Databases 2

• Implementing Transactions!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9

Relational Databases 2

Scheduler

Storage Manager

Transaction Manager

• Relational Databases 2

– Coming to your lecture hall in Summer Semester

2009

• Featuring

– Learn to build a DB yourself!

– Discover all the secrets we

skipped this semester!

– See the wonders of tuning

and optimization!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10

Relational Databases 2

• Remember lecture 5: data in relational databases is represented using the relational model

– Relation R(A1:D1,…, An:Dn)• A1…An are attributes

• D1…Dn are domains of the attributes

• A relational database schema consists of

– A set of relation schemas

– A set of integrity constraints

• A relational database instance (or extension) is

– A set of tuples adhering to the respective schemas and respecting all integrity constraints

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 11

13.1 Functional Dependency

• For the sake of the argument, let us assume that a database is represented by a single universal relation R(A1,…, An)

• Based on this relation, we can introduce the concept of functional dependencies

– Functional dependency is crucial to formally introduce data normalization

– In short, functional dependency canbe informally described as

• “If some attribute B is dependent on another attribute A, and two tuples have the same value for the attribute A, they also have the same value for attribute B”

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12

13.1 Functional Dependency

Definition:

• X and Y are subsets of the attributes of R

• There is functional dependency between Xand Y (denoted as X → Y), iff…

– … for any two tuples t1 and t2 within any instance of

R holds: If t1[X]=t2[X] then also t1[Y]=t2[Y]

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 13

13.1 Functional Dependency

• If X → Y then

– Y is functionally depending of X

– X is called the head of the dependency, Y the body

– The values of the attributes in Y are determined by

those in X

• Note the following:

– If X represents a candidate key (i.e. value

combinations are unique within an instance) then

• X → Y for any Y

– X → Y does not imply Y → X

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 14

13.1 Functional Dependency

• Functional dependencies are properties of the

semantics of attributes

– Semantics are given by the understanding of the

domain

– The designer is responsible for identifying those

semantics

• Functional dependencies further

restrict possible schema extensions

– All extensions respecting the functional dependencies

are called legal extensions

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 15

13.1 Functional Dependency

• Let F be a set of functional dependencies on the

relation R

• Examples:

– A relation containing students

• Semantics: matrikelnummer is unique

• {matrikelnummer} → {firstName, lastName, birthdate}

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 16

13.1 Functional Dependency

matrikelnummer firstName lastName birthdate

– A relation containing real names and aliases of heroes

• {alias} → {realName}, iff each hero has only one unique

alias

– A relation containing license plates and the type of the

respective car

• {areaCode, characterCode, numberCode} → {carType}

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 17

13.1 Functional Dependency

alias realName

areaCode characterCode numberCode carType

• However, often not all possible functional dependencies are explicitly modeled

• Example:

– Heroes have some unique id, a realName, and an alias

– Hero teams have a unique id and a unique team leaderidentified by its id, a hero is only leader of one team

– F = {{hero_id}→{realName, alias}, {team_id} →{leader_id}, {leader_id} →{hero_id}}

– However, also the following dependencies hold

• {hero_id} → {hero_id}, {team_id} → {realName}, {team_id} → {alias}, etc.

• Those dependencies can be inferred from F

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 18

13.1 Functional Dependency

primary key dependency

foreign keydependency

• Definition: The set F+ is called the closure of Fand contains all dependencies which can be

inferred from F

– A dependency X → Y is inferred from a set of relations

F, iff it holds for any legal extension of R

– If X → Y can be inferred from F, this is denoted as

F ⊨ X →Y

• Given a set of dependencies F, the full closure F+

can be inferred automatically by inference rules

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 19

13.1 Functional Dependency

• Reflexive Rule (R1)

– ⊨{X} → {Y}, Y ⊆ X

• Augmentation Rule (R2)

– {{X} → {Y}} ⊨ {X,Z} → {Y,Z}

• Transitive Rule (R3)

– {{X} → {Y}, {Y} → {Z}} ⊨ {X} → {Z}

• It was shown that the rules R1-R3 are sound and complete

– W. W. Armstrong: “Dependency Structures of Data Base Relationships”. In: IFIP Congress, 1974

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 20

13.1 Functional Dependency

It‘s thatsimple!

• Projection Rule (R4)

– {{X} → {Y, Z}} ⊨ {X} → {Y}

• Union Rule (R5)

– {{X} → {Y}, {X} → {Z}} ⊨ {X} → {Y, Z}

• Pseudo-Transitive Rule (R6)

– {{X} → {Y}, {W, Y} → {Z}} ⊨ {W, X} → {Z}

• Rules R4-R6 can be concluded from R1-R3

– R1-R3 are axioms, R4-R6 are not

– They just allow for easier and faster inference

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 21

13.1 Functional Dependency

• Usually, database designers specify all dependencies within F, which can be extracted easily from the semantics of the domain

– All other dependencies of the closure of F+ can be computed automatically

• The algorithm is based on computing the attributeclosure X+ of an attribute set X under F– X+ is the maximal set of attributes which is

depending on X, if all dependencies in F hold

– Use rules R1-R6 for this

– E.g., {X → X’} implies X’ ⊆ X+

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 22

13.1 Functional Dependency

• Compute X+ under F

• Perform this algorithm for all X ⊆ {Head(F)}

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 23

13.1 Functional Dependency

X+ := X;repeat

oldX+ := X+ ;for each dependency Y → Z in F do

if (X+ ⊇ Y) then X+ := X+ ∪ Z;until (X+ == oldX+)

• Example:

– F = {{hero_id} → {real_name, alias}, {team_id} → {leader_id, battle_parole} , {leader_id} → {hero_id},{hero_id, team_id} → {join_date}}

– Attribute Closures

• {hero_id}+ = {hero_id, real_name, alias}

• {team_id}+ = {team_id, leader_id, battle_parole, hero_id, real_name, alias}

• {hero_id, team_id}+ = {hero_id, team_id, join_date, real_name, alias, leader_id, battle_parole}

• etc…

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 24

13.1 Functional Dependency

• To compute the dependency closure F+, create

dependencies for all subsets of the attribute closures

– ∀ X ∈ Head(F) ∀ Y ∈ ℘(X+) : create a dependency X → Y

– E.g., {hero_id, team_id}+ = {hero_id, team_id, join_date, real_name, alias, leader_id, battle_parole}

• {heroId, teamId} → {heroId}

• {heroId, teamId} → {teamId}

• …

• {heroId, teamId} →{heroId, teamId, join_date}

• …

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 25

13.1 Functional Dependency

• Definition:

– A set of functional dependencies F covers a set E, iff

E ⊆ F+

• Or: ∀ d ∈ E : F ⊨ d

• Definition:

– Two sets of functional dependencies E and F are

equivalent if E + = F+

• Or: E covers F and F covers E

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 26

13.1 Functional Dependency

• Now, we can define the minimal cover

– Informally: F minimally covers E, iff… • F covers E

• If F would not cover E anymore if any dependency of F is removed or weakened

– Formally: F minimally covers E, iff… • F covers E

• Every dependency in F has a single attribute as its body

• No dependency in F with X → A can be replaced by Y → A with Y⊂ X such that the resulting dependency set is still equivalent to F

• No dependency in F can be removed such that the resulting dependency set is still equivalent to F

– Thus, F is in canonical form and without redundancies

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 27

13.1 Functional Dependency

• Since ancient times, people dream of intelligent machines– Golden robots of Hephaestus

– Archytas‟ wooden pigeon (400 BC)

– Leonardo daVinci‟s mechanical knight (1495)

– The Turk of Wolfgang von Kempelen (1770)

– …

• In computer science, this gave birth to the field of Artificial Intelligence

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 28

Knowledge-based Systems and

Deductive DB

• In the initial phase of A.I. research, people were highly motivated and full of visions– High amount of research money available,

mainly from the military (DARPA)

• In the mid seventies, the great visions died… – A long series of failures took

its toll

– The A.I. winter – funding stopped

• Change of research direction– Do not imitate the full human brain, but find intelligent

algorithms for solving particular difficult problems

– Today the basic ideas are part of the Semantic Web efforts

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 29

Knowledge-based Systems and

Deductive DB

• Main critique – Hubert Dreyfus (UC Berkeley, USA)

– Expertise cannot readily be extracted

from human experts

– Much knowledge is not explicit, but

somehow embodied

• The brain is not simply hardware running a

program based on discrete symbolic

calculations

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 30

Knowledge-based Systems and

Deductive DB

• In the 1980ies, A.I. focused on well-defined problem domains building first commercially successful systems

– Knowledge-based systems or ‘expert systems’

• Idea: Create a system which can draw conclusions and thus support people in difficult decisions

– Simulate a human expert

– Main idea: extract knowledge of experts and just cheaply copy it to all places you might need it

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 31

Knowledge-based Systems and

Deductive DB

• Expert Systems were supposed to be

especially useful in

– Medical diagnosis

• Great failure up to now

– Production and machine failure diagnosis

• Works quite well

– Financial services

• Widely used and successful

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 32

Knowledge-based Systems and

Deductive DB

• Usually this is based on interference rules and specific problem data– Rule: All frogs are green

– Fact: Hektor is a frog

– Implies new fact: Hektor is green

• Also, uncertainly can be supported– Rule: Almost all birds can fly except ostriches, chicken

and penguins

– Fact: Tweety is a bird

– Query: Can Tweety fly?• Only few species are ostrichs, chicken or penguins

• Tweety can fly with high probability

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 33

Knowledge-based Systems and

Deductive DB

• MYCIN

– Developed 1970 at Stanford University, USA

– Medical expert system for treating infections

• Diagnosis of infection types and recommended antibiotics

(antibiotics names usually end with ~mycin)

– Around 600 rules (also supporting uncertainty)

– MYCIN was treated as a success by the project team

• Experiments showed good results, especially with rare infections

– … but was never used in practice

• Too clumsy

• Technological constraints

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 34

Knowledge-based Systems and

Deductive DB

• NASA Shine– Spacecraft Health Inference Engine

– Development started in mid 70s by NASA and JPL (Jet Prolusion Lab) for the Deep Space Network• Commercially used by ViaSpace

– Multi-purpose inference system

– Detects system failures within complex mission critical machineries

– Designed to run in real-time in embedded and distributed systems

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 35

Knowledge-based Systems and

Deductive DB

• Knowledge-based Systems and Deductive Databases

– Coming to your lecture hall in Summer Semester 2009

• Featuring

– Fun with logics

– Really clever systems

– Databases which can cureinfections, repair spacecraftsand drill for oil

– And of course the Semantic Web

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 36

Knowledge-based Systems and

Deductive DB

• Functional dependencies may be used to further

specify semantic properties of a relational

schema

• We assume that a schema is given by

– Some relations, their attributes and domains

– For each relation, there is a primary key

– For each relation, there is a set

of functional dependencies

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 37

13.2 Normal Forms

• Schemas can be classified to adhere to a certain

normal form

• Part of a schema design process is to choose a

desired normal form and convert the schema into

that form

– There are 5 normal forms (1-NF to 5-NF)

• The higher the number, the stricter the properties

– Schemas which do not follow any of the normal forms

may show very anomalous behavior

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 38

13.2 Normal Forms

• The normalization process was first introduced

by E. Codd in 1972

– Tests whether a relational schema satisfies the

conditions of a given normal form

– If not, the schema can be modified such that the

condition is fulfilled

– Normalization increases the quality and stability of

the schema design

• Normal forms remove redundancy

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 39

13.2 Normal Forms

• What problems may arise if you do not normalize?

– Example Scenario: A single table for storing heroes and

super teams

• Attention: This schema is not good!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 40

13.2 Normal Forms

hero_id team_id hero_name team_name join_year

1 1 Thor The Avengers 1963

2 2 Mister Fantastic Fantastic Four 1961

3 1 Iron Man The Avengers 1963

4 1 Hulk The Avengers 1963

5 1 Captain America The Avengers 1964

6 2 Invisible Girl Fantastic Four 1961

• A normalized (good) schema would look like this:

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 41

13.2 Normal Forms

hero_id team_id join_year

1 1 1963

2 2 1961

3 1 1963

4 1 1963

5 1 1964

6 2 1961

hero_id hero_name

1 Thor

2 Mister Fantastic

3 Iron Man

4 Hulk

5 Captain America

6 Invisible Girl

team_id team_name

1 The Avengers

2 Fantastic Four

• Each entity type/ relationship type has its own relation

• In each relation, there are only dependencies from the key to non-key attributes

• In case of badly designed, non-normalized

schemas, several problems occur due to data

redundancy and superfluous dependencies

– Insertion Anomalies

– Deletion Anomalies

– Modification Anomalies

– Superfluous NULL-values

– Spurious Tuples

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 42

13.2 Normal Forms

• Insertion Anomalies

– Imagine you want to add a new hero without a

team

• In schema A, this is not easily possible as you have to make

up a key value for the team id and fill all team-related

attributes with NULL

• Within schema B, this is no problem at all

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 43

13.2 Normal Forms

hero_id team_id hero_name team_name join_year

7 -1 Spiderman NULL NULL

hero_id hero_name

7 Spiderman

– You want to add a new hero to the “Fantastic Four”

• In schema A, you need to replicate the team name

attribute to avoid consistency problems

• In schema B, no replication is needed. Just add a new hero

and a new hero-team-assignment

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 44

13.2 Normal Forms

hero_id team_id hero_name team_name join_year

8 2 The Thing Fantastic Four 1961

hero_id hero_name

8 The Thing

hero_id team_id join_year

8 2 1961

• Deletion Anomalies

– Similar to insert anomalies

– What happens to heroes if you delete a team?

What happens if you delete the last hero of a team?

• Remove them too?

• Introduce NULL values and fake primary keys?

• Modification Anomalies

– During modification, consistency has to be ensured

• e.g. if you change a team name, multiple tuples have to be

changed

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 45

13.2 Normal Forms

• Spurious Tuples

– Spurious Tuples are the result of particularly poor

design

– Characteristics: Two tables have intersecting

attributes such that when applying a natural join,

invalid tuples are generated

• i.e. there are matching attributes which are not in a

foreign key - primary key combination

• Usually, a result of carelessly decomposing

larger relations

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 46

13.2 Normal Forms

• Consider following (inapt) relations

• Performing a natural join results in

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 47

13.2 Normal Forms

hero_id team_id real_name team_name

1 1 Thor Odinson The Avengers

2 2 Reed Richards Fantastic Four

3 1 Tony Stark The Avengers

4 1 Bruce Banner The Avengers

5 1 Steve Rogers The Avengers

6 2 Susan Storm Fantastic Four

alias team_name

Iron Man The Avengers

Invisible Girl Fantastic Four

Thor The Avengers

Hulk The Avengers

Captain America The Avengers

Mister Fantastic Fantastic Four

HeroToTeamNames

hero_id team_id real_name team_name alias

1 1 Thor Odinson The Avengers Iron Man

2 2 Reed Richards Fantastic Four Invisible Girl

3 1 Tony Stark The Avengers Iron Man

4 1 Bruce Banner The Avengers Iron Man

… … … .. ..

Spurious Tuples

Same attribute but no foreign key

• These observations can be summarized in some

informal design guidelines

• Guideline 1:

– Design a schema such that it is easy to explain. If

possible, a relation should only represent one entity

type or relationship type

• Guideline 2:

– Design the schema such that there are no update,

delete, or insert anomalies

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 48

13.2 Normal Forms

• Guideline 3:

– Avoid attributes in base relations which regularly have

NULL values

• Guideline 4:

– Design relations in such a way that a natural join can

be applied without creating spurious tuples

• These informal guidelines can also be formalized

– This is what normalization does!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 49

13.2 Normal Forms

• Database technology has eased the handling of relational data and provides efficient querying– Typical queries

• List the names of all bookstore with more than ten thousand titles

• List the names of the customers with highest sales in the year 2007

• But what about queries with a spatial dimension?– List all bookstores within ten

miles of Hannover

– List the average amounts for purchases of customers who live in Braunschweig and its adjoining area

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 50

GIS

• A Geographical Information System (GIS) is

any information system capable of providing

geographically referenced information

– This includes integrating, editing, analyzing, sharing, and

displaying information

• For storing and querying the information a

specialized spatial database is used

– Highly optimized to store and query data related to

objects in space, including points, lines and polygons

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 51

GIS

• In 1854, John Snow depicted a cholera outbreak in London

– Points on a map represented the locations of individual cases

– The study of the distribution of cholera led to the source of the disease, a contaminated water pump in the middle of the cholera outbreak

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 52

GIS

• Geo-Information Systems

– Coming to your lecture hall in Summer Semester

2009

• Featuring

– The art of creating and storing

a map

– Finding exotic stuff in places

you don‟t know

– Build a GPS for your car…

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 53

GIS

• To characterize the normal forms, we will need

the following concepts

– Functional dependencies

– Superkeys (non-minimal keys), candidate keys (any

key of a relation), primary keys, secondary keys

– Prime attributes

• Prime attributes are all those attributes which are within

any candidate key

– Nonprime attributes

• Are those attributes which are no prime attributes…

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 54

13.2 Normal Forms

• The earliest proposed normal forms are1-NF to 3-NF– 1972 by Codd

– They are hierarchical • A schema in 3-NF is also in 2-NF, a schema

in 2-NF is also in 1-NF

• This is just by convention, not due to their properties

– 1-NF• Removes multi-valued attributes

– 2-NF• Enforces full functional dependency

– 3-NF• Removes transitive dependencies

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 55

13.3 1-NF to 3-NF

• First Normal Form (1-NF)

– Remember lecture 5

– Restricts relations to being “flat”

• Only atomic attributes are allowed

– Multi-values attributes must be normalized, e.g., by

A) Introducing a new relation for the multi-valued attribute

B) Replicating the tuple for each multi-value

C) If the maximum-number is known, introducing an own

attribute for each multi-value

• The first solution is usually considered the best

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 56

13.3 1-NF

• Introducing a new relation for the multi-valued

attribute

– Uses old key and multi-attribute as composite key

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 57

13.3 1-NF

hero_id hero_name powers

1 Storm weather control, flight

2 Wolverine extreme cellular regeneration

3 Phoenix omnipotence, indestructibility, limitless energy manipulation

hero_id power

1 weather control

1 flight

2 extreme cellular regeneration

3 omnipotence

3 indestructibility

3 limitless energy manipulation

hero_id hero_name

1 Storm

2 Wolverine

3 Phoenix

• Replicating the tuple for each multi-value

– Uses old key and multi-attribute as composite key

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 58

13.3 1-NF

hero_id hero_name powers

1 Storm weather control, flight

2 Wolverine extreme cellular regeneration

3 Phoenix omnipotence, indestructibility, limitless energy manipulation

hero_id hero_name powers

1 Storm weather control

1 Storm flight

2 Wolverine extreme cellular regeneration

3 Phoenix omnipotence

3 Phoenix indestructibility

3 Phoenix limitless energy manipulation

• If the maximum-number is known, introducing an

own attribute for each multi-value

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 59

13.3 1-NF

hero_id hero_name powers

1 Storm weather control, flight

2 Wolverine extreme cellular regeneration

3 Phoenix omnipotence, indestructibility, limitless energy manipulation

hero_id hero_name power1 power2 power3

1 Storm weather control flight NULL

2 Wolverine cellular regeneration NULL NULL

3 Phoenix omnipotence indestructibility limitless energy manipulation

• The Second Normal Form (2-NF)

– The second normal is based on the concept of full

functional dependencies

– A dependency X→Y is full functional iff

• ∀ A ∈ X : (X \ {A}) ↛ Y

• i.e. there is no attribute in X such that the dependency still

holds after removing it

• A dependency which is not full functional is called a partial

dependency

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 60

13.3 2-NF

• Definition:A relation schema is in 2-NF if it is in 1-NF and every non-prime attribute is full functionally depending on the primary key

– Non-prime attributes are those which are not part of any candidate key

• If the relation has a non-composite primarykey and is in 1-NF, it is always also in 2-NF

– 2-NF is violated, if there is a composite key and any dependency between a non-prime attribute and a component of the primary key

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 61

13.3 2-NF

• Normalization into 2-NF is archived by breakingthe relation into sub-relations– Sub-relations consist of the components of the

primary key and all their full functional dependent attributes

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 62

13.3 2-NF

hero_id team_id hero_name team_name join_year

1 1 Thor The Avengers 1963

2 2 Invisible Girl Fantastic Four 1961

3 1 Iron Man The Avengers 1963

hero_id team_id join_year

1 1 1963

2 2 1961

3 1 1963

hero_id hero_name

1 Thor

2 Mister Fantastic

3 Iron Man

team_id team_name

1 The Avengers

2 Fantastic Four

• The Third Normal Form (3-NF)– The third normal form removes all transitive

dependencies• A dependency X→Y is transitive if there is a set Z such that Z is

neither a candidate key nor a subset of any key and X→Z and Z→Y

– Definition:A relation schema is in 3-NF, if it is 2-NF and no nonprime attribute is transitively dependent on the primary key

– Alternative Definition:A relation schema R is in 3-NF if, whenever there is a non-trivial functional dependency X→A, then X is a superkey of R or A is a prime attribute of R

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 63

13.3 3-NF

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 64

13.3 3-NF

• Normalization works by breaking the relation at

the transitive dependency

hero_id hero_name home_city_id home_city home_city_flag_symbol

11 Professor X 563 New York Sea, Ships & Sun

12 Wolverine 782 Alberta Crops & Mountains

13 Cyclops 112 Anchorage Anchor

14 Phoenix 563 New York Sea, Ships & Sun

hero_id hero_name home_city_id

11 Professor X 563

12 Wolverine 782

13 Cyclops 112

14 Phoenix 563

home_city_id home_city home_city_flag_symbol

563 New York Sea, Ships & Sun

782 Alberta Crops & Mountains

112 Anchorage Anchor

• Also, there is a stricter version of the 3-NF

– Boyce-Codd Normal Form (BCNF) (1974)

• which was actually invented by Ian Heath 3 years before

Boyce-Codd…

– All BCNF schemas are also in 3-NF, and most 3-NF

schemas are also in BCNF

• There are some rare exceptions

– Definition:

A relation schema R is in BCNF if, whenever there is

a non-trivial functional dependency X→A,

then X is a superkey of R

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 65

13.3 BCNF

– Thus, the definition looks very similar to the

definition of 3-NF

• Difference: A is not allowed to be a prime attribute

– A schema in 3-NF is not BCNF if all of the following

conditions hold

• All candidate keys in the relation are composite keys

(that is, they are not single attributes)

• There is more than one candidate key in the relation

• The keys are not disjoint, that is, some attributes in the

keys are common

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 66

13.3 BCNF

• Example: Given a table with students, a topic, and the respective advisor

– Lets assume following dependencies hold:

• {student , topic} → {advisor}

• {advisor} → {topic}

• i.e. For each topic, a student has a specific advisor. Each advisor is responsible for a single specific topic.

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 67

13.3 BCNF

Student Topic Advisor

100 Math Gauss

100 Physics Einstein

101 Math Leibniz

102 Math Gauss

– Thus, we could have following candidate keys

• (student, topic), (student, advisor),

– The relation is in 3-NF, because there are no

transitive dependencies of non-prime attributes

• However, there are transitive dependencies involving prime

attributes: it is not in BCNF

• The table has still deletion anomalies

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 68

13.3 BCNF

Student Topic Advisor

100 Math Gauss

100 Physics Einstein

101 Math Leibniz

102 Math Gauss

If you delete this, all information about Leibniz doing math is lost

• Normalized solution: Decompose tables

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 69

13.3 BCNF

Student Advisor

100 Gauss

100 Einstein

101 Leibniz

102 Gauss

Advisor Topic

Gauss Math

Einstein Physics

Leibniz Math

• Summary 1-NF to BCNF

Normal Form Test Normalization

1-NF There must be no non-atomic attribute values

Create new relation for attribute OR use replication OR introduce new attributes

2-NF For composite primary keys, there must be no attributes depending on only a key component

Decompose relation and create a new one for each partial key and its depending attributes

3-NF There must be no transitive dependency between a no-key attribute and the key

Decompose and set up a relation that includes those non-key attributes which are depending on other non-key attributes

BCNF There must also be no transitive dependencies among attributes of different candidate keys.

Further decompose the relation.

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 70

13.3 1-NF to BCNF

• The fourth NF prohibits non-trivial multivalued

dependencies (or a multivalued dependency

depends on a superkey)

– There are no two (or more) attributes in a 1:n

relationship with the key

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 71

13.3 Higher Normal Forms

Student Advisor Fav. Color

100 Gauss green

100 Einstein green

101 Leibniz red

102 Gauss blue

102 Gauss red

Student Advisor

100 Gauss

100 Einstein

101 Leibniz

102 Gauss

Student Fav .Color

100 green

101 red

102 blue

102 red

4 NF

• The fifth NF simplifies relations such that the

original relation can be restored using projections

and joins

– Every additional split of relation would lose

information

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 72

13.3 Higher Normal Forms

Student Advisor Course

100 Gauss Math

100 Einstein Physics

101 Gauss Physics

Student Advisor

100 Gauss

100 Einstein

101 Gauss

Advisor Course

Gauss Math

Gauss Physics

Einstein Physics

Student Advisor Course

100 Gauss Math

100 Gauss Physics

100 Einstein Physics

101 Gauss Math

101 Gauss Physics

• Usually, a schema in a higher normal form is

better than one in a lower normal form

– However, sometimes it is a good idea to artificially

create lower-form schemas to, e.g. increase read

performance

• This is called denormalization

• Denormalization usually increases query speed and

decreases update efficiency due to the introduction of

redundancy

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 73

13.3 Denormalization

• Often, denormalization is facilitated with materialized views

– See lecture 9

– Example: Students and average exam results are regularly needed – create a materialized view!• Join and aggregation are expensive operations that now can be omitted

• But for every update, the materialized view may also require an update

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 74

13.3 Denormalization

matNr firstName lastName sex

1005 Clark Kent m

2832 Louise Lane f

matNr crsNr result

1005 100 3.7

2832 102 2.0

1005 101 4.0

2832 100 1.3student avg result

Louise Lane 1.65

Clark Kent 3.85

• For business-oriented data relational systems build a good foundation, but…

– There is a huge flood of updates in productive databases

– For data analysis purposes data often has to be transformed and aggregated

– Reports have to be generated quickly to support important decisions

• Should such stress be put on top of operational database systems’ workloads?

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 75

Data Warehousing

• Basic idea: Don‟t put stress on your crucial DBMSs, but use a second independent system

– Data Warehouses

• provide a unified view of business data and

• provide retrieval of data without slowing down the operational systems

• facilitate decision support system applications such as trend reports or market analysis

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 76

Data Warehousing

• A data warehouse provides a common data model for all data of interest regardless of the data's source

– Data is usually scattered over several systems in companies: sales invoices, order receipts, production data, etc.

– For reporting and analysis the data would have to be retrieved from each respective source, transformed into a common model and then aggregated

• Before loading into the warehouse all data can be cleaned

– Inconsistencies can be identified and resolved

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 77

Data Warehousing

• Global enterprise data models are optimized for efficient

retrieval needed for timely decision support

– The most simple schema is the star model

– The model consists of a (few) central fact tables that are are connected to

multiple dimensions

– All dimensions are denormalized with each dimension being represented

by a single table

• If dimensions are normalized into several related tables with minimized redundancy, the

snowflake model evolves

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 78

Data Warehousing

• Queries on data warehouses follow the paradigm

of online analytical processing (OLAP)

– Exploiting the multidimensional data model of

the warehouse allows for complex analytical and ad-

hoc queries with a rapid execution time

– The heart of any OLAP system is

an OLAP cube consisting of

numeric facts called measures

that are categorized by dimensions

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 79

Data Warehousing

• Data Warehousing

– Coming to your lecture hall in Summer Semester

2009

• Featuring

– New and exciting ways to store

your data!

– State enormous queries!

– Mine your Data in seconds!

– Learn stuff which is important

for industry!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 80

Data Warehousing

• A) What is the difference between an UDF, trigger,

and a procedure? (3 P)

– Triggers are called automatically on an event

– UDFs may be used within an SQL statement

– Stored procedures represent full SQL statements or

external programs

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 81EN 3

Exercise 1 - A

• B) Could a trigger be replaced by a constraint or

vice versa? (4 P)

– A constraint can be replaced by a trigger, but

not all triggers can be replaced by a constraint

– Constraints can only reject actions, if a condition is

not met, while triggers might perform pretty much

any kind of action if an event is triggered

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 82EN 3

Exercise 1 - B

• C) What types of constraints are there? (4 P)

– Static integrity constraints

• Bound to a correct DB state (e.g., data types, key

constraints, value domains)

– Dynamic integrity constraints

• Transitional integrity constraints are bound to a

change in the DB state (e.g., update, insert, delete)

• Temporal integrity constraints are bound to a

sequence of DB states (e.g., transactions, periodical checks)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 83EN 3

Exercise 1 - C

• D) Briefly explain Event-Condition-Action. (3 P)

– In case of a specific event (e.g. DELETE on table

heroes), you check whether a specific condition is

met (nr of entries < 5). If the condition is met, you

perform an action (restrict statement and return an

error)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 84EN 3

Exercise 1 - D

• E) What is the base idea of SQL injection? Very briefly provide the techniques preventing those attacks. (5 P)

– The base idea is to provide to the DBMS SQL-statements as “user input” so that those statements are executed.

– To prevent those attacks, you should:

• Sanitize the input i.e. restrict the input to values you expect

• Quote and escape the input

• Use strong types (cast every input to its intended type)

• Use Prepared Statements (don‟t allow injection)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 85EN 3

Exercise 1 - E

• Provide a constraint such that it is not possible

that a given student participates in an exam of

one lecture more often than three times.

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 86

Exercise 2 - A

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 87

Exercise 2 - A

ALTER TABLE Results ADD CONSTRAINT noMoreThan3CHECK (

(SELECT max(counts) FROM(SELECT count(*) AS countsFROM Results GROUP BY matNr, crsNr )

) < 4 )

• In standard SQL, the solution would look like below– Unfortunately, this query is too complex for many DBMS

(like DB2 which just forbids subqueries in check-clauses)

– In that case, a trigger must be used to solve the problem…

• Write some triggers and table which perform an

audit trail on Result.

– i.e. log all changes to the data (updates, adds, deletes)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 88

Exercise 2 - B

CREATE TABLE ResultAudit(date timestamp, type int,oldMat int, oldCrs int, oldRes double, newMat int, newCrs int, newRes double)

• Updates

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 89

Exercise 2 - B

CREATE TRIGGER updateResultsAFTER UPDATE ON ResultsREFERENCING NEW AS new OLD AS old FOR EACH ROWINSERT INTO ResultAudit VALUES

(current timestamp, 0old.matNr, old.crsNr, old.result, new.matNr, new.crsNr, new.result)

• Inserts

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 90

Exercise 2 - B

CREATE TRIGGER insertResultsAFTER INSERT ON ResultsREFERENCING NEW AS newFOR EACH ROWINSERT INTO ResultAudit VALUES

(current timestamp, 1null, null, nullnew.matNr, new.crsNr, new.result)

• Deletes

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 91

Exercise 2 - B

CREATE TRIGGER deleteResultsAFTER DELETE ON ResultsREFERENCING OLD AS oldFOR EACH ROWINSERT INTO ResultAudit VALUES

(current timestamp, 2old.matNr, old.crsNr, old.result, null, null, null)

• Write a Grant-statement providing full access

rights including the right to also grant access to a

user called „Magneto‟ for the table „students‟.

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 92

Exercise 2 - C

GRANT ALL ON students TO USER ‘Magneto’ WITH GRANT OPTION


Recommended