Database management. Teréz A. Várkonyi varkonyi.teri@nik.uni-obuda.hu +361 666 57 29 Antal Bejczy...

Post on 18-Jan-2016

216 views 1 download

Tags:

transcript

Database management

• Teréz A. Várkonyi• varkonyi.teri@nik.uni-obuda.hu• +361 666 57 29• http://uni-obuda.hu/users/varkonyi.teri• Antal Bejczy Center for Intelligent Robotics• 82 Kiscelli str.

Requirements

• Excercise for the semester• 3 lab tests• 2 theoretical tests• Oral exam• Moodle: elearning.uni-obuda.hu• Teacher changes

Brief contents

• Information processing• Database management systems• Relational data model• Basic terms• Designing a database• Normalization• Relation decomposition

Information processing

Information processing

• Information: from which data can be derived– Oral– Written or printed– Electronic

• Needs systematization

Steps of information processing

• Gathering useful facts: what kind of knowledge do we need?

• Encoding: e.g. linguistic, magnetic, electronic, etc. (useful for those who know the code)

• Recording• Utilizing: searching, sorting, grouping, finding

correspondence

Difficulties

• What if the collected data is not correct?• Information is power – keep it secret!

– Bodyguard?– Secret code?– Firewall?

• Data transmission

Advantages of computers

• XVIII. century – Hollerith card– numerical calculations– 1890: census in the USA

• 2nd generation: compilers, bigger capacity, first real info processing applications

• 3rd generation: operating systems, bigger storage capacity, parallel computing, more opportunities

Database Management Systems

Database Management Systems

• New concept: gathering all the data and correspondence to one integrated database

• Answer the questions with this DB

• Giant data collection (needs storing, processing)• Basic elements

– Entity: e.g. students, courses, etc.– Relationships: e.g. David attends Database

management

DBMS – what for?

• E.g.: registries, banks, facebook• Amount of data in business sector in 2012:

360 GB/person• Data independence and effective searching• Data integrity, safety• Unified administration• Concurrent access, fault tolerant process

(quick reboot after crash)

• Control of replication• Support of quick application development• Standardization

– Methods– Programs– Access– Etc.

Why learn database management?

• Variety of tasks it can solve• Information processing: increased need• Quantity and heterogenity of data is growing

every day– Digital libraries, interactive videos, e-trade, sensor

nets, telecommunication• Many areas are used from computer science

– OS, programming, theory, AI, multimedia, logic

Motivation and tool

• Number of users and tasks increases• Need for the ad hoc type serving• Need for the uniform serve• Course costs 2000€• Database approach

– Every data and relationship in one database– Serve everyone from the same database– Access might be limited (not to the whole DB)

Basic principle

• We start from the data we have and• we collect every entity and their relationship• into one integrated database.• Every user can use this or• the part of this database for answering

questions.

Basic terms

• Data model: notion collection describing the data

• Schema: Describing a dataset with a data model

• Relational data model: nowadays mostly used– Relation: table– Every relation has a schema describing the

structure of the relation (attributes)

Example – relational data model

• Relational data model: columns (attributes), rows (records), tables (entities), relations between them

• Schema: Table COURSES with attributes NEPTUN, Course name and Teacher

NEPTUN Course name Teacher

NAIAB0SEND Database managementTeréz A. Várkonyi

NAMIK1ERNMKinematics and Dynamics of Industrial Robots Péter Zentay

Data representation

• What are the boundaries of the questions the database can answer?

• Models the real world: mini world with limitations• 3 levels:

– Conceptual model: A world described by the DB– Implementation/representation model: a model

understandable for the DBMS (structured records, tables, fields, etc.)

– Physical model: DBMS implemented on the computer (files, programs)

Structure of the DBMS

• The user has permission for the smaller part of the DB: View

Conceptual model

Implementation model

Physical model

View1 View2 View3

Example: university DB

• Conceptual model:– Student (sid: string, name: string, age: integer,

cumulative average: real)– Subject (subid: string, sname: string, credit:

integer)– Registration (sid: string, subid: string, mark:

integer, date: date)

Example: university DB

• Implementation model:– Create table subject

(subid varchar(10) not null primary key,sname varchar (50) not null,credit int not null

)• Physical model: files containing unsorted data• View: Teachers can see info about their own

courses

Relational data model

Relational data model

• 4 types of data model– Hierarchic data model (data trees, 1:N relations)– Net data model– Relational data model– Object oriented data model

Relational data model

• Relation:table+constraints

• Column headers:attribute/domain

• Rows:data records/tuples

• Database:Set of tables

Relationship

• 1:1 (one to one) relationship– Person&ID number– Husband&wife

Rare in real world

Relationship

• 1:N (one to many) relationship:– Mother&Children– Owner&Cars

Person Owns Car

Relationship

• M:N (many to many) relationship– Actor&plays– Teachers&Students

Actor Acts Play

Requirements

• There cannot be two identical row or column• The order of the column cannot carry

information• Superkey: set of attributes that

unambiguously defines the other attributes of every row (NEPTUN+semester for students)

• Key: Superkey which cannot be reduced

Keys as frame

• Basic terms: Primary key (Person:ID), foreign key(Owns:Owner’s ID), simple/composite key(Person:ID/Owns:Owner ID+Car’s plate)

• System of keys=frame of databaseOWNERS

ID Name

1 John Doe

2 Jane Doe

CARS Plate Type

OMW-123 Porsche

ABC-234 Porsche

DEF-456 Ferrari

OWNS Plate Owner

OMW-123 1

ABC-234 1

DEF-456 2

Relational algebra

Mathematics

NOOOOOOO!

Basic terms

• Elements: , , 𝑎 𝑏 𝑐• Sets: , , 𝐴 𝐵 𝐶• Defining a set:

– enumeration: ={ , , }, thus 𝐴 𝑎 𝑏 𝑐 𝑎∈𝐴– rules: ={ | ≥100 ≤1000}𝐵 𝑥 𝑥 ∧𝑥

• Subset: , ha :𝐴⊂𝐵 ∀𝑎∈𝐴 𝑎∈𝐵• Ordered set (vector): = , , 𝑞 ⟨𝑎 𝑏 𝑐⟩• Descartes-multiplication: × , e.g. ={0,1},𝐴 𝐵 𝐴

={ , }, then × ={ 0, , 0, , 1, , 1, }𝐵 𝑎 𝑏 𝐴 𝐵 ⟨ 𝑎⟩ ⟨ 𝑏⟩ ⟨ 𝑎⟩ ⟨ 𝑏⟩

Attributes, dependencies, keys

• Attributes: sets A, B, C, D, E• Entities ~ set of attributes: • Dependency: function

„others” {C,D,E} depend on the key {A,B}• Key: A,B

– Simple– Composite

• Secondary attributes: C, D, E

Example

• fworker={name, institute}{salary, room}• Key: name, institute• Secondary: salary, room• fworker={name, institute}{salary}

• fworker={name, institute}{room}

Operation with dependecies

• Unify: left hand side is equivalent

• Compositionfworker={name, institute}{salary, room}fworker={name, institute}{salary}fworker={name, institute}{room}

&

Relational schema

• Descartes multiplication of given attributes and dependencies

• Gives the structure of the database• Relation: the tables with data that fulfills the

schema– Columns: attributes– Rows: records

Connection of relations, foreign key

• Relation r’s attributes can be extended to relation s if the attributes of the key of s (Ks) are attributes of r

• Ks is called foreign key in relation r if– It is primary key in s– The values in r exist in s

• Relationship of s and r is 1:N

Example

OWNERS (S)ID Name

1 John Doe

2 Jane Doe

CARS (R) Plate Type Owner

OMW-123 Porsche 1

ABC-234 Porsche 2

DEF-456 Ferrari 1

Anomalies

• Insertion anomaly: Superkey needs too much data, some is missing, cannot be inserted. Solution: Reduce superkey to key.

• Update anomaly: a value exists in several places in the database, it has to modified in each places. Solution: Store data in one separate table and modify once.

• Deletion anomaly: By deleting a row, other important information is lost (different objects stored in one table – not good).

Example

• Relation={product_code, product_name, product_description,price,supplyer, supplyer_address}

• Update anomaly: change in the address, modify it everywhere

• Insertion anomaly: new product without price• Deletion anomaly: lost contact with supplyer.

Shall we delete the products also?

Example no. 2

• Teachers(ID, name,address,telephone, course_name, semester/hours, requirements)

• Two entity sets in one relation• Solution: divide into relations

– Teachers(ID,name,address,telephone)– Courses(course_name,semester/

hours,requirements)– Teach (teacher_id,course_name)

Database normalization

Steps of designing a database

• Collect the attributes to be stored!• Write down the dependencies!• Know very well your data model!• To avoid anomalies, normalize!

1 NF

• Every value in every row is a single value• Does not contain embedded tables/records• Oracle 8 supports embedded data• Be careful

Example

Name Field of research

T. A. Várkonyi Mathematics

T. A. Várkonyi Computer science

Name Field of researchT. A. Várkonyi Mathematics,

Computer science

2 NF

• 1 NF and there cannot be data in the relation that depends only on the part of the key (no partial functional dependency)

• Example: Order(date, buyer_ID, product_ID, product_no,product_description, comments)

• key: date,buyer_ID,product_ID• product_ID product_description• Solution: create a new table for the products

(product_ID,product_description)

3 NF

• 2 NF and there is no secondary attribute in the relation that depends on a secondary attribute (no transitive dependency)

• Example: soft_drink(name,bottle_type, manufacturer_name,manufacturer_address)

• Key: name,bottle_type• Manufacturer_namemanufacturer_address

BCNF (Boyce-Codd NF)

• 3 NF and there is no subset of the key that depends on other key or secondary attribute

• Example: let’s assume that every teacher has only one course to teach: {teacher, year} course

• Neptun(teacher,year,semester,course,headcount)• Keys:

– teacher,year,semester

– course,year,semester

Teacher

Teacher Year Semester Course Headcount

TA Varkonyi 2014/2015 1 Database m. 25

Zsolt Szabo 2014/2015 1 Database lab 25

TA Varkonyi 2014/2015 2 Database m. 25

Zsolt Szabo 2014/2015 2 Database lab 25

TA Varkonyi 2014/2015 3 Database m. 25

Solution

Year Semester Course Headcount

2014/2015 1 Database m. 25

2014/2015 1 Database lab 25

2014/2015 2 Database m. 25

2014/2015 2 Database lab 25

2014/2015 3 Database m. 25

Teacher Year Course

TA Varkonyi 2014/2015 Database m.

Zsolt Szabo 2014/2015 Database lab

TA Varkonyi 2014/2015 Database m.

Zsolt Szabo 2014/2015 Database lab

TA Varkonyi 2014/2015 Database m.

Conclusions

• If a relation is in 0 NF (can be put in tables) and does not contain multiple field then it is in at least 1 NF

• If a relation is in 1 NF and does not contain partial functional dependency then it is in at least 2 NF

• If a relation is in 2 NF and does not contain transitive functional dependency then it is in at least 3 NF

3 NF vs. BCNF

• A 3 NF is not in BCNF if– There are several possible keys,– these keys are composite, and– there is a common attribute in the keys

Decomposition of relations

Motivation

• Decompose the original relation to several relations to avoid anomalies

• Question: does the new database describe the original model?

• Decomposition has to– Be lossless– Preserve dependencies

Finding the key

• Attribute set is key-candidate of relation , if– functional dependency stands– There is no subset of that could determine the

other attributes of relation R.

Superkey

• Extending the key with secondary attributes• Not minimal key• Attribute sets that contain key-candidates

Closure of an attribute set

• To find new relationships• Closure of attribute set based on functional

dependency set:

– Let’s find dependency from so that but . So let’s extend:

– Repeat this until there is no possibility to extend X.

Armstrong axioms

• To find new dependencies in a relation.• , , and are attribute sets• A functional dependency is reflexive:

If then (a key defines its own attributes)

• A funtional dependency is transitive:If and then

• A funtional dependency is augmentive:If then {,}{,}

Dependency preservation

• After decomposition, the originial dependencies can be infered from the new relation’s dependencies.

• Def.: Decomposition of relation R preserves dependency according to dependency set F, if we can logically deduce F from the union of the dependencies of (e.g. by Armstrong axioms or closure).

Lossless decomposition

• By uniting/jointing the decomposed tables, the original tables before normalization can be created

• 3 NF and BCNF are always lossless

BUT!

• BCNF does not always preserve dependencies

Preserve dependencies - check

Wrong example

• , • Decomposition of : • Non-trivial dependency in : (transitive, see

Amstrong axioms)• Non-trivial dependency in : • By uniting dependency sets and : .• {} cannot be deduced!

Good example

• , • Decomposition of : • Non-trivial dependency in : • Non-trivial dependency in : • By uniting dependency sets and : , original is

gained

Example - BCNF

• (City,Street,Postal code)

– not BCNF because key C depends on not-key P• Decomposition of to BCNF: , • Lossless (see the proof later)• Non-trivial dependency in : • Non-trivial dependency in : • First dependency is lost.

Conclusions

• BCNF does not always preserve dependencies• 3 NF always preserves dependencies and is

always lossless• Use 3 NF and check if BCNF preserves

dependency

Lossless - check

Lossless

• By uniting/jointing the decomposed tables, the original tables before normalization can be recreated

• The decomposition cannot lead to bad database structure:– 3 NF and BCNF are always lossless, otherwise

there’s no reason to normalize

Example – information loss

Model Name Price Category

a11 100 Canon

s20 200 Nikon

a70 150 Canon

Model Name Category

a11 Canon

s20 Nikon

a70 Canon

Price Category

100 Canon

200 Nikon

150 Canon

𝑅1 𝑅2

𝑅

Recomposition

• Red lines are not in the original relation

• How could we separate?

Model Name Price Category

a11 100 Canon

a11 150 Canon

s20 200 Nikon

a70 100 Canon

a70 150 Canon

𝑅1∪𝑅2

𝑅 Model Name Price Category

a11 100 Canon

s20 200 Nikon

a70 150 Canon

Check losslessness of a decomposition

• Let the decomposition of relation be and let be their dependeny set. Let’s create table T:– Number of rows:= number of relations in D (m). 1

row/1 relation.– Number of columns:=number of attributes in the

original relation.• , if kth attribute exists in ith relation• , otherwise.

Solution – cont.

• Iteration: Let’s apply the elements of dependency set : – In table T, if there are two identical rows in the

columns of X, then let’s modify the columns of Y: for each column, if one of the (two) values is a(i), then its pair has to be modified to a(i). If both are b(i,k), then modify one of them to be equal to its pair.

• Decision: Finally, if there is at least one row which contains only s, then the composition is lossless. Otherwise, not.

Example

Creating table T

First dependency

Second dependency

Third dependency (unnecessary)

BCNF - example

• Fproduct: {ID} {Name, Price, VATtype}

• Forder: {OrderID} {Address}

• Fquantities: {ID, OrderID} {Quantity}

• FVAT: {VATtype} {VAT %}

ID Name Price VATtype VAT % OrderID Quant. Address

Example – cont.ID Name Price VATtype VAT % OrderID Quant. Addr.B(1,1) B(1,2) B(1,3) B(1,4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) B(2,6) B(2,7) B(2,8)

B(3,1) B(3,2) B(3,3) B(3,4) B(3,5) B(3,6) B(3,7) B(3,8)

B(4,1) B(4,2) B(4,3) B(4,4) B(4,5) B(4,6) B(4,7) B(4,8)

ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) B(3,2) B(3,3) B(3,4) B(3,5) A(6) A(7) B(3,8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)

Example – cont.

• Fproductk: {ID} {Name, Price, VATtype}

ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) B(3,2) B(3,3) B(3,4) B(3,5) A(6) A(7) B(3,8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)

ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) B(3,8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)

Example – cont.

Forder: {Quantity} {Address}

ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) B(3,8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)

ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) A(8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)

Example – cont.

• Fquantities: {ID, OrderID} {Quantity}

ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) A(8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)

Example – cont.

• FVAT: {VATtype} {VAT %}

ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) A(8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)

ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) A(5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) A(2) A(3) A(4) A(5) A(6) A(7) A(8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)

Thank you for your attention!