Database Modelling Week 2: Directed Reading Relational Database Design Nick Rossiter October 3,...

Database Modelling

Week 2: Directed ReadingRelational Database Design

Nick Rossiter

April 11, 2023 1

Learning Objectives1. To consider atomic data in relations

2. To consider data types in a relation

3. To consider missing data & NULLs in relations

4. To consider Integrity Constraints

5. To consider the principles of Candidate Keys

6. To consider Candidate Keys in SQL

April 11, 2023 2

The Information Principle

• A relation contains only data values, i.e. facts.• Each attribute-in-a-tuple

contains just a single data value.• In particular there are no pointers or OIDs.

– Pointers point to variables • because they reference the storage location of the

variable) and thus are different in nature to data values.

• The reason for this principle is to keep relations simple. – Following work by EF Codd (1970)

April 11, 2023 3

Efficiency and Human Productivity• Handling of relations is a simple abstraction:

– of what is going on at the physical level of a computer

– queries that used to take many hours to write for earlier types of DB can be written in a few minutes in a relational language.

• It would be foolish to throw away this productivity– Use computer power to increase human productivity.

April 11, 2023 4

Use of Pointers

• Pointers are often advocated as a means to make DBs more efficient.

• But their place is in the implementation of the relational model that the user uses, not in the model itself where it complicates things.

• It is important to distinguish between the relational model and its implementation.

• Object-based DBMS provide pointers at the logical level.

April 11, 2023 5

Atomic Data

• Since it is now known that an atom consists of component parts – – a nucleus of protons and neutrons with electrons

orbiting round it - perhaps atom is no longer the best word to describe this concept.

• However until the 20th century, an atom was regarded as the smallest, indivisible particle of matter possible– The tradition of using atom/atomic to describe this

concept has remained.April 11, 2023 6

Definition of Atomic Value

• Definition: An atomic value is a single, indivisible value, not a composite value or a collection of values.

• The data in one attribute of one tuple must be atomic.

April 11, 2023 7

April 11, 2023 8

Example of Atomic Data

2

4

6

8

M-S

6

2

4

8

2 S

M

D

8 M

W

5

6

7

8

EName

7

5

6

8

5 Smith

Mitchell

Robson

8 Blake

Jones

1

2

3

4

ENo

3

1

2

4

1 E3

E5

E1

4 E6

E8

2

4

6

8

Sal

6

2

4

8

2 12,500

21,000

32,500

8 54,000

68,000

Example: In this EMPLOYEE relation there is only one value in each attribute-in-a-tuple.

Atomicity Requirement• Maintains the inherent simplicity of relations, and is of great

practical benefit.• Occasionally it is not obvious whether a value is atomic or not.• For example, is a date atomic?

– On the one hand, it represents a particular day in the calendar, and thus should be atomic.

– On the other hand, it has 3 components, day-of-the-month, month and year.

• Another example is a person’s full name– should this be atomic ?– or split into its component parts, (set of) forename(s) and

surname?

April 11, 2023 9

Designer’s choice

• The DB designer must decide– based on what precisely the data means in the

context of the database and how it will be used.

• Typically a date will be regarded as atomic, through a date type predefined in the database standard SQL.

• Names are more diverse and are usually handled through definitions to suit the application.

April 11, 2023 10

April 11, 2023 11

Non-Atomic Data

Part

Bicycle

Frame

Wheel

1 1

40

Quantity

1 2

1 1 1

Component

Frame Wheel

A-frame Handlebars

Saddle

Rim Axle

Spoke

These attributes contain non-atomic data (also sometimes called repeating data)

This attribute does contain atomic data.

Example: A “relation” containing non-atomic data.

Multiple Atomic Values

• The attributes ‘Component’ and ‘’Quantity’ clearly contain multiple atomic values.

• Although not shown in the above example, it is possible that some tuples could contain single values of ‘Component’ and ‘Quantity’.

• Traditional prohibition against repeating values has weakened in the last few years. – allowing relations and other “container types” such as

arrays, records, etc to be stored as the single atomic value in an attribute.

April 11, 2023 12

Data Types

• Definition: A data type has a set of permissible values.

• Each attribute value must be drawn from the set of permissible values of the data type specified for that attribute.

• More than one attribute in a relation may draw its values from the same data type.

April 11, 2023 13

April 11, 2023 14

Example of Type Assignment

C B

1 1 1 ...

2 2 ...

3 3 ...

A

5 5 5 ....

6 6 ....

7 7 ....

2 2 2 .. 2 2 2 ...

4 4 .. 4 4 ...

6 6 .. 6 6 ...

D

Type A

1 1 1 ...

2 2 ...

3 3 ...

Type B

5 5 5 ....

6 6 ....

7 7 ....

Type C-D

2 2 2 .. 2 2 2 ...

4 4 .. 4 4 ...

6 6 .. 6 6 ...

Every attribute must have a data type, just as every variable in a program must have a type.

April 11, 2023 15

Example of Using Data Types

Text(2) any Text

{ S‘, ‚M‘, ‚W‘, ‚D‘ }

{ No > 999 AND No < 100,000 }

1 1 1 E3 5 5 5 Smith 2 2 2 S 2 2 2 12,500

ENo EName M-S Sal

2 2 E51 6 6 X47/35 4 4 A 4 4 500

Too long, 3 characters.

Wrong letter, not S, M, W or D.

Too small, less than 999. OK !

Any text.

Comments on Example

• All the attribute values in the top tuple are drawn from their respective data types– Would not be acceptable otherwise.

• Only 1 out of 4 values in the bottom tuple are drawn from their respective data types; so that tuple cannot be stored in the relation.

• If even one attribute has a value that is not drawn from its type’s set of permissible values– that tuple cannot be in the relation.

April 11, 2023 16

Data Types -- Operators

• Definition: A Data Type has a set of permissible operators.

• Examples : • Number : +, -, /, x, <, > apply to all possible

numbers• Text : length( ), sub( ), concat( ) apply to all

possible texts

April 11, 2023 17

Data Type Implementation

• A data type needs logical representations for its values and operators. – These are what the user uses. – They become part of the logical model.

• A data type needs physical representations for its values and executable code for each of its operators. – These form the implementation of the logical model.

• We need Strong Typing.• Object classes as described in traditional object-oriented

programming meet precisely these criteria. • Thus object classes are data types.April 11, 2023 18

Built-in Data Types

• Commonly needed data types are Built-In, i.e. provided with the DBMS.

• Examples: Number, Text, (and usually Date ).• Text appears as char, varchar in SQL• Also have data types for multimedia (blob,

clob in SQL)• Can also add new data types (later in module)

April 11, 2023 19

Underlying/Specific Data Types

• When specifying an attribute‘s type, there are typically 2 stages :– Specify the general kind of data required, e.g.

numbers, text, dates. • This is the Underlying Type.

– Specify that subset of it that the attribute values must be limited to,

• e.g. only the integer numbers 1 .. 1,000. • This is the Specific Type. • It uses the same operators as the underlying type.

April 11, 2023 20

April 11, 2023 21

Example of Specific Type

Underlying Type

Specific Type

Note that data types have sets of values that are permitted to be used.

Specific Type allows a subset of those in Underlying Type

Ways to Define a Mathematical Set

• Set enumeration– the values in the set are individually specified; – useful for comparatively small sets of values.

• Set comprehension– a constraint or rule applied to any potential

member of the set to see if it is in fact a member of the set

• Both these methods can be used to constrain the underlying type to get the specific type.

April 11, 2023 22

CHECK constraint

• Some underlying types have parameters – which can be used to constrain them to the

specific type.

• SQL has a Check constraint option, allows– the enumeration of the specific type within the

underlying type – and more ad hoc constraints on the underlying

typeApril 11, 2023 23

April 11, 2023 24

SQL Data Types

Examples: In Oracle SQL :-

Char(x) = text of exactly x bytes, i.e. x characters Varchar2(x) = variable length text of up to x characters/

bytes maximum Integer = Integer number of standard precision Number = Floating point number

Floating point is a physical storage format.

Bytes are a unit of physical storage.

SQL data types are often based on their physical storage.

Explanation of Types• The “2” in Varchar2(x) indicates that it is Oracle’s second version of the

variable length text type. – It has nothing to do with the number of characters in the text. – It is the standard Oracle variable length text type now.

• The text type is often known as the character string type.• SQL attributes/columns are assigned their types when the relation/table is

created with the Create Table statement.• The SQL Create Table statement is the statement used to specify all

aspects of a table. – the Alter Table statement can be used to add, delete and modify any

aspect of a table already created with a Create Table.

April 11, 2023 25

April 11, 2023 26

SQL : Example

Create the EMPLOYEE relation :-

Create Table EMPLOYEE ( ENo Char(2), EName Varchar2(30), M-S Char(1) Check( M-S in (‘S’, ‘M’, ‘W’, ‘D’ ) ), Sal Number Check( Sal > 999 AND

Sal < 100000 ) ) ;

Parameters used to constrain the underlying type to the desired specific type.

Underlying domain.

Set comprehension applied via CHECK. Set enumeration

applied via CHECK.

The SQL Create Table statement creates a table type with the required columns of specified names and specific types (which will use the operators of the underlying types), and creates the table variable EMPLOYEE of that type.

Missing Data

• In the real world, data is not always available to put in the DB.

• Reasons for this include the value being :– unknown– not available– not applicable– not yet known– undefined– to be announced– does not exist– not supplied, etc

• One paper recorded 42 different reasons! April 11, 2023 27

Solutions to Missing Data

• Use a special value to represent missing data.

Example: ‘N/A’, ‘T.B.A.’It gives the reason why the data is missing.

• The special value must have the same type as the data that is missing, so it can be stored with the data that is known.– So cannot use ‘N/A’ in numeric field

April 11, 2023 28

April 11, 2023 29

Use of NULL

NULL is the absence of a value. NULL does NOT equal 0 NULL does NOT equal ‘ ’ NULL is not a member of any type, because there is no value.

Null requires special support from the DBMS. SQL DBMSs provide this support. But which reason does it represent ?

In reality, no-one yet has come up with a good way of handling missing data that is generally accepted . These options are the main ones used in practice, but there is considerable controversy about them.

NULL terminology

• We might say :– “The attribute ‘Sal’ in a tuple has a null value.”

• This is wrong ! • Null is the absence of a value, so it can’t be a value.

– “The attribute ‘Sal’ in a tuple contains a null.” • This is better, but not very precise.

– “The attribute ‘Sal’ in a tuple is null.” • This is accurately stated.

• ‘Sal’ is null does not mean that there is no such value or that the salary is £0.

April 11, 2023 30

2-Valued Logic

• Normal Boolean logic only has the truth values true and false.

• The result (or value) of any comparison will be true or false.

• Thus, the Boolean logic operators AND, OR and NOT can be used to combine comparisons, forming an expression.– Example:- ( X = 3 AND Y < X ) OR NOT ( Y > 0 )

• The value of this expression will be true or false, and can be derived if the values of X and Y are known.

April 11, 2023 31

April 11, 2023 32

Truth Table

AND

T

T

F

F

F

F

T

F

OR

T

T

T

F

T

F

T

F

NOT

T

F

F

T

The rules which the logic operators apply to their operands to yield a Boolean result are represented by truth tables

In the tables, T and F stand for true and false respectively.

The 3 operators yield commonsense results. AND yields false unless both its operands are true, OR yields true unless both operands are false, and NOT reverses the truth value of its operand.

3-valued Logic

• With NULL, results of queries can be maybe.– For example, for X = Y, if both NULL, result is

unclear.

• We still want to use logical expressions. Therefore AND, OR and NOT must be extended to cope with maybe.

• This logic offends Gödel’s theorem on decidability.

April 11, 2023 33

April 11, 2023 34

3-valued Logic Truth Tables

AND

T

M

F

T

M

T

F

M

M

M

F

F

F

F

F

OR

T

M

F

T

T

T

T

T

M

M

M

T

M

F

F

NOT

T M

F

F

M

T

In the tables, M stands for maybe.

April 11, 2023 35

SQL : Not NULL Constraint

Example :-

Create Table EMPLOYEE ( EmpNo Char(2), EName Varchar2(30) NOT NULL, M-S Char(1), Sal Number

) ;

A name must always be provided.

Sometimes we want to prevent a column from ever holding NULLs.

To achieve this in SQL, add a “Not NULL” constraint to the definition of the column.

Advantages of NOT NULL

• Because 3-valued logic is more complex than 2-valued– it sometimes gives unexpected results– It is desirable to avoid having missing data if at all

possible. – It would be better to stick to 2-valued logic.

• Thus we often specify a “not NULL” constraint for a column to ensure that it always contains a data value in every row.

April 11, 2023 36

Integrity Constraints

Definition: An Integrity Constraint is a constraint on the values that a given DB relation is permitted to hold.

April 11, 2023 37

Purpose

• To try to ensure that the relations in the DB only hold data that is true, accurate and up-to-date.

• A constraint is a validation check that the DBMS automatically applies when a relation’s value is altered.

• The requirement that a data type be assigned to every attribute is itself an integrity constraint.

• However, more integrity constraints are possible.

April 11, 2023 38

April 11, 2023 39

Applying Constraints in SQL

Create Table EMPLOYEE ( );

Insert constraints here,

with other definitions.

In SQL, integrity constraints are usually applied when the relation is created using the Create Table statement.

But can also be enforced in the Alter Table statement

Categories of Integrity Constraints• Attribute type constraints:

– already considered.• Candidate Key constraints:

– these apply to an individual relation.• Referential Integrity constraints:

– these correlate two relations.• Ad hoc constraints:

– these apply to one or more relations.• This is not the only possible categorisation of integrity constraints

– but it is a convenient and practical one.

April 11, 2023 40

Candidate Keys

• There are no duplicate tuples in a relation, because it is a set of tuples.

So every tuple must be unique.• Often, indeed typically, the values of only one

attribute, or a small number of attributes, in a relation are sufficient to make each tuple in it unique.

• Whether it requires one attribute, several attributes, or all the attributes in a relation to make each tuple unique, that set of 1 or more attribute(s) is called a Candidate Key.

April 11, 2023 41

Candidate Keys for ID

• The candidate key attribute(s) can also be considered as uniquely identifying each tuple in the relation.

• A relation may contain two or more candidate keys.• Often a candidate key consists of just one attribute.

– For example passport number. • Nevertheless it is quite normal for a candidate key to consist of

2 or more attributes– For example student number + module number in marks

• Occasionally in an extreme case– all of a relation’s attributes form the candidate key.

• If there is only one key, it is still called the Candidate Key.

April 11, 2023 42

April 11, 2023 43

Properties of a Candidate Key

Example :- Relation ( A, B, C, D, E, F, G )

Reducible if (A, B ) are unique per tuple.

Uniqueness: No two distinct tuples may contain the same key value.

Irreducibility: No attribute can be removed from the set forming the key, and the remaining attributes still possess the uniqueness property.

Underlining attribute(s) is a common way of specifying a candidate key.

Benefits of these two Properties

• Candidate keys provide (the only) guaranteed way to find a particular tuple.

• Checks on the uniqueness of tuples can be limited to the candidate key attribute(s), giving greater efficiency.

April 11, 2023 44

April 11, 2023 45

Candidate Keys: Example (1) Example:- EMPLOYEE ( ENo, EName, M-S, Sal )

The relation has just one candidate key, ‘ENo’. Example : CAR holds details of employees’ cars that are entitled to park in

the company’s car park

CAR ( RegNo, Owner, Type )

RegNo = registration number of car, Owner = car owner, identified by an employee number, Type = type of car.

Each car could equally well be identified by its ‘RegNo’ or its ‘Owner’. So they each individually form a candidate key.

Rationale for Candidate Keys

• Relation EMPLOYEE :– Employee names are not guaranteed to be unique, – Many employees may be on the same salary. – So none of the attributes apart from ‘ENo’, either on

their own or taken in combination, is sufficient to form a candidate key for EMPLOYEE.

• Relation CAR :– A car’s registration number is guaranteed to be unique,

and so can be a candidate key. – As ‘Owner’ is an employee number which the company

will ensure is unique, that can also be a candidate key

April 11, 2023 46

April 11, 2023 47

Candidate Keys : Example (2)

CAR ( RegNo, ENo, Type )

One 2-attribute key, indicated by both attributes having the same level of underlining.

Re-consider CAR. Let us change the assumptions. The company now has a “share and park” scheme whereby a group of employees can share a car to work; a group may use several of the members’ cars.

Now neither ‘RegNo’ or ‘ENo’ on its own is sufficient to identify a tuple in CAR.

However, both together will identify any tuple. Therefore, they jointly become the only candidate key.

2-attribute candidate key

• The ‘ENo’ attribute need not represent the owner of the car in question; – s/he need not even own a car.

• We simply need to know which cars each employee may come to work in, or

• alternatively which employees may be allowed to come to work in each car.

• It is essential to be able to distinguish:– two 1-attribute candidate keys and– one 2-attribute candidate key.

April 11, 2023 48

Candidate Keys in SQL

SQL has Primary and Alternate Keys.If there is only one candidate key,

it becomes the primary key;there are no alternate keys.

If there is more than one candidate key,

choose one as the primary key;the rest become the alternate keys.

April 11, 2023 49

Making the choice• Any candidate key can become the primary key. • So choose one that makes the most practical sense.

– Usually the shortest - easiest for the user, most efficient for the computer.

• While SQL defines a Primary Key using the phrase Primary Key, it defines an Alternate Key using the word Unique!

• SQL does not make specifying at least one candidate key mandatory !

April 11, 2023 50

Define a Primary Key

• So bags can occur.• To insist on a set

– Therefore ensure every SQL table has at least a primary key.

April 11, 2023 51

SQL Key Requirements

• SQL requires that a primary key never be null. – However, it will let an alternate key be null,

thereby permitting duplicate null keys, a contradiction in terms.

– To prevent this, a “not NULL” constraint can be added to an alternate key.

• It is desirable to specify any other candidate keys that happen to exist as alternate keys.

April 11, 2023 52

Entity Integrity

• This is an additional constraint for relational DBs that allow nulls.

• Definition: Entity Integrity requires that no

attribute in a primary key can ever be null.

April 11, 2023 53

Rationale• Each tuple represents an entity in the real world• Each entity must be identifiable by definition• Primary keys serve as identifiers of tuples• Therefore a primary key can never be partly or

wholly null, to ensure that it does identify each tuple.

• Note entity integrity does not apply to alternate keys – which can be null, either wholly or in part.

April 11, 2023 54

Constraint Names

• Most relational DBMSs give integrity constraints a name when they store the constraint in the DB. – Thus a candidate key constraint would receive a

name.

• If the user does not supply a name when assigning the constraint, a unique default name is created for it by the DBMS.

April 11, 2023 55

Use of naming Constraints

• The naming of integrity constraints can be of great practical use :– It helps users find out about what integrity

constraints have been assigned to relations.– It allows more meaningful error messages to be

provided to the user if there is an attempt to break an integrity constraint.

April 11, 2023 56

Constraints in Oracle

• Oracle allows user to provide integrity constraint names, and generate default names if the user does not provide them.

• Oracle default names are not very user-friendly. – Therefore assign your own names to integrity

constraints.

• Constraint names must be unique within the entire user’s DB, not just within a single relation.

April 11, 2023 57

Specifying Candidate Keys in SQL• Keys (primary and alternate) can be assigned in

two places :– in the same sub-statement in which an attribute is

assigned its type,– in a separate sub-statement at the end of a Create

Table statement.

• The first method is only possible if the SQL key consists of one attribute.

• The second method must be used if the key consists of two or more attributes.

April 11, 2023 58

Constraint Definition

• In either case, starting the assignment of an SQL key with the phrase

Constraint constraint-nameassigns a name to the key constraint as well

• In Oracle, if a primary or alternate key

constraint is assigned in its own sub-statement at the end of the Create Table statement, then it is mandatory to give the constraint a name.

April 11, 2023 59

April 11, 2023 60

Examples of SQL Keys (1)

In Oracle, note that :

the first example is permissible

the second example would not be permissible, because no name has been assigned to the key constraint and so a default name must be given.

Create Table EMPLOYEE (

ENo Char(2 ) Primary Key, EName Varchar2(30), M-S Char(1 ) Check( M-S In (‘S’, ‘M’, ‘W’, ‘D’ ) ), Sal Number Check( Sal >999 And Sal < 100000 ) ) ;


ENo Char(2 ), EName Varchar2(30), M-S Char(1 ) Check( M-S In (‘S’, ‘M’, ‘W’, ‘D’ ) ), Sal Number Check( Sal >999 And Sal < 100000 ), Primary Key( ENo ) );

These 2 versions of EMPLOYEE have a default constraint name.

April 11, 2023 61

Examples of SQL Keys (2)

Create Table EMPLOYEE ( ENo Char(2) Constraint PKEY Primary Key, EName Varchar2(30), M-S Char(1) Check( M-S In (‘S’, ‘M’, ‘W’, ‘D’ ) ), Sal Number Check( Sal >999 And Sal < 100000 ) ) ;


ENo Char(2), EName Varchar2(30), M-S Char(1) Check( M-S In (‘S’, ‘M’, ‘W’, ‘D’ ) ), Sal Number Check( Sal >999 And Sal < 100000 ) Constraint PKEY Primary Key( ENo ) ) ;

These 2 versions of EMPLOYEE have the constraint name “PKEY”.

In Oracle, both examples are permissible.

April 11, 2023 62

Examples of SQL Keys (3) Relation CAR with two 1-attribute keys :-

Create Table CAR ( RegNo Char(9) Constraint PKEY Primary Key, Owner Char(2) Constraint AKEY Unique, Type Varchar2(30) ) ;

Relation CAR with one 2-attribute key :-

Create Table CAR ( RegNo Char(9), Owner Char(2), Type Varchar2(30), Constraint BKEY Primary Key( RegNo, Owner ) ) ;

Either attribute could have been the primary or alternate key.

Note that, in any version of SQL, only the second method is permissible if a primary or alternate key consists of 2 or more attributes.

Date post:	01-Apr-2015
Category:	Documents
Upload:	darlene-ginn
View:	214 times
Download:	0 times

Database Modelling Week 2: Directed Reading Relational Database Design Nick Rossiter October 3,...

Documents