+ All Categories

dbms1

Date post: 23-Nov-2014
Category:
Upload: sharath-chandra-ponugoti
View: 309 times
Download: 0 times
Share this document with a friend
Popular Tags:
87
© SQL Star International Ltd. 0 Contents Chapter Page No Chapter 1: Database Management System (DBMS) 1-18 Database 3 File Systems and Associated Problems 3 Benifits of Database Approach 5 Database Mangement System 6 DBMS Functions 6 Database System 7 Functions of a Database Adminsitrator (DBA) 8 Components of DBMS 9 Data Model 10 Database Architecture 11 Schema 14 Types of Database Models 15 Chapter 2: Introduction to Relational Databases (RDBMS) 19-26 Evolution of RDBMS 21 What is a Relational Database? 21 What is a RDBMS? 22 Features of RDBMS 22 Basic Relational Database Terminology 22 Keys and their Use 23 Referential Integrity 24 Chapter 3: Conceptual Design using Entity-Relationship Model 27-46 Overview of Database Design 29 E-R Modeling 29 Degree of Relationship 32 Cardinality 33 Keys 34 E-R Model example 34 Constraints on E-R Model 36 ISA Hierarchies 39
Transcript
Page 1: dbms1

© SQL Star International Ltd. 0

Contents Chapter Page No Chapter 1: Database Management System (DBMS) 1-18 Database 3

File Systems and Associated Problems 3

Benifits of Database Approach 5

Database Mangement System 6

DBMS Functions 6

Database System 7

Functions of a Database Adminsitrator (DBA) 8

Components of DBMS 9

Data Model 10

Database Architecture 11

Schema 14

Types of Database Models 15

Chapter 2: Introduction to Relational Databases (RDBMS) 19-26 Evolution of RDBMS 21

What is a Relational Database? 21

What is a RDBMS? 22

Features of RDBMS 22

Basic Relational Database Terminology 22

Keys and their Use 23

Referential Integrity 24

Chapter 3: Conceptual Design using Entity-Relationship Model 27-46 Overview of Database Design 29

E-R Modeling 29

Degree of Relationship 32

Cardinality 33

Keys 34

E-R Model example 34

Constraints on E-R Model 36

ISA Hierarchies 39

Page 2: dbms1

© SQL Star International Ltd. 0

Aggregation 41

Conceptual Design using E-R Model 41

Constraints beyond E-R Model 45

Chapter 4: Schema Refinement and Normalization 47-60 Normalization 49

Why Normalization? 49

What is a Normal Form? 50

Types of Normal Forms 50

First Normal Form (1NF) 51

Functional Dependencies 52

Second Normal Form (2NF) 53

Transitive Dependency 55

Third Normal Form (3NF) 55

Boyce-Codd Normal Form (BCNF) 57

Multivalued Dependency 57

Fourth Normal Form (4NF) 57

Fifth Normal Form (5NF) 58

Chapter 5: Supertypes and Subtypes 61-76 Supertype 63

Subtype 63

Inheritance 63

Relationships and Subtypes 64

Supertype/Subtype Notation 65

Generalization and Specialization 66

Constraints in Supertype 67

Constraints in Supertype/Subtype 70

Supertype/Subtype Hierarchy 72

Domains 73

Domain Integrity Constraints 73

Exercises 77-87

Page 3: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 1

Chapter 1

Database Management System (DBMS)

Database File Systems and Associated Problems

Benefits of Database Approach Database Mangement System

DBMS Functions Database System

Users Functions of a Database Adminsitrator (DBA)

Components of DBMS Data Model

Database Architecture Schema

Types of Database Models

Page 4: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 2

Objectives

In this chapter, we will discuss:

• What is a Database?

• File System Vs Database Approach

• Benefits of database approach

• What is DBMS

• Various functions of a DBMS

• Database system

• Role of a DBA

• Components of DBMS

• Data Model and their types

• Database Architecture

• Types of Database Models

Page 5: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 3

Data Data are known facts that can be recorded and that have implicit meaning.

Database

Database is a logical collection of relevant data. It is designed to offer an organized mechanism for storing, managing and retrieving stored information. A ledger, a telephone directory or an address book can be called a database because they all store related data in a structured way.

Traditionally, data accessed through computers has been stored on different storage media in the form of individual files. Files proved to be quite satisfactory as long as computerization was limited to a few application areas and the use of computers restricted to a privileged few. However, as actual users grew in number, especially with the advent of online time-sharing systems, the file systems gave rise to many serious problems. The discipline of database systems evolved in response to these problems. Let us first consider what these problems are so as to understand the different features of database systems more clearly.

File Systems and the Associated Problems Most data processing systems in existence today, especially in India, use files for storing, accessing and manipulating data. Files are stored typically on magnetic tapes and disks.

Most of the problems with files arise out of the fact that files are specific to an application, e.g., a set of files may be designed for the sales analysis system of a company.

Programs of the same application system can use these files. However, if some other new application needs source data from this system, there may be difficulties. Therefore, in many cases, new files, with considerable data in common with the existing files, may have to be designed for the new applications.

Therefore, as applications proliferate, the total number of computerized files grows considerably. Also, as the number of actual users of the computer grows, the number of applications increases, in turn resulting in an increase in the number of files. A large number of files give rise to the following problems:

• Files involve a high level of redundancy in data. As we have mentioned above, proliferation of files results in the same data item being stored at many different places.

• Redundancy in data often results in inconsistency. The same data item being used by different applications may exist in different versions. What is worse, it may exist in different stages of update at different places and thus may have different values. This may ultimately result in inconsistencies amongst reports generated by the two application systems.

• Individual files are not amendable to rapid changes, especially with respect to the way the data items are structured within the file. If an application wants

Page 6: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 4

data from an existing file but structured differently, it cannot be provided quickly and easily. For this purpose, either conversion programs have to be written or new files have to be created.

• Because of the inflexibility of files, many ad hoc queries cannot be answered.

• Yet, another consequence of the inflexibility of files is that it is usually expensive to make changes to a file system. It is also a very slow process. It may even involve modification of application programs.

• What is worse, modification in one program may require modifications in other programs, which interface with this program. This process may set off a chain reaction of modifications.

The above problems give rise to further difficulties detailed below:

• The Management Information Systems (MIS) finds it difficult to control data, especially when the actual users develop applications on their own.

• Major changes required by the system while modifying files increase the maintenance load on Data Processing (DP) professionals substantially, thus making them unavailable for development of new systems.

• High-level data redundancy entails repetitive data entry and redundant storage with the accompanying costs.

As has been already stated, Database Systems provide an effective solution to the above problems. Let us see how.

You have just seen, files give rise to several problems because they are application specific. Consequently, the applications become data-dependent, that is, they depend upon the organization and access method for the data on the secondary storage. This happens because, with conventional application development tools such as COBOL, the application logic incorporates the knowledge of data organization and access methods. Therefore, most changes in data organization or access methods affect the application logic substantially. If problems arising due to this fact are to be avoided, the data organization (and the access method) and the application logic have to be made independent of each other. Database Systems do precisely this.

The first step towards this goal is to distinguish between data as is actually stored (called as the physical representation of data) and data as is presented to an individual user (called as the logical representation of data).

Physical Representation of Data

The smallest named unit of data physically stored in the database is known as a “stored field”, and a named collection of associated stored fields is known as a “stored or physical record”. The named collection of all occurrences of one type of physical record is known as a “stored or a physical file”. This concept will be clearer after we discuss the logical representation of data.

Page 7: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 5

Logical Representation of Data

A logical field, record or file is, the field, record or file as it appears to the users, that is, as it is defined in the user’s application programs. In all traditional systems, the logical and physical data are practically the same, which is the root cause of all the major problems with traditional files. This is not the case with database systems.

Similarly, the structure of stored and logical records can be different. A logical record type may be obtained by selectively combining fields from different stored records.

The logical and physical views of a file could also be different in terms of, say, the key fields for sequencing the records in the file.

With such a separation of the logical and physical data, the database can be modified and developed without affecting existing applications. The database architecture achieves this separation.

Finally, the Library DBMS must have mechanisms to handle system failure (e.g., failure of power, disk crash, etc.) so that the database can be recovered to a consistent state.

Benefits of the Database Approach

• Redundancy can be reduced - Because of the relational approach towards data organization, data is not stored in more than one location. Repetition of information is also avoided.

• Inconsistency can be avoided – With the usage of database, it is assured

that all the users access a true picture of information present in the database.

• Data can be shared - Multiple users can login into the database to access information and each of them are granted access to the database. They can manipulate the database in a controlled environment.

• With a centralized control of data, the database system may be designed

for an overall optimal performance from the viewpoint of the entire organization.

• Standards can be enforced - Standards can be enforced on the database to

regulate the access to the database. • Security restrictions can be applied - Security is the process of limiting

actual access to the database server itself. It is the most important angle of security and needs to be carefully planned.

• Integrity can be maintained – Through integrity, one can ensure only

accurate data is stored within the database.

• Data independence can be provided - None of the users need to know the technical aspects of the database to access it. They are physically, as well as logically, independent to access the database.

Page 8: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 6

• New applications may be developed using the existing database. Database Management System

It is a computerized record-keeping system.

Modern day Computer-based Information Systems (IS) are capable of serving a variety of complex tasks in a coordinated manner. Such systems handle large volumes of data, multiple users and several applications for activities occurring in a central and/or distributed environment.

The heart of an IS is Database Management. This is because most IS have to handle massive amounts of data. This core module of an IS is called as Database Management System (DBMS). A DBMS provides for storage, retrieval and updation of data in an organized manner.

1. User requests data item

2. DBMS intercepts and interprets the request

3. Retrieves the data from the physical database

4. Constructs the record using physical/conceptual mapping

5. Records constructed using relevant conceptual/external mapping.

6. Derives the required external record from conceptual record.

An example: Consider the situation in a library. Here, we have data corresponding to books, authors, suppliers, borrowers, etc. The total volume of data stored and handled in a library may be quite large. The Library DBMS may require several operations, such as issue, return or purchase of books; handle queries relating to book information, borrowing information, etc. Moreover, there are different types of users who operate various stages or activities. For instance, a borrower may merely view certain information, whereas an issuer may be allowed to update the status of a book during issue or return. The Library staff may, on the other hand, add new books, their supplier, price and other information to the database. Each user category has a different access right on both, the data, as well as the processing capabilities. Multiple users may concurrently operate the Library DBMS performing several tasks at the same time. They may even try to access the same data simultaneously. It is the job of a DBMS to handle the data and its processing in an integrated, coordinated and consistent manner.

DBMS Functions • Data Definition - Database allows us to define our own data in a simpler

possible way.

• Data Manipulation - Database allows us to manipulate i.e., insert, update

and delete information.

Page 9: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 7

• Data Security and Integrity - Database allows us to secure the data and a

true picture of the data is given to users accessing it.

• Data Recovery and Concurrency - We can always get back to a previously

defined consistent state of the database in case of a crash; and multiple users

can still access the database.

• Data Dictionary - Database maintains Meta information in its dictionaries.

This will help database identify information on behalf of the user queries.

• Performance - Performance of the database is maintained irrespective of the

load it takes in terms of number of users accessing the database. Database System

A DBMS is a complex piece of software that usually consists of a number of modules. It may be considered as an agent that allows communication between the various types of users with the physical database and the operating system without the users being aware of every detail of how it is done. To enable the DBMS to fulfill its tasks, the database management system must maintain information about the data itself that is stored in the system. This information would normally include what data is stored, how it is stored, who has access to what parts of it, and so on.

The information about the data in a database is called the metadata (data about data). In addition to information listed above, some information regarding the use of a database is often collected to monitor the system's performance. This metadata helps management in maintaining an effective and efficient database system.

Meta-Data Stored Database

Users

The three broad classes of users are as follows:

Users

Application Programs/Queries

Software to Access Stored

Software to Process Programs/Queries

Page 10: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 8

• Application programmers - Responsible for writing application programs

that use the database.

• End users - Interact with the system from workstations or terminals. A given

end user can access the database via one of the applications, or can use an

interface provided as an integral part of the database system software (such

interfaces are also supported by means of applications, of course, but those

applications are built-in, not user-written, e.g., query language processor)

• Database Administrator (DBA) - Creates the actual database and

implements technical controls needed to enforce various policy decisions. The

DBA is also responsible for ensuring that the system operates with adequate

performance and for providing a variety of other related technical services. Functions of Database Administrator (DBA) The database administrator is responsible for the overall planning of the company’s data resources, for the design of data, and for the day-to-day operational aspects of data management. The overall planning of corporate data is the strategic aspect of the database administration function and involves company-wide planning of existing data and assessment of organization-wise data standards.

Some of the design aspects of database administration work are:

• Deciding on the storage structures and access methods • Selecting database software and hardware • Designing restart and recovery procedures to take care of system outages or crashes • Designing means of reconstructing data in the event of abnormal loss of the same • Designing schema • Designing the means of reorganizing or tuning databases periodically • Designing database searching strategies • Designing authorization checks and validation procedures • Specifying techniques for monitoring database performance The operations management of database administration deals with data problems arising on a day-to-day basis. Specifically, the responsibilities include:

Page 11: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 9

• Investigation of errors found in the data

• Supervision of restart and recovery procedures in the event of a failure

• Supervision of reorganization of databases

• Initiation and control of all periodic dumps of data.

In addition, this aspect of database administration includes maintenance of data security, which involves maintaining security authorization tables, conducting periodic security audits, investigating all known security breaches. To carry out all these functions, it is crucial that the DBA has all the accurate information about the company’s data readily on hand. For this purpose he maintains a data dictionary. The data dictionary contains definitions of all data items and structures, the various schemes, the relevant authorization and validation checks and the different mapping definitions. It should also have information about the source and destination of a data item and the flow of a data item as it is used by a system. This type of information is a great help to the DBA in maintaining centralized control of data.

Components of DBMS

The main components of DBMS are:

A Query Language and a Data Description Language (DDL) to provide users the access to the database.

Query processor - translates statements in a query language (or DML) into low-level instructions that the DB manager understands.

Database manager - provides interface between the low level data stored in the database and the application programs and queries submitted to the system. File manager - manages the allocation of space on disk storage and the data structures used to represent information stored on the disk.

The physical database – This is the data collection on drives.

The metadata – This describes the data for database to provide access to it for users.

Page 12: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 10

The above listing of DBMS components is not exhaustive, and also includes some very important components like concurrency controller and recovery manager. These components have not been shown (to keep the architecture relatively simple). Data Model

One fundamental characteristic of the database approach is that it provides some level of data abstraction by hiding details of data storage that are not needed by most database users. A data model is the main tool for providing this abstraction. A data model is a set of concepts that can be used to describe the structure of a database. It is a collection of high-level data description constructs that hide many low-level storage details.

Categories of Data Models Many data models have been proposed. We can categorize data models based on the types of concepts they provide to describe the data structure. High Level or conceptual data models: Provide concepts that are close to the way many users perceive data. Use concepts, such as entities, attributes and relationships, where: • Entity represents a real world object (e.g., student, employee) or concepts (e.g., course, company); • Attribute represents properties that describes objects (e.g., color, name); • Relationships represent an interaction or links among entities (e.g., works-on, is- a, has, etc.).

Page 13: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 11

Low-level or physical data models: Provide concepts that describe the details of how data is stored in the computer. Concepts provided by low-level data models are generally meant for computer specialists, not for typical end users. They represent information, such as record formats, record orderings and access paths (structure that makes the search for particular database records efficient, i.e. Indexing). Representational or implementation data models: Between above two extremes is a class of representational (or implementation) data models, which provide concepts that may be understood by end users, but that are not too far removed from the way data is organized within the computer. Representational data models hide some details of data storage, but can be implemented on a computer system in a direct way. The three important characteristics of the database approach are:

(a) Insulation of programs and data (program-data and program-operation independence).

(b) Support of multiple user views.

(b) Use of a catalog to store database description.

The three-schema architecture was proposed to achieve these characteristics. Architecture The goal of the three-schema architecture is to separate the user, applications and the physical database. The three levels of architecture are:

Internal Level

The internal level is the one closest to the physical storage, i.e., it is the one concerned with the way data is physically stored. The internal (or physical) database is stored on secondary storage devices, mainly the magnetic disk. It itself can be conceptually viewed at different levels of abstraction.

At its lowest level, it is stored in the form of bits with the associated physical addresses on the secondary storage device.

At its highest level, it can be viewed in the form of files and simple data structures. It is this level that we shall study when we discuss the physical organization for databases in later chapters. The physical database is described by means of a physical schema or an internal schema. It essentially describes the various types of stored records, the different indexes that are employed for accessing these and the representations for different stored fields. It is also called as the “storage structure definition”.

External Level

Page 14: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 12

The external level is the one closest to the user, i.e., it is the one concerned with the way data is viewed by individual users. The external model (or view) is application-specific. Therefore, the user views the database through an external model, and there are as many external views as there are applications. External views are the proper interface between the user and the database as an individual user can hardly be expected to be interested in the entire database. Generally, an external model consists of multiple occurrences of multiple types of external record. An example of an external record is the record of a file as defined in the data division of COBOL program.

Each external model is defined by means of an external schema, which describes each external record type in the external model.

The external model is derived from the conceptual model. For this purpose, the correspondence between the particular external models has to be defined. An external/conceptual mapping similar to the conceptual/physical mapping does this. However, a separate mapping has to be defined for each external view.

An explicit definition of the mapping should be documented, preferably in the corresponding external schema.

The user interacts with the database through a high-level language such as COBOL, PL/I or some special purpose language. This language is known as the host language. It includes a data sub-language (DSL). The user carries out the retrieval and storage operations on the database through the DSL.

In fact, the Database Task Group (DBTG) report published in April 1971, contains proposals for three distinct languages, two of which relate very closely to the concept of a DSL. These are sub schema Data Description Language (DDL) and Data Manipulation Language (DML). The sub schema DDL is used for defining the external views while the DML is used for carrying out operations on the database.

In addition to the languages, the user is also supposed to be provided with a workspace. This workspace is an area meant for receiving or transmitting all data transferred between the user and the database. This is simply the input-output area for a program. Conceptual Level

The conceptual level is a level of indirection between the other two. The conceptual model, also called as the data model, represents information content of the database in its entirety, but is abstract with respect to the physical database. Broadly speaking, the conceptual model provides a view of the data as it really is.

This model consists of multiple occurrences of multiple types of a conceptual record. A conceptual record represents relevant information content only. In this sense, it is much closer to the external record than a stored record is.

However, it is not the same as the external record. It contains all the information to

Page 15: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 13

build relevant external records. For example, a conceptual stock record may consist of the quantity of material and the buying rate but not its value; still the user’s external record may consist of the value of the stock. A conceptual model may consist of occurrences of such stock records, a collection of supplier record occurrences and a collection of assembly records.

Obviously, the conceptual model is derived from the physical model. For this, the database needs a conceptual/physical mapping, which specifies how conceptual records and fields map into their counterparts in the physical database. The conceptual database is described by means of a conceptual schema. Needless to say, the conceptual schema is independent of the physical characteristics of data, such as storage structures, physical sequences, stored field representations etc. Ideally the conceptual schema should include many features in addition to just the definitions of conceptual records. These may include relevant authorization checks and validation procedures, the uses of data, the source and destination of data etc.

The conceptual database is a real-world view of data from the organization point of view. As the real world changes, changes have to be made to the conceptual database and schema as well. In such a case, it is usually possible to limit the corresponding changes to only those external schemas, which use the conceptual elements that are changed. There will be many distinct external views, each consisting of a more or less abstract representation of some portion of the total database, and there will be one conceptual view, consisting of a similarly abstract representation of the database in its entirety. Likewise, there will be precisely one internal view, representing the total database as physically stored.

An example for the three levels is as shown:

External level (Individual user views)

Conceptual Level (Community user

view)

Internal level (Storage view)

Database

Database Architecture

Page 16: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 14

Internal View

Create table STAFF

{

Sno number(3),

Fname varchar2(20),

Lname varchar2(20),

Age number,

Salary Number(8,2),

BranchNo number(6)

};

Schema

A description of data in terms of a data model is called a schema. The description of a database is called database schema, which is specified during database design and is not expected to change frequently.

The Internal View/ Schema:

The internal view (or stored database) is a low-level representation of the entire database. The internal view is defined by the internal schema, which defines the various stored record types and specified what indexes exist, how stored fields are represented, what physical sequence the stored records are in, and so on.

The Conceptual View / Schema:

The conceptual view is a representation of the entire content of the database, in a form that is more or less abstract in comparison with the way in which the data is

SNo FName LName Age Salary SNo Lname BranchNo

SNo FName LName Age Salary BranchNo

Conceptual Model

External View 1 External View 2

Page 17: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 15

physically stored. The conceptual view is defined by means of the conceptual schema, which includes definitions of each of the various conceptual record types.

The External View / Schema:

Each external view is defined by means of an external schema. External schema consists of definitions of each of the various external record types in that external view.There must be a definition of the mapping between the external schema and the underlying conceptual schema.

Data Independence

Data independence refers to changing the schema at one level of a database system without the need to change the schema at the next higher level.

The three-level database architecture allows a clear separation of the information meaning (conceptual view) from the external data representation and from the physical data structure layout.

A database system that is able to separate the three different views of data is likely to be flexible and adaptable. This flexibility and adaptability is data independence.

Physical data independence: The separation of the conceptual view from the internal view enables us to provide a logical description of the database without the need to specify physical structures. This is often called physical data independence.

Logical data independence: Separating the external views from the conceptual view enables us to change the conceptual view without affecting the external views. This separation is sometimes called logical data independence.

Types of Database Models The most well known record-based models are the hierarchical model, the network model and the relational model.

Hierarchical Model: This model represents data as a hierarchical tree. It is a special kind of a network model in which the relationship is essentially a tree-like structure, where one parent may have many children, but one child cannot have more than one parent. The relationship borrower to books in a library system satisfies this condition. One of the popular DBMS based on hierarchical model is Information Management System (IMS) from IBM.

Page 18: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 16

Hierarchical Model Network model: This model represents data as record types. Here, we have explicit linkages (expressed in the form of pointers), which relate various records. Each record has a link field corresponding to every relationship that it participates in. IDS (Integrated Data Store) is one of the DBMS product based on network models.

Network model

Relational model: In this model, each database item is viewed as a record with attributes. A set of records with similar attributes is called a table. Most of the popular commercial DBMS products like Oracle, Sybase, MySQL, etc. are based on relational model.

Relational model Object Relational model Hierarchical, network and relational database models have been quite successful in storing data for traditional business applications. But, object oriented databases evolved to handle more complex applications such as databases for scientific experiments, geographic information system, engineering design and manufacturing. An object oriented database stores data, their relationships and the way they interact with other data. This model draws its concept from real world objects. As compared to the relational database approach, which deals with data at the lowest level, that is, columns and rows, the object oriented approach deals with data at a higher level, that is, with the objects surrounding the data. This model represents DB in terms of objects, their attributes and their behaviors.

Page 19: dbms1

Database Management System (DBMS)

© SQL Star International Ltd. 17

Summary In this chapter, we have discussed:

• Describe a database

• Understand File System Vs Database Approach

• List the benefits of database approach

• Describe DBMS

• Describe various functions of a DBMS

• Database system

• Components of DBMS

• Data Model and their types

• Describe Database Architecture

• List the Role of DBA

• Types of Database Models

Page 20: dbms1

© SQL Star International Ltd. 19

Chapter 2

Introduction to Relational Databases (RDBMS)

Evolution of RDBMS

What is a Relational Database?

What is a RDBMS?

Features of RDBMS

Basic Relational Database Terminology

Keys and their Use

Referential Integrity

Page 21: dbms1

Introduction to Relational Databases (RDBMS)

© SQL Star International Ltd. 20

Objectives

In this chapter, we will discuss:

• Evolution of RDBMS

• Relational Database

• Relational Database Management System (RDBMS)

• Features of an RDBMS

• Important terms related to RDBMS

• Different types of keys and their use

• Explain referential integrity

Page 22: dbms1

Introduction to Relational Databases (RDBMS)

© SQL Star International Ltd. 21

RDBMS Dr. E.F.Codd outlined the principles of the relational model, which formed the basis for the evolution of the Relational Database Management System.

A Relational Database Management System is defined as a collection of tables related to each other through common values.

Evolution of RDBMS Before the acceptance of Codd’s Relational Model, database management systems was just an ad hoc collection of data designed to solve a particular type of problem, later extended to solve more basic purposes. This led to complex systems, which were difficult to understand, install, maintain and use. These database systems were plagued with the following problems:

• They required large budgets and staffs of people with special skills that were in short supply.

• Database administrators’ staff and application developers required prior preparation to access these database systems.

• End-user access to the data was rarely provided.

• These database systems did not support the implementation of business logic as a DBMS responsibility.

Hence, the objective of developing a relational model was to address each and every one of the shortcomings that plagued those systems that existed at the end of the 1960s decade, and make DBMS products more widely appealing to all kinds of users.

The existing relational database management systems offer powerful, yet simple solutions for a wide variety of commercial and scientific application problems. Almost every industry uses relational systems to store, update and retrieve data for operational, transaction, as well as decision support systems.

What is a Relational Database?

A relational database is a database system in which the database is organized and accessed according to the relationships between data items without the need for any consideration of physical orientation and relationship. Relationships between data items are expressed by means of tables. It is a tool, which can help you store, manage and disseminate information of various kinds. It is a collection of objects, tables, queries, forms, reports, and macros, all stored in a computer program all of which are inter-related.

Page 23: dbms1

Introduction to Relational Databases (RDBMS)

© SQL Star International Ltd. 22

It is a method of structuring data in the form of records, so that relations between different entities and attributes can be used for data access and transformation. What is a Relational Database Management System? A Relational Database Management System (RDBMS) is a system, which allows us to perceive data as tables (and nothing but tables), and operators necessary to manipulate that data are at the user’s disposal.

Features of an RDBMS The features of a relational database are as follows:

The ability to create multiple relations (tables) and enter data into them

An interactive query language

Retrieval of information stored in more than one table

Provides a Catalog or Dictionary, which itself consists of tables ( called

system tables ) Basic Relational Database Terminology Catalog: A catalog consists of all the information of the various schemas (external, conceptual and internal) and also all of the corresponding mappings (external/conceptual, conceptual/internal). It contains detailed information regarding the various objects that are of interest to the system itself; e.g., tables, views, indexes, users, integrity rules, security rules, etc. In a relational database, the entities of the ERD are represented as tables and their attributes as the columns of their respective tables in a database schema.

It includes some important terms, such as:

• Table: Tables are the basic storage structures of a database where data about something in the real world is stored. It is also called a relation or an entity.

• Row: Rows represent collection of data required for a particular entity. In order to identify each row as unique there should be a unique identifier called the primary key, which allows no duplicate rows. For example in a library every member is unique and hence is given a membership number, which uniquely identifies each member. A row is also called a record or a tuple.

Page 24: dbms1

Introduction to Relational Databases (RDBMS)

© SQL Star International Ltd. 23

• Column: Columns represent characteristics or attributes of an entity. Each attribute maps onto a column of a table. Hence, a column is also known as an attribute.

• Relationship: Relationships represent a logical link between two tables. A relationship is depicted by a foreign key column.

Degree: number of attributes

Cardinality: number of tuples

An attribute of an entity has a particular value. The set of possible values

That a given attribute can have is called its domain. For example, the set of values that the attribute EMPLOYEE.id can assume are a

positive integer of 5 digits.

Keys and Their Use

Key: An attribute or set of attributes whose values uniquely identify each entity in an entity set is called a key for that entity set.

Super Key: If we add additional attributes to a key, the resulting combination would still uniquely identify an instance of the entity set. Such augmented keys are called super keys. Primary Key: It is a minimum super key. It is a unique identifier for the table (a column or a column combination with the property that at any given time no two rows of the table contain the same value in that column or column combination). Candidate Key: There may be two or more attributes or combinations of attributes that uniquely identify an instance of an entity set. These attributes or combinations of attributes are called candidate keys. In such a case, we must decide which of the candidate keys will be used as the primary key. The remaining candidate keys would be considered alternate keys. Secondary Key: A secondary key is an attribute or combination of attributes that may not be a candidate key, but that classifies the entity set on a particular characteristic.

Page 25: dbms1

Introduction to Relational Databases (RDBMS)

© SQL Star International Ltd. 24

A case in point is the entity set EMPLOYEE having the attribute department, which identifies by its value all instances EMPLOYEE who belong to a given department. Any key consisting of a single attribute is called a simple key, while that consisting of a combination of attributes is called a composite key. Referential Integrity

Referential Integrity can be defined as an integrity constraint that specifies that the value (or existence) of an attribute in one relation depend on the value (or existence) of an attribute in the same or another relation.

Referential integrity in a relational database is consistency between coupled tables. It is usually enforced by the combination of a primary key and a foreign key. For referential integrity to hold, any field in a table that is declared a foreign key can contain only values from a parent table's primary key field. For instance, deleting a record that contains a value referred to by a foreign key in another table would break referential integrity. Primary Key Course

Course code Course Name

E01 ELECTRONICS

M02 MATHS

A03 ACCOUNTS

B04 BIOLOGY

Foreign Key Student

Student No Name Course code

101 Annie B04

102 Julie E01

103 Rita A03

Page 26: dbms1

Introduction to Relational Databases (RDBMS)

© SQL Star International Ltd. 25

Summary In this chapter, we have discussed:

• Evolution of RDBMS

• Relational Database

• Relational Database Management System (RDBMS)

• Features of an RDBMS

• Important terms related to RDBMS

• Different types of keys and their use

• Referential Integrity

Page 27: dbms1

© SQL Star International Ltd. 27

Chapter 3

Conceptual Design Using the Entity- Relationship Model

Overview of Database Design E-R Modeling

Degree of Relationship Cardinality

Keys E-R Model example

Constraints on E-R Model ISA Hierarchies

Aggregation Conceptual Design using E-R Model

Constraints beyond E-R Model

Page 28: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 28

Objectives

In this chapter, we will discuss:

• Process of designing a database

• List the components of an E-R model

• Drawing E-R diagrams

• Designing E-R Diagrams with key constraints

• Aggregation

• Conceptual Design using the ER Model

• Constraints beyond the E-R Model

Page 29: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 29

Overview of Database Design The database design process comprises the following steps:

Requirement Analysis Conceptual Design (ER Model is used at this stage) Schema Refinement (Normalization) Logical Design Physical Database Design and Tuning

Requirement Collection & Analysis: The database designers interview

prospective database users to understand and document their data requirements. The result of this step is concisely written set of users requirements. This concept of user-defined operations will be applied to the database and they include both retrievals and updates in software design.

Conceptual Design: It is a concise description of the data requirements of the

users and includes detailed descriptions of the entity types, relationships and constraints. They are expressed using the concepts provided by the high level data model.

Logical Design: Identification of data model mapping is done here - RDBMS /

DBMS / Object Model. Schema Refinement (Normalization): Check the relational schema for

redundancies and related anomalies. Physical Design: Here, the internal storage structures/ access paths and file

organizations for the database files are specified. These activities and application programs are designed and implemented as database transactions corresponding to the high level specifications.

E-R Modeling The Entity-Relationship model (ER Model in short) is a graphical designing tool for implementation of database systems. It provides a common, informal and convenient model for communication between users and the DBA for the purpose of modeling the structure of data.

The following components are used in developing an E-R Model: Entity Entity Set Instance Attribute Relationship Cardinality Keys

Page 30: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 30

Entity: An entity is anything that exists and is distinguishable. For example, each chair is an entity. So is each person and each automobile. Entities can have concrete existence or constitute ideas or concepts. Concepts like love and hate are entities. Entities can be classified as Regular entities and Weak entities. A regular (independent) entity does not depend on any other entity for its existence. For example, Employee is a regular entity. A regular entity is depicted using a rectangle. It can also be represented as: An entity whose existence depends on the existence of another entity is called a weak (or dependent) entity. For example, the dependent of an employee is a weak entity, whose existence depends on the entity Employee. A dependent entity is depicted in a double-lined box, or a darkened rectangle. During the design phase, an entity is processed further as Tables. Entity Set: A group of similar entities forms an entity set. Examples of entity sets are:

1. All persons 2. All automobiles 3. All emotions

Instance: A specific type of entity is called an instance. Example: - Smith, Jones, Ally are all employees. Attributes: Attributes are the properties that characterize an entity set. For example, employees of an organization are modeled by the entity set EMPLOYEE. We must include in the model the properties of the employees that may be useful to the organization. Some of these properties are name, address, skill, etc. An attribute is denoted by an ellipse with its type written inside thereby attached to their respective entity.

Employees

Or

Employees

Page 31: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 31

An attribute is attached to its entity in the following manner. During design phase, an attribute is processed further as Column of a table. Relationship: It is an association between two or more entities or same entity set. For example, we may have the relationship that an employee works in a department. Same entity set could participate in different relationship sets, or in different “roles” in same set. A relationship is depicted by a diamond, with the name of the relationship type. A relation can be of following types: Strong Relationship: A Strong relationship can have

Attributes.

It is shown using - Weak relationship: A weak relationship cannot have any attributes.

It is shown using -

Degree of Relationship:

Type

Type Name

Employees

Name

or

Page 32: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 32

The number of participating entities in a relationship is known as degree of the relationship. According to degree of relationship, there can be three types of relationships.

Unary Relationship Binary Relationship

Ternary Relationship

N-ary Relationship

Unary Relationship: A relationship where only one entity participates in more than role, is called a Unary Relationship.

Binary Relationship:

A relationship where there are two entities participating in a relationship, it is called

a Binary relationship.

Example:

Ternary relationship: A relationship where three entity types are involved is called a ternary relationship. Example:

Employee

manages

Sales Assistant

Product sell

Customer

Manager manages Employee

Page 33: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 33

N-ary Relationship: An n-ary relationship set R relates n entity sets E1...En; each relationship in R involves entities e1 E1, ..., and En. Cardinality: It defines the numeric relationship between occurrences of entities on either end of the relationship line. Relationships can be classified into three types based on cardinality:

One-to-one: One student is issued only one card (and vice-versa).

One-to-many (or many-to-one): One student can enroll for only one course, but

one course can be offered to many students.

Chen- notation

Crow’s foot Notation

Many-to-many: One student can take many tests, and one test can be taken by

many students.

Issued Student Card

Student

1 1

1 enroll m

Course

enroll Student Course

Write Test Student

Page 34: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 34

Keys:

Data items used to uniquely identify individual occurrences of an entity type.

Candidate Keys: It is a set of attributes used to uniquely identify individual occurrences of an entity type. Each table may have more than one candidate key. Primary Key: One of the candidate keys is selected to be a primary key.

Composite key: A candidate key with more than one attribute is called a composite key. E-R Model Example:

Let us now see how the E-R model is implemented using the above discussed notations. Consider that an employee works in a department and his details stored in the database include his id,name, department name, department id etc.

Page 35: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 35

In the above figure, we show the relationship set Works_in, in which each relationship indicates a department in which an employee works. The entities are described by a set of attributes and identified by primary keys denoted as ‘__’. Entities used in the above diagram are: Entity name: Employees Attributes: Ssn, Name, Lot Primary Key: Ssn Entity Name: Department Attributes: Did, Dname, Budget Primary Key: Did The entity sets that participate in a relationship set need not be distinct; sometimes a relationship might involve two entities in the same entity set. For example, in Reports_To relationship set, every relationship is of the form (emp1, emp2). Works_In relationship shows that an employee can work in many departments and a department can have many employees Relationship sets can also have descriptive attributes (e.g., the since attribute of Works_ In). A relationship must be uniquely identified by the participating entities, without reference to the descriptive attributes. In the Works_in relationship set, for example, each Works_in relationship must be uniquely identified by the combination of employee ssn and department did. Thus, for a given employee-department pair, we cannot have more than one associated since value.

Page 36: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 36

Thus, in translating a relationship set to a relation, attributes of the relation must include: • Keys for each participating entity set (as foreign keys). This set of attributes forms superkey for the relation. • All descriptive attributes Constraints on E-R model Key Constraints:

A Key constraint between an entity set S and a relationship set restricts instances of the relationship set by requiring that each entity of S participate in at most one relationship. Look at an example:

Consider the relationship Manages: Each dept has at most one manager, according to the key constraint on ‘Manages’ relationship. The arrow from Department to Manages indicates that each Department entity appears in at most one ‘Manages’ relationship in any allowable instance of ‘Manages’. Thus given a Department entity, we can uniquely determine the ‘Manages’ relationship in which it appears. Translating ER Diagrams with Key Constraints: Map relationship to a table: Note that did is the key now. Since each department has a unique manager, we could instead combine ‘Manages’ and Departments. Manages table without Key constraint: CREATE TABLE Manages( ssn CHAR( 11), did INTEGER, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn)

Page 37: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 37

REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments) Key Constraints for Ternary Relationships The following figures show the relationship between employee, department and locations. Since three entity are involved in the relationship with a key constraint on the employee entity, is it known as key constraint for ternary relationship.

In the above figure, SSn, Did and Address are a primary keys in the Employee entity, Department entity and location entity respecitively. An arrow drawned from the employee entity indicates that an employee can work in at most one department at a single location. Participation Constraints:

The key constraint on ‘Manages’ tells us that a Department has at most one Manager (indicated by arrow). The participation constraint specifies whether the existence of an entity depends on its being related to another entity, via the relationship type. Participation constraints can be of two types: Total participation Partial participation

Page 38: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 38

Total Participation constraint: Does every department have a manager? If so, this is a participation constraint: The total participation is indicated by a dark line between entity and relationship. Partial Participation constraint: A participation that is not total is said to be partial. Eg. participation of Employee in Manages is partial.

In the above example, the participation of departments in Manages is total whereas the participation of employee in Manages is partial. A participation constraint between an entity set S and a relationship set restricts instances of the relationship set by requiring that each entity of S participate in at least one relationship. Every did value in Department table must appear in a row of the Manages table (with a non-null ssn value!). Similarly, every ssn value in Employee table must appear in a row of the Works_in table. Participation Constraints in SQL: We can capture participation constraints involving one entity set in a binary relationship, but little else (without resorting to CHECK constraints). CREATE TABLE Dept_Mgr( did INTEGER, dname CHAR( 20), budget REAL, ssn CHAR( 11) NOT NULL, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE NO ACTION )

Page 39: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 39

Weak entity A weak entity’s existence is dependent on another (owner) entity. Hence, a weak entity will not have it’s own key. It can be identified uniquely only by considering the primary key of its owner entity. Owner entity set and weak entity set must participate in a one-to-many

relationship set (1 owner, many weak entities). Weak entity set must have total participation in this identifying relationship set.

Translating Weak Entity Sets: Weak entity set and identifying relationship set are translated into a single table. When the owner entity is deleted, all owned weak entities must also be deleted.

For example: If the employee quits, any policy owned by the employee is terminated. All the relevant policy and dependent information is also deleted from the database. To indicate that Dependent is a weak entity and policy is its identifying relationship, we draw both with dark lines. CREATE TABLE Dep_ Policy ( pname CHAR( 20), age INTEGER, cost REAL, ssn CHAR( 11) NOT NULL, PRIMARY KEY (pname, ssn), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE )

Page 40: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 40

ISA (‘is a ‘) Hierarchies It is the formation of new entity as a union of two or more entity sets. The process is also known as generalization. Here, an employee can be an hourly employee or a contract employee. Attributes are inherited.

ISA Constraints:

There are two types of ISA constraints: Overlap constraints : Can Joe be an Hourly_Emp, as well as a Contract_Emp entity? (Allowed/ disallowed) Covering constraints : Does every Employee entity also have to be an Hourly_ Emp or a Contract_ Emp entity? (Yes/ no) Reasons for using ISA : • To add descriptive attributes specific to a subclass; • To identify entities those participate in a relationship. Translating ISA hierarchies to relations:

General approach:

Page 41: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 41

3 relations: Employee, Hourly_Emp and Contract_Emp. Hourly_ Emp : Every employee is recorded in Employee. For Hourly emps, extra info recorded in Hourly_Emp ( hourly_wages, hours_worked, ssn) must delete Hourly_ Emps tuple if referenced Employees tuple is deleted). Queries involving all employees easy, those involving just Hourly_Emp require a join to get some attributes. Alternative:

Just Hourly_ Emp and Contract_ Emp. Hourly_ Emp : ssn, name, lot, hourly_ wages, hours_ worked.

Contract_ Emp : ssn, name, lot, contractid. Each employee must be in one of these two subclasses

Aggregation

Aggregation is meant to represent a relationship between a whole object and its component parts. It is used when we have to model a relationship involving (entity sets and a relationship set). Aggregation allows us to treat a relationship set as an entity set for purposes of participation in (other) relationships. For example, a Project is sponsored by a Department. This is a simple relationship. An Employee monitors this Sponsorship (and not Project or Department). This is aggregation. Monitors are mapped to the table like any other relationship set.

Page 42: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 42

Aggregation vs. Ternary Relationship:

Can we express relationships involving other relationships without using aggregation? The use of aggregation vs. ternary relationship may be guided by certain integrity constraints. For example: we can impose a constraint that each sponsorship is monitored by at most one employee (not possible without aggregation). Conceptual Design Using the E-R Model

The design choices are: ♦ Should a concept be modeled as an entity or an attribute? ♦ Should a concept be modeled as an entity or a relationship ♦ Identifying relationships: Binary or ternary? Aggregation? Entity vs. Attribute

Should address be an attribute of Employees or an entity (connected to Employees by a relationship)? It all depends upon the use we want to make of address information, and the semantics of the data. If we have several addresses per employee, address must be an entity (since attributes cannot be set- valued). If the structure (city, street, etc.) is important, e.g., we want to retrieve employees in a given city, address must be modeled as an entity (since attribute values are atomic). Otherwise, address can be used as an attribute of Employee.

Works_In does not allow an employee to work in a department for two or more periods. Why? Similar to the problem of wanting to record several addresses for an employee: we want to record several values of the descriptive attributes for each instance of this relationship.

Page 43: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 43

Consider that an employee works in a given department over more than one period. This possibility is ruled out by the ER diagram’s semantics of previous slide. The problem is that we want to record several values for descriptive attributes for each instance of Works_in relationship. We can address this problem by introducing an entity set called Duration, with attributes from and to. Entity vs. Relationship The ER diagram above is OK if a manager gets a separate discretionary budget for each department. But, what if a manager gets a discretionary budget that covers all managed departments? The following factors follow: • Redundancy of dbudget, which is stored for each dept managed by the manager. • Misleading: suggests dbudget (DB) tied to managed dept.

One of the possible designs to resolve the two issues of the previous ER diagram:

Page 44: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 44

We model the appointment as an entity set, say Mgr_appt, and use a ternary relationship, say manages, to relate a manager, an appointment, and a department. The budget is now associated with the appointment of the employee

as manager of a group of departments. The details of an appointment (such as the discretionary budget) are not repeated for each department that is included in the appointment now, although there is still one Manages relationship instance per such Department.

The figure below models a situation in which an employee can own several policies, each policy can be owned by several employees, and each dependent can be covered by several policies.

Page 45: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 45

Suppose we have following constraint: Each policy is owned by just 1 employee. Key constraint on Policy would mean policy can only cover 1 dependent!

Binary Vs Ternary Relationship - A better Design The key constraints allow us to combine Purchaser with Policy and Beneficiary with Dependent.

Participation constraints lead to NOT NULL constraints. CREATE TABLE Policy (

policyid INTEGER,

cost REAL,

ssn CHAR( 11) NOT NULL,

PRIMARY KEY (policyid),

FOREIGN KEY (ssn) REFERENCES Employee,

ON DELETE CASCADE )

CREATE TABLE Dependent (

pname CHAR( 20),

age INTEGER,

policyid INTEGER,

PRIMARY KEY (pname, policyid),

Page 46: dbms1

Conceptual Design Using the Entity-Relationship Model

© SQL Star International Ltd. 46

FOREIGN KEY (policyid) REFERENCES Policy,

ON DELETE CASCADE )

Constraints Beyond the ER Model The constraints in the ER Model are as follows: • A lot of data semantics can (and should) be captured. • But, some constraints cannot be captured in ER diagrams. Hence, there is a further need for refining the schema. Relational schema obtained from ER diagram is a good first step. But, the ER design is subjective and can’t express certain constraints; so this relational schema may need refinement. Functional Dependencies For example, a department can’t order two distinct parts from the same supplier. We cannot express this with respect to ternary Contracts relationship. Normalization refines ER design by considering FDs. The next chapter will deal with Normalization to refine the Entity Relationship Design.

Summary

In this chapter, we have discussed:

• Process of designing a database

• List the components of an E-R model

• Drawing E-R diagrams

• Designing E-R Diagrams with key constraints

• Aggregation

• Conceptual Design using the ER Model

• Constraints beyond the E-R Model

Page 47: dbms1

© SQL Star International Ltd. 47

Chapter 4

Schema Refinement and Normalization

Normalization Why Normalization?

What is a Normal Form? Types of Normal Forms

First Normal Form Functional Dependencies

Second Normal Form Transitive Dependency

Third Normal Form Boyce-Codd Normal Form Multivalued Dependency

Fourth Normal Form Fifth Normal Form

Page 48: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 48

Objectives In this chapter, we will discuss:

• Normalization

• Reasons for Normalization

• Refining a database

• Defining Normal Form

• Types of Normal Forms

Page 49: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 49

Normalization Normalization is a process of designing a consistent Database by minimizing redundancy and ensuring Data Integrity through the principle of Non-loss decomposition.

Why Normalization? In order to produce good database design, we should ask questions like:

a. Does the design ensure that all database operations will be efficiently performed and that the design does not make the DBMS perform expensive consistency checks, which could be avoided?

b. Is the information unnecessarily replicated?

Unless these issues are properly handled, several difficulties like redundancy and loss

of information may arise. There are several methods to avoid the above-mentioned

problems. One such method is database decomposition through normalization, which

tries to minimize redundancy and the efforts of checking of constraints and

dependencies.

Database normalization: Ensures Data Integrity

Now, let us see what is Data Integrity.

Data integrity ensures the correctness of data stored within the database. It is achieved by imposing integrity constraints. An integrity constraint is a rule, which restricts values present in the database.

There are three integrity constraints:

♦ Entity constraints:

Page 50: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 50

The entity integrity rule states that the value of the primary key can never be a null value (a null value is one that has no value and is not the same as a blank). Because a primary key is used to identify a unique row in a relational

table, its value must always be specified and should never be unknown. The integrity rule requires that insert, update and delete operations maintain the uniqueness and existence of all primary keys.

♦ Domain Constraints: Only permissible values of an attribute are allowed in a relation. ♦ Referential Integrity constraints: The referential integrity rule states that if a relational table has a

foreign key, then every value of the foreign key must either be null or match the values in the relational table in which that foreign key is a primary key.

Prevents Redundancy in data

A non-normalized database is vulnerable to data anomalies, if it stores data redundantly. If data is stored in two locations, but later updated in only one of the locations, then the data is inconsistent; this is referred to as an "update anomaly". A normalized database stores non-primary key data in only one location.

Redundancy can be:

♦ Direct Redundancy:

Direct redundancy can result due to the presence of same data in two different locations, thereby, leading to anomalies such as reading, writing, updating and deleting.

♦ Indirect redundancy: Indirect Redundancy results due to storing information that can be computed from the other data items stored within the database.

Normalized databases have a design that reflects the true dependencies between tracked quantities, allowing quick updates to data with little risk of introducing inconsistencies. There are formal methods for quantifying "how normalized" a relational database is, and these classifications are called Normal Forms (or NF).

Page 51: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 51

What is a Normal Form?

Forms are designed to logically address potential problems such as inconsistencies and redundancy in information stored in the database.

A database is said to be in one of the Normal Forms, if it satisfies the rules required by that Form as well as previous; it also will not suffer from any of the problems addressed by the Form. Types of Normal Forms

Several normal forms have been identified, the most important and widely used of which are:

First normal form (1NF)

Second normal form (2NF)

Third normal form (3NF)

Boyce-Codd normal form (BCNF)

Fourth normal form (4NF)

Fifth Normal Form (5NF)

A form is said to be in its particular form only if it satisfies the previous Normal form. First Normal Form (1NF)

A Relation is in 1NF, if every row contains exactly one value for each attribute. Let us understand this with an example. Consider a table ‘Faculty’ which has information about the faculty, subjects and, the number of hours allotted to each subject they teach. Faculty: Faculty code Faculty Name Date of Birth Subject Hours

Java 16 PL/SQL 8

100 Smith 17/07/64

Linux 8 Java 16 Forms 8

101 Jones 24/12/72

Reports 12

Page 52: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 52

SQL 10 Linux 8

102 Fred 03/02/80

Java 16 SQL 10 PL/SQL 8

103 Robert 28/11/66

Forms 8 Anomalies: - The above table does not have any atomic values in the ‘Subject’ column. Hence, it is called un-normalized table. Inserting, Updating and deletion would be a problem is such table. Hence it has to be normalized. For the above table to be in first normal form, each row should have atomic values. Hence let us re-construct the data in the table. A ‘S.No’ column is included in the table to uniquely identity each row. SNO Faculty

code Faculty Name

Date of Birth

Subject Hours

1 100 Smith 17/07/64 Java 16 2 100 Smith 17/07/64 PL/SQL 8 3 100 Smith 17/07/64 Linux 8 4 101 Jones 24/12/72 Java 16 5 101 Jones 24/12/72 Forms 8 6 101 Jones 24/12/72 Reports 12 7 102 Fred 03/02/80 SQL 10 8 102 Fred 03/02/80 Linux 8 9 102 Fred 03/02/80 Java 16 10 103 Robert 28/11/66 SQL 10 11 103 Robert 28/11/66 PL/SQL 8 12 103 Robert 28/11/66 Forms 8 This table shows the same data as the previous table but we have eliminated the repeating groups. Hence the table is now said to be in First Normal form (1NF). But we have introduced Redundancy into the table now. This can be eliminated using Second Normal Form (2NF). Functional Dependencies (FDs)

Page 53: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 53

Functional dependency determines the set of values of the attribute based on another attribute. It is denoted by A -> B i.e., B is functionally dependent on A

Or

A determines B.

Functional Dependencies can be of two types: Full Functional Dependency

Partial Functional Dependency Full Functional Dependency: A Functional Dependency A -> B is a full functional dependency if removal of any attribute x from A means that the dependency does not hold any more.

{Empno, Project_no} -> HOURS Full functional dependency: Empno ->hours and Project_no ->Hours

In the above example, Hours is fully functional dependent on both Empno and Project_no. Why? The reason is: The number of hours spent on the project by a particular employee cannot be determined with the project number (project_no) alone. It needs the employee number (empno) as well.

Partial Dependency: An FD A -> B is a partial dependency if there is some attribute x Є A (x subset of A) , that can be removed from A and the dependency will still hold.

{Empno, Project_no } -> Ename

Page 54: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 54

Partial dependency: Empno -> Ename holds.

In the above example, Ename is partially dependent on {Empno, Project_no} Reason being, employee name (ename) can be determined using the employee id (empno) alone even if project_no is removed from the relation. For a table to be in 2nd Normal form, there should be no partial dependencies.

Second Normal Form (2NF)

A relation is in 2NF, if it is in 1NF and every non-key attribute is fully functionally dependent on the primary key of the relation.

2NF prohibits partial dependencies.

The steps for converting a database to 2NF are as follows: Find and remove attributes that are related to only a part of the key. Group the removed items in another table. Assign the new table a key that consists of that part of the old composite key.

If a relation is not in 2NF, it can be further normalized into a number of 2NF relations.

Let us consider the table we obtained after first normalization.

SNO Faculty code

Faculty Name

Date of Birth

Subject Hours

1 100 Smith 17/07/64 Java 16 2 100 Smith 17/07/64 PL/SQL 8 3 100 Smith 17/07/64 Linux 8 4 101 Jones 24/12/72 Java 16 5 101 Jones 24/12/72 Forms 8 6 101 Jones 24/12/72 Reports 12 7 102 Fred 03/02/80 SQL 10

Page 55: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 55

8 102 Fred 03/02/80 Linux 8 9 102 Fred 03/02/80 Java 16 10 103 Robert 28/11/66 SQL 10 11 103 Robert 28/11/66 PL/SQL 8 12 103 Robert 28/11/66 Forms 8 While eliminating the repeating groups, we have introduced redundancy into table. Faculty Code, Name and date of Birth are repeated since the same faculty is multi skilled. To eliminate this, let us split the table into 2 parts; one with the non-repeating groups and the other for repeating groups. Faculty:

Faculty code Faculty Name Date of Birth 100 Smith 17/07/64 101 Jones 24/12/72 102 Fred 03/02/80 103 Robert 28/11/66

Faculty_code Faculty_name, Date_of_Birth

The other table is those with repeating groups.

Subject: SNO Faculty code Subject Hours 1 100 Java 16 2 100 PL/SQL 8 3 100 Linux 8 4 101 Java 16 5 101 Forms 8 6 101 Reports 12 7 102 SQL 10 8 102 Linux 8 9 102 Java 16 10 103 SQL 10 11 103 PL/SQL 8 12 103 Forms 8

Faculty Code is the only key to identify the faculty name and the date of birth.

Hence, Faculty code is the primary key in the first table and foreign key in the second table.

Faculty code is repeated in the Subject table. Hence, we have to take into account the ‘SNO’ to form a composite key in Subject table. Now, SNO +Faculty code can unique identity each row in this table.

Page 56: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 56

Hence, the relation is now in Second Normal form.

Anomalies in 2nd NF: The situation could lead to the following problems:

• Insertion: Inserting the records of various Faculty teaching same subject would result the redundancy of hours information.

• Updation: For a subject, the number of hours allotted to a subject is repeated several times. Hence, if the number of hours has to be changed, this change will have to be recorded in every instance of that subject. Any omissions will lead to inconsistencies.

• Deletion: If a faculty leaves the organization, information regarding hours allotted to the subject is lost.

This Subject table should therefore be further decomposed without any loss of information as:

SNO Faculty code Subject

Subject Hours

Transitive Dependency Transitive dependencies arise:

• When one non-key attribute is functionally dependent on another non-key attribute.

• FD: non-key attribute -> non-key attribute

• And when there is redundancy in database.

Third Normal Form A relation is in 3NF, if it is in 2NF and no non-key attribute of the relation is transitively dependent on the primary key.

3NF prohibits transitive dependencies. In order to remove the anomalies that arose in Second Normal Form and to remove transitive dependencies, if any, we have to perform third normalization.

Now let us see how to normalize the second table obtained after 2NF.

Page 57: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 57

Subject:

SNO Faculty code Subject Hours

1 100 Java 16 2 100 PL/SQL 8 3 100 Linux 8 4 101 Java 16 5 101 Forms 8 6 101 Reports 12 7 102 SQL 10 8 102 Linux 8 9 102 Java 16 10 103 SQL 10 11 103 PL/SQL 8 12 103 Forms 8

In this table, hours depend on the subject and subject depends on the Faculty code and SNO. But, hours is neither dependent on the faculty code nor the SNO. Hence, there exits a transitive dependency between SNO, Subject and Hours.

If a faculty code is deleted, due to transitive dependency, information regarding the subject and hours allotted to it will be lost.

For a table to be in 3rd Normal form, transitive dependencies must be eliminated.

So, we need to decompose the table further to normalize it.

Fac_Sub:

SNO Faculty code Subject

1 100 Java 2 100 PL/SQL 3 100 Linux 4 101 Java 5 101 Forms 6 101 Reports 7 102 SQL 8 102 Linux 9 102 Java 10 103 SQL 11 103 PL/SQL 12 103 Forms

Page 58: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 58

Sub_Hrs:

Subject Hours

Java 16 PL/SQL 8 Linux 8 Forms 8 Reports 12 SQL 10

After decomposing the ‘Subject’ table we now have ‘Fac_Sub’ and ‘Sub_Hrs’ table respectively. By doing so, the following anomalies are addressed in the table.

Insertion: - No redundancy of data for subject and hours while inserting the records.

Updation: - Subject and hours are stored in the separate table. So updation becomes much easier as there is no repetitiveness of data.

Deletion: - Even if the faculty leaves the organization, the hours allotted to a particular subject can be still retrieved from the Sub_Hrs table.

Boyce–Codd Normal Form (BCNF) The intention of Boyce-Codd Normal Form (BCNF) is that - 3NF does not satisfactorily handle the case of a relation processing two or more composite or overlapping candidate keys. A relation R is said to be in BCNF, if and only if every determinant is a candidate key.

In most cases, third normal form is the sufficient level of decomposition. But some case requires the design to be further formalized upto the level of 4th as well as 5th. These are based on the concept of MultiValued Dependency. Let us have a idea about it now. Multivalued Dependency:

Multivalued dependency defined by X Y is said to hold for a relation R(X,Y,Z) if for a given set of values for X, there is a set of associated values for set of values of attribute Y, and X values depend only on X values and have no dependence on the set of attributes Z.

Fourth Normal Form (4NF)

A relation is said to be in fourth normal form if each table contains no more than one multi-valued dependency per key attribute.

Page 59: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 59

Seminar Faculty Topic

DBP-1 Brown Database Principles

DAT-2 Brown Database Advanced Techniques

DBP-1 Brown Data Modeling Techniques

DBP-1 Robert Database Principles

DBP-1 Robert Data Modeling Techniques

DAT-2 Maria Database Advanced Techniques

In the above example, same topic is being taught in a seminar by more than 1 faculty. And Each Faculty takes up different topics in the same seminar. Hence, Topic names are being repeated several times. This is an example of multivalued dependency. For a table to be in fourth Normal Form, multivalued dependency must be avoided.

To eliminate multivalued dependency, split the table such that there is no multivalued dependency.

Fifth Normal Form

A relation is said to be in 5NF if and only if it is in 4NF and every join dependency in it is implied by the candidate keys.

Fifth normal form deals with cases where information can be reconstructed from smaller pieces of information that can be maintained with less redundancy. It emphasizes on lossless decomposition.

Consider the following example:

Faculty Seminar Location

Brown DBP-1 New York Brown DAT-2 Chicago Robert DBP-1 Chicago

Seminar Topic

DBP-1 Database Principles

DAT-2 Database Advanced Techniques

DBP-1 Data Modeling Techniques

Seminar Faculty

DBP-1 Brown

DAT-2 Brown

DBP-1 Robert

DAT-2 Maria

Page 60: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 60

If we were to add the seminar DAT-2 to New York, we would have to add a line to the table for each instructor located in New York.

The table would look like as shown below adding the above information:

Faculty Seminar Location

Brown DBP-1 New York Brown DAT-2 Chicago Robert DBP-1 Chicago Brown DAT-2 New York Robert DAT-2 New York

From the above table, we observe that there is a redundancy of data stored for Brown’s information. So to eliminate this redundancy, we have to do a ‘Non-Loss decomposition’ of the table.

Consider the following decomposition of the above table into fifth normal form:

Seminar Location

DBP-1 New York DAT-2 Chicago DBP-1 Chicago DAT-2 New York

Faculty Location

Brown New York Brown Chicago Robert Chicago Robert New York

Faculty Seminar

Brown DBP-1 Brown DAT-2 Robert DBP-1 Robert DAT-2

Page 61: dbms1

Schema Refinement and Normalization

© SQL Star International Ltd. 61

Generally, table is in fifth normal form when its information content cannot be reconstructed from several smaller tables, i.e., from tables having fewer fields than the original table, each table having different keys. In the normalized form, the fact that ‘Brown’ traveling to ‘New York’ is recorded only once, whereas, in the unnormalized form it may be repeated many times. An attempt has been made to explain Normal forms in a simple yet understandable manner. Some redundancies are unavoidable. One should take care while normalizing a table so that data integrity is not compromised for removing redundancies.

Summary

In this chapter, we have discussed:

Normalization

Reasons for Normalization

Refining a database

Normal Form

Types of Normal Forms

Page 62: dbms1

© SQL Star International Ltd. 62

Chapter 5

Supertypes and Subtypes

Supertype Subtype

Inheritance Relationships and Subtypes

Supertype/Subtype Notation Generalization and Specialization

Constraints in Supertype Constraints in Supertype/Subtype

Supertype/Subtype Hierarchy Domains

Domain Integrity Constraints

Page 63: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 63

Objectives In this chapter, we will discuss:

Advanced concepts of database design

Defining Subtypes and Supertypes

Generalization and Specialization

Using Constraints in Supertype

Using Constraints in Supertype/Subtype Discriminators

Supertype/Subtype Hierarchy

Domains

Page 64: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 64

Basics

Supertype

Supertype is a generic parent entity that contains generalized attributes and key. It is a generic entity type that has a relationship with one or more subtypes.

Subtype

A subtype is a subgrouping of the entities in an entity type, which has attributes that are distinct from those in other sub groupings. Subtypes are category entities that inherit the attributes keys, and relationships of the Supertype entity. Each subtype entity will contain the migrated foreign key and only those attributes that pertain to the category type.

Inheritance Subtype entities inherit values of all attributes of the supertype. An instance of a subtype is also an instance of the supertype. By this important property, the subtype entities inherit values of all attributes of the supertype. It makes it unnecessary to include supertype attributes redundantly with the subtypes.

Attributes shared by all entities

SUPERTYPE General entity

type

Subtype 1 Subtype 2

Attributes unique

to subtype 1

Attributes unique

to subtype 2

Specialized versions of supertype

And so forth

Page 65: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 65

Figure 1: Basic notation for supertype/subtype relationships Example: The following figure shows an Employee supertype with three subtypes. Figure 2: An Employee supertype with three subtypes

Relationships and Subtypes a) Relationships at the supertype level indicate that all subtypes will participate in the relationship. b) The instances of a subtype may participate in a relationship unique to that subtype. In this situation, the relationship is shown at the subtype level.

All Employee subtypes will have Emp name, number, date_hired and address.

Each Employee subtype will also have its own attributes.

EMPLOYEE

Employee na

Address

Employee_no

Date_hired

Hourly Employee

Salaried employee

Consultant

Hourly_rate

annual_salary

Stock_option

contact_numbe

Billing_rate

Page 66: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 66

Example: The following figure shows the supertype/subtype relationships in a hospital:

Figure 3: supertype/subtype relationships in a hospital

Supertype/Subtype Notation The hieararchy of the supertype/suntype notation is as follows:

SUPERTYPE SUBTYPE 3 SUBTYPE 2 SUBTYPE 1 Attributes unique to subtype 1 Attributes unique to subtype 2 Attributes unique to subtype 3 Attributes shared by all entities

Page 67: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 67

Generalization and Specialization Generalization In general, an object can be described by its shared characteristics; the attributes. For example, we can characterize an employee by their employee id, name, job title and skill set.

Another method of characterizing entities is by both similarities and differences. For example, suppose an organization categorizes the work it does, into internal and external projects. Internal projects are done on behalf of some unit within the organization. External projects are done for entities outside of the organization. We can recognize that both types of projects are similar in that each involves work done by employees of the organization within a given schedule. Yet, we also recognize that there are differences between them. External projects have unique attributes, such as a customer identifier and the fee charged to the customer.

This process of categorizing entities by their similarities and differences is known as generalization.

Generalization hierarchies should be used when:

A large number of entities appear to be of the same type

Attributes are repeated for multiple entities

The model is continually evolving Rules for Generalization

The primary rule of generalization hierarchies is that each instance of the supertype entity must appear in at least one subtype; likewise, an instance of the subtype must appear in the supertype. Subtypes can be a part of only one generalization hierarchy. That is, a subtype cannot be related to more than one supertype. However, generalization hierarchies may be nested by having the subtype of one hierarchy be the supertype for another. Subtypes may be the parent entity in a relationship, but not the child. If this were allowed, the subtype would inherit two primary keys. The following figure shows three entity types: CAR, TRUCK and MOTORCYCLE.

Page 68: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 68

Figure 4: Example of Generalization

Specialization It is the process of defining one or more subtypes of the supertype, and forming supertype/subtype relationships TOP-DOWN.

Constraints in Supertype Completeness Constraints

The completeness constraint addresses the question of whether an instance of a supertype must also be a member of at least one subtype. There are two possible rules:

a) Total Specialization Rule

The total specialization rule specifies that each entity instance of the supertype must be a member of some subtype in the relationship. For example: all STUDENTS are either UNDERGRADUATE or GRADUATE students.

It is denoted by a double line. c) Partial Specialization Rule

The partial specialization rule specifies that an entity instance of the supertype is allowed to not belong to any subtype. For example: FACULTY and STAFF are not the only possible members of the entity EMPLOYEE.

Page 69: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 69

It is denoted by a single line. Following are the examples of completeness constraints.

Figure 5: Total specialization rule

Figure 6: Partial specialization rule Disjointness Constraints

The disjoint constraint addresses the question of whether an instance of a Super type may simultaneously be a member of two (or more) subtypes.

Page 70: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 70

There are two possible rules: a) Disjoint Rule

The disjoint rule specifies that if an entity instance is a member of one subtype, it cannot simultaneously be a member of any other subtype. For example: all PERSONS are either MALE or FEMALE.

It is denoted by the letter “d”. b) Overlap Rule

The overlap rule specifies that an entity instance can simultaneously be a member of two (or more) subtypes. For example: an ATHLETE can be both a RUNNER and a JUMPER. It is denoted by the letter “O”.

Figure 7: An example of Disjoint Rule

Page 71: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 71

Figure 8: An example of Overlap Rule

Constraints in Supertype/Subtype

Discriminators Subtype Discriminator

The subtype discriminator is “an attribute of the supertype whose values determine the target subtype(s)”. It is used to direct into which of the subtypes (if any) a new instance of the supertype should be inserted.

Disjoint - a simple attribute with alternative values to indicate the possible subtypes. Overlapping - a composite attribute whose subparts pertain to different subtypes. Each subpart contains a Boolean value to indicate whether or not the instance belongs to the associated subtype. The following figure introduces subtype discriminators - disjoint rule and overlap rule.

Page 72: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 72

Figure 9: Introducing a subtype discriminator (Disjoint Rule)

Figure 10: Introducing a subtype discriminator (Overlap Rule)

Page 73: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 73

Supertype/Subtype Hierarchy

A supertype/subtype hierarchy is “a hierarchical arrangement of supertypes and subtypes, where each subtype has only one supertype”.

In this hierarchy, attributes are assigned at the highest logical level that is possible in the hierarchy. Subtypes that are lower in the hierarchy inherit attributes not only from their immediate supertype, but also from all supertypes higher in the hierarchy, up to the root. The following figure shows the supertype/subtype hierarchy:

Figure 11: Example of supertype/subtype hierarchy

Page 74: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 74

Domains A domain is a conceptual pool of values from which one or more attributes draw their actual values. Examples: DOMAIN AGE RANGE 0-127 ATTRIBUTE EMPLOYEE.AGE 16-65 ATTRIBUTE DEPENDENT.AGE 0-60 Two values can only be compared if they come from the same domain.

Defining a Domain The syntax to create a domain in a database is as follows:

CREATE { DOMAIN | DATATYPE } [ AS ] domain-name data-type [ [NOT] NULL] [DEFAULT default-value] [ CHECK ( condition ) ]

Domain-name: identifier

data-type: built-in data type, with precision and scale

Example: DOMAIN GENDER - Data Type: Character - Length: 6 bytes - Allowable Values: Male, Female, Null - Storage Format: Uppercase - Operations Allowed: - Inherited Operators: String, Unstring, = - Input Editing: Nil - Extra Functions: Is_ Male, Is_Female,What_Gender Domain Integrity Constraints

Domains are used in the relational model to define the characteristics of the columns of a table. The domain specifies its own name, data type and logical size. The logical size represents the size as perceived by the user, not how it is implemented internally. For example, for an integer, the logical size represents the number of digits used to display the integer, not the number of bytes used to store it. The domain integrity constraints are used to specify the valid values that a column defined over the domain can take. You can define the valid values by listing them as a set of values

Page 75: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 75

(such as an enumerated data type in a strongly typed programming language), a range of values, or an expression that accepts the valid values. Strictly speaking, only values from the same domain should ever be compared or be integrated through a union operator. Note that a formal treatment of the domain concept would require the following for all of the domains: The ability to specify the complete set of domains that apply to a given

database (the result of any operation on any column defined over any domain must then yield a result in one of the specified domains).

The ability to specify - for every domain, pair of domains, triplet of domains, and so on - which operators can be applied to the values taken from the domains, as well as what the domain of the result must be.

The ability to specify an ordering of the values in the domain.

Page 76: dbms1

Supertypes and Subtypes

© SQL Star International Ltd. 76

Summary

In this chapter, we have discussed:

Advanced concepts of database design

Defining Subtypes and Supertypes

Generalization and Specialization

Using Constraints in Supertype

Using Constraints in Supertype/Subtype Discriminators

Supertype/Subtype Hierarchy

Domains

Page 77: dbms1

Exercises

© SQL Star International Ltd. 2008 77

Exercises

Page 78: dbms1

Exercises

© SQL Star International Ltd. 2008 78

Chapter 6 E-R Diagrams

1. Construct an E-R Diagram for a hospital with a set of patients and a set of medical

doctors. A log of the various conducted tests is associated with each patient. Construct the normalized relations from this ER diagram.

2. Construct an E-R Diagram for a car insurance company with a set of customers, each of who owns a number of cars. Each car has a number of accidents associated with it. Construct the normalized relations from this ER diagram.

3. Consider the following E-R Diagram: Represent the diagram in the relational model by relations (tables).

4. Suppose we have a database consisting of the following 3 relations:

FREQUENTS ( DRINKER, BAR ) SERVES ( BAR, BEER ) LIKES ( DRINKER, BEER )

The first relation indicates the bars each drinker visits, the second tells what beers each bar serves, and the last indicates which beers each drinker likes to drink.

Draw an E-R Diagram for the given relations.

5. An education database contains information about an in-house company education-training scheme. For each training course, the database contains details of all prerequisite courses and all offerings for that course; and for each offering it contains details of all teachers and all student enrollments for that offering. The database also contains information about employees. The relevant relations in outline are as follows:

COURSE ( COURSE#, TITLE ) PREREQ (SUP_COURSE#, SUB_COURSE# ) OFFERING ( COURSE#, OFF#, OFFDATE, LOCATION ) TEACHER ( COURSE#, OFF#, EMP# ) ENROLLMENT ( COURSE#, OFF#, EMP#, GRADE ) EMPLOYEE ( EMP#, ENAME, JOB )

The meaning of the PREREQ relation is that the superior course (SUP_COURSE#) has the subordinate course (SUB_COURSE#) as an immediate prerequisite.

Draw an E-R Diagram for this education database.

Page 79: dbms1

Exercises

© SQL Star International Ltd. 2008 79

Normalization

6. Consider the table: Course_No. Course_Name Student_Name Address Credits

CIS200 Information Systems John Warner 23, Main St. 5

CIS220 Information Systems Tim Hoffman 87, River Rd. 5

CIS220 Information Systems Jenny Lin 18, Wind Circle 5

CIS450 System Ana. and Des. Alice Chalmers 5483, Ocean Bld. 5

CIS480 Communication N/ws. John Warner 23, Main St. 5

CIS480 Communication N/ws. Jenny Lin 18, Wind Circle 5

a) Does the relation contain any repeating groups? Explain.

b) In what normal form is this relation?

c) If the relation is not already in third normal form, develop new relations that meet the requirements of 3NF.

7. The following figure is an un-normalized representation of a collection of information to be recorded in a company personnel database. The figure is intended to be read as follows:

• The company has a set of departments.

• Each department has a set of employees, a set of projects, and a set of offices.

• Each employee has a job history (set of jobs the employee has held). For each such job, the employee also has a salary history (set of salaries received while employed on that job).

• Each office has a set of phones.

The database is to contain the following information:

• For each department: department number (unique), budget, and the department manager’s employee number.

• For each employee: employee number (unique), current project number, office number, and phone number; also, title of each job the employee has held, plus date and salary for each distinct salary received in that job.

• For each project: project number (unique) and budget.

• For each office: office number (unique), area in square feet, and numbers (unique) of all phones in that office.

• Design an appropriate set of normalized relations to represent this information. State any assumptions you make concerning the dependencies involved.

Normalization - A sample example

Page 80: dbms1

Exercises

© SQL Star International Ltd. 2008 80

Normalization is defined briefly, but accurately in the following statement:

‘The Key, the Whole Key and Nothing but the Key’!

Typically, the literature on normalization covers many levels of normalization, 9 is not uncommon, but this seems to me to be a race amongst academics to identify as many levels as possible, in 99 cases out of 100, 3 levels of normalization are all that is required.

1st Normal Form: Converting an un-normalized data structure, such as a report or an order form into 1st Normal Form (1NF) is commonly referred to as removing repeating groups, but also may involve removing complex groups, such as the Address Group described in rule 2. The aim is to ensure that each item is atomic.

2nd Normal Form: Converting a 1NF data structure into 2nd Normal Form (2NF) involves looking at each non-primary key attribute and ensuring that it depends on the whole of the key and not just part of it.

3rd Normal Form: Converting a 2NF data structure into 3rd Normal Form (3NF) involves looking at the interrelationships between non-key attributes to see if any non-key attributes depend only on each other.

This is all best described by looking at an example. Consider the following table, which has been built up by an order entry clerk.

However, this seems to be a clumsy approach and results in a three part key consisting of Cust#, Ord# and Part#. A simpler approach is to separate the repeating groups out into separate tables.

Step 1: Remove the repeating group of orders:

CUSTOMERS(Customer_Number, Customer_Name)

ORDERS(Order_Number, Customer_Number*, Order_Date, (Part_Number,

Part_Description,Part_Quantity,Part_Price,Supplier_Number, Supplier_Name))

Step 2: Remove the repeating group of parts:

CUSTOMERS(Customer_Number, Customer_Name)

ORDERS(Order_Number, Customer_Number*, Order_Date)

ORDER_PARTS(Part_Number,Order_Number*,Part_Description, Part_Quantity,

Part_Price, Supplier_Number, Supplier_Name)

The structure is now in 1NF, since there are no repeating or complex group items (each item depends on the key). The next step is to convert the structure into 2NF, by examining each non-primary key attribute to ensure that each depends on the whole of the key.

The CUSTOMERS and ORDERS tables each has a single column making up their primary key and are therefore by definition in 2NF. However, looking at the ORDER_PARTS table, it can be seen that Part_Description, Part_Price, Supplier_Number and Supplier Name only depend on Part_Number, i.e. their values are the same regardless of Order_Number. (Part_Quantity depends on the whole of the key since different quantities can appear on different orders.) To convert to 2NF, a separate table is created for part descriptions, prices, and supplier details.

Page 81: dbms1

Exercises

© SQL Star International Ltd. 2008 81

CUSTOMERS(Customer_Number, Customer_Name)

ORDERS(Order_Number, Customer_Number*, Order_Date)

ORDER_PARTS(Part_Number, Order_Number*, Part_Quantity)

PARTS(Part_Number,Part_Description,Part_Price,Supplier_Number, Supplier_Name)

The structures are now in 2NF, since every non-primary key attribute depends on the whole of the key. The next step is to convert the structure into 3NF by ensuring that each non-primary key attribute depends on nothing, but the key.

The CUSTOMERS table is patently in 3NF, because there is no non-primary key attribute for Customer_Name to depend on. The ORDERS table is in 3NF, because there is no dependency between Order_Date and Customer_Number (a customer can place different orders on different dates). The ORDER_PARTS table is in 3NF, because the quantity ordered is dependent on both the order number and the part number. Looking however at the PARTS table it can be seen that the Supplier_Name attribute depends on the Supplier_Number and has nothing to do with the part number. To convert the structure into 3NF, a separate table is created containing supplier details.

CUSTOMERS(Customer_Number, Customer_Name)

ORDERS(Order_Number, Customer_Number*, Order_Date)

ORDER_PARTS(Part_Number, Order_Number*, Part_Quantity)

PARTS(Part_Number, Supplier_Number*, Part_Description, Part_Price)

SUPPLIERS(Supplier_Number, Supplier_Name)

Sample Example Of E-R Diagram

Company: Organized into Departments, Each Department has a name, number and manager who manages the department. The Company keeps track of the date that the employee manages the department. A Department may have a several locations.

Department:

A Department controls a number of Projects each of which has a unique name, number and a single Location.

Employee:

Name, Age, Gender, BirthDate, SSN, Address, Salary. An Employee is assigned to one department, may work on several projects, which are not controlled by the department. Track of the number of hours per week is also controlled.

Keep track of the dependents of each employee for insurance policies: We keep each dependant first name, gender, Date of birth and relationship to the employee.

Page 82: dbms1

Exercises

© SQL Star International Ltd. 2008 82

Example:

Manage: Department and Employee Partial Participation Relation Attribute : StartDate. Works For: Department and Employee Total Participation Control : Department , Project Partial Participation from Department Total Participation from Project Control Department is a RKA. Supervisor : Employee, Employee Partial and Recursive

Page 83: dbms1

Exercises

© SQL Star International Ltd. 2008 83

Works–On: Project , Employee Total Participation Hours Worked is a RKA. Dependants of: Employee , Dependant Dependant is a Weaker Dependant is Total , Employee is Partial.

Summary of Conceptual Design Conceptual design follows requirements analysis. It yields a high-level description of data to be stored.

ER model is popular for conceptual design. Its constructs are expressive, close to the way people think about their applications.

Basic constructs: Entities, Relationships, and Attributes (of entities and relationships). Some additional constructs are: Weak entities, ISA hierarchies, and Aggregation.

Note: There are many variations on ER model.

Summary of ER

Several kinds of integrity constraints can be expressed in the ER model: Key constraints, Participation constraints, and Overlap/Covering constraints for ISA hierarchies. Some Foreign key constraints are also implicit in the definition of a relationship set.

• Some of these constraints can be expressed in SQL only if we use general CHECK constraints or assertions.

Page 84: dbms1

Exercises

© SQL Star International Ltd. 2008 84

• Some constraints (notably, functional dependencies) cannot be expressed in the ER model.

• Constraints play an important role in determining the best database design for an enterprise.

ER design is subjective. There are often many ways to model a given scenario! Analyzing alternatives can be tricky, especially for a large enterprise. Common choices include:

Entity vs. attribute, entity vs. relationship, binary or n- ary relationship, whether or not to use ISA hierarchies, and whether or not to use aggregation.

Ensuring good database design: Resulting relational schema should be analyzed and refined further. FD information and normalization techniques are especially useful.

Page 85: dbms1

Exercises

© SQL Star International Ltd. 2008 85

Case Studies

1. Prescriptions-R-X chain

The Prescriptions-R-X chain of pharmacies has offered to give you a free lifetime supply of medicines if you design its database. Given the rising cost of health care, you agree. Here's the information that you gather:

Patients are identified by an SSN, and their names, addresses, and ages must be recorded.

Doctors are identified by an SSN. For each doctor, the name, specialty, and years of experience must be recorded.

Each pharmaceutical company is identified by name and has a phone number.

For each drug, the trade name and formula must be recorded. Each drug is sold by a given pharmaceutical company, and the trade name identifies a drug uniquely from among the products of that company. If a pharmaceutical company is deleted, you no longer need to keep track of its products.

Each pharmacy has a name, address, and phone number.

Every patient has a primary physician. Every doctor has at least one patient.

Each pharmacy sells several drugs and has a price for each. A drug could be sold at several pharmacies, and the price could vary from one pharmacy to another.

Doctors prescribe drugs for patients. A doctor could prescribe one or more drugs for several patients, and a patient could obtain prescriptions from several doctors. Each prescription has a date and a quantity associated with it. You can assume that if a doctor prescribes the same drug for the same patient more than once, only such last prescription needs to be stored.

Pharmaceutical companies have long-term contracts with pharmacies. A pharmaceutical company can contract with several pharmacies, and a pharmacy can contract with several pharmaceutical companies. For each contract, you have to store a start date, an end date, and the text of the contract.

Pharmacies appoint a supervisor for each contract. There must always be a supervisor for each contract, but the contract supervisor can change over the lifetime of the contract.

1. Draw an ER diagram that captures the above information. Identify any constraints that are not captured by the ER diagram.

2. How would your design change if each drug must be sold at a fixed price by all pharmacies?

3. How would your design change if the design requirements change as follows: If a doctor prescribes the same drug for the same patient more than once, several such prescriptions may have to be stored.

2. Dane County Airport

Page 86: dbms1

Exercises

© SQL Star International Ltd. 2008 86

Computer Sciences Department has been frequently complaining to Dane County Airport officials about the poor organization at the airport. As a result, the officials have decided that all information related to the airport should be organized using a DBMS, and you've been hired to design the database. Your first task is to organize the information about all the airplanes that are stationed and maintained at the airport.

The relevant information is as follows:

• Every airplane has a registration number, and each airplane is of a specific model.

• The airport accommodates a number of airplane models, and each model is identified by a model number (e.g., DC-10) and has a capacity and a weight.

• A number of technicians work at the airport. You need to store the name, SSN, address, phone number, and salary of each technician.

• Each technician is an expert on one or more plane model(s), and his or her expertise may overlap with that of other technicians. This information about technicians must also be recorded.

• Traffic controllers must have an annual medical examination. For each Traffic controller, you must store the date of the most recent exam.

• All airport employees (including technicians) belong to a union. You must store the union membership number of each employee. You can assume that each employee is uniquely identified by the social security number.

• The airport has a number of tests that are used periodically to ensure that airplanes are still airworthy. Each test has a Federal Aviation Administration (FAA) test number, a name, and a maximum possible score.

• The FAA requires the airport to keep track of each time that a given airplane is tested by a given technician using a given test. For each testing event, the information needed is the date, the number of hours the technician spent doing the test, and the score that the airplane received on the test.

1. Draw an ER diagram for the airport database. Be sure to indicate the various attributes of each entity and relationship set; also specify the key and participation constraints for each relationship set. Specify any necessary overlap and covering constraints as well (in English).

2. The FAA passes a regulation that tests on a plane must be conducted by a technician who is an expert on that model. How would you express this constraint in the ER diagram? If you cannot express it, explain briefly.

3. University Database

Page 87: dbms1

Exercises

© SQL Star International Ltd. 2008 87

Consider the following information about a university database:

• Professors have an SSN, a name, an age, a rank, and a research speciality.

• Projects have a project number, a sponsor name (e.g., NSF), a starting date, an ending date and a budget.

• Graduate students have an SSN, a name, an age and a degree program (e.g., M.S. or Ph.D.).

• Each project is managed by one professor (known as the project's principal investigator).

• Each project is worked on by one or more professors (known as the project's co-investigators).

• Professors can manage and/or work on multiple projects.

• Each project is worked on by one or more graduate students (known as the project's research assistants).

• When graduate students work on a project, a professor must supervise their work on the project. Graduate students can work on multiple projects, in which case they will have a (potentially different) supervisor for each one.

• Departments have a department number, a department name and a main office. Departments have a professor (known as the chairman) who runs the department.

• Professors work in one or more departments, and for each department that they work in, a time percentage is associated with their job.

• Graduate students have one major department in which they are working on their degree.

• Each graduate student has another, more senior graduate student (known as a student advisor) who advises him or her on what courses to take.

Design and draw an ER diagram that captures the information about the university.

Use only the basic ER model here that is, entities, relationships, and attributes. Be sure to indicate any key and participation constraints.


Recommended