Handout Dbms

8/3/2019 Handout Dbms

1/25


2/25

There are various types of data files. On the basis of changes, which take place in the data, they

are classified into Master Files and the Transaction Files.

Master Files The master file is the one, which contains or stores relatively permanent type ofdata. For example, Employee No., Employee Name, Basic Pay, Customer Address, ProductCode, Bank Account No., etc. This data is updated only when there is any change in suchalmost static data. For example: The basic pay of only those employees needs to be changedwho are due for increment in a specific month or the address of those customers who haveshifted their office needs to be updated.

Transaction Files This file holds or stores the data temporarily and is used to update themaster file data. The data may pertain to creating new employee records, deleting records ofemployees who have left, modifying the data. This records the day-to-day transactions oractivities in the organization.

The above files are stored on magnetic storage devices so that the data can be retrieved andprocessed using various techniques.

File Organization and Access Methods

File organization basically refers to the way the data is organized on the magnetic media and its

retrieval. This organization method determines the speed of access to any piece of data. The

various methods of file organization are:

Sequential File

Direct or Random File

Indexed Sequential File

Sequential File Organization A sequential file is a file with a record structure in which therecords are stored one after the other on a tertiary device and can only be accessed in thatsequence. It is also sometimes referred to as serial organization as it stores records in a sequential

order on some specific attribute. For example, storing student records in the order of roll

numbers, employee records in the order of employee code, etc. This facilitates the access asmajority of the time the data is processed in the same sequence. This is similar to audiocassette

recording, on which to listen to third song, you have to skip or forward the first two songs. The

popular storage devices used for this organization are Magnetic tape or cartridge tape. The other

devices could be floppy disk, magnetic disk. This organization method is useful and lessexpensive for processing large number of records. For example, payroll processing, generating

accounts receivables, etc. The major disadvantage of this method is that locating or accessing the

records is time consuming i.e. to reach the record of student with roll no. 55, one has to wait tillthe first 54 records are skipped. The bigger the file more is the time to access a record.

Direct or Random File Organization A direct access file has a symbolic name (key) for eachrecord from which the position of the record in the file is calculated. This method of organizationdoes not follow any sequence for storing the records. Instead it retrieves the records on thebasis of their key value i.e. the data field that uniquely identifies a record in a random manner.This method of storage and access is useful for application where records need to be retrievedone by one within a reasonably short time. For example: In railway reservation system, online

2


3/25

updation of bookings. Any record can be retrieved instantaneously by typing the key value, forexample train number or PNR number or Item code or customer code, etc. This method issimilar to the access of songs stored on a gramophone record. Just placing the needle on thattrack, immediately plays the song. The storage devices that provide such access are: Magneticdisk, Floppy Disk, CD, etc.

Since there is no fixed way of organizing the data, the access to a record depends on linking thekey field data with the physical location of the record on the storage device. One popularmethod used for this is called Hashing algorithm. Hashing algorithm is an ingenious and usefulform for address calculation. In this the record key is converted into a near random number (byusing a mathematical formula) and this number gives the address of the record on the storage.Sometimes instead of the record address, it refers to the address of a group of records (called abucket or pocket) and the number of logical records in that group is called the bucket capacity.This method is much faster than sequential organization for accessing a specific record but ismore expensive to due need for random storage devices. It is therefore not cost-effective forprocessing large volume of data such as payroll.

Indexed Sequential File Organization This file organization method combines the features of

sequential and the random file methods. This method permits a file to be used both insequential access mode as well as random access mode. This method stores the records insequential order of the key element similar to sequential files. An index sequential file is a filewith a record structure that can be accessed at random via an index. The index is oftenorganized as a direct access file. The index entries point to a block or a track on disk. The blockor track pointed to will then be sequentially scanned for the requested record. In addition, anindex is also created which records the key values and their corresponding physical address onthe storage device. This requires the use of random storage devices such as Magnetic disk orCDs. For example, the student file need to be accessed randomly while updating change in theiraddress but sequential access is desirable for generating a list of students who have not paidthe fees in a particular semester. In a stock master file, to update the quantity issued for specificitems, required random access to item records by giving their codes, but sequential access

would be economical for finding say, the items having stock value > 1000.

This method is quite useful as it provides dual access to the file but is more expensive thansequential access due to extra storage for index and slower than random access method due toadditional index search.

The file management system is the software that helps in creating, retrieving and manipulatingfiles on a magnetic storage device. The traditional file management system has certaindrawbacks that have affected their utility for developing applications.

Drawbacks of the Traditional File Management System

The traditional file environment where the operational data of the organization is widelydispersed into separate files. This puts several limitations that in turn restrict its utility. The

major drawbacks are listed below:

Data Redundancy Data redundancy refers to the use or duplication of same data fields inmany different application files. This results in repetition or duplication of data in multiple files inthe organization. For example, Employee code and name will appear in Payroll file, Providentfile, Personal file, Income tax file, etc. The availability of same information at several places in

3


4/25


5/25

A Database Management System (DBMS) is computer software that manages easy and quickaccess to databases and allows its manipulation by inserting, deleting, modifying or queryingthe data. A DBMS therefore stores, processes, and retrieves data from the database. TheDBMS offers a number of services also. It defines a method of storing the data and providesservices that allow you to retrieve and manipulate that data. It provides simultaneous access to

a large amount of information stored in the database to a number of users. The DBMS alsoensures that the data stored in the database is accurate and secure.

Figure : Functionality of Database Management System

A database management system integrates the data files into a database and can providedifferent views of the data to different users depending on their requirements. A DBMS

therefore makes it possible to access integrated data across multiple operations, functions andorganizational boundaries.

DATABASE CONCEPTS/TERMINOLOGY

The most commonly used terms in database systems are discussed below:

Database

Any organization must store information about its suppliers, customers, employees, sales ordersetc. A database is an organized collection of related information stored at a centrallocation. For example, a company maintains a database of its personnel; a college

maintains a database of students, a hospital having a database of its patients records,etc. A database is simply a collection of data. To manage the database that stores alarge amount of important information efficiently, an appropriate database system isneeded. A database system, thus, is a computerized record-keeping system, thepurpose of which is to maintain data and to make that data available on demand.

5


6/25

USERS

APPLICATION PROGRAMS

Payroll Sales & Inventory Personnel

Marketing

Database

Management

System

(DBMS)

Figure: DBMS A simplified users view

Entities

An entity is any item about which we collect and store information in the database. The entitymay be a tangible object such as: a person, an employee, a place, etc. It may also be intangibleobject such as an event, concept, condition, etc. In a data processing system, we generally areconcerned with collection of similar entities such as employees, customers, students, etc. Forexample: in a university environment, the entities about which data is stored are: STUDENTS,FACULTY, COURSES, EXAMINATIONS, etc. In a hospital environment, the entities are:PATIENTS, DOCTORS, NURSES, ROOMS, DRUGS, etc. In a manufacturing system, thevarious entities are: SUPPLIERS, PARTS, CUSTOMERS, ORDERS, SHIPMENTS, etc. For anydatabase to be developed, the selection of entities in the organization is the first step. Thisdepends on the kind of problem to handle, its relationship with other activities, etc.

Attributes

An attribute is the data field or a field that describes the entity. Therefore every entity has somebasic attributes that characterize it. It refers to the various items on which we record the dataabout entity. For example: Student may be described by attributes such as his RegistrationNumber, Name, Address, Telephone, Date of Birth, Class, Course, etc. We may select theattributes, which may store any type of data such as text, numbers, graphics, audio, video, etc.

6

Database

Management

System (DBMS)

QueryLanguage

Integrated

Database


7/25

Data Value

A data value is the actual data or information contained in each data field. This is called datavalue as it records the specific detail or facts of an individual entity. The data field employeename can take values like S K Gupta, V Kumar, Jyoti, etc. These values could be

quantitative, qualitative or descriptive, depending on how the data fields describe the entity. Forexample:

9811034 PRANAV 77 MODEL TOWN, DELHI 7278593 20/09/84 BCA

are the data values that identify a student.

Key Data Field

From among the various data fields that describe an entity, the value of some of the data fieldscan uniquely identify the values contained in other data fields of the same entity. For example,

knowing the employee code, we can find his personal data like name, date of birth, date of joining, scale of pay, qualifications, etc. These data fields that helps in identifying other datafields are called Key Data Fields. A key field contains unique data used to identify a record sothat it can be easily retrieved and processed. Some examples of key fields are: Customer Code,Vendor Identification, Student Registration No., PAN No., Customer Account No., etc.Sometimes we find that there are more than one key field in an entity. These data fields arecalled candidates for becoming key data fields and are therefore also referred as candidatekey. The selection of primary or key data field is very important as it may directly affect thedatabase design process.

OBJECTIVES OF DATA BASE MANAGEMENT SYSTEMS

The decision of an organization to store all its operational data in an integrated is based on theprimary advantage offered by a database i.e. Centralized control of operational data. Thereforethe major objective of database management system is:

Integrating Databases to provide Centralized Control over data This is required since forproblem solving the data needed may be residing in different databases. This demands thatthe databases should be integrated. This integration provides a centralized control oforganizational data.

ADVANTAGES AND DISADVANTAGES OF DBMS

This integration or centralized control of data results in several advantages as well as

disadvantages that accrue to an organization. These are:

The amount of redundancy in the stored data can be minimized. In a database, datafields are stored once only instead of repeating them. The single data field can beshared by various applications or users. The duplication of data cannot be eliminatedcompletely and instead helps sometimes for improved performance, however it isreduced considerably or minimized. This makes it less expensive as the excessive datastorage needs (due to duplication) are avoided.

7


8/25

Problem of inconsistency in the stored data can be avoided. This advantageaccrues from the reduced redundancy of data. Since data is stored at one place, anyupdation needs to be done only once at one place only. This results in improvedconsistency of data.

The sharing of stored data is possible. The database allows sharing of informationamong many users or applications. This means that data may be stored once and canthen be retrieved any number of times for specified purpose by authorized users of thedatabase. This helps in reduced storage requirements and improved consistency.

Higher Program Independence. The maintenance of database is easy and simple asprograms are tied with the database view instead of file formats. The database view isindependent of the physical storage media.

Security Restrictions can be Applied. Although several users can share data, accessto specific or selected piece of information can be restricted to selected authorizedusers. The DBA can ensure that access to the confidential or sensitive data in thedatabase is allowed to legitimate users after proper authorization checks such aspasswords. The access could be permitted either for retrieve, update, or deleteprocesses or a combination of these.

Improved Data Integrity. This requires that the database contain accurate data. Theaccuracy and consistency of data is ensured by reduced redundancy and absence ofinconsistency. To further improve the accuracy of data, validation procedures may usedwhile updating the data. A database that is secure and reliable is said to have dataintegrity.

Improved User Productivity. The DBMS are quite simple to use and provide flexibleways of generating information from the database by using English like queries.

Standards can be Enforced. With centralized control of data, the industry standardscan be adopted throughout the organization in representing the data. For example,instead of using different units of measurement like kilograms, grams, quintals, we mayuse a standard unit kilograms.

Although the database system offers clear advantages, there are certain disadvantagesassociated with them. The major ones are:

Enterprise Vulnerability. Centralizing all data of an enterprise in one database may mean thatthe database becomes an indispensable resource. The survival of the enterprisemay depend on reliable information being available from its database. Theenterprise therefore becomes vulnerable to the destruction of the database or tounauthorized modification of the database.

Data Quality. Since the database is accessible to users remotely, adequate controls areneeded to control users updating data and to control data quality. With increasednumber of users accessing data directly, there are enormous opportunities for users to

damage the data. Unless there are suitable controls, the data quality may becompromised.

Cost. Using a database requires high costs in acquiring expensive databasemanagement software, high random disk storage, higher memory and more skilledmanpower for design, development and maintenance.

8


9/25

Data Integrity. Since a large number of users could be using a database concurrently, technicalsafeguards are necessary to ensure that the data remain correct during operation.The main threat to data integrity comes from several different users attempting toupdate the same data at the same time. The database therefore needs to beprotected against inadvertent changes by the users.

Confidentiality and Security. When information is centralized and is made available tousers from remote locations, the possibilities of abuse are often more than in aconventional system. To reduce the chances of unauthorized users accessing sensitiveinformation, it is necessary to take technical, administrative and, possibly, legalmeasures.

Privacy. This relates to the ethical use of database for specified purposes. They shouldnot intrude into peoples privacy.

DATABASE ADMINISTRATOR(DBA)

An organization with integrated database will have some identifiable person who has the centralresponsibility for the operational data. The Database Administrator (DBA) is a person or groupof persons who coordinates and manages all activities and procedures related to organizationaldatabase. The major responsibilities of DBA include:

Database Planning This involves: understanding information requirements of theorganization and the users, selecting the DBMS, specifying security mechanisms. This ineffect results in deciding what information should be stored in the database and the type ofdata model to be used for representing data in the database.

Database design This include three functions: defining conceptual schema, storagestructure and the mappings between the two.

Database creation This involves converting the design framework into actual database byentering the data and storing the same on magnetic devices.

Database implementation and Maintenance This is concerned with meeting the changinguser needs by modifying the database. This involves changing the structure andorganization of the database to meet the changed requirements. The actual deletion oraddition of records is the job of the actual users.

Liaison with the users This is to ensure that users needs are being met, monitoring howand what is being used, determining user access privileges, etc.

Ensuring database security The DBA is responsible to ensure integrity (security andreliability) of data by preventing unauthorized access to the database. To achieve this, it isrequired to define authorization checks, and validation procedures using the DDL.

Backup and Recovery Once an organization creates and starts using the database, theiroperations become dependent on the smooth operation of the system. In the event of anydamage to the database (hardware or software errors, fire, etc), it is essential to be able torecover or buildup the data with minimum delay. This requires that DBA must develop

9


10/25

appropriate backup and recovery procedures in the system such as periodic dumping ofdatabase on a backup storage device.

Performance monitoring This ensures that the system is serving the users in the bestpossible way. A general complaint that system is too slow needs to be taken care of. Heshould respond to changes in physical storage by modifying the definitions.

Defining Concurrency procedures This refers to the sharing of database by multiple users.If a piece of data is to be shared by multiple users, this may lead to chaos situation as two ormore users may be trying to change some piece of data. It is therefore essential to haveappropriate concurrency procedures defined to avoid the problems on account of concurrentaccess to data.

TYPES OF DATABASE MANAGEMENT SYSTEMS

The database systems can be organized into four basic categories. These are:

HIERARCHICAL

NETWORK

RELATIONAL

OBJECT ORIENTED

These databases have evolved over a period of time as shown in Figure 8.3. The currently useddatabases are the relational and object-oriented databases.

Hierarchical Databases In hierarchical database organization, the data is represented by asimple tree structure. This tree structure has various levels and the lower level records (calledChild) being subordinate to the higher level records (called Parent). The parent record at the topof the tree (database) is usually known as the root record or simply the root. In general, the

root may have any number of dependent record types (child) each of which may have anynumber of lower level record types and so on to any number of levels. This means that in ahierarchical database, a parent record may have more than one child but a child can have onlyone parent. This represents one-to-many kind of relationship among the data records similar toour family tree. In order to locate any specific record, we have to traverse the tree starting fromthe top or the parent record and trace down the tree to the child. For example, employeesworking in an organization are associated with specific departments as shown in Figure 8.4.

There are various departments and each department has many employees. One employee canbelong to only one department. This shows the one to-many relationship between employeesand the departments and no relationship among the employees (or the child).

10


11/25

Figure : Evolution of databases

Departments Personnel Accounts Production Sales

EmployeeName Suresh Rakesh Kapil Neeraj Surender Neelam Kavita Anil Ram

Figure : Hierarchical database - One-to-many relationship

The hierarchical organization is the oldest type but is still used in certain systems such as:reservation system due to some of its strengths. The major strength lies in its simplicity toimplement and faster updation of data as the relationship between the parent and the child ispre-defined. However it has many drawbacks as well. This is a rigid structure, as adding a newdata field to the database requires that the entire database be redefined. Also we cannot insertany data in the database unless there is a parent for it. For example, a supplier record cannot

be added to the supplier database without supplier supplying some parts.

Network Databases In this type of database organization, records and links represent the datasimilar to the hierarchical database. However, this is a more generalized structure because arecord may have any number of superiors or owners as against one in the hierarchical model.This means that in terms of family relationship, each child record can have more than oneparent record. The child record is called a member and may be reached through more thanone parent called owner. This reflects a many-to-many relationship, which can easily bemapped using the network model. For example, in a university system, the relationship between

11


12/25

students, courses and department can be shown as in Figure 8.5. The student A has twoowners: Financial management and Sales management. The owner Commerce department hastwo members: General management and Financial management.

Departments

Courses

Students A B C D E F

Figure : Network database - many-to-many relationship

This model offers more flexibility as new relationships may be established among data recordsat different levels. This can easily represent the hierarchical (one-to-many) relationships as well.But the primary disadvantage of this database organization is its complexity in structure. Alsothe structure needs to be defined in advance. In addition, there is a restriction of number ofpossible links, data records can have.

Relational Databases The relational database organization connects data in different filesthrough the use of a common data fields called a key field. In this arrangement, the data fieldsare stored in different tables comprising of rows and columns. The tables are called relations;the rows of the table are referred to as tuples or records and columns are referred asattributes. The relation is only a table of records and not the linkage between records.

Example - 1: The data on suppliers, Parts and Shipments is shown in various tables.

Suppliers Table

Supplier No. Supp. Name City Status

Parts Table

Part No. Part Name Unit of Measure Location where stored

Shipment Table

Supplier No. Part No. Quantity Supplied

Each of these tables resembles a sequential file or table with rows representing records and

columns representing data fields. The records are identified by key fields such as Supplier no.,Part No. or Supplier No. and Part No.

Example2: To store information about employees, different tables such as an employee table,a department table, a salary table etc. can be created.

12

A EDCB F

SALES MGT. FIN. MGT GEN.MGT.

MANAGEMENT COMMERCE


13/25

DEPARTMENT

EMPLOYEE

SALARY

Object Oriented Databases An object-oriented database approach makes use of objects aselements within database files. An object consists of text, graphics, audio and video and theinstructions or methods to perform actions on the data. This approach tries to model the realworld situations. Unlike hierarchical, network or relational database, which can store text andnumeric data, the object-oriented database can store apart from this the audio, video andgraphical data as well. This type is object-oriented and is closer to real world features whileother approaches are record-oriented and closer to computer system.

RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)

The Relational Database Management System (RDBMS) is based on the relational model thatwas proposed by Dr. E.F.Codd in the year 1970. In this model, data is stored in two-dimensionaltables called relations. Some of the examples of RDBMS are Oracle 7.x, Sybase, Ingress,Microsoft Access and Unify. In 1985, Dr. Codd laid out 12 rules that must be followed by anyDBMS to be considered as relational. Oracle follows the maximum of the Codd rules i.e. 11 rules. Other DBMSs like MS Access covers 7 and Sybase covers 9 rules approximately. As ofnow, there is no RDBMS that fully implements all the 12 rules of Dr. Codd.

The RDBMS is useful as it avoids loss of information during addition or deletion of records in thedatabase. The user need not be aware of any structure of the data and therefore can be usedeasily. This is highly flexible as records and fields can easily be added, deleted or updated. The

relational database model is the most popular model on microcomputers such as: MicrosoftAccess, Oracle, Paradox, Fox Pro, etc.

ELEMENTS OF A DATABASE MANAGEMENT SYSTEM

A number of components make up a database management system. The prominent ones arediscussed below:

Data Dictionary This is an important DBA tool. It is a database in its own right. It containsmeta data i.e. data about data. It stores the description of other entities in the system ratherthan simply raw data. This dictionary file contains: various schemas, mapping definitions,

authorization checks, validation rules, etc.

Utilities The DBMS has programs that help in the administration and maintenance ofdatabase. These may include creating a database, editing and deleting data in the database, tofeed, display and input data in proper manner using friendly screens.

Query Language Databases can contain thousands of fields and millions of records. However,we usually are interested in only a part, or subset, of this data. Queries allow us to create a new

Dno DNAME

10 Sales

20 Computer

30 Accounts

40 Production

ENO BASIC COMM

100 6000 1000

101 300

ENO ENAME JOB DNO

100 Jack Manager 20

101 Kevin Operator 10

102 Sam Salesman 20

13


14/25

table that contains only those fields and records in which we are interested. For example, if you

have two tables, one with customer name and address information, and another with charge

account information, you can create a query to extract from them just the information you want.If your want to see late payers, you can create a new table that lists just the name, phone, and

date and amount of the charge. This new table, called a dynasetcontains only some of the fields

from each table and contains only those records where payment was past due. You can usequeries to update or delete groups of records. The dynaset can be used as the basis for a report.

Most reports begin by first using a query to gather just the data you want the report to list. There

are essentially two ways to create queries:

Query-By-Example (QBE)

Query Language

In query-by-example (QBE), the users ask for information by using a sample record to specify

the criteria for selecting records. There is a fill-in form, in which you first select the fields you

want to include and then specify what records are to be listed. To narrow the list, we can use

selection criteria that specify a field to look in and value to look for. The value can be text,numbers, or dates. After specifying the fields to be included and the criteria to be used, the

program creates the new table, i.e. the dynaset. This new table contains only those records thatmatch the criteria specified. For example: The criteria for selecting those employees who have

joined the organization after 1998 would be filled in by entering the following in the date of

joining field , >= 01/01/1998. When the query is executed, only those records that match thespecified criteria are listed in the dynaset.

Queries can also be written out like a programming language. The most popular querylanguage is structured query language (SQL). This provides a detailed way to specify queries.The structured query language is the most popular easy to use query language that allowsselection and retrieval of data based on simple English like statements. Each Language has its

own vocabulary and procedures. This involves two languages: Data Definition Language (DDL)and Data Manipulation Language (DML). DDL describes the structure (content and format) ofthe data stored in the database is defined. This definition is also called schema definition andprovides a link between the physical and logical views of the database. DML provides the userswith procedures to insert delete or update data in the database using the efficient accessmethods. The SQL makes use of verbs such as SELECT, DELETE, MODIFY.

Forms - Database tables are not very interesting to view as they look much like a spreadsheet.To make a database more user-friendly, forms are created and displayed on the screen.People then type data into these various forms and it is automatically entered into the database.Once the data has been entered, the form can also be used to view, edit, or delete it. A payrollaccountant looking at a payroll database might see the salaries of every employee while a tax

accountant might see just the withholding information.

Reports - Databases contain vast amounts of information, more than most individuals need,especially in large organizations. Most people dont actually use a database itself. Generally,

they look through reports containing just the information they are interested in. many such

reports can be created from the same database. Using the same database of employees, you might

create a report that references just names and addresses or a report showing salaries and benefits.

14


15/25

Exporting to a spreadsheet - Database are great for storing data but not as powerful asspreadsheets when it comes to analyzing it, so data from the database is frequently exported toa spreadsheet, in most cases just the set or records appropriate to the problem being looked at.For example, in a sales database, you might download to a spreadsheet Januarys sales inCalifornia to see how they compare to last years or another states sales.

Exporting to a word processor - Its common to use a query to isolate appropriate databaseinformation and then export that data or link it, usually in the form of a table, to a documentyoure creating on a word processor. Its even more common to use the database in conjunctionwith a word processing program to mail merge documents. The word processing program isused to create the form letter or other document containing merge codes that refer to fields inthe database from which the data are retrieved when the document is merge-printed.

Data Access Security It is concerned with integrity of the data in the database. The DBAassigns various data access privileges to the users depending upon the sensitivity of data. Theprivileges may be to read only, to update to delete, etc. The user authentication and accessrestrictions are meant to secure the data from malicious users.

System Recovery The DBA is responsible for recovering the contents of the database incase of any hardware or software failures. This may include backup of database and thetransactions, which cause changes in the database.

NORMALIZATION

Normalization is a method of breaking down complex table structures into simple tablestructures by using certain rules to form well-defined relations. This reduces redundancy anddiscrepancies in the data and eliminates the problems of inconsistency and disk space usage.

Normalization results in the formation of simple tables that satisfy certain specified rules andrepresent certain normal forms. Normal forms are the categories of relations defined to prevent

discrepancies.Normal forms are used to ensure that the database is prevented from variousanomalies and inconsistencies. A table structure is always in a certain normal form. Severalnormal forms have been identified and the most popular normal forms are:

First Normal Form

Second Normal Form

Third Normal form

Fourth Normal Form

ENTITY RELATIONSHIP MODEL

The Entity Relationship (ER) model gives a design of the information stored by a business. Itillustrates the data and the relationships between the data. Database designers use the ERdiagrams as a tool to build the logical database design of a system. An ER diagram representsthe following 3 three elements:

Entity- An entity is any object, item, place, person, concept, or activity about which a businessneeds to store information. An entity is an object that can be easily identified with a distinct setof properties. Examples of entities are employee, department, sales order, product, customer,and student. Entities are the building blocks of a database. An entity corresponds to a record.

15


16/25

For example, a student record in a student-administration database is a representation of anactual person. In the diagramming technique, a rectangular box represents an entity andcontains the name of the entity.

Attribute - An attribute is a property of a given entity. It describes a part of the entity andprovides some information about it. An entity can have one or more attributes. For example, forthe employee entity, the attributes are the employee name, designation, salary, address, etc.Similarly, for the CUSTOMER entity, the attributes can be the Customer-ID, Name, Address,Phone-No, etc. An attribute usually corresponds to a field in a record. Attributes are depicted asellipses, labeled with the name of the property.

Primary Key- A primary key is a group of one or more attributes that uniquely identifies a row ina table. In the following relation, either the attribute DEPT_ID or NAME can be used as aprimary key as both the attributes carry distinct values.

DEPT_ID NAME LOCATION

1 Sales New York

2 Computer Houston

3 Accounts Boston

4 Production New York

If no single attribute in the table has unique values, a combination of any number of attributes

can be used to identify the rows. The key with multiple attributes is known as a composite key.

In the example shown below, the combination of PRODUCT_ID and SUPPLIER_ID results inall unique values and can be used as a Composite Primary key.

PRODUCT_ID SUPPLIER_ID PRICE

P01 S01 1000

16

PrimaryKey

STUDENT

COURSE

GRADE

CUSTOMER

NameAddress

Phone-NoCustomer-ID


17/25

P01 S02 3000

P01 S03 2300P02 S01 4500

P02 S04 1000P03 S05 680

Note that a primary key cannot contain a NULL value. The primary key must uniquely identifyeach row in an entity; thus if a primary key value is null, it wouldnt be able to identify anything.

Relationship - A relationship is an association between entities. It is used to establish aconnection between a pair of logically related entities. For example, consider a relationshipbetween the entities EMPLOYEE and DEPARTMENT. Each employee belongs to a departmentor a department contains many employees. Thus, there is a one-to-many relationship betweenthe department and the employee.

Here, # Indicates Primary key* Mandatory attribute

o Optional attribute

A relationship is represented using a diamond labeled with the name of the relationship. Forexample, if students studying various courses, the entities will be STUDENT and COURSE andthe relationship between them is Studies.

NORMALISATION

Normalisation is a series of steps that enables us to identify the existence of potential problems called

update anomalies in the design of a relational database. This process also supplies methods for correcting

these problems.

Definition: Normalisation is a process of successive reduction of a given set of relations in a better form.

17

EMPLOYEE

# empno

* name

o salaryo job

DEPARTMENT# deptno

* nameo location

STUDENT COURSEStudies


18/25

The normalization process involves converting tables into various types of normal forms. A table in a

particular normal form possesses certain collection of properties. There are several normal forms, the

most common being 1NF, 2NF, 3NF & 4NF. They form a progression in which a table that is in 1NF is

better than a table that is not in 1NF, a table that is in 2NF is better than the table that is in 1NF & so on.

The goal of normalization process is to allow you to take a collection of tables & produce a new

collection of tables that represents the same information but is free of problems. In this context, theconcept of Functional dependence & keys are important to understand.

Decomposition

Decomposition refers to the breaking down of one table into multiple tables. Any database design

process involves decomposition. Decomposition is almost similar to Projection operation of Relational

Algebra.

As we know in projection operation, we choose only the needed columns of a table & discarding

the rest i.e. we select rows with specified columns. Here discarding does not mean that the columns are

lost forever, but they are simply placed in a table where they should logically belong.

Recall that the concept of Redundancy means unwanted or uncontrolled duplication of data. SoRedundancy apart from duplication also leads to data inconsistency & loss of data integrity.

Lets take a table of students with the data.

Roll No. St. Name Subject code Subject Name Marks

101 Suresh S1 English 80

101 Suresh S2 OR 56

101 Suresh S3 BDP 78

102 Harish S1 English 39

102 Harish S2 OR 75

102 Harish S3 BDP 75

We can see some redundancy here i.e. St. Names & Subject name. So it is not a good idea to keep the twotogether. This is the problem of decomposition. So to minimize the redundancy we follow a simple rule.

Keep one fact in one place.

So we decompose the above table into two tables as shown:

Examination Table Student

18

RollNo

St-name

Sub-code

Sub-nameMarks

RollNo

St-name


19/25


20/25

Subject

We can create the original table from the above 3 tables.

There are referential integrity relationship in the above tables such as shown.

Between Student & Result tables based on RollNo

Between Subject and Result tables based on Subject code

If we join the three tables to perform recomposition, we would get the following columns (aftereliminating duplicate columns):-

RollnoSt-name

Sub-code

Sub-nameMarks

Thus we have been able to preserve the data and the relationship between data elements evenafter decomposition.

Further if we examine the functional dependency:

In examination table there were three functional dependencies namely:

20

Sub-code

Sub-name

Student table

Rollno

Subject table

Sub-code

Result tableRollno

Sub-code


21/25

Rollno St-name

Sub-code Sub-name

Rollno, Sub-code Marks

When we decompose this table into two-table structure we lose one of them.

So to find whether the decomposition is lossy or loss less, we can examine FDs & if we lose theoriginal FDs then it is lossy . When we create 3-Table structure, we regain the lost FDs.

So, in lossy decomposition we lose some of the FDs relationships while in lossless

decomposition, all FDs relationship are preserved.

FDs after lossy decompositionRollno St-name

Sub-code Sub-name

FDs after lossloss decomposition

Rollno St-name

Sub-code Sub-name

Rollno, Sub-code Marks

Note: Difference between Decomposition & Normalization is decomposition does not abide byany formal rules while normalization follow formal rules. When we apply a normalization rule,

the Database design takes the next logical form- called the Normal form.

CASE STUDY

Assume the Order table with structure

Order No

Order Date

Customer No

21


22/25

Item No

Item NameQuantity

Rate

Bill Amount

This table contains information about the order received from various customers. The table identifies an

Order based on order no. column, Customer based on customer no. & Item based on Item no.column.

So for given order several entries will repeat.

To convert the above into First Normal Form (1NF)

A table is in 1NF if it doesnt contain any repeating column or repeating groups of columns.

So order table does not confirm to the principle of 1NF. To bring it into 1NF we need to take the help of

decomposition technique & we need to ensure that decomposition is lossless.

Simple Principle is: Move all the repeating columns to another table.

So we now have two tables:

a) The modified order table

b) A new table called order-item that contains the repeating columns.

ORDER ORDER-ITEM

Order No Item No

Order Date Item Name

Customer No Quantity

Rate

Bill Amount

If we test the above tables, we find this is lossy decomposition. Judge it by joining the two tables & we

will not get the original table. So we have to add another column to make it lossless. i.e. In Order-item

table we add order no column to link an item sold to the order no.

ORDER ORDER-ITEM

Order No Order No. } Primary Key

Order Date Item No }

Customer No Item Name

22


23/25

Quantity

Rate

Bill Amount

So we insert a referential integrity relationship based on order no between the two tables. Order no. in

order table will be the primary key & order no. in the order-item table will be foreign key. Now this islossless decomposition as we can preserve the original data relationship. This table is in the 1NF.

Second Normal Form

A table is in the 2nd NF if it is in the first normal form and if all non-key columns in the table

depend on the entire primary key. So the pre-requisites for a database to be in the second normal form

are :

i) All the tables should be in the INF

ii) No non-key attribute is dependent on only a portion of the primary key.

Note: If the primary key of a table contains only a single column, the table is automatically in 2NF.

Lets examine this in our case study problem.

The first condition is already satisfied.

The two table were:

ORDER ORDER-ITEM

Order No Order No. } Primary Key

Order Date Item No }

Customer No Item Name

Quantity

Rate

Bill Amount

The first table contains 3 column. The primary key is order no. From Order No., we can derive other non-

key columns i.e. Order date & Customer No. Also the non-key attributes do not depend on each other, Sothis table is in 2NF.

The second table contains the columns - Order No. & Item No. as a composite Key.

Can we determine the values of the other non-Key columns of the table by using the primary Key. Not

Quite.

We can determine Item Name from Item No. above (i.e. part of composite key). We do not need order

no. for this purpose. Likewise unit price can also be found from item code. Thus all the non-key columns

23


24/25

do not depend on the entire primary key but part of the key. So this table fails to satisfy the second

condition of 2NF.

So follow a simple strategy:

Move columns that do not depend on the entire primary key to another table

So we need to decompose the order-item table to bring it in 2NF. But why we need to do so ? Answer is

Due to update Anomalies.

Update Anomalies refers to the problems that we are likely to face if we maintain the existing table

structure. These problems are classified into three SQL operations INSERT DELETE & UPDATE

INSERT PROBLEMS - Assume we have a new item, which is not yet available in the Item Table. Can

we add information regarding this item in the order table. No. we cannot because unless/until an order is

placed for that item, we cannot insert a row for that item in the order-line table.

UPDATE PROBLEMS Suppose we have a product with Item NO. 100 & Item Name as HD. Now

we need to modify & assign this code to FDD Then we need to change this description in many rows ofthe table because this item could be a part of many rows. So locate and amend.

DELETE PROBLEMS Suppose we have only one order for an Item say Item NO. 108, if we cancel this

order then all the data about item is lost with that as the row is deleted from the order Item Table So we

will loose all the Item data as there is no other data.

Therefore the table would look like as:

ORDER ORDER-ITEM ITEM

Order No Order No. } Primary Key Item NoOrder Date Item No } Item name

Customer No Quantity Rate

Amount

Sp there is referential integrity relationship between order-item & item tables based on item no.

Now lets verify the update anomalies.

Insert problem We can now insert a new item in the item table without needing to have even a single

order for this item.

Update We can easily update the item name & unit price only once or at one place.Delete Even if we delete an order, the information about the item is not completely lost.

THIRD NORMAL FORM (3NF)

A table is in 3NF if it is in the 2NF and if all non-key columns in the table depend non-transitively on the

entire primary key i.e. simply speaking a table should be in 2NF & every non-key column in the tablemust be independent of all other non-key columns.

What is transitive Dependency?

24


25/25

It is an indirect relationship between two columns.

a) If there is a functional dependency between 2 columns A & B such that B is FD on A.

b) If there is a functional dependency between 2 columns B & C such that C is functionally

dependant on B.

Then we say that C transitively depends on A & we represent this as

A B C

i.e. C transitively depends on A.

So for 3NF we should identify all such transitive dependency and remove them or get rid of them.In order table, there is no T.D. because order date and customer no. are F.D. on order number.

In Order-item table, item-no and quantity are FD on order no. This is not true for Bill Amt. As bill amt is

calculated as Rate x Quantity.

Now rate is available in item table and quantity is available in order-item table. So the column Bill amt

does not directly depend on the primary key of order-item table. So we need to remove this T.D. i.e.

remove Bill amt column.

In Item table, there is no T.D. as unit price and item name are F.D. on Item No.

So we need to get rid of bill amt column to bring the table into 3NF.

ORDER ORDER-ITEM ITEM

Order No Order No. } Primary Key Item No

Order Date Item No } Item nameCustomer No Quantity Rate

Lets revisit Anamolies:

Insert Since there is no bill amt column hence we shall not be required to recalculate anything when we

insert any item.

Update: Even if we make changes in any order row, we need not recalculate the total bill amt.

Delete If we delete one order row, for an order having multiple rows, we need not recalculate the total

bill amt. So delete anomaly is taken care of.

Date post:	06-Apr-2018
Category:	Documents
Upload:	anoop-singh
View:	232 times
Download:	0 times

Handout Dbms

Documents