Data base management system

Data Base Management System

(DBMS)

Navneet Jingar

Contents Data Hierarchy Traditional File Processing Database approach to Data Management DBMS- Features and Capabilities Database Schemas Components of DBMS Data Models RDBMS Normalization

What is it and why is it required? Background of Normalization: Definitions

The process of normalization

Data HierarchyData Hierarchy refers to the systematic organization of data,

often in a hierarchical form. A computer system organizes data in a hierarchy that starts with bits and bytes and progresses to fields, records, files, and databases. A bitrepresents the smallest unit of data a computer can handle. A group of bits, called a byte, represents a single character, which can be a letter, a number, or another symbol

Data organization involves fields, records, files and so on. A field holds a single fact - Consider a date field, e.g.

"September 19, 2004". This can be treated as a single date field (eg birthdate), or 3 fields, namely, month, day of month and year.

A record a collection of related fields. An Employee record may contain a name field(s), address fields, birthdate field and so on.

Data Hierarchy A file is a collection of related records. If there are 100

employees, then each employee would have a record (e.g. called Employee Personal Details record) and the collection of 100 such records would constitute a file (in this case, called Employee Personal Details file).

Files are integrated into a Database. This is done using a Database Management System. If there are other facets of employee data that we wish to capture, then other files such as Employee Training History file and Employee Work History file could be created as well.

Traditional File Processing The use of a traditional approach to file processing encourages

each functional area in a corporation to develop specialized applications. Each application requires a unique data file that is likely to be a subset of the master file. These subsets of the master file lead to data redundancy and inconsistency, processing inflexibility, and wasted storage resources.

Each application, requires its own files and its own computerprogram to operate. For example, the human resources functional area might have a personnel master file, a payroll file, a medical insurance file, a pension file, a mailing list file, and so forth until tens, perhaps hundreds, of files and programs existed. In the company as a whole, this process led to multiple master files created, maintained, and operated by separate divisions or departments. As this process goes on for 5 or 10 years, the organization is saddled with hundreds of programs and applications that are very difficult to maintain and manage. The resulting problems are data redundancy and inconsistency, program-data dependence, inflexibility, poor data security, and an inability to share data among applications.

Database approach to Data ManagementDatabase

A database is a logically coherent collection of data with some inherent meaning, representing some aspect of real world and which is designed, built and populated with data for a specific purpose .

DBMSA Data Base Management System (DBMS) is a set of software programs that enables users to define, create and maintain a database. The DBMS also enforces necessary access restrictions and security measures in order to protect the database.

Database technology cuts through many of the problems a traditional file organization creates. Database serves many applications efficiently by centralizing the data and controlling redundant data. Rather than storing data in separate files for each application, data are stored so as to appear to users as being stored in only one location.

For example, instead of a corporation storing employee data in separate information systems and separate files for personnel, payroll, and benefits, the corporation creates a single common human resources database

DBMS Features and Capabilities

DBMS Features and Capabilities Query abilty: Querying is the process of requesting attribute

information from various perspectives and combinations of factors.

Backup and Replication: Copies of attributes are regularly created to cater to the situation when primary disks or other equipment fails. Data is consistently replicated among various database servers.

Rule Enforcement: Application of rules to attributes so that attributes are clean and reliable – ability to add and updates to rules without significant data layout redesign.

Security: Application of limits for who can see or change which attributes or groups of attributes.

Controlling of Redundancy

DBMS Features and Capabilities Computation: There are common computations requested on

attributes such as counting, summing, averaging, sorting, grouping, cross-referencing, etc

Change and access logging: Often one wants to know who accessed what attributes, what was changed, and when it was changed. Logging services allow this by keeping a record of access occurrences and changes

Automated Optimization: If there are frequently occurring usage patterns or requests, some DBMS can adjust themselves to improve the speed of those interactions. In some cases the DBMS will merely provide tools to monitor performance, allowing a human expert to make the necessary adjustments after reviewing the statistics collected.

Provides multiple user interfaces

Database Schema

Database Schema Database Schema: A database schema is its structure described in

a formal language supported by the DBMS. In a relational database, the schema defines the tables, the fields in each table, and the relationships between fields and tables.

The three levels of abstractions are:1. Physical level: the lowest level of abstraction describes how data

is stored: files, indices, etc. on the random access disk system. It also typically describes the record layout of files and type of files (hash, b-tree, flat).

2. Logical level: Hides details of the physical level. In the relational model, this schema presents data as a set of tables. The DBMS maps data access between the logical to physical schemas automatically. Physical schema can be changed without changing application: DBMS must change mapping from conceptual to physical. Referred to as physical data independence.

Database Schema, contd.

3. View level (External Schema): It is tailored to the needs of a particular

category of users. Portions of stored data should not be seen by some users and simplifies the view for these users. E.g. students should not see faculty salaries.

Applications are written in terms of an external schema. The external view is computed when accessed. It is not stored. Translation from external level to logical level is done automatically by DBMS at run time. The conceptual schema can be changed without changing application. Mapping from external to conceptual must be changed. This is referred as conceptual data independence.

Components of DBMS

Components of DBMSA database management system has three components:

1. A data definition language (DDL) is the formal language programmers use to specify the structure of the content of the database. DDL defines each data element as it appears in the database before that data element is translated into the forms required by application programs. With this help a data scheme can be defined and also changed later.

Typical DDL operations (with their respective keywords in SQL):

Creation of tables and definition of attributes (CREATE TABLE ...)

Change of tables by adding or deleting attributes (ALTER TABLE …)

Deletion of whole table including content (!) (DROP TABLE …)

Components of DBMS2. A data manipulation language (DML) is a language for

the descriptions of the operations with data like store, search, read, change, etc. the so-called data manipulation, is needed. Typical DML operations (with their respective keywords in the structured query language SQL):

Add data (INSERT) Change data (UPDATE) Delete data (DELETE) Query data (SELECT)

Often DDL and DML for the definition and manipulation of databases are combined in one comprehensive language. A good example is the structured query language SQL.

Components of DBMS3. Data Dictionary: This is an automated or manual file

that stores definitions of data elements and data characteristics, such as usage, physical representation, ownership (who in the organization is responsible for maintaining the data), authorization, and security.

Many data dictionaries can produce lists and reports of data use, groupings, program locations, and so on.

Data Models

Data Models

A data model is a theory or specification describing how a database is structured and used.

A data model is not just a way of structuring data: it also defines a set of operations that can be performed on the data. The relational model, for example, defines operations such as select, and join. Although these operations may not be explicit in a particular query language, they provide the foundation on which a query language is built.

Common Data Models: Hierarchical Model Network Model Relational Model Object Model (Object Oriented Database Management System)

The relational model is the most widely used model today.

Hierarchical Model

In a hierarchical model, the data is organized into a tree-like structure.

The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent. This structure is simple but nonflexible because the relationship is confined to a one-to-many relationship.

These models were popular in late 1960s, and in 1970. The most widely used hierarchical databases is IMS developed by IBM.

Network Model

The network model is a variation on the hierarchical model – allowing each record to have multiple parent and child records.

Network models generally implement the set

relationships by means of pointers that directly address the location of a record on disk. This gives excellent retrieval performance, at the expense of operations such as database loading and reorganization.

Some well known DBMS using Network Model:

Honeywell IDS (Integrated Data Store) IDMS (Integrated Database Management

System)

Relational ModelThe data is stored in two-dimensional tables (rows and columns). The

data is manipulated based on the relational theory of mathematics.

Properties of Relational Tables: Values Are Atomic Each Row is Unique Column Values Are of the Same Kind The Sequence of Columns is Insignificant The Sequence of Rows is Insignificant Each Column Has a Unique Name

A relational database management system (RDBMS) is a DBMS that is based on the relational model.

Some well known RDBMS:IBM DB2, Informix, Microsoft SQL Server, Microsoft Visul Foxpro,

MySQL, Oracle, Sybase, Teradata, Microsoft Access

Object Model Object model (ODBMS, object-oriented database management

system): The data is stored in the form of objects, which are structures called classes that display the data within. The fields are instances of these classes .

The object oriented structure has the ability to handle graphics, pictures, voice and text, types of data, without difficultly unlike the other database structures. This structure is popular for multimedia Web-based applications. It was designed to work with object-oriented programming languages such as Java.

RDBMS

RDBMS A RDBMS stores information in a set of "tables", each of which has a

unique identifier or "primary key” (PK). The tables are then related to one another using "foreign keys” (FK). A foreign key is simply the primary key in a different table.

In the example above, "Customer ID" is the PK in one table and the FK in another. The arrow represents a one-to-many relationship between the two tables. The relationship indicates that one customer can have one or more orders. A given order, however, can be initiated by one and only one customer.

Normalization

NormalizationNormalization is a systematic way of ensuring that a database structure is

suitable for general-purpose querying and free of certain undesirable characteristics that could lead to a loss of data integrity.

The objectives of normalization: Free the database of modification anomalies Minimize redesign when extending the database structure Make the data model more informative to users Avoid bias towards any particular pattern of querying

In general, relational databases should be normalized to the "third normal form".

Background to Normalization: Definitions

Functional Dependency: If A and B are attributes of relation R, B is functionally dependent on A (denoted A B), if each A value is associated with precisely one B value.

Or in other words, In every possible legal value of R (relation), whenever two tuple agree on their A values, they also agree on their B value.

Determinant of a functional dependency refers to attribute or group of attributes on left-hand side of the arrow.

e.g. in an "Employee" table that includes the attributes "Employee ID" and "Employee Date of Birth", the functional dependency {Employee ID} → {Employee Date of Birth} would hold.

Background to Normalization: Definitions

Full Functional Dependency A and B are attributes of a relation, B is fully dependent on A if B is functionally dependent on A but not

on any proper subset of A.

A functional dependency X Y is full functional dependency if removal of any attribute A from X means that the dependency does not hold any more.

Background to Normalization: DefinitionsTransitive Dependency: A transitive dependency is an indirect functional

dependency. Let A, B, and C designate three distinct attributes in the relation. Suppose all three of the following conditions hold:

A → B It is not the case that B → A B → C

Then the functional dependency A → C is a transitive dependency.

The functional dependency {Book} → {Author Nationality} applies; that is, if we know the book, we know the author's nationality. Furthermore:

{Book} → {Author} {Author} → {Author Nationality} {Author} does not → {Book}Therefore {Book} → {Author Nationality} is a transitive dependency.

Background to Normalization: DefinitionsAn Index or Key is an attribute or collection of attributes that may be used to

identify or retrieve one or more records.

SuperKey: A superkey is a set of columns within a table whose values can be used to uniquely identify a row.

e.g. Imagine a table with the fields <Name>, <Age>, <SSN> and <Phone Extension>. This table has many possible superkeys. Three of these are <SSN>, <Phone Extension, Name> and <SSN, Name>. Of those listed, only <SSN> is a candidate key, as the others contain information not necessary to uniquely identify records

A candidate key is a key that can be used to uniquely identify record. I.e., it may be used to retrieve one specific record.

The primary key of a relation is a candidate key that has been designated as the main key.

A foreign key is an attribute (or collection of attributes) in a relation that can be used as a key to another relation. Foreign keys link tables together to form an integrated database.

The Process of Normalization

The Process of NormalizationThere are two main steps of the normalization process:

eliminate redundant data (for example, storing the same data in more than one table) and ensure data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

Formal technique for analyzing a relation based on its primary key and functional dependencies between its attributes.

Often executed as a series of steps. Each step corresponds to a specific normal form, which has known properties.

As normalization proceeds, relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies.

First Normal Form (1NF)No Repeating Elements or Groups of Elements

A relation in which intersection of each row and column contains one and only one value. All key attributes get defined No repeating groups in table All attributes dependent on primary key

UNF to 1NF: Eliminate duplicative columns from the same table (In other

words.. Remove subsets of data that apply to multiple rows of a table and place them in separate tables.).

Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Create relationships between these new tables and their predecessors through the use of foreign keys.

Second Normal Form (2NF)No Partial Dependencies on a Concatenated Key

A relation that is in 1NF and every non-primary-key attribute is fully functionally dependent on the primary key (no partial dependency).

1NF to 2NF: Identify primary key for the 1NF relation. Identify functional dependencies in the relation. If partial dependencies exist on the primary key remove them by placing

them in a new relation along with copy of their determinant (in other words, remove columns that are not fully dependent upon the primary key).

Create relationships between these new tables and their predecessors through the use of foreign keys.

Third Normal Form (3NF)No Dependencies on Non-Key Attributes

A relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on the primary key.

2NF to 3NF Identify the primary key in the 2NF relation. Identify functional dependencies in the relation. If transitive dependencies exist on the primary key remove them by

placing them in a new relation along with copy of their determinant.

Boyce-Codd normal form (BCNF)A relation is in Boyce-Codd normal form (BCNF) if every determinant is a

candidate key.

Difference between 3NF and BCNF is that for a functional dependency A B, 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key.

Whereas, BCNF insists that for this dependency to remain in a relation, A must be a candidate key.

Thank You

Date post:	14-Jan-2015
Category:	Technology
Upload:	navneet-jingar
View:	1,770 times
Download:	2 times

Data base management system

Technology