1.Database System Concepts and Architecture

Database System Concepts and

Architecture

Data Models

A collection of concepts that can be used to

describe the structure of a database (data

types, relationships, and constraints)

basic operations (retrieval and updates)

specify the dynamic aspect or behavior of a

database application( user-defined operations )

example: COMPUTE_GPA, which can be

applied to a STUDENT object

Jan 29, 2002

Categories of Data Models

High-level or conceptual data models

(common users)

low-level or physical data models

(describe the details of how data is stored

)

in between, representational (or

implementation) data models can serve

both categories above

Jan 29, 2002

Conceptual Data Model

Use concepts such as

◦ Entities:a real-world object or concept

(DEPT) (COURSE)

◦ Attributes:property of interest that further

describes an entity (dept no, name, telephone,

etc)

◦ Relationships:interaction among the entities

(DEPT) provides (COURSE)

Jan 29, 2002

Physical Data Model

Describes how data is stored in the

computer.

It represents info such as

◦ record formats

◦ record orderings

◦ access path: make search more efficient

Jan 29, 2002

Representational Data Model

Used in traditional commercial DMBS

they include

◦ Relational Data model

◦ Network model

◦ Hierarchical model

Jan 29, 2002

Schemas

Is the description of the database (not database

itself)

◦ Specified during database design

◦ Not expected to change frequently

◦ A displayed schema is called a schema diagram (Fig 2.1)

Each object in the schema-such as STUDENT or COURSE-is a schema construct.

Schema diagram represents only some aspects of a schema (name of record type, data element and some type of constraint)

Jan 29, 2002

Jan 29, 2002

Instances and Database State

The data in the database at a particular moment in time is

called a database state or snapshot or current set of

occurrences or instances in the database

When we define a new database we have database state is

empty state (schema specified only in DBMS)

The initial state when the database is first populated

Then At any point in time, the database has a current state

schema evolution: when we need to change the schema

Jan 29, 2002

The Three-Schema Architecture

Importance of using DB approach

◦ insulation of programs and data

◦ support of multiple user views

◦ use of a catalog to store the database description (schema).

The aim is to separate the user application and physical

DB

schema can be defined into three levels:

◦ The internal level has an internal schema

◦ describes the physical storage structure of the database.

◦ uses a physical data model

Jan 29, 2002

Jan 29, 2002

The Three-Schema Architecture

◦ The conceptual level has a conceptual schema describing the structure of the whole database for a community of users.

◦ It hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints.

◦ A high-level data model or an implementation data model can be used at this level.

◦ The external or view level includes a number of external schemas or user views describing the part of the db that a particular user group is interested in and hides the rest of the db from that user group.

◦ A high-level data model or an implementation data model can be used at this level.

Jan 29, 2002

Data Independence

Is the capacity to change the schema at one level of a

database system without having to change the schema at

the next higher level.

Logical data independence: capacity to change the

conceptual schema without having to change external

schemas or application programs.

Physical data independence: capacity to change the

internal schema without having to change the conceptual

(or external) schemas

Jan 29, 2002

DBMS Languages

Data Definition Language DDL: Language to specify

conceptual and internal schemas for the database and any

mappings between the two.

Storage definition language SDL: used when clear

distinction between conceptual and internal schema.

view definition language VDL: specify user views and their

mappings to the conceptual schema.

data manipulation language DML:retrieval, insertion,

deletion, and modification of the data

Jan 29, 2002

DBMS Languages …..

SQL relational database language: represents a

combination of DDL, VDL, and DML, as well as

statements for constraint specification and schema

evolution

There are two main types of DMLs:

◦ A high-level or nonprocedural DML : specify complex DB

operations. Example SQL(set-at-a-time)

◦ A low-level or procedural DML: retrieve individual records

or objects from DB and process each separately (record-

at-a-time).

Jan 29, 2002

DBMS Interfaces

Menu-Based Interfaces for Browsing

◦ menus leads to formulation of a request

Forms-Based Interfaces

◦ display a form for each user (insert, select)

◦ designed for naïve users.

Graphical User Interfaces (GUI)

◦ display schema as diagram.

◦ Utilize both menu and forms.

Jan 29, 2002

DBMS Interfaces

Natural Language Interfaces◦ Accept requests in native language and attempt to

understand them.

◦ Refers to words in the schema and (standard words) to interpret the request.

Interfaces for Parametric Users (eg tellers)◦ goal is to min the number of keystroks required.

(use of function) keys

Interfaces for the DBA◦ creating accounts, system privileges, changing

schema, etc.

Jan 29, 2002

The Database System Environment

DBMS Component Modules (fig 2.3)

◦ db & DBMS stored in disk controlled by OS.

◦ Stored data manager control access to DBMS

◦ SDM puts data in buffers in main memory

◦ DDL compiler process schema definitions and store it in meta data.

◦ Run-time-data-proc handles DB accesses @runtime

◦ receive update or retrieve and solve them on the DB

◦ Query-Compiler: handles high level queries: parse, analyze and interpret uses DB access code.

◦ Precompiler extract DML commands from app program

Jan 29, 2002

Jan 29, 2002

Database System Utilities

Loading: load existing files into the DB

Backup: creates backup copy of the DB

File reorganization: reorganize files for

better performance

Performance monitoring: monitor DB

usage and provide statistics to DBA

Jan 29, 2002

Tools, Application Environments &

Communications Facilities Case: design phase

data (information) repository: store

catalog info, design decisions, usage, app

program description, user information

Application Developer: e.g. power builder.

Help in development of DB design, GUI,

query, update etc.

Comm Software: allow users remotely to

access the DB

Jan 29, 2002

Classification of DBManagement

Systems

Data model:◦ relational, object, object-relational, hierarchical, network,

and other.

Number of users supported by the system. ◦ Single-user systems and Multiuser systems

Number of sites over which the database is distributed.◦ centralized, distributed DBMS (DDBMS) ,Homogeneous

DDBMSs ,federated DBMS (develop software to access several autonomous preexisting databases stored under heterogeneous DBMSs. )

Jan 29, 2002

Classification of DBManagement

Systems ….. Cost of the DBMS: 10K-100K. Single 100-

3K

General-purpose vs Special-purpose

(When performance is a primary

consideration.

◦ Example: on-line transaction processing

(OLTP) systems, which must support a large

number of concurrent transactions without

imposing excessive delays. )

Jan 29, 2002

Jan 29, 2002

What is DBMS?

Need for information management

A very large, integrated collection of data.

Models real-world enterprise.

◦ Entities (e.g., students, courses)

◦ Relationships (e.g., John is taking CS662)

A Database Management System (DBMS) is a software package designed to store and manage databases.

Why Use a DBMS?

Data independence and efficient access.

Data integrity and security.

Uniform data administration.

Concurrent access, recovery from crashes.

Replication control

Reduced application development time.

Why Study Databases??

Shift from computation to information

◦ at the “low end”: access to physical world

◦ at the “high end”: scientific applications

Datasets increasing in diversity and volume.

◦ Digital libraries, interactive video, Human

Genome project, e-commerce, sensor networks

◦ ... need for DBMS/data services exploding

DBMS encompasses several areas of CS

◦ OS, languages, theory, AI, multimedia, logic

?

Data Models

A data model is a collection of concepts for describing data.

A schema is a description of a particular collection of data, using the a given data model.

The relational model of data is the most widely used model today.

◦ Main concept: relation, basically a table with rows and columns.

◦ Every relation has a schema, which describes the columns, or fields.

Levels of Abstraction

Many views, single conceptual

(logical) schema and physical

schema.

◦ Views describe how users see

the data.

◦ Conceptual schema defines

logical structure

◦ Physical schema describes the

files and indexes used.

* Schemas are defined using DDL; data is modified/queried using DML.

Physical Schema

Conceptual Schema

View 1 View 2 View 3

Example: University Database

Conceptual schema:

◦ Students(sid: string, name: string, login: string,

age: integer, gpa:real)

◦ Courses(cid: string, cname:string, credits:integer)

◦ Enrolled(sid:string, cid:string, grade:string)

Physical schema:

◦ Relations stored as unordered files.

◦ Index on first column of Students.

External Schema (View):

◦ Course_info(cid:string, enrollment:integer)

cid:string

Data Independence

Applications insulated from how data is

structured and stored.

Logical data independence: Protection from

changes in logical structure of data.

Physical data independence: Protection

from changes in physical structure of data.

* One of the most important benefits of using a DBMS!

Concurrency Control

Concurrent execution of user programs is essential for good DBMS performance.◦ Because disk accesses are frequent, and relatively

slow, it is important to keep the CPU humming by working on several user programs concurrently.

Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed.

DBMS ensures such problems don’t arise: users can pretend they are using a single-user system.

Transaction: An Execution Unit of a DB

Key concept is transaction, which is an atomicsequence of database actions (reads/writes).

Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins.

◦ Users can specify some simple integrity constraints on the data, and the DBMS will enforce these constraints.

◦ Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed). Why not?

◦ Thus, ensuring that a transaction (run alone) preserves consistency is ultimately the user’s responsibility!

Scheduling Concurrent Transactions

DBMS ensures that execution of {T1, ... , Tn} is equivalent to some serial execution T1’ ... Tn’.

◦ Before reading/writing an object, a transaction requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of the transaction. (Strict 2PL locking protocol.)

◦ Idea: If an action of Ti (say, writing X) affects Tj (which perhaps reads X), one of them, say Ti, will obtain the lock on X first and Tj is forced to wait until Ti completes; this effectively orders the transactions.

◦ What if Tj already has a lock on Y and Ti later requests a lock on Y? What is it called? What will happen?

Ensuring Atomicity

DBMS ensures atomicity (all-or-nothing property)

even if system crashes in the middle of a Xact.

Idea: Keep a log (history) of all actions carried

out by the DBMS while executing a set of Xacts:

◦ Before a change is made to the database, the

corresponding log entry is forced to a safe location.

(WAL protocol.)

◦ After a crash, the effects of partially executed

transactions are undone using the log. (Thanks to

WAL, if log entry wasn’t saved before the crash,

corresponding change was not applied to database!)

The Log

The following actions are recorded in the log:

◦ Ti writes an object: the old value and the new value.

Log record must go to disk before the changed page!

◦ Ti commits/aborts: a log record indicating this action.

Log records chained together by Xact id, so it’s easy to

undo a specific Xact (e.g., to resolve a deadlock).

Log is often duplexed and archived on “stable” storage.

All log related activities (and in fact, all CC related

activities such as lock/unlock, dealing with deadlocks

etc.) are handled transparently by the DBMS.

Databases make these folks happy ...

End users and DBMS vendors

DB application programmers

◦ e.g. webmasters

Database administrator (DBA)

◦ Designs logical /physical schemas

◦ Handles security and authorization

◦ Data availability, crash recovery

◦ Database tuning as needs evolve

Must understand how a DBMS works!

Structure of a DBMS

A typical DBMS has a

layered architecture.

The figure does not show

the concurrency control

and recovery

components.

This is one of several

possible architectures;

each system has its own

variations.

Query Optimization

and Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

These layers

must consider

concurrency

control and

recovery

Summary

DBMS used to maintain, query large datasets.

Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security.

Levels of abstraction give data independence.

A DBMS typically has a layered architecture.

DBAs hold responsible jobs and are well-paid!

DBMS R&D is one of the broadest, mature areas in CS.

Data Models

A Database models some portion of the real world.

Data Model is link between user’s view of the world and bits stored in computer.

Many models have been proposed.

We will concentrate on the Relational Model.

10101

11101

Student (sid: string, name: string, login:

string, age: integer, gpa:real)

Describing Data: Data Models A data model is a collection of concepts for

describing data.

A database schema is a description of a particular collection of data, using a given data model.

The relational model of data is the most widely used model today.◦ Main concept: relation, basically a table with rows

and columns.◦ Every relation has a schema, which describes the

columns, or fields.

Levels of Abstraction

Views describe how users see the data.

Conceptual schema defines logical structure

Physical schemadescribes the files and indexes used.

(sometimes called the ANSI/SPARC model)

Physical Schema

Conceptual Schema


DB

Users

Data Independence:The Big

Breakthrough of the Relational Model

A Simple Idea:

Applications should be

insulated from how data

is structured and

stored. Physical Schema

Conceptual Schema


DB

• Q: Why are these particularly important for DBMS?

• Physical data independence: Protection from changes in physical structure of data.

• Logical data independence: Protection from changes in logical structure of data.

Why Study the Relational Model? Most widely used model currently.

◦ DB2, MySQL, Oracle, PostgreSQL, SQLServer, …

◦ Note: some “Legacy systems” use older models e.g., IBM’s IMS

Object-oriented concepts have recently merged in

◦ object-relational model

Informix, IBM DB2, Oracle 8i

Early work done in POSTGRES research project at Berkeley

XML (semi-structured)models emerging?

Relational Database: Definitions Relational database: a set of relations.

Relation: made up of 2 parts:

◦ Schema : specifies name of relation, plus name and type of each column.

E.g. Students(sid: string, name: string, login: string, age: integer, gpa: real)

◦ Instance : a table, with rows and columns.

#rows = cardinality

#fields = degree / arity

Can think of a relation as a set of rows or tuples.

◦ i.e., all rows are distinct

Example: University Database

Conceptual schema:

◦ Students(sid: string, name: string, login:

string, age: integer, gpa:real)

◦ Courses(cid: string, cname:string, credits:integer)

◦ Enrolled(sid:string, cid:string, grade:string)

External Schema (View):

◦ Course_info(cid:string,enrollment:integer)

One possible Physical schema :

◦ Relations stored as unordered files.

◦ Index on first column of Students.

Physical Schema

Conceptual Schema


DB

Ex: An Instance of Students Relation

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@eecs 18 3.2

53650 Smith smith@math 19 3.8

Cardinality = 3, Arity = 5

All rows must be unique (set semantics)

• Q: Do all values in each column of a relation instance

have to be Unique?

• Q: Is “Cardinality” a schema property?• Q: Is “Arity” a schema property?

SQL - A language for Relational DBs SQL (a.k.a. “Sequel”),

◦ “Intergalactic Standard for Data”◦ Stands for Structured Query Language

Two sub-languages:

Data Definition Language (DDL)◦ create, modify, delete relations◦ specify constraints◦ administer users, security, etc.

Data Manipulation Language (DML)◦ Specify queries to find tuples that satisfy criteria◦ add, modify, remove tuples

SQL Overview CREATE TABLE <name> ( <field> <domain>, … )

INSERT INTO <name> (<field names>)VALUES (<field values>)

DELETE FROM <name> WHERE <condition>

UPDATE <name> SET <field name> = <value>

WHERE <condition>

SELECT <fields> FROM <name>

WHERE <condition>

Creating Relations in SQL

Creates the Students relation.

◦ Note: the type (domain) of each field is

specified, and enforced by the DBMS

whenever tuples are added or modified.

CREATE TABLE Students(sid CHAR(20), name CHAR(20), login CHAR(10),age INTEGER,gpa FLOAT)

Table Creation (continued)

Another example: the Enrolled table holds

information about courses students take.

CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2))

Adding and Deleting Tuples

Can insert a single tuple using:

INSERT INTO Students (sid, name, login, age, gpa)VALUES (‘53688’, ‘Smith’, ‘smith@ee’, 18, 3.2)

• Can delete all tuples satisfying some condition (e.g., name = Smith):

DELETE FROM Students SWHERE S.name = ‘Smith’

Powerful variants of these commands are available; more later!

Keys

Keys are a way to associate tuples in different

relations

Keys are one form of integrity constraint (IC)





sid cid grade

53666 Carnatic101 C

53666 Reggae203 B

53650 Topology112 A

53666 History105 B

Enrolled Students

PRIMARY KeyFORIEGN Key

Primary Keys A set of fields is a superkey if:

◦ No two distinct tuples can have same values in all key fields

A set of fields is a candidate key for a relation if :

◦ It is a superkey

◦ No subset of the fields is a superkey

what if >1 key for a relation?

◦ one of the candidate keys is chosen (by DBA) to be the primary key.

E.g.

◦ sid is a key for Students.

◦ What about name?

◦ The set {sid, gpa} is a superkey.

Primary and Candidate Keys in SQL

Possibly many candidate keys (specified using

UNIQUE), one of which is chosen as the primary key.

• Keys must be used carefully!

• “For a given student and course, there is a single grade.”

“Students can take only one course, and no two students in a course receive the same grade.”

CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid))

CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid),UNIQUE (cid, grade))

vs.

Foreign Keys, Referential Integrity

Foreign key : Set of fields in one relation that is used to

`refer’ to a tuple in another relation.

◦ Must correspond to the primary key of the other

relation.

◦ Like a `logical pointer’.

If all foreign key constraints are enforced, referential

integrity is achieved (i.e., no dangling references.)

Foreign Keys in SQL E.g. Only students listed in the Students relation should be allowed to enroll for

courses.

◦ sid is a foreign key referring to Students:

CREATE TABLE Enrolled (sid CHAR(20),cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid),FOREIGN KEY (sid) REFERENCES Students )

sid cid grade

53666 Carnatic101 C

53666 Reggae203 B

53650 Topology112 A

53666 History105 B

Enrolled





Students

11111 English102 A

Enforcing Referential Integrity

Consider Students and Enrolled; sid in Enrolled is a foreign key that references Students.

What should be done if an Enrolled tuple with a non-existent student id is inserted? (Reject it!)

What should be done if a Students tuple is deleted?◦ Also delete all Enrolled tuples that refer to it?

◦ Disallow deletion of a Students tuple that is referred to?

◦ Set sid in Enrolled tuples that refer to it to a default sid?

◦ (In SQL, also: Set sid in Enrolled tuples that refer to it to a special value null, denoting `unknown’ or `inapplicable’.)

Similar issues arise if primary key of Students tuple is updated.

Integrity Constraints (ICs)

IC: condition that must be true for any instance of the database; e.g., domain constraints.

◦ ICs are specified when schema is defined.

◦ ICs are checked when relations are modified.

A legal instance of a relation is one that satisfies all specified ICs.

◦ DBMS should not allow illegal instances.

If the DBMS checks ICs, stored data is more faithful to real-world meaning.

◦ Avoids data entry errors, too!

Where do ICs Come From?

ICs are based upon the semantics of the real-world that is being described in the database relations.

We can check a database instance to see if an IC is violated, but we can NEVER infer that an IC is true by looking at an instance.

◦ An IC is a statement about all possible instances!

◦ From example, we know name is not a key, but the assertion that sid is a key is given to us.

Key and foreign key ICs are the most common; more general ICs supported too.

Relational Query Languages

A major strength of the relational model:

supports simple, powerful querying of data.

Queries can be written intuitively, and the

DBMS is responsible for efficient evaluation.

◦ The key: precise semantics for relational queries.

◦ Allows the optimizer to extensively re-order

operations, and still ensure that the answer does

not change.

The SQL Query Language

The most widely used relational query

language.

◦ Current std is SQL-2003; SQL92 is a basic

subset that we focus on in this class.

To find all 18 year old students, we can

write:SELECT *FROM Students S

WHERE S.age=18

• To find just names and logins, replace the first line:

SELECT S.name, S.login



53688 Smith smith@ee 18 3.2

Querying Multiple Relations

What does the following query compute?SELECT S.name, E.cidFROM Students S, Enrolled E

WHERE S.sid=E.sid AND E.grade='A'

sid cid grade

53831 Carnatic101 C

53831 Reggae203 B

53650 Topology112 A

53666 History105 B

Given the following instance of Enrolled

S.name E.cid

Smith Topology112we get:

Semantics of a Query

A conceptual evaluation method for the previous query:

1. do FROM clause: compute cross-product of Students and Enrolled

2. do WHERE clause: Check conditions, discard tuples that fail

3. do SELECT clause: Delete unwanted fields

Remember, this is conceptual. Actual evaluation will be much more efficient, but must produce the same answers.

Cross-product of Students and Enrolled Instances

S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade

53666 Jones jones@cs 18 3.4 53831 Carnatic101 C

53666 Jones jones@cs 18 3.4 53832 Reggae203 B

53666 Jones jones@cs 18 3.4 53650 Topology112 A

53666 Jones jones@cs 18 3.4 53666 History105 B

53688 Smith smith@ee 18 3.2 53831 Carnatic101 C

53688 Smith smith@ee 18 3.2 53831 Reggae203 B

53688 Smith smith@ee 18 3.2 53650 Topology112 A

53688 Smith smith@ee 18 3.2 53666 History105 B

53650 Smith smith@math 19 3.8 53831 Carnatic101 C

53650 Smith smith@math 19 3.8 53831 Reggae203 B

53650 Smith smith@math 19 3.8 53650 Topology112 A

53650 Smith smith@math 19 3.8 53666 History105 B

Queries, Query Plans, and Operators

System handles query plan

generation & optimization; ensures

correct execution.

SELECT eid, ename, title

FROM Emp E

WHERE E.sal > $50K

SELECT E.loc, AVG(E.sal)

FROM Emp E

GROUP BY E.loc

HAVING Count(*) > 5

SELECT

COUNT DISTINCT (E.eid)

FROM Emp E, Proj P, Asgn A

WHERE E.eid = A.eid

AND P.pid = A.pid

AND E.loc <> P.loc

• Issues: view reconciliation, operator ordering, physical operator choice, memory management, access path (index) use, …

EmployeesProjects

Assignments

Emp

Select

Emp

Group(agg)

Having

Emp

Count distinct

Asgn

Join

Join

Proj

Structure of a DBMS

A typical DBMS has a layered architecture.

The figure does not show the concurrency control and recovery components.

Each system has its own variations.

The book shows a somewhat more detailed version.

You will see the “real deal” in PostgreSQL.

◦ It’s a pretty full-featured example

Next class: we will start on this stack, bottom up.

Query Optimization

and Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

These layers

must consider

concurrency

control and

recovery

Relational Model: Summary A tabular representation of data.

Simple and intuitive, currently the most widely used

◦ Object-relational variant gaining ground

Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations.

◦ Two important ICs: primary and foreign keys

◦ In addition, we always have domain constraints.

Powerful query languages exist.

◦ SQL is the standard commercial one DDL - Data Definition Language

DML - Data Manipulation Language

Chapter 5 69

Storage

The are two general types of storage

media that is used with computers. They

are :

◦ Primary Storage - This includes all storage

media that can be operated on directly by the

CPU (RAM , L1 and L2 Cache Memory)

◦ Secondary Storage - This includes Hard

Drives, CD’s and tape.

Chapter 5 70

Memory Hierarchies & Storage

Devices The Memory Hierarchy is based upon

speed of access. However, this speed

comes with a price tag attached which

varies inversely with the access time of

memory. Like cars the faster the memory

access is the more it costs.

Chapter 5 71

Primary Storage Level of Memory

The Primary Storage Level of Memory is

generally made up of 3 Levels.

◦ L1 Cache which is located on the CPU

◦ L2 Cache which is located near the CPU

◦ Main Memory which is the RAM figure that is

often referred to in computer advertisements

Chapter 5 72

Secondary Storage Level of Memory

The Secondary Storage Level of Memory

may be made up of 4 Levels.

◦ Flash Memory or EEPROM

◦ Hard Drives

◦ CD ROM’s

◦ Tape

Chapter 5 73

Terms Used in the Hardware

Description of Hard Drives Capacity - The number of bytes it can

store.

Single-sided vs. Double-sided - States if

the disk/platter is written on one or both

sides.

Disk Pack - A collection of disks/platters

that are assembled together into a pack.

Track - A Circle of a small width on a disk.

A disk surface will have many tracks.

Chapter 5 74


Description of Hard Drives Sector - A segment or arc of a track.

Block - is the division of a track into equal

sized portions by the operating system.

Interblock Gaps - These are fixed sized

segments that separate the blocks.

Read/Write Head - Actual reads/writes

the information to the disk.

Chapter 5 75


Description of Hard Drives

Cylinder - Tracks with the same diameter

that are located on the disk surface of a

disk pack.

Chapter 5 76

Terms Used in Measuring Disk

Operations

Seek Time (s)- The time it takes to position the

read/write head on the desired track. It will be

given in all problems that it is needed for.

Rotational Delay (rd) - The average amount of

time it takes the desired block to rotate into

position under the read/write head.

Rd=(1/2)*(1/p) min where p is rpm of the disk

Chapter 5 77


Operations Transfer Rate (tr) - The rate at which

information can be transferred to or from

the disk. tr =(track size)/(1/p min)

Block Transfer Time (btt) - The time it

takes to transfer the data once the

read/write head has been positioned. btt

= B/tr msec where B is the block size in

bytes.

Chapter 5 78


Operations

Bulk Transfer Rate (btr) - The rate at which

multiple blocks can be written/read to

contiguous blocks. Where G is the Interblock

Gap

btr = (B/(B+G)) * tr bytes/msec

Rewrite Time (Trw) - Time it takes after a

block is read to write that same block back to

the disk or the time for one revolution.

Chapter 5 79

Computing Times

Given :

◦ Seek Time (s) = 10 msec

◦ Rotational speed = 3600 rpm

◦ Track size = 50 KB

◦ Block size (B) = 512 bytes

◦ Interblock Gap = 128 bytes

Chapter 5 80

Problems for Disk Operations

Compute the average time it takes to

transfer 1 block on this system.


transfer 20 non-contiguous blocks that

are located on the same track.


transfer 20 contiguous blocks.

Chapter 5 81

Parallelizing Disk Access Using

RAID RAID - Stands for Redundant Arrays of

Inexpensive Disks or Redundant Arrays of

Independent Disks.

RAIDs are used to provide increased

reliability, increased performance or both.

Chapter 5 82

RAID Levels

Level 0 - has no redundancy and the best

write performance but its read

performance is not as good as level 1.

Level 1 - uses mirrored disks which

provide redundancy and improved read

performance.

Level 2 - provides redundancy using

Hamming Codes

Chapter 5 83

RAID Levels

Level 3 - uses a single parity disk.

Level 4 and 5 - use block-level data

striping with level 5 distributing the data

across all the disks.

Level 6 - uses the P + Q redundancy

scheme making use of the Reed-Soloman

codes to protect against the failure of 2

Disks.

Chapter 5 84

Records

Records is the term used to refer to a

number of related values or items. Each

value or item is stored in a field of a

specific data type.

Records may be of either fixed or variable

lengths.

Chapter 5 85

Variable Length Records in Files

There are several reasons a record with

the same record type may be of variable

length.

◦ Variable length fields

◦ Repeating fields

For efficiency reasons different record

types may be clustered in a file.

Chapter 5 86

Spanned Vs Unspanned Records

When the records in a file is stored on a

disk they may be placed in blocks of a fixed

size. This will rarely match the record size.

So a decision must be made when the

record size is smaller than the block size

and the block size is not a multiple of the

record size whether to store the record all

in one block and have unused space or in

two different blocks.

Chapter 5 87

File Operations

File may either be stored in contiguous blocks or by linking the blocks together. There are advantages and disadvantages to both methods.

Operations on files can be group into two type of operations. Retrieval or update. Retrieval only involves a read while and update involves read, write and modification.

Chapter 5 88

File Structure

Heap (Pile) Files

Hash (Direct) Files

Ordered (Sorted) Files

B - Trees

Chapter 5 89

Once the data has been brought into memory, it can be

accessed by an instruction in .00000004 seconds by a

machine running a 25MIPS. The disparity between time

for memory access and disk access is enormous:we can

perform 625,000 instructions in the time it takes to

read /write one disk page.

To put this in human terms if you were typing a letter

for you boss and found a word you could not make out

so you leave him a voice mail message. Since you were

told to do nothing else but this you patiently wait for

his reply doing Nothing! Unfortunately, he just went on

vacation and does not get your message for 3 WEEKS.

This is similar to the computer waiting .025 seconds to

get the needed data into memory from a disk read.

Date post:	07-Mar-2015
Category:	Documents
Upload:	debarshi-datta
View:	56 times
Download:	6 times

1.Database System Concepts and Architecture

Documents