Date post: | 07-Mar-2015 |
Category: |
Documents |
Upload: | debarshi-datta |
View: | 56 times |
Download: | 6 times |
Database System Concepts and
Architecture
Data Models
A collection of concepts that can be used to
describe the structure of a database (data
types, relationships, and constraints)
basic operations (retrieval and updates)
specify the dynamic aspect or behavior of a
database application( user-defined operations )
example: COMPUTE_GPA, which can be
applied to a STUDENT object
Jan 29, 2002
Categories of Data Models
High-level or conceptual data models
(common users)
low-level or physical data models
(describe the details of how data is stored
)
in between, representational (or
implementation) data models can serve
both categories above
Jan 29, 2002
Conceptual Data Model
Use concepts such as
◦ Entities:a real-world object or concept
(DEPT) (COURSE)
◦ Attributes:property of interest that further
describes an entity (dept no, name, telephone,
etc)
◦ Relationships:interaction among the entities
(DEPT) provides (COURSE)
Jan 29, 2002
Physical Data Model
Describes how data is stored in the
computer.
It represents info such as
◦ record formats
◦ record orderings
◦ access path: make search more efficient
Jan 29, 2002
Representational Data Model
Used in traditional commercial DMBS
they include
◦ Relational Data model
◦ Network model
◦ Hierarchical model
Jan 29, 2002
Schemas
Is the description of the database (not database
itself)
◦ Specified during database design
◦ Not expected to change frequently
◦ A displayed schema is called a schema diagram (Fig 2.1)
Each object in the schema-such as STUDENT or COURSE-is a schema construct.
Schema diagram represents only some aspects of a schema (name of record type, data element and some type of constraint)
Jan 29, 2002
Jan 29, 2002
Instances and Database State
The data in the database at a particular moment in time is
called a database state or snapshot or current set of
occurrences or instances in the database
When we define a new database we have database state is
empty state (schema specified only in DBMS)
The initial state when the database is first populated
Then At any point in time, the database has a current state
schema evolution: when we need to change the schema
Jan 29, 2002
The Three-Schema Architecture
Importance of using DB approach
◦ insulation of programs and data
◦ support of multiple user views
◦ use of a catalog to store the database description (schema).
The aim is to separate the user application and physical
DB
schema can be defined into three levels:
◦ The internal level has an internal schema
◦ describes the physical storage structure of the database.
◦ uses a physical data model
Jan 29, 2002
Jan 29, 2002
The Three-Schema Architecture
◦ The conceptual level has a conceptual schema describing the structure of the whole database for a community of users.
◦ It hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints.
◦ A high-level data model or an implementation data model can be used at this level.
◦ The external or view level includes a number of external schemas or user views describing the part of the db that a particular user group is interested in and hides the rest of the db from that user group.
◦ A high-level data model or an implementation data model can be used at this level.
Jan 29, 2002
Data Independence
Is the capacity to change the schema at one level of a
database system without having to change the schema at
the next higher level.
Logical data independence: capacity to change the
conceptual schema without having to change external
schemas or application programs.
Physical data independence: capacity to change the
internal schema without having to change the conceptual
(or external) schemas
Jan 29, 2002
DBMS Languages
Data Definition Language DDL: Language to specify
conceptual and internal schemas for the database and any
mappings between the two.
Storage definition language SDL: used when clear
distinction between conceptual and internal schema.
view definition language VDL: specify user views and their
mappings to the conceptual schema.
data manipulation language DML:retrieval, insertion,
deletion, and modification of the data
Jan 29, 2002
DBMS Languages …..
SQL relational database language: represents a
combination of DDL, VDL, and DML, as well as
statements for constraint specification and schema
evolution
There are two main types of DMLs:
◦ A high-level or nonprocedural DML : specify complex DB
operations. Example SQL(set-at-a-time)
◦ A low-level or procedural DML: retrieve individual records
or objects from DB and process each separately (record-
at-a-time).
Jan 29, 2002
DBMS Interfaces
Menu-Based Interfaces for Browsing
◦ menus leads to formulation of a request
Forms-Based Interfaces
◦ display a form for each user (insert, select)
◦ designed for naïve users.
Graphical User Interfaces (GUI)
◦ display schema as diagram.
◦ Utilize both menu and forms.
Jan 29, 2002
DBMS Interfaces
Natural Language Interfaces◦ Accept requests in native language and attempt to
understand them.
◦ Refers to words in the schema and (standard words) to interpret the request.
Interfaces for Parametric Users (eg tellers)◦ goal is to min the number of keystroks required.
(use of function) keys
Interfaces for the DBA◦ creating accounts, system privileges, changing
schema, etc.
Jan 29, 2002
The Database System Environment
DBMS Component Modules (fig 2.3)
◦ db & DBMS stored in disk controlled by OS.
◦ Stored data manager control access to DBMS
◦ SDM puts data in buffers in main memory
◦ DDL compiler process schema definitions and store it in meta data.
◦ Run-time-data-proc handles DB accesses @runtime
◦ receive update or retrieve and solve them on the DB
◦ Query-Compiler: handles high level queries: parse, analyze and interpret uses DB access code.
◦ Precompiler extract DML commands from app program
Jan 29, 2002
Jan 29, 2002
Database System Utilities
Loading: load existing files into the DB
Backup: creates backup copy of the DB
File reorganization: reorganize files for
better performance
Performance monitoring: monitor DB
usage and provide statistics to DBA
Jan 29, 2002
Tools, Application Environments &
Communications Facilities Case: design phase
data (information) repository: store
catalog info, design decisions, usage, app
program description, user information
Application Developer: e.g. power builder.
Help in development of DB design, GUI,
query, update etc.
Comm Software: allow users remotely to
access the DB
Jan 29, 2002
Classification of DBManagement
Systems
Data model:◦ relational, object, object-relational, hierarchical, network,
and other.
Number of users supported by the system. ◦ Single-user systems and Multiuser systems
Number of sites over which the database is distributed.◦ centralized, distributed DBMS (DDBMS) ,Homogeneous
DDBMSs ,federated DBMS (develop software to access several autonomous preexisting databases stored under heterogeneous DBMSs. )
Jan 29, 2002
Classification of DBManagement
Systems ….. Cost of the DBMS: 10K-100K. Single 100-
3K
General-purpose vs Special-purpose
(When performance is a primary
consideration.
◦ Example: on-line transaction processing
(OLTP) systems, which must support a large
number of concurrent transactions without
imposing excessive delays. )
Jan 29, 2002
Jan 29, 2002
What is DBMS?
Need for information management
A very large, integrated collection of data.
Models real-world enterprise.
◦ Entities (e.g., students, courses)
◦ Relationships (e.g., John is taking CS662)
A Database Management System (DBMS) is a software package designed to store and manage databases.
Why Use a DBMS?
Data independence and efficient access.
Data integrity and security.
Uniform data administration.
Concurrent access, recovery from crashes.
Replication control
Reduced application development time.
Why Study Databases??
Shift from computation to information
◦ at the “low end”: access to physical world
◦ at the “high end”: scientific applications
Datasets increasing in diversity and volume.
◦ Digital libraries, interactive video, Human
Genome project, e-commerce, sensor networks
◦ ... need for DBMS/data services exploding
DBMS encompasses several areas of CS
◦ OS, languages, theory, AI, multimedia, logic
?
Data Models
A data model is a collection of concepts for describing data.
A schema is a description of a particular collection of data, using the a given data model.
The relational model of data is the most widely used model today.
◦ Main concept: relation, basically a table with rows and columns.
◦ Every relation has a schema, which describes the columns, or fields.
Levels of Abstraction
Many views, single conceptual
(logical) schema and physical
schema.
◦ Views describe how users see
the data.
◦ Conceptual schema defines
logical structure
◦ Physical schema describes the
files and indexes used.
* Schemas are defined using DDL; data is modified/queried using DML.
Physical Schema
Conceptual Schema
View 1 View 2 View 3
Example: University Database
Conceptual schema:
◦ Students(sid: string, name: string, login: string,
age: integer, gpa:real)
◦ Courses(cid: string, cname:string, credits:integer)
◦ Enrolled(sid:string, cid:string, grade:string)
Physical schema:
◦ Relations stored as unordered files.
◦ Index on first column of Students.
External Schema (View):
◦ Course_info(cid:string, enrollment:integer)
Data Independence
Applications insulated from how data is
structured and stored.
Logical data independence: Protection from
changes in logical structure of data.
Physical data independence: Protection
from changes in physical structure of data.
* One of the most important benefits of using a DBMS!
Concurrency Control
Concurrent execution of user programs is essential for good DBMS performance.◦ Because disk accesses are frequent, and relatively
slow, it is important to keep the CPU humming by working on several user programs concurrently.
Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed.
DBMS ensures such problems don’t arise: users can pretend they are using a single-user system.
Transaction: An Execution Unit of a DB
Key concept is transaction, which is an atomicsequence of database actions (reads/writes).
Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins.
◦ Users can specify some simple integrity constraints on the data, and the DBMS will enforce these constraints.
◦ Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed). Why not?
◦ Thus, ensuring that a transaction (run alone) preserves consistency is ultimately the user’s responsibility!
Scheduling Concurrent Transactions
DBMS ensures that execution of {T1, ... , Tn} is equivalent to some serial execution T1’ ... Tn’.
◦ Before reading/writing an object, a transaction requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of the transaction. (Strict 2PL locking protocol.)
◦ Idea: If an action of Ti (say, writing X) affects Tj (which perhaps reads X), one of them, say Ti, will obtain the lock on X first and Tj is forced to wait until Ti completes; this effectively orders the transactions.
◦ What if Tj already has a lock on Y and Ti later requests a lock on Y? What is it called? What will happen?
Ensuring Atomicity
DBMS ensures atomicity (all-or-nothing property)
even if system crashes in the middle of a Xact.
Idea: Keep a log (history) of all actions carried
out by the DBMS while executing a set of Xacts:
◦ Before a change is made to the database, the
corresponding log entry is forced to a safe location.
(WAL protocol.)
◦ After a crash, the effects of partially executed
transactions are undone using the log. (Thanks to
WAL, if log entry wasn’t saved before the crash,
corresponding change was not applied to database!)
The Log
The following actions are recorded in the log:
◦ Ti writes an object: the old value and the new value.
Log record must go to disk before the changed page!
◦ Ti commits/aborts: a log record indicating this action.
Log records chained together by Xact id, so it’s easy to
undo a specific Xact (e.g., to resolve a deadlock).
Log is often duplexed and archived on “stable” storage.
All log related activities (and in fact, all CC related
activities such as lock/unlock, dealing with deadlocks
etc.) are handled transparently by the DBMS.
Databases make these folks happy ...
End users and DBMS vendors
DB application programmers
◦ e.g. webmasters
Database administrator (DBA)
◦ Designs logical /physical schemas
◦ Handles security and authorization
◦ Data availability, crash recovery
◦ Database tuning as needs evolve
Must understand how a DBMS works!
Structure of a DBMS
A typical DBMS has a
layered architecture.
The figure does not show
the concurrency control
and recovery
components.
This is one of several
possible architectures;
each system has its own
variations.
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
These layers
must consider
concurrency
control and
recovery
Summary
DBMS used to maintain, query large datasets.
Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security.
Levels of abstraction give data independence.
A DBMS typically has a layered architecture.
DBAs hold responsible jobs and are well-paid!
DBMS R&D is one of the broadest, mature areas in CS.
Data Models
A Database models some portion of the real world.
Data Model is link between user’s view of the world and bits stored in computer.
Many models have been proposed.
We will concentrate on the Relational Model.
10101
11101
Student (sid: string, name: string, login:
string, age: integer, gpa:real)
Describing Data: Data Models A data model is a collection of concepts for
describing data.
A database schema is a description of a particular collection of data, using a given data model.
The relational model of data is the most widely used model today.◦ Main concept: relation, basically a table with rows
and columns.◦ Every relation has a schema, which describes the
columns, or fields.
Levels of Abstraction
Views describe how users see the data.
Conceptual schema defines logical structure
Physical schemadescribes the files and indexes used.
(sometimes called the ANSI/SPARC model)
Physical Schema
Conceptual Schema
View 1 View 2 View 3
DB
Users
Data Independence:The Big
Breakthrough of the Relational Model
A Simple Idea:
Applications should be
insulated from how data
is structured and
stored. Physical Schema
Conceptual Schema
View 1 View 2 View 3
DB
• Q: Why are these particularly important for DBMS?
• Physical data independence: Protection from changes in physical structure of data.
• Logical data independence: Protection from changes in logical structure of data.
Why Study the Relational Model? Most widely used model currently.
◦ DB2, MySQL, Oracle, PostgreSQL, SQLServer, …
◦ Note: some “Legacy systems” use older models e.g., IBM’s IMS
Object-oriented concepts have recently merged in
◦ object-relational model
Informix, IBM DB2, Oracle 8i
Early work done in POSTGRES research project at Berkeley
XML (semi-structured)models emerging?
Relational Database: Definitions Relational database: a set of relations.
Relation: made up of 2 parts:
◦ Schema : specifies name of relation, plus name and type of each column.
E.g. Students(sid: string, name: string, login: string, age: integer, gpa: real)
◦ Instance : a table, with rows and columns.
#rows = cardinality
#fields = degree / arity
Can think of a relation as a set of rows or tuples.
◦ i.e., all rows are distinct
Example: University Database
Conceptual schema:
◦ Students(sid: string, name: string, login:
string, age: integer, gpa:real)
◦ Courses(cid: string, cname:string, credits:integer)
◦ Enrolled(sid:string, cid:string, grade:string)
External Schema (View):
◦ Course_info(cid:string,enrollment:integer)
One possible Physical schema :
◦ Relations stored as unordered files.
◦ Index on first column of Students.
Physical Schema
Conceptual Schema
View 1 View 2 View 3
DB
Ex: An Instance of Students Relation
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@eecs 18 3.2
53650 Smith smith@math 19 3.8
Cardinality = 3, Arity = 5
All rows must be unique (set semantics)
• Q: Do all values in each column of a relation instance
have to be Unique?
• Q: Is “Cardinality” a schema property?• Q: Is “Arity” a schema property?
SQL - A language for Relational DBs SQL (a.k.a. “Sequel”),
◦ “Intergalactic Standard for Data”◦ Stands for Structured Query Language
Two sub-languages:
Data Definition Language (DDL)◦ create, modify, delete relations◦ specify constraints◦ administer users, security, etc.
Data Manipulation Language (DML)◦ Specify queries to find tuples that satisfy criteria◦ add, modify, remove tuples
SQL Overview CREATE TABLE <name> ( <field> <domain>, … )
INSERT INTO <name> (<field names>)VALUES (<field values>)
DELETE FROM <name> WHERE <condition>
UPDATE <name> SET <field name> = <value>
WHERE <condition>
SELECT <fields> FROM <name>
WHERE <condition>
Creating Relations in SQL
Creates the Students relation.
◦ Note: the type (domain) of each field is
specified, and enforced by the DBMS
whenever tuples are added or modified.
CREATE TABLE Students(sid CHAR(20), name CHAR(20), login CHAR(10),age INTEGER,gpa FLOAT)
Table Creation (continued)
Another example: the Enrolled table holds
information about courses students take.
CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2))
Adding and Deleting Tuples
Can insert a single tuple using:
INSERT INTO Students (sid, name, login, age, gpa)VALUES (‘53688’, ‘Smith’, ‘smith@ee’, 18, 3.2)
• Can delete all tuples satisfying some condition (e.g., name = Smith):
DELETE FROM Students SWHERE S.name = ‘Smith’
Powerful variants of these commands are available; more later!
Keys
Keys are a way to associate tuples in different
relations
Keys are one form of integrity constraint (IC)
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@eecs 18 3.2
53650 Smith smith@math 19 3.8
sid cid grade
53666 Carnatic101 C
53666 Reggae203 B
53650 Topology112 A
53666 History105 B
Enrolled Students
PRIMARY KeyFORIEGN Key
Primary Keys A set of fields is a superkey if:
◦ No two distinct tuples can have same values in all key fields
A set of fields is a candidate key for a relation if :
◦ It is a superkey
◦ No subset of the fields is a superkey
what if >1 key for a relation?
◦ one of the candidate keys is chosen (by DBA) to be the primary key.
E.g.
◦ sid is a key for Students.
◦ What about name?
◦ The set {sid, gpa} is a superkey.
Primary and Candidate Keys in SQL
Possibly many candidate keys (specified using
UNIQUE), one of which is chosen as the primary key.
• Keys must be used carefully!
• “For a given student and course, there is a single grade.”
“Students can take only one course, and no two students in a course receive the same grade.”
CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid))
CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid),UNIQUE (cid, grade))
vs.
Foreign Keys, Referential Integrity
Foreign key : Set of fields in one relation that is used to
`refer’ to a tuple in another relation.
◦ Must correspond to the primary key of the other
relation.
◦ Like a `logical pointer’.
If all foreign key constraints are enforced, referential
integrity is achieved (i.e., no dangling references.)
Foreign Keys in SQL E.g. Only students listed in the Students relation should be allowed to enroll for
courses.
◦ sid is a foreign key referring to Students:
CREATE TABLE Enrolled (sid CHAR(20),cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid),FOREIGN KEY (sid) REFERENCES Students )
sid cid grade
53666 Carnatic101 C
53666 Reggae203 B
53650 Topology112 A
53666 History105 B
Enrolled
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@eecs 18 3.2
53650 Smith smith@math 19 3.8
Students
11111 English102 A
Enforcing Referential Integrity
Consider Students and Enrolled; sid in Enrolled is a foreign key that references Students.
What should be done if an Enrolled tuple with a non-existent student id is inserted? (Reject it!)
What should be done if a Students tuple is deleted?◦ Also delete all Enrolled tuples that refer to it?
◦ Disallow deletion of a Students tuple that is referred to?
◦ Set sid in Enrolled tuples that refer to it to a default sid?
◦ (In SQL, also: Set sid in Enrolled tuples that refer to it to a special value null, denoting `unknown’ or `inapplicable’.)
Similar issues arise if primary key of Students tuple is updated.
Integrity Constraints (ICs)
IC: condition that must be true for any instance of the database; e.g., domain constraints.
◦ ICs are specified when schema is defined.
◦ ICs are checked when relations are modified.
A legal instance of a relation is one that satisfies all specified ICs.
◦ DBMS should not allow illegal instances.
If the DBMS checks ICs, stored data is more faithful to real-world meaning.
◦ Avoids data entry errors, too!
Where do ICs Come From?
ICs are based upon the semantics of the real-world that is being described in the database relations.
We can check a database instance to see if an IC is violated, but we can NEVER infer that an IC is true by looking at an instance.
◦ An IC is a statement about all possible instances!
◦ From example, we know name is not a key, but the assertion that sid is a key is given to us.
Key and foreign key ICs are the most common; more general ICs supported too.
Relational Query Languages
A major strength of the relational model:
supports simple, powerful querying of data.
Queries can be written intuitively, and the
DBMS is responsible for efficient evaluation.
◦ The key: precise semantics for relational queries.
◦ Allows the optimizer to extensively re-order
operations, and still ensure that the answer does
not change.
The SQL Query Language
The most widely used relational query
language.
◦ Current std is SQL-2003; SQL92 is a basic
subset that we focus on in this class.
To find all 18 year old students, we can
write:SELECT *FROM Students S
WHERE S.age=18
• To find just names and logins, replace the first line:
SELECT S.name, S.login
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@ee 18 3.2
Querying Multiple Relations
What does the following query compute?SELECT S.name, E.cidFROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.grade='A'
sid cid grade
53831 Carnatic101 C
53831 Reggae203 B
53650 Topology112 A
53666 History105 B
Given the following instance of Enrolled
S.name E.cid
Smith Topology112we get:
Semantics of a Query
A conceptual evaluation method for the previous query:
1. do FROM clause: compute cross-product of Students and Enrolled
2. do WHERE clause: Check conditions, discard tuples that fail
3. do SELECT clause: Delete unwanted fields
Remember, this is conceptual. Actual evaluation will be much more efficient, but must produce the same answers.
Cross-product of Students and Enrolled Instances
S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade
53666 Jones jones@cs 18 3.4 53831 Carnatic101 C
53666 Jones jones@cs 18 3.4 53832 Reggae203 B
53666 Jones jones@cs 18 3.4 53650 Topology112 A
53666 Jones jones@cs 18 3.4 53666 History105 B
53688 Smith smith@ee 18 3.2 53831 Carnatic101 C
53688 Smith smith@ee 18 3.2 53831 Reggae203 B
53688 Smith smith@ee 18 3.2 53650 Topology112 A
53688 Smith smith@ee 18 3.2 53666 History105 B
53650 Smith smith@math 19 3.8 53831 Carnatic101 C
53650 Smith smith@math 19 3.8 53831 Reggae203 B
53650 Smith smith@math 19 3.8 53650 Topology112 A
53650 Smith smith@math 19 3.8 53666 History105 B
Queries, Query Plans, and Operators
System handles query plan
generation & optimization; ensures
correct execution.
SELECT eid, ename, title
FROM Emp E
WHERE E.sal > $50K
SELECT E.loc, AVG(E.sal)
FROM Emp E
GROUP BY E.loc
HAVING Count(*) > 5
SELECT
COUNT DISTINCT (E.eid)
FROM Emp E, Proj P, Asgn A
WHERE E.eid = A.eid
AND P.pid = A.pid
AND E.loc <> P.loc
• Issues: view reconciliation, operator ordering, physical operator choice, memory management, access path (index) use, …
EmployeesProjects
Assignments
Emp
Select
Emp
Group(agg)
Having
Emp
Count distinct
Asgn
Join
Join
Proj
Structure of a DBMS
A typical DBMS has a layered architecture.
The figure does not show the concurrency control and recovery components.
Each system has its own variations.
The book shows a somewhat more detailed version.
You will see the “real deal” in PostgreSQL.
◦ It’s a pretty full-featured example
Next class: we will start on this stack, bottom up.
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
These layers
must consider
concurrency
control and
recovery
Relational Model: Summary A tabular representation of data.
Simple and intuitive, currently the most widely used
◦ Object-relational variant gaining ground
Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations.
◦ Two important ICs: primary and foreign keys
◦ In addition, we always have domain constraints.
Powerful query languages exist.
◦ SQL is the standard commercial one DDL - Data Definition Language
DML - Data Manipulation Language
Chapter 5 69
Storage
The are two general types of storage
media that is used with computers. They
are :
◦ Primary Storage - This includes all storage
media that can be operated on directly by the
CPU (RAM , L1 and L2 Cache Memory)
◦ Secondary Storage - This includes Hard
Drives, CD’s and tape.
Chapter 5 70
Memory Hierarchies & Storage
Devices The Memory Hierarchy is based upon
speed of access. However, this speed
comes with a price tag attached which
varies inversely with the access time of
memory. Like cars the faster the memory
access is the more it costs.
Chapter 5 71
Primary Storage Level of Memory
The Primary Storage Level of Memory is
generally made up of 3 Levels.
◦ L1 Cache which is located on the CPU
◦ L2 Cache which is located near the CPU
◦ Main Memory which is the RAM figure that is
often referred to in computer advertisements
Chapter 5 72
Secondary Storage Level of Memory
The Secondary Storage Level of Memory
may be made up of 4 Levels.
◦ Flash Memory or EEPROM
◦ Hard Drives
◦ CD ROM’s
◦ Tape
Chapter 5 73
Terms Used in the Hardware
Description of Hard Drives Capacity - The number of bytes it can
store.
Single-sided vs. Double-sided - States if
the disk/platter is written on one or both
sides.
Disk Pack - A collection of disks/platters
that are assembled together into a pack.
Track - A Circle of a small width on a disk.
A disk surface will have many tracks.
Chapter 5 74
Terms Used in the Hardware
Description of Hard Drives Sector - A segment or arc of a track.
Block - is the division of a track into equal
sized portions by the operating system.
Interblock Gaps - These are fixed sized
segments that separate the blocks.
Read/Write Head - Actual reads/writes
the information to the disk.
Chapter 5 75
Terms Used in the Hardware
Description of Hard Drives
Cylinder - Tracks with the same diameter
that are located on the disk surface of a
disk pack.
Chapter 5 76
Terms Used in Measuring Disk
Operations
Seek Time (s)- The time it takes to position the
read/write head on the desired track. It will be
given in all problems that it is needed for.
Rotational Delay (rd) - The average amount of
time it takes the desired block to rotate into
position under the read/write head.
Rd=(1/2)*(1/p) min where p is rpm of the disk
Chapter 5 77
Terms Used in Measuring Disk
Operations Transfer Rate (tr) - The rate at which
information can be transferred to or from
the disk. tr =(track size)/(1/p min)
Block Transfer Time (btt) - The time it
takes to transfer the data once the
read/write head has been positioned. btt
= B/tr msec where B is the block size in
bytes.
Chapter 5 78
Terms Used in Measuring Disk
Operations
Bulk Transfer Rate (btr) - The rate at which
multiple blocks can be written/read to
contiguous blocks. Where G is the Interblock
Gap
btr = (B/(B+G)) * tr bytes/msec
Rewrite Time (Trw) - Time it takes after a
block is read to write that same block back to
the disk or the time for one revolution.
Chapter 5 79
Computing Times
Given :
◦ Seek Time (s) = 10 msec
◦ Rotational speed = 3600 rpm
◦ Track size = 50 KB
◦ Block size (B) = 512 bytes
◦ Interblock Gap = 128 bytes
Chapter 5 80
Problems for Disk Operations
Compute the average time it takes to
transfer 1 block on this system.
Compute the average time it takes to
transfer 20 non-contiguous blocks that
are located on the same track.
Compute the average time it takes to
transfer 20 contiguous blocks.
Chapter 5 81
Parallelizing Disk Access Using
RAID RAID - Stands for Redundant Arrays of
Inexpensive Disks or Redundant Arrays of
Independent Disks.
RAIDs are used to provide increased
reliability, increased performance or both.
Chapter 5 82
RAID Levels
Level 0 - has no redundancy and the best
write performance but its read
performance is not as good as level 1.
Level 1 - uses mirrored disks which
provide redundancy and improved read
performance.
Level 2 - provides redundancy using
Hamming Codes
Chapter 5 83
RAID Levels
Level 3 - uses a single parity disk.
Level 4 and 5 - use block-level data
striping with level 5 distributing the data
across all the disks.
Level 6 - uses the P + Q redundancy
scheme making use of the Reed-Soloman
codes to protect against the failure of 2
Disks.
Chapter 5 84
Records
Records is the term used to refer to a
number of related values or items. Each
value or item is stored in a field of a
specific data type.
Records may be of either fixed or variable
lengths.
Chapter 5 85
Variable Length Records in Files
There are several reasons a record with
the same record type may be of variable
length.
◦ Variable length fields
◦ Repeating fields
For efficiency reasons different record
types may be clustered in a file.
Chapter 5 86
Spanned Vs Unspanned Records
When the records in a file is stored on a
disk they may be placed in blocks of a fixed
size. This will rarely match the record size.
So a decision must be made when the
record size is smaller than the block size
and the block size is not a multiple of the
record size whether to store the record all
in one block and have unused space or in
two different blocks.
Chapter 5 87
File Operations
File may either be stored in contiguous blocks or by linking the blocks together. There are advantages and disadvantages to both methods.
Operations on files can be group into two type of operations. Retrieval or update. Retrieval only involves a read while and update involves read, write and modification.
Chapter 5 88
File Structure
Heap (Pile) Files
Hash (Direct) Files
Ordered (Sorted) Files
B - Trees
Chapter 5 89
Once the data has been brought into memory, it can be
accessed by an instruction in .00000004 seconds by a
machine running a 25MIPS. The disparity between time
for memory access and disk access is enormous:we can
perform 625,000 instructions in the time it takes to
read /write one disk page.
To put this in human terms if you were typing a letter
for you boss and found a word you could not make out
so you leave him a voice mail message. Since you were
told to do nothing else but this you patiently wait for
his reply doing Nothing! Unfortunately, he just went on
vacation and does not get your message for 3 WEEKS.
This is similar to the computer waiting .025 seconds to
get the needed data into memory from a disk read.