+ All Categories
Home > Documents > 1.Database System Concepts and Architecture

1.Database System Concepts and Architecture

Date post: 07-Mar-2015
Category:
Upload: debarshi-datta
View: 56 times
Download: 6 times
Share this document with a friend
89
Database System Concepts and Architecture
Transcript
Page 1: 1.Database System Concepts and Architecture

Database System Concepts and

Architecture

Page 2: 1.Database System Concepts and Architecture

Data Models

A collection of concepts that can be used to

describe the structure of a database (data

types, relationships, and constraints)

basic operations (retrieval and updates)

specify the dynamic aspect or behavior of a

database application( user-defined operations )

example: COMPUTE_GPA, which can be

applied to a STUDENT object

Jan 29, 2002

Page 3: 1.Database System Concepts and Architecture

Categories of Data Models

High-level or conceptual data models

(common users)

low-level or physical data models

(describe the details of how data is stored

)

in between, representational (or

implementation) data models can serve

both categories above

Jan 29, 2002

Page 4: 1.Database System Concepts and Architecture

Conceptual Data Model

Use concepts such as

◦ Entities:a real-world object or concept

(DEPT) (COURSE)

◦ Attributes:property of interest that further

describes an entity (dept no, name, telephone,

etc)

◦ Relationships:interaction among the entities

(DEPT) provides (COURSE)

Jan 29, 2002

Page 5: 1.Database System Concepts and Architecture

Physical Data Model

Describes how data is stored in the

computer.

It represents info such as

◦ record formats

◦ record orderings

◦ access path: make search more efficient

Jan 29, 2002

Page 6: 1.Database System Concepts and Architecture

Representational Data Model

Used in traditional commercial DMBS

they include

◦ Relational Data model

◦ Network model

◦ Hierarchical model

Jan 29, 2002

Page 7: 1.Database System Concepts and Architecture

Schemas

Is the description of the database (not database

itself)

◦ Specified during database design

◦ Not expected to change frequently

◦ A displayed schema is called a schema diagram (Fig 2.1)

Each object in the schema-such as STUDENT or COURSE-is a schema construct.

Schema diagram represents only some aspects of a schema (name of record type, data element and some type of constraint)

Jan 29, 2002

Page 8: 1.Database System Concepts and Architecture

Jan 29, 2002

Page 9: 1.Database System Concepts and Architecture

Instances and Database State

The data in the database at a particular moment in time is

called a database state or snapshot or current set of

occurrences or instances in the database

When we define a new database we have database state is

empty state (schema specified only in DBMS)

The initial state when the database is first populated

Then At any point in time, the database has a current state

schema evolution: when we need to change the schema

Jan 29, 2002

Page 10: 1.Database System Concepts and Architecture

The Three-Schema Architecture

Importance of using DB approach

◦ insulation of programs and data

◦ support of multiple user views

◦ use of a catalog to store the database description (schema).

The aim is to separate the user application and physical

DB

schema can be defined into three levels:

◦ The internal level has an internal schema

◦ describes the physical storage structure of the database.

◦ uses a physical data model

Jan 29, 2002

Page 11: 1.Database System Concepts and Architecture

Jan 29, 2002

Page 12: 1.Database System Concepts and Architecture

The Three-Schema Architecture

◦ The conceptual level has a conceptual schema describing the structure of the whole database for a community of users.

◦ It hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints.

◦ A high-level data model or an implementation data model can be used at this level.

◦ The external or view level includes a number of external schemas or user views describing the part of the db that a particular user group is interested in and hides the rest of the db from that user group.

◦ A high-level data model or an implementation data model can be used at this level.

Jan 29, 2002

Page 13: 1.Database System Concepts and Architecture

Data Independence

Is the capacity to change the schema at one level of a

database system without having to change the schema at

the next higher level.

Logical data independence: capacity to change the

conceptual schema without having to change external

schemas or application programs.

Physical data independence: capacity to change the

internal schema without having to change the conceptual

(or external) schemas

Jan 29, 2002

Page 14: 1.Database System Concepts and Architecture

DBMS Languages

Data Definition Language DDL: Language to specify

conceptual and internal schemas for the database and any

mappings between the two.

Storage definition language SDL: used when clear

distinction between conceptual and internal schema.

view definition language VDL: specify user views and their

mappings to the conceptual schema.

data manipulation language DML:retrieval, insertion,

deletion, and modification of the data

Jan 29, 2002

Page 15: 1.Database System Concepts and Architecture

DBMS Languages …..

SQL relational database language: represents a

combination of DDL, VDL, and DML, as well as

statements for constraint specification and schema

evolution

There are two main types of DMLs:

◦ A high-level or nonprocedural DML : specify complex DB

operations. Example SQL(set-at-a-time)

◦ A low-level or procedural DML: retrieve individual records

or objects from DB and process each separately (record-

at-a-time).

Jan 29, 2002

Page 16: 1.Database System Concepts and Architecture

DBMS Interfaces

Menu-Based Interfaces for Browsing

◦ menus leads to formulation of a request

Forms-Based Interfaces

◦ display a form for each user (insert, select)

◦ designed for naïve users.

Graphical User Interfaces (GUI)

◦ display schema as diagram.

◦ Utilize both menu and forms.

Jan 29, 2002

Page 17: 1.Database System Concepts and Architecture

DBMS Interfaces

Natural Language Interfaces◦ Accept requests in native language and attempt to

understand them.

◦ Refers to words in the schema and (standard words) to interpret the request.

Interfaces for Parametric Users (eg tellers)◦ goal is to min the number of keystroks required.

(use of function) keys

Interfaces for the DBA◦ creating accounts, system privileges, changing

schema, etc.

Jan 29, 2002

Page 18: 1.Database System Concepts and Architecture

The Database System Environment

DBMS Component Modules (fig 2.3)

◦ db & DBMS stored in disk controlled by OS.

◦ Stored data manager control access to DBMS

◦ SDM puts data in buffers in main memory

◦ DDL compiler process schema definitions and store it in meta data.

◦ Run-time-data-proc handles DB accesses @runtime

◦ receive update or retrieve and solve them on the DB

◦ Query-Compiler: handles high level queries: parse, analyze and interpret uses DB access code.

◦ Precompiler extract DML commands from app program

Jan 29, 2002

Page 19: 1.Database System Concepts and Architecture

Jan 29, 2002

Page 20: 1.Database System Concepts and Architecture

Database System Utilities

Loading: load existing files into the DB

Backup: creates backup copy of the DB

File reorganization: reorganize files for

better performance

Performance monitoring: monitor DB

usage and provide statistics to DBA

Jan 29, 2002

Page 21: 1.Database System Concepts and Architecture

Tools, Application Environments &

Communications Facilities Case: design phase

data (information) repository: store

catalog info, design decisions, usage, app

program description, user information

Application Developer: e.g. power builder.

Help in development of DB design, GUI,

query, update etc.

Comm Software: allow users remotely to

access the DB

Jan 29, 2002

Page 22: 1.Database System Concepts and Architecture

Classification of DBManagement

Systems

Data model:◦ relational, object, object-relational, hierarchical, network,

and other.

Number of users supported by the system. ◦ Single-user systems and Multiuser systems

Number of sites over which the database is distributed.◦ centralized, distributed DBMS (DDBMS) ,Homogeneous

DDBMSs ,federated DBMS (develop software to access several autonomous preexisting databases stored under heterogeneous DBMSs. )

Jan 29, 2002

Page 23: 1.Database System Concepts and Architecture

Classification of DBManagement

Systems ….. Cost of the DBMS: 10K-100K. Single 100-

3K

General-purpose vs Special-purpose

(When performance is a primary

consideration.

◦ Example: on-line transaction processing

(OLTP) systems, which must support a large

number of concurrent transactions without

imposing excessive delays. )

Jan 29, 2002

Page 24: 1.Database System Concepts and Architecture

Jan 29, 2002

Page 25: 1.Database System Concepts and Architecture

What is DBMS?

Need for information management

A very large, integrated collection of data.

Models real-world enterprise.

◦ Entities (e.g., students, courses)

◦ Relationships (e.g., John is taking CS662)

A Database Management System (DBMS) is a software package designed to store and manage databases.

Page 26: 1.Database System Concepts and Architecture

Why Use a DBMS?

Data independence and efficient access.

Data integrity and security.

Uniform data administration.

Concurrent access, recovery from crashes.

Replication control

Reduced application development time.

Page 27: 1.Database System Concepts and Architecture

Why Study Databases??

Shift from computation to information

◦ at the “low end”: access to physical world

◦ at the “high end”: scientific applications

Datasets increasing in diversity and volume.

◦ Digital libraries, interactive video, Human

Genome project, e-commerce, sensor networks

◦ ... need for DBMS/data services exploding

DBMS encompasses several areas of CS

◦ OS, languages, theory, AI, multimedia, logic

?

Page 28: 1.Database System Concepts and Architecture

Data Models

A data model is a collection of concepts for describing data.

A schema is a description of a particular collection of data, using the a given data model.

The relational model of data is the most widely used model today.

◦ Main concept: relation, basically a table with rows and columns.

◦ Every relation has a schema, which describes the columns, or fields.

Page 29: 1.Database System Concepts and Architecture

Levels of Abstraction

Many views, single conceptual

(logical) schema and physical

schema.

◦ Views describe how users see

the data.

◦ Conceptual schema defines

logical structure

◦ Physical schema describes the

files and indexes used.

* Schemas are defined using DDL; data is modified/queried using DML.

Physical Schema

Conceptual Schema

View 1 View 2 View 3

Page 30: 1.Database System Concepts and Architecture

Example: University Database

Conceptual schema:

◦ Students(sid: string, name: string, login: string,

age: integer, gpa:real)

◦ Courses(cid: string, cname:string, credits:integer)

◦ Enrolled(sid:string, cid:string, grade:string)

Physical schema:

◦ Relations stored as unordered files.

◦ Index on first column of Students.

External Schema (View):

◦ Course_info(cid:string, enrollment:integer)

Page 31: 1.Database System Concepts and Architecture

Data Independence

Applications insulated from how data is

structured and stored.

Logical data independence: Protection from

changes in logical structure of data.

Physical data independence: Protection

from changes in physical structure of data.

* One of the most important benefits of using a DBMS!

Page 32: 1.Database System Concepts and Architecture

Concurrency Control

Concurrent execution of user programs is essential for good DBMS performance.◦ Because disk accesses are frequent, and relatively

slow, it is important to keep the CPU humming by working on several user programs concurrently.

Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed.

DBMS ensures such problems don’t arise: users can pretend they are using a single-user system.

Page 33: 1.Database System Concepts and Architecture

Transaction: An Execution Unit of a DB

Key concept is transaction, which is an atomicsequence of database actions (reads/writes).

Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins.

◦ Users can specify some simple integrity constraints on the data, and the DBMS will enforce these constraints.

◦ Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed). Why not?

◦ Thus, ensuring that a transaction (run alone) preserves consistency is ultimately the user’s responsibility!

Page 34: 1.Database System Concepts and Architecture

Scheduling Concurrent Transactions

DBMS ensures that execution of {T1, ... , Tn} is equivalent to some serial execution T1’ ... Tn’.

◦ Before reading/writing an object, a transaction requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of the transaction. (Strict 2PL locking protocol.)

◦ Idea: If an action of Ti (say, writing X) affects Tj (which perhaps reads X), one of them, say Ti, will obtain the lock on X first and Tj is forced to wait until Ti completes; this effectively orders the transactions.

◦ What if Tj already has a lock on Y and Ti later requests a lock on Y? What is it called? What will happen?

Page 35: 1.Database System Concepts and Architecture

Ensuring Atomicity

DBMS ensures atomicity (all-or-nothing property)

even if system crashes in the middle of a Xact.

Idea: Keep a log (history) of all actions carried

out by the DBMS while executing a set of Xacts:

◦ Before a change is made to the database, the

corresponding log entry is forced to a safe location.

(WAL protocol.)

◦ After a crash, the effects of partially executed

transactions are undone using the log. (Thanks to

WAL, if log entry wasn’t saved before the crash,

corresponding change was not applied to database!)

Page 36: 1.Database System Concepts and Architecture

The Log

The following actions are recorded in the log:

◦ Ti writes an object: the old value and the new value.

Log record must go to disk before the changed page!

◦ Ti commits/aborts: a log record indicating this action.

Log records chained together by Xact id, so it’s easy to

undo a specific Xact (e.g., to resolve a deadlock).

Log is often duplexed and archived on “stable” storage.

All log related activities (and in fact, all CC related

activities such as lock/unlock, dealing with deadlocks

etc.) are handled transparently by the DBMS.

Page 37: 1.Database System Concepts and Architecture

Databases make these folks happy ...

End users and DBMS vendors

DB application programmers

◦ e.g. webmasters

Database administrator (DBA)

◦ Designs logical /physical schemas

◦ Handles security and authorization

◦ Data availability, crash recovery

◦ Database tuning as needs evolve

Must understand how a DBMS works!

Page 38: 1.Database System Concepts and Architecture

Structure of a DBMS

A typical DBMS has a

layered architecture.

The figure does not show

the concurrency control

and recovery

components.

This is one of several

possible architectures;

each system has its own

variations.

Query Optimization

and Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

These layers

must consider

concurrency

control and

recovery

Page 39: 1.Database System Concepts and Architecture

Summary

DBMS used to maintain, query large datasets.

Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security.

Levels of abstraction give data independence.

A DBMS typically has a layered architecture.

DBAs hold responsible jobs and are well-paid!

DBMS R&D is one of the broadest, mature areas in CS.

Page 40: 1.Database System Concepts and Architecture

Data Models

A Database models some portion of the real world.

Data Model is link between user’s view of the world and bits stored in computer.

Many models have been proposed.

We will concentrate on the Relational Model.

10101

11101

Student (sid: string, name: string, login:

string, age: integer, gpa:real)

Page 41: 1.Database System Concepts and Architecture

Describing Data: Data Models A data model is a collection of concepts for

describing data.

A database schema is a description of a particular collection of data, using a given data model.

The relational model of data is the most widely used model today.◦ Main concept: relation, basically a table with rows

and columns.◦ Every relation has a schema, which describes the

columns, or fields.

Page 42: 1.Database System Concepts and Architecture

Levels of Abstraction

Views describe how users see the data.

Conceptual schema defines logical structure

Physical schemadescribes the files and indexes used.

(sometimes called the ANSI/SPARC model)

Physical Schema

Conceptual Schema

View 1 View 2 View 3

DB

Users

Page 43: 1.Database System Concepts and Architecture

Data Independence:The Big

Breakthrough of the Relational Model

A Simple Idea:

Applications should be

insulated from how data

is structured and

stored. Physical Schema

Conceptual Schema

View 1 View 2 View 3

DB

• Q: Why are these particularly important for DBMS?

• Physical data independence: Protection from changes in physical structure of data.

• Logical data independence: Protection from changes in logical structure of data.

Page 44: 1.Database System Concepts and Architecture

Why Study the Relational Model? Most widely used model currently.

◦ DB2, MySQL, Oracle, PostgreSQL, SQLServer, …

◦ Note: some “Legacy systems” use older models e.g., IBM’s IMS

Object-oriented concepts have recently merged in

◦ object-relational model

Informix, IBM DB2, Oracle 8i

Early work done in POSTGRES research project at Berkeley

XML (semi-structured)models emerging?

Page 45: 1.Database System Concepts and Architecture

Relational Database: Definitions Relational database: a set of relations.

Relation: made up of 2 parts:

◦ Schema : specifies name of relation, plus name and type of each column.

E.g. Students(sid: string, name: string, login: string, age: integer, gpa: real)

◦ Instance : a table, with rows and columns.

#rows = cardinality

#fields = degree / arity

Can think of a relation as a set of rows or tuples.

◦ i.e., all rows are distinct

Page 46: 1.Database System Concepts and Architecture

Example: University Database

Conceptual schema:

◦ Students(sid: string, name: string, login:

string, age: integer, gpa:real)

◦ Courses(cid: string, cname:string, credits:integer)

◦ Enrolled(sid:string, cid:string, grade:string)

External Schema (View):

◦ Course_info(cid:string,enrollment:integer)

One possible Physical schema :

◦ Relations stored as unordered files.

◦ Index on first column of Students.

Physical Schema

Conceptual Schema

View 1 View 2 View 3

DB

Page 47: 1.Database System Concepts and Architecture

Ex: An Instance of Students Relation

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@eecs 18 3.2

53650 Smith smith@math 19 3.8

Cardinality = 3, Arity = 5

All rows must be unique (set semantics)

• Q: Do all values in each column of a relation instance

have to be Unique?

• Q: Is “Cardinality” a schema property?• Q: Is “Arity” a schema property?

Page 48: 1.Database System Concepts and Architecture

SQL - A language for Relational DBs SQL (a.k.a. “Sequel”),

◦ “Intergalactic Standard for Data”◦ Stands for Structured Query Language

Two sub-languages:

Data Definition Language (DDL)◦ create, modify, delete relations◦ specify constraints◦ administer users, security, etc.

Data Manipulation Language (DML)◦ Specify queries to find tuples that satisfy criteria◦ add, modify, remove tuples

Page 49: 1.Database System Concepts and Architecture

SQL Overview CREATE TABLE <name> ( <field> <domain>, … )

INSERT INTO <name> (<field names>)VALUES (<field values>)

DELETE FROM <name> WHERE <condition>

UPDATE <name> SET <field name> = <value>

WHERE <condition>

SELECT <fields> FROM <name>

WHERE <condition>

Page 50: 1.Database System Concepts and Architecture

Creating Relations in SQL

Creates the Students relation.

◦ Note: the type (domain) of each field is

specified, and enforced by the DBMS

whenever tuples are added or modified.

CREATE TABLE Students(sid CHAR(20), name CHAR(20), login CHAR(10),age INTEGER,gpa FLOAT)

Page 51: 1.Database System Concepts and Architecture

Table Creation (continued)

Another example: the Enrolled table holds

information about courses students take.

CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2))

Page 52: 1.Database System Concepts and Architecture

Adding and Deleting Tuples

Can insert a single tuple using:

INSERT INTO Students (sid, name, login, age, gpa)VALUES (‘53688’, ‘Smith’, ‘smith@ee’, 18, 3.2)

• Can delete all tuples satisfying some condition (e.g., name = Smith):

DELETE FROM Students SWHERE S.name = ‘Smith’

Powerful variants of these commands are available; more later!

Page 53: 1.Database System Concepts and Architecture

Keys

Keys are a way to associate tuples in different

relations

Keys are one form of integrity constraint (IC)

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@eecs 18 3.2

53650 Smith smith@math 19 3.8

sid cid grade

53666 Carnatic101 C

53666 Reggae203 B

53650 Topology112 A

53666 History105 B

Enrolled Students

PRIMARY KeyFORIEGN Key

Page 54: 1.Database System Concepts and Architecture

Primary Keys A set of fields is a superkey if:

◦ No two distinct tuples can have same values in all key fields

A set of fields is a candidate key for a relation if :

◦ It is a superkey

◦ No subset of the fields is a superkey

what if >1 key for a relation?

◦ one of the candidate keys is chosen (by DBA) to be the primary key.

E.g.

◦ sid is a key for Students.

◦ What about name?

◦ The set {sid, gpa} is a superkey.

Page 55: 1.Database System Concepts and Architecture

Primary and Candidate Keys in SQL

Possibly many candidate keys (specified using

UNIQUE), one of which is chosen as the primary key.

• Keys must be used carefully!

• “For a given student and course, there is a single grade.”

“Students can take only one course, and no two students in a course receive the same grade.”

CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid))

CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid),UNIQUE (cid, grade))

vs.

Page 56: 1.Database System Concepts and Architecture

Foreign Keys, Referential Integrity

Foreign key : Set of fields in one relation that is used to

`refer’ to a tuple in another relation.

◦ Must correspond to the primary key of the other

relation.

◦ Like a `logical pointer’.

If all foreign key constraints are enforced, referential

integrity is achieved (i.e., no dangling references.)

Page 57: 1.Database System Concepts and Architecture

Foreign Keys in SQL E.g. Only students listed in the Students relation should be allowed to enroll for

courses.

◦ sid is a foreign key referring to Students:

CREATE TABLE Enrolled (sid CHAR(20),cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid),FOREIGN KEY (sid) REFERENCES Students )

sid cid grade

53666 Carnatic101 C

53666 Reggae203 B

53650 Topology112 A

53666 History105 B

Enrolled

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@eecs 18 3.2

53650 Smith smith@math 19 3.8

Students

11111 English102 A

Page 58: 1.Database System Concepts and Architecture

Enforcing Referential Integrity

Consider Students and Enrolled; sid in Enrolled is a foreign key that references Students.

What should be done if an Enrolled tuple with a non-existent student id is inserted? (Reject it!)

What should be done if a Students tuple is deleted?◦ Also delete all Enrolled tuples that refer to it?

◦ Disallow deletion of a Students tuple that is referred to?

◦ Set sid in Enrolled tuples that refer to it to a default sid?

◦ (In SQL, also: Set sid in Enrolled tuples that refer to it to a special value null, denoting `unknown’ or `inapplicable’.)

Similar issues arise if primary key of Students tuple is updated.

Page 59: 1.Database System Concepts and Architecture

Integrity Constraints (ICs)

IC: condition that must be true for any instance of the database; e.g., domain constraints.

◦ ICs are specified when schema is defined.

◦ ICs are checked when relations are modified.

A legal instance of a relation is one that satisfies all specified ICs.

◦ DBMS should not allow illegal instances.

If the DBMS checks ICs, stored data is more faithful to real-world meaning.

◦ Avoids data entry errors, too!

Page 60: 1.Database System Concepts and Architecture

Where do ICs Come From?

ICs are based upon the semantics of the real-world that is being described in the database relations.

We can check a database instance to see if an IC is violated, but we can NEVER infer that an IC is true by looking at an instance.

◦ An IC is a statement about all possible instances!

◦ From example, we know name is not a key, but the assertion that sid is a key is given to us.

Key and foreign key ICs are the most common; more general ICs supported too.

Page 61: 1.Database System Concepts and Architecture

Relational Query Languages

A major strength of the relational model:

supports simple, powerful querying of data.

Queries can be written intuitively, and the

DBMS is responsible for efficient evaluation.

◦ The key: precise semantics for relational queries.

◦ Allows the optimizer to extensively re-order

operations, and still ensure that the answer does

not change.

Page 62: 1.Database System Concepts and Architecture

The SQL Query Language

The most widely used relational query

language.

◦ Current std is SQL-2003; SQL92 is a basic

subset that we focus on in this class.

To find all 18 year old students, we can

write:SELECT *FROM Students S

WHERE S.age=18

• To find just names and logins, replace the first line:

SELECT S.name, S.login

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@ee 18 3.2

Page 63: 1.Database System Concepts and Architecture

Querying Multiple Relations

What does the following query compute?SELECT S.name, E.cidFROM Students S, Enrolled E

WHERE S.sid=E.sid AND E.grade='A'

sid cid grade

53831 Carnatic101 C

53831 Reggae203 B

53650 Topology112 A

53666 History105 B

Given the following instance of Enrolled

S.name E.cid

Smith Topology112we get:

Page 64: 1.Database System Concepts and Architecture

Semantics of a Query

A conceptual evaluation method for the previous query:

1. do FROM clause: compute cross-product of Students and Enrolled

2. do WHERE clause: Check conditions, discard tuples that fail

3. do SELECT clause: Delete unwanted fields

Remember, this is conceptual. Actual evaluation will be much more efficient, but must produce the same answers.

Page 65: 1.Database System Concepts and Architecture

Cross-product of Students and Enrolled Instances

S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade

53666 Jones jones@cs 18 3.4 53831 Carnatic101 C

53666 Jones jones@cs 18 3.4 53832 Reggae203 B

53666 Jones jones@cs 18 3.4 53650 Topology112 A

53666 Jones jones@cs 18 3.4 53666 History105 B

53688 Smith smith@ee 18 3.2 53831 Carnatic101 C

53688 Smith smith@ee 18 3.2 53831 Reggae203 B

53688 Smith smith@ee 18 3.2 53650 Topology112 A

53688 Smith smith@ee 18 3.2 53666 History105 B

53650 Smith smith@math 19 3.8 53831 Carnatic101 C

53650 Smith smith@math 19 3.8 53831 Reggae203 B

53650 Smith smith@math 19 3.8 53650 Topology112 A

53650 Smith smith@math 19 3.8 53666 History105 B

Page 66: 1.Database System Concepts and Architecture

Queries, Query Plans, and Operators

System handles query plan

generation & optimization; ensures

correct execution.

SELECT eid, ename, title

FROM Emp E

WHERE E.sal > $50K

SELECT E.loc, AVG(E.sal)

FROM Emp E

GROUP BY E.loc

HAVING Count(*) > 5

SELECT

COUNT DISTINCT (E.eid)

FROM Emp E, Proj P, Asgn A

WHERE E.eid = A.eid

AND P.pid = A.pid

AND E.loc <> P.loc

• Issues: view reconciliation, operator ordering, physical operator choice, memory management, access path (index) use, …

EmployeesProjects

Assignments

Emp

Select

Emp

Group(agg)

Having

Emp

Count distinct

Asgn

Join

Join

Proj

Page 67: 1.Database System Concepts and Architecture

Structure of a DBMS

A typical DBMS has a layered architecture.

The figure does not show the concurrency control and recovery components.

Each system has its own variations.

The book shows a somewhat more detailed version.

You will see the “real deal” in PostgreSQL.

◦ It’s a pretty full-featured example

Next class: we will start on this stack, bottom up.

Query Optimization

and Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

These layers

must consider

concurrency

control and

recovery

Page 68: 1.Database System Concepts and Architecture

Relational Model: Summary A tabular representation of data.

Simple and intuitive, currently the most widely used

◦ Object-relational variant gaining ground

Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations.

◦ Two important ICs: primary and foreign keys

◦ In addition, we always have domain constraints.

Powerful query languages exist.

◦ SQL is the standard commercial one DDL - Data Definition Language

DML - Data Manipulation Language

Page 69: 1.Database System Concepts and Architecture

Chapter 5 69

Storage

The are two general types of storage

media that is used with computers. They

are :

◦ Primary Storage - This includes all storage

media that can be operated on directly by the

CPU (RAM , L1 and L2 Cache Memory)

◦ Secondary Storage - This includes Hard

Drives, CD’s and tape.

Page 70: 1.Database System Concepts and Architecture

Chapter 5 70

Memory Hierarchies & Storage

Devices The Memory Hierarchy is based upon

speed of access. However, this speed

comes with a price tag attached which

varies inversely with the access time of

memory. Like cars the faster the memory

access is the more it costs.

Page 71: 1.Database System Concepts and Architecture

Chapter 5 71

Primary Storage Level of Memory

The Primary Storage Level of Memory is

generally made up of 3 Levels.

◦ L1 Cache which is located on the CPU

◦ L2 Cache which is located near the CPU

◦ Main Memory which is the RAM figure that is

often referred to in computer advertisements

Page 72: 1.Database System Concepts and Architecture

Chapter 5 72

Secondary Storage Level of Memory

The Secondary Storage Level of Memory

may be made up of 4 Levels.

◦ Flash Memory or EEPROM

◦ Hard Drives

◦ CD ROM’s

◦ Tape

Page 73: 1.Database System Concepts and Architecture

Chapter 5 73

Terms Used in the Hardware

Description of Hard Drives Capacity - The number of bytes it can

store.

Single-sided vs. Double-sided - States if

the disk/platter is written on one or both

sides.

Disk Pack - A collection of disks/platters

that are assembled together into a pack.

Track - A Circle of a small width on a disk.

A disk surface will have many tracks.

Page 74: 1.Database System Concepts and Architecture

Chapter 5 74

Terms Used in the Hardware

Description of Hard Drives Sector - A segment or arc of a track.

Block - is the division of a track into equal

sized portions by the operating system.

Interblock Gaps - These are fixed sized

segments that separate the blocks.

Read/Write Head - Actual reads/writes

the information to the disk.

Page 75: 1.Database System Concepts and Architecture

Chapter 5 75

Terms Used in the Hardware

Description of Hard Drives

Cylinder - Tracks with the same diameter

that are located on the disk surface of a

disk pack.

Page 76: 1.Database System Concepts and Architecture

Chapter 5 76

Terms Used in Measuring Disk

Operations

Seek Time (s)- The time it takes to position the

read/write head on the desired track. It will be

given in all problems that it is needed for.

Rotational Delay (rd) - The average amount of

time it takes the desired block to rotate into

position under the read/write head.

Rd=(1/2)*(1/p) min where p is rpm of the disk

Page 77: 1.Database System Concepts and Architecture

Chapter 5 77

Terms Used in Measuring Disk

Operations Transfer Rate (tr) - The rate at which

information can be transferred to or from

the disk. tr =(track size)/(1/p min)

Block Transfer Time (btt) - The time it

takes to transfer the data once the

read/write head has been positioned. btt

= B/tr msec where B is the block size in

bytes.

Page 78: 1.Database System Concepts and Architecture

Chapter 5 78

Terms Used in Measuring Disk

Operations

Bulk Transfer Rate (btr) - The rate at which

multiple blocks can be written/read to

contiguous blocks. Where G is the Interblock

Gap

btr = (B/(B+G)) * tr bytes/msec

Rewrite Time (Trw) - Time it takes after a

block is read to write that same block back to

the disk or the time for one revolution.

Page 79: 1.Database System Concepts and Architecture

Chapter 5 79

Computing Times

Given :

◦ Seek Time (s) = 10 msec

◦ Rotational speed = 3600 rpm

◦ Track size = 50 KB

◦ Block size (B) = 512 bytes

◦ Interblock Gap = 128 bytes

Page 80: 1.Database System Concepts and Architecture

Chapter 5 80

Problems for Disk Operations

Compute the average time it takes to

transfer 1 block on this system.

Compute the average time it takes to

transfer 20 non-contiguous blocks that

are located on the same track.

Compute the average time it takes to

transfer 20 contiguous blocks.

Page 81: 1.Database System Concepts and Architecture

Chapter 5 81

Parallelizing Disk Access Using

RAID RAID - Stands for Redundant Arrays of

Inexpensive Disks or Redundant Arrays of

Independent Disks.

RAIDs are used to provide increased

reliability, increased performance or both.

Page 82: 1.Database System Concepts and Architecture

Chapter 5 82

RAID Levels

Level 0 - has no redundancy and the best

write performance but its read

performance is not as good as level 1.

Level 1 - uses mirrored disks which

provide redundancy and improved read

performance.

Level 2 - provides redundancy using

Hamming Codes

Page 83: 1.Database System Concepts and Architecture

Chapter 5 83

RAID Levels

Level 3 - uses a single parity disk.

Level 4 and 5 - use block-level data

striping with level 5 distributing the data

across all the disks.

Level 6 - uses the P + Q redundancy

scheme making use of the Reed-Soloman

codes to protect against the failure of 2

Disks.

Page 84: 1.Database System Concepts and Architecture

Chapter 5 84

Records

Records is the term used to refer to a

number of related values or items. Each

value or item is stored in a field of a

specific data type.

Records may be of either fixed or variable

lengths.

Page 85: 1.Database System Concepts and Architecture

Chapter 5 85

Variable Length Records in Files

There are several reasons a record with

the same record type may be of variable

length.

◦ Variable length fields

◦ Repeating fields

For efficiency reasons different record

types may be clustered in a file.

Page 86: 1.Database System Concepts and Architecture

Chapter 5 86

Spanned Vs Unspanned Records

When the records in a file is stored on a

disk they may be placed in blocks of a fixed

size. This will rarely match the record size.

So a decision must be made when the

record size is smaller than the block size

and the block size is not a multiple of the

record size whether to store the record all

in one block and have unused space or in

two different blocks.

Page 87: 1.Database System Concepts and Architecture

Chapter 5 87

File Operations

File may either be stored in contiguous blocks or by linking the blocks together. There are advantages and disadvantages to both methods.

Operations on files can be group into two type of operations. Retrieval or update. Retrieval only involves a read while and update involves read, write and modification.

Page 88: 1.Database System Concepts and Architecture

Chapter 5 88

File Structure

Heap (Pile) Files

Hash (Direct) Files

Ordered (Sorted) Files

B - Trees

Page 89: 1.Database System Concepts and Architecture

Chapter 5 89

Once the data has been brought into memory, it can be

accessed by an instruction in .00000004 seconds by a

machine running a 25MIPS. The disparity between time

for memory access and disk access is enormous:we can

perform 625,000 instructions in the time it takes to

read /write one disk page.

To put this in human terms if you were typing a letter

for you boss and found a word you could not make out

so you leave him a voice mail message. Since you were

told to do nothing else but this you patiently wait for

his reply doing Nothing! Unfortunately, he just went on

vacation and does not get your message for 3 WEEKS.

This is similar to the computer waiting .025 seconds to

get the needed data into memory from a disk read.


Recommended