+ All Categories
Home > Documents > CAS CS 460 Introduction to Database Systems Thanks to Prof. George Kollios, Boston University and...

CAS CS 460 Introduction to Database Systems Thanks to Prof. George Kollios, Boston University and...

Date post: 19-Dec-2015
Category:
View: 220 times
Download: 3 times
Share this document with a friend
Popular Tags:
34
CAS CS 460 CAS CS 460 Introduction to Database Introduction to Database Systems Systems Thanks to Prof. George Kollios, Boston University and Prof. Mitch Cherniack Brandeis University for lecture materials
Transcript

CAS CS 460CAS CS 460Introduction to Database SystemsIntroduction to Database Systems

Thanks to Prof. George Kollios, Boston University and Prof. Mitch Cherniack Brandeis University for lecture materials

1.2

About the course – AdministriviaAbout the course – Administrivia

Instructor: Ravi Kothuri, [email protected]

Office, Hours: MCS 147, Mon/Wed 5-6PM and 7:30-8PM

Teaching Fellow: Panagiotis Papapetrou, [email protected]

MCS 147, Tue/Thu 11 - 12:30 AM

Home Page: http://www.cs.bu.edu/rkothuri

Check frequently! Syllabus, schedule, assignments, announcements…

1.3

GradingGrading

Homeworks: 20% 4-5 assignments

Midterm 20%

Final 30%

Projects 30% 5-6 parts

1.4

My BackgroundMy Background

Oracle Corporation PhD from University of California, Santa Barbara Research:

Multi-dimensional indexing Mobile Databases Spatial, GIS systems and CAD/CAM databases

Google Maps type of technologies for Enterprise Geometric algorithms for terrain management, city modeling,…

Data Mining (spatial, financial, …) RFID technologies Semantic –web (RDF) technologies

Book: “Pro Oracle Spatial”, Nov 2004 Teaching on invitation from Prof. George Kollios

1.5

Who uses Databases?Who uses Databases?

Universities (records for students, faculty, courses,…

Airlines (passengers, flights, luggage, …)

Banking (customers, loans, …)

Utilities (customers, usage history, bills); e.g. telecom, electric,..

Any Company: human resources Employees, depts, facilities,…

“Data is the primary and integral part of information industry. Proper

management of the data using database technology is essential

for any large-scale company, organization.’’

1.6

What is a Database What is a Database SystemSystem??

Database:

A very large collection of related data

Models a real world enterprise: Entities (e.g., teams, games / students, courses)

Relationships (e.g., The Patriots are playing in the Superbowl)

Even active components (e.g. “business logic”)

DBMS: A software package/system that can be used to store, manage and retrieve data form databases

Database System: DBMS+data (+ applications)

1.7

Why Study Databases??Why Study Databases??

Shift from computation to information Always true for corporate computing

More and more true in the scientific world

and of course, Web

DBMS encompasses much of CS in a practical discipline OS, languages, theory, AI, logic

1.8

Managing data: A naïve approachManaging data: A naïve approach

Why not store everything on flat files: use the file system of the OS, cheap/simple…

Name, Course, Grade

John Smith, CS112, B

Mike Stonebraker, CS234, A

Jim Gray, CS560, A

John Smith, CS560, B+

…………………

Yes, but not scalable… Filesize limitations, access/update performance is slow,..

1.9

Problem 1Problem 1

Data redundancy and inconsistency Multiple file formats, duplication of information in different files

(say, in different departments)

John Smith, [email protected], CS112, B

John Smith, Arts560, [email protected], B+

Smith J, [email protected], Math212, A

Why is this a problem?

Wasted space

Potential inconsistencies (multiple formats, John Smith vs Smith J.)

1.10

Problem 2Problem 2

Data retrieval: Find the students who took CS560

Find the students with GPA > 3.5

For every query we need to write a program!

Need a Query/Retrieval engine that can support different ways to access data Easy to write

Execute efficiently

1.11

Problem 3Problem 3

Data Integrity

No support for sharing:

Prevent simultaneous modifications

No coping mechanisms for system crashes

No means of Preventing Data Entry Errors (checks must be hard-coded in the programs)

Security problems

1.12

Database SystemsDatabase Systems

Database systems offer solutions to all the mentioned problems

Database systems: Support Modeling of the data

Provide Levels of Abstraction of the data

Provide programs to allow you to Retrieve/modify the data

SQL

• For easy, standard specification of queries

Query Optimizer

• To process your queries efficiently

Ensure Integrity Maintenance

Transaction Manager/Recovery Manager

• to ensure atomicity/integrity in concurrent transactions

• to ensure integrity after system crashes)

…………….

1.13

Database SystemsDatabase Systems

Data Modeling

Levels of Abstraction

Data Retrieval

Data Modification/Integrity Maintenance

1.14

Data ModelData Model

A framework for describing data data relationships data semantics data constraints

Entity-Relationship model (Ch. 6) A set of entities to model real-world objects

Relationships among entities

Relational model Data as a set (or sets) of “records” or “tuples”

Each tuple in the set has the same set of attributes

Other models: object-oriented model: inheritance, abstraction,… semi-structured data models, XML: tuples in a set can have different attributes

1.15

Entity-Relationship ModelEntity-Relationship Model

Example of schema in the entity-relationship model

1.16

Entity Relationship Model (Cont.)Entity Relationship Model (Cont.)

E-R model of real world Entities (objects)

E.g. customers, accounts, bank branch

Relationships between entities

E.g. Account A-101 is held by customer Johnson

Relationship set depositor associates customers with accounts

Widely used for database design Database design in E-R model usually converted to design in the

relational model (coming up next) which is used for storage and processing

1.17

Relational ModelRelational Model

Example of tabular data in the relational model

customer-name

Customer-idcustomer-street

customer-city

account-number

Johnson

Smith

Johnson

Jones

Smith

192-83-7465

019-28-3746

192-83-7465

321-12-3123

019-28-3746

Alma

North

Alma

Main

North

Palo Alto

Rye

Palo Alto

Harrison

Rye

A-101

A-215

A-201

A-217

A-201

Attributes

1.18

Database SystemsDatabase Systems

Data Modeling

Levels of Abstraction

Data Retrieval

Data Modification/Integrity Maintenance

1.19

Levels of AbstractionLevels of Abstraction

Data storage Involves Complex data structures Hide complexity from users

Abstract views of the data (e.g., for storing a customer record) Physical level: how a customer record is stored as

bytes/words on disk

• Mostly hidden from database users/programmers Logical level: describes “types” inside the database

type customer = recordname : string;street : string;city : integer;

ssn; integer;

end;

View level: application programs hide details of data types. Views can also hide information (e.g., ssn) for security purposes.

1.20

View of DataView of Data

A logical architecture for a database system

1.21

Physical Level: Data OrganizationPhysical Level: Data Organization

Data Storage (Ch 11)

Where can data be stored? Main memory

Secondary memory (hard disks)

Optical store

Tertiary store (tapes)

Move data? Determined by buffer manager

Mapping data to files? Determined by file manager

1.22

Database ArchitectureDatabase Architecture(physical level data organization)(physical level data organization)

DBA

DDL Interpreter

Buffer ManagerFile Manager

Data

Schema

DDL Commands

Metadata

Storage Manager

Secondary Storage

1.23

Database SystemsDatabase Systems

Data Modeling

Levels of Abstraction

Data Retrieval

Data Modification/Integrity Maintenance

1.24

Data retrievalData retrieval

Queries (Ch 3, 4)

Query = Declarative data retrieval

describes what data, not how to retrieve it

Ex. Give me the students with GPA > 3.5 vs

Scan the student file and retrieve the records with gpa>3.5

Why?

1. Easier to write

2. Efficient to execute

1.25

Data retrievalData retrieval

Query Optimizer“compiler” for queries (aka “DML Compiler”)

Plan ~ Assembly Language Program

Optimizer Does Better With Declarative Queries:

1. Algorithmic Query (e.g., in C) 1 Plan to choose from2. Declarative Query (e.g., in SQL) n Plans to choose from

Query Optimizer Query Evaluator

Query

Plan

Data

Query Processor

1.26

Specifying the Query using SQLSpecifying the Query using SQL

SQL: widely used (declarative) non-procedural language E.g. find the name of the customer with customer-id 192-83-7465

select customer.customer-namefrom customerwhere customer.customer-id = ‘192-83-7465’

E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465

select account.balancefrom depositor, accountwhere depositor.customer-id = ‘192-83-7465’ and depositor.account-number = account.account-

number

Procedural languages: C++, Java, relational algebra

1.27

Data retrieval: Data retrieval: Indexing (Ch 12)Indexing (Ch 12)

How to answer fast the query: “Find the student with SID = 101”?

One approach is to scan the student table, check every student, retrurn the one with id=101… very slow for large databases

Any better idea?

1st keep student record over the SID. Do a binary search…. Updates…2nd Use a dynamic search tree!! Allow insertions, deletions, updates and at the same time keep the records sorted! In databases we use the B+-tree (multiway search tree) 3rd Use a hash table. Much faster for exact match queries… but cannot support Range queries. (Also, special hashing schemes are needed for dynamic data)

1.28

Root

B+Tree Example B=4

120

150

180

30

100

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

1.29

Database ArchitectureDatabase Architecture(data retrieval)(data retrieval)

DBA

Query OptimizerDDL Interpreter

Query Evaluator

Buffer Manager

File Manager

Data

Schema

DDL Commands

User

Query

DB Programmer

DML Precompiler

Code w/ embedded queries

Statistics

Indices Metadata

Query Processor

Storage Manager

Secondary Storage

1.30

Database SystemsDatabase Systems

Data Modeling

Levels of Abstraction

Data Retrieval

Data Modification/Integrity Maintenance

1.31

Data IntegrityData Integrity Transaction processing (Ch 15, 16)Transaction processing (Ch 15, 16)

Why Concurrent Access to Data must be Managed?

John and Jane withdraw $50 and $100 from a common account…

Initial balance $300. Final balance=?

It depends…

John: 1. get balance 2. if balance > $50 3. balance = balance - $50 4. update balance

Jane: 1. get balance 2. if balance > $100 3. balance = balance - $100 4. update balance

1.32

Data IntegrityData IntegrityRecovery (Ch 17)Recovery (Ch 17)

Transfer $50 from account A ($100) to account B ($200)

1. get balance for A

2. If balanceA > $50

3. balanceA = balanceA – 50

4.Update balanceA in database

5. Get balance for B

6. balanceB = balanceB + 50

7. Update balanceB in database

System crashes….

Recovery management

1.33

Database ArchitectureDatabase Architecture

DB Programmer

User DBA

DML Precompiler Query OptimizerDDL Interpreter

Query Evaluator

Buffer Manager

File Manager

Data

Statistics

Indices

Schema

DDL CommandsQueryCode w/ embedded queries

Transaction ManagerRecovery Manager

Metadata

Integrity Constraints

Secondary Storage

Storage Manager

Query Processor

1.34

OutlineOutline

1st half of the course: application-oriented How to develop database applications: User + DBA

2nd part of the course: system-oriented Learn the internals of a relational DBMS (Oracle..)

Last few lectures on Oracle-specific features such as XDB….


Recommended