+ All Categories
Home > Documents > Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office:...

Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office:...

Date post: 17-Jan-2018
Category:
Upload: juniper-sullivan
View: 216 times
Download: 0 times
Share this document with a friend
Description:
Fall What the subject is about Modeling and organization of data Efficient (expressive?) retrieval of data Reliable and consistent storage of data Not surprisingly, all these topics are interrelated.
21
Fall 2002 1 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11
Transcript
Page 1: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 1

CSE330/CIS550: Introduction to Database

Management Systems

Prof. Susan DavidsonOffice: 278 Moore

Office hours: TTh 10-11

Page 2: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 2

Administrative Stuff • What you should know to take this

class.• Handouts: Syllabus and Homework 1.• Resources: Text, TAs, Web site,

bulletin board and office hours.• Coursework: homeworks, exams,

project.• Computer accounts.

Page 3: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 3

What the subject is about• Modeling and organization of data• Efficient (expressive?) retrieval of data• Reliable and consistent storage of

data• Not surprisingly, all these topics are

interrelated.

Page 4: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 4

What is a DBMS?• A database (DB) is a large,

integrated collection of data.• A DB models a real-world

enterprise. • A database management system

(DBMS) is a software package designed to store and manage databases.

Page 5: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 5

Why study databases?• Everybody needs them, i.e. $$$.• There are lots of interesting

problems, both in database research and in implementation.

• Good design is always a challenge.

Page 6: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 6

Connection to otherareas of CS…

• Programming languages and software engineering (obviously)

• Algorithms (obviously)• Logic, discrete math, and theory of

computation • “Systems” issues: concurrency,

operating systems, file organization and networks.

Page 7: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 7

But 80% of the world’s data is not in a DB!

Examples: - scientific data (large images, complex

programs that analyze the data) - personal data- WWW

Page 8: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 8

Why don't we “program up” databases when we need

them? • For simple and small databases

this is often the best solution. Flat files and grep get us a long way.

• We run into problems when– The structure is complicated (more than a

simple table)– The database gets large– Many people want to use it simultaneously

Page 9: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 9

• We might start by building a file with the following structure:

• This text file is easy to deal with. So there's no need for a DBMS!

Example: Personal Calendar

What Day When Who Where

Lunch 10/24 1pm Rick Joe’s DinerCS123 10/25 9am Dr. Egghead Morris234Biking 10/26 9am Jane Jane’s houseDinner 10/26 6PM Jane Café Le Boeuf

Page 10: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 10

Problem 1: Data Organization

• Consider the all-important “who” field. Do we also want to keep e-mail addresses, telephone numbers etc?

• Expand our file to look like:

• Now we are keeping our address book in our calendar and doing so redundantly.

What When Who-name Who-email Who-tel …. Where …

Page 11: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 11

“Link” Calendar with Address Book?

• Two conceptual “entities” -- contact information and calendar -- with a relationship between them, linking people in the calendar to their contact information.

• This link could be based on something as simple as the person's name.

Page 12: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 12

Problem 2: Efficiency• Size of personal address book is probably less

than one hundred entries, but there are things we'd like to do quickly and efficiently. – “Give me all appointments on 10/28”– “When am I next meeting Jim?”

• “Program” these as quickly as possible. • Have these programs executed efficiently. • What would happen if you were using a

corporate calendar with hundreds of thousands of entries?

Page 13: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 13

Problem 3. Concurrency and Reliability

• Suppose other people are allowed access to your calendar and are allowed to modify it? How do we stop two people changing the file at the same time and leaving it in a physical (or logical) mess?

• Suppose the system crashes while we are changing the calendar. How do we recover our work?

Page 14: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 14

Transactions• Key concept for concurrency is that of a

transaction : an atomic sequence of database actions (read/write) on data items (e.g. calendar entry).

• Key concept for recoverability is that of a log : keeping track of all actions carried out by the db.

• Sounds like operating systems all over again!

Page 15: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 15

Database architecture -- the traditional view

It is common to describe databases in two ways:– The logical structure. What users see. The

program or query language interface.– The physical structure. How files are organized.

What indexing mechanisms are used. Further it is traditional to split the logical

level into two components: overall database design (conceptual) and the views that various users get to see.

Page 16: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 16

Three-level architectureView 1 View 2 … View N

Physical Level(file organization, indexing)

Schema Conceptual Level

Page 17: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 17

Data independence• A user of a relational database system should

be able to use SQL to query the database without knowing about how the precisely how data is stored, e.g.

• After all, you don't worry much how numbers are stored when you program some arithmetic or use a computer-based calculator.

SELECT When, WhereFROM CalendarWHERE Who = "Bill"

Page 18: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 18

More on data independence

• Logical data independence protects the user from changes in the logical structure of the data -- could completely reorganize the calendar “schema” without changing how I query it.

• Physical data independence protects the user from changes in the physical structure of data: could add an index on Who without changing how the user would write the query, but the query would execute faster (query optimization).

Page 19: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 19

That's the traditional view, but ...

• Three-level architecture is not always achievable for database programmers. When databases get big, queries must be carefully written to achieve efficiency.

• There are databases over which we have no control. The Web is a giant, disorganized, database.

• There are also well-organized database on the web (e.g., the Movie database) for which the terminology does not quite apply.

Page 20: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 20

In this course...• Study relational databases, their design,

how to query, what forms of indices to use.• Beyond relational algebra: a logical model

of data (Datalog), recursion.• Beyond “first-normal form”: object-

oriented databases, how to query, using OO design techniques.

• XML and semi-structured data models

Page 21: Fall 20021 CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh 10-11.

Fall 2002 21

What we won’t cover in any depth...

• The “technology” of databases: – details of physical design– concurrency control– transaction management– query optimization

(although a few of these issues will be briefly discussed)


Recommended