2/9/10
1
Introduc/on to Database Systems
CISC437/637, Lecture #1 Ben CartereAe
1 Copyright © Ben CartereAe
Copyright © Ben CartereAe 2
Physical and logical organiza/on of databases. Data retrieval languages, rela/onal database languages, security and integrity, concurrency, distributed databases.
2/9/10
2
Database Systems
• The overview in 5 Ws (and one H): – What is a database? What is a database management system (DBMS)?
– Why use databases? Why study them?
– Who works with databases? – How does a DBMS work? – Where and when did databases originate?
Copyright © Ben CartereAe 3
What is a Database?
• A database is a collec/on of data – Usually large quan//es of interrelated data • E.g. student records, faculty records, courses, classrooms, payrolls, …
• A database management system (DBMS) is a so]ware system designed to store and manage data
Copyright © Ben CartereAe 4
2/9/10
3
Why Use a DBMS?
“So a bunch of text files on disk can be a database. I’ll just process them with Python. Why do I need to learn about DBMS so]ware?”
• Data too large to fit in memory; files too big for random access on disk
• Arbitrarily complex queries that must be answered quickly • Many users accessing data concurrently
• Some users need different access permissions
Copyright © Ben CartereAe 5
Why Use a DBMS?
• Data independence • Efficient access
• Integrity and security • Access administra/on
• Concurrent access • Applica/on development /me
Copyright © Ben CartereAe 6
2/9/10
4
Why Not Use a DBMS?
• DBMSs are large, complex programs designed for very general data needs and workloads; not always op/mal for specialized tasks
• Applica/on may need to manipulate data in ways not supported by DBMS
• Security, concurrent access, crash recovery may not be cri/cal
• Example: web search
Copyright © Ben CartereAe 7
Why Study Databases?
• Mul/billion dollar industry, second only to opera/ng systems
• Databases form backbone of many informa/on-‐centric applica/ons – Using computa/on to create and understand informa/on
• Implemen/ng and understanding DBMS incorporates knowledge from every area of CS – Systems, theory, ar/ficial intelligence
Copyright © Ben CartereAe 8
2/9/10
5
Applica/ons of Databases
• Electronic commerce and banking – Amazon, eBay, PayPal
– Integra/ng vast catalogs and accounts, high security
• Social networking – Facebook, TwiAer – Analyzing flow of informa/on through large, /ghtly-‐connected networks
Copyright © Ben CartereAe 9
Applica/ons of Databases
• Sensor networks – GPS, RFID, … – O]en supports mission-‐cri/cal applica/ons – Response to failures and trust are important
• Bioinforma/cs, health informa/cs – Gene Ontology, PubMed, … – Requires data integra/on, paAern matching, approximate matching, ranking, automa/c inference
Copyright © Ben CartereAe 10
2/9/10
6
Who Works With Databases?
• DBMS programmers actually implement the DBMS so]ware
• Database administrators design storage requirements, handle security, ensure graceful recovery, tune database performance
• Applica;ons programmers write so]ware that interacts with a database
• End users use the so]ware wriAen by applica/ons programmers
Copyright © Ben CartereAe 11
How Does a DBMS Work?
• This is the focus of the course • Today: a brief overview of the topics that will be covered
1. Data Models 2. Database Queries 3. Transac/on Management 4. DBMS Structure
Copyright © Ben CartereAe 12
2/9/10
7
Data Models
• A data model is a collec/on of concepts for describing data
• A schema is a descrip/on of a par/cular collec/on of data using a given model
• The rela;onal data model is the most commonly used – Rela;ons (tables of records) are the main concept – Every rela/on has a schema that describes the record fields/table columns
Copyright © Ben CartereAe 13
Levels of Abstrac/on
• Physical schema describes the specific files used to store a rela/on on disk
• Conceptual schema defines the logical structure of rela/ons
• Views or external schema describe how users see the data
Copyright © Ben CartereAe 14
Physical Schema
Conceptual Schema
View 1 View 2 View 3
2/9/10
8
Data Independence
• Using an external schema does not require knowledge of conceptual schema – Logical data independence
• Using a conceptual schema does not require knowledge of physical schema – Physical data independence
• In other words, applica/ons are insulated from how data is structured and stored
Copyright © Ben CartereAe 15
Database Queries
• Queries are ques/ons asked of the data • A query language specifies how queries are posed in a specific data model – The language consists of keywords and operators for manipula/ng rela/ons – the data manipula;on language (DML)
• Formula/ng a query does not require knowledge of physical schema
• Allows fast applica/on development – Embed DML in high-‐level language like Java, C, Python
Copyright © Ben CartereAe 16
2/9/10
9
Concurrency Control
• Many databases are used by mul/ple users concurrently – Each user is manipula/ng rela/ons in different ways
– Simultaneous uses can result in inconsistencies • E.g. one is looking up vacancies while another is making a reserva/on
• DBMS ensures that these problems don’t happen
Copyright © Ben CartereAe 17
Transac/on Management
• A transac;on is an atomic sequence of database ac/ons (reads and writes)
• The complete execu/on of each transac/on must leave the database in a consistent state if the database is consistent when it begins – Consistency means no logical conflicts
• User/applica/on formulates integrity constraints for the DBMS to enforce
Copyright © Ben CartereAe 18
2/9/10
10
Scheduling Transac/ons
• DBMS ensures that execu/on of {T1, …, Tn} is equivalent to serial execu/on T1’, …, Tn’ – Locks: before reading or wri/ng, a transac/on requests a lock on an object, and does nothing un/l DBMS grants lock. Locks are released a]er execu/on.
– Use locks to force ordering of unordered transac/ons. – Deadlock: Ti has lock on object A and needs lock on object B. Tj has lock on object B and needs lock on object A.
Copyright © Ben CartereAe 19
Atomicity
• “All or nothing”: an atomic transac/on is one that either completely finishes or does not happen at all
• DBMS needs to maintain atomicity even when it crashes in the middle of transac/ons
• Use a log to keep track of ac/ons DBMS takes to execute transac/on – Write-‐ahead log (WAL) enables this
• Transac/on isn’t done un/l all of its ac/ons are done
Copyright © Ben CartereAe 20
2/9/10
11
Write-‐Ahead Log
• The log consists of the following: – For write ac/ons, the old data and the new data – A flag indica/ng whether the transac/on was commiAed or aborted
• Transac/ons can be undone when commit not present
• Deadlocks can be resolved by abor/ng one transac/on and allowing the other to con/nue
Copyright © Ben CartereAe 21
DBMS Structure
• Layered architecture, each layer only aware of layer below it
Copyright © Ben CartereAe 22
Query op/miza/on & execu/on
Rela/onal operators
Files and access methods
Buffer management
Disk space management
DB
Recovery manager
Transac/on manager
Lock manager
Concurrency control
2/9/10
12
When and Where
• Charles Bachman designed the Integrated Data Store at General Electric in the 1960s
• The network data model, a tree-‐based representa/on designed for explora/on rather than querying
• First Turing Award winner in 1973
Copyright © Ben CartereAe 23
When and Where
• Edgar Codd proposed rela/onal data model in 1970 at IBM
• Quickly became the basis of commercial systems; strong theore/cal founda/on developed
• Turing Award 1981
Copyright © Ben CartereAe 24
2/9/10
13
When and Where
• Jim Gray made fundamental contribu/ons to transac/on management in the 80s and 90s
• Allowed DBMSs to scale to huge applica/ons with thousands or millions of users
• Turing Award 1999 Copyright © Ben CartereAe 25
Summary
• DBMS used to maintain and query large amounts of data
• They allow concurrent access, recovery from failure, fast applica/on development, security
• Levels of abstrac/on mean that one can work on one subproblem without knowing about others
• Huge industry and huge research area in CS
Copyright © Ben CartereAe 26