1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.

1

CS 430Database Theory

Winter 2005

Lecture 16: Inside a DBMS

2

Topics

Phyical Storage Indexing Query Optimization Making ACID Work

Transactions Concurrency Journaling Rollback/Rollforward Recovery

Distributed Database

3

Physical Storage

Storage Hierarchy Main memory Secondary Storage (Disk) Maybe: Tape, CD, …

Generally: Databases are too big for main memory Databases require non-volatile storage

I.e. not main memory

Buffer Management: Moving data between levels in the storage hierarchy

4

Secondary Storage Organization Database is stored on

One or more disk files One or more disks

Database can be One file One or more files per table A collection of files

DBA specifies how data is spread among files

5

Database Files

Files are organized into blocks or pages Records are stored in pages

May require that records fit in single pages May allow records to span multiple pages Records can be fixed or variable length

Usually variable length Need to handle VARCHAR, BLOBS

See DBMS documentation for: Available strategies for DBMS How to compute record sizes and file sizes

Good rule of thumb: Database size is twice the raw data size

6

Record Organization

Records may be organized In insertion order Sorted by primary key Hashed by primary key

Records may be clustered Records of different types sharing some key are stored

together E.g. store Order and Order Line records together

Best Record Organization? For small problems: Use DBMS default For large problems: Based on characteristics of problem

7

Indexing

Most common indexing scheme: B Tree or B+ Tree B Tree: Balanced Tree, tree has constant depth Can be used for keys and non-unique indices Allows retrieval based on key in O(log n) time (n is

number of records in table) As many I/Os as there are levels in tree (+/- 1)

Allows retrieval of records based on partial value E.g. First field of two field key, but not second field

Allows retrieval of records in sorted order

8

Indexing

Most common second choice: Hashing Can be used for keys and non-unique indices Retrieve records in O(1) time

Typical retrievals in one or two I/Os Can’t use partial values, doesn’t help sorting Sometimes used for clustering of multiple records

Other choices Bit maps Linear orderings Two-level trees

9

What and How to Index

DBMS will impose some constraints Typical: Any unique key (includes primary key) must be

indexed Additional indices:

Make retrieval faster Make update slower

Strategy: Start with minimal set of indices and build up from there Adding and dropping indices has relatively small cost in an

RDBMS

10

Query Optimization Take a query (say an SQL Select)

Rewrite it into Relational Algebra Figure out best way to answer that query

Lowest cost?, Fastest? Frequently, what is best way to do joins

Have collection of available strategies File scans, Use indices, External file sorts, Parallel

processing, etc., etc. Most RDBMSs have mechanism to describe the

strategy for a specific query Use to analyze problematic queries

11

Heuristic Query Optimization

Use heuristics to choose best approach Sample heuristics:

Use an index if possible Do joins in order given in FROM clause

Problem: Bad choices can be really bad Advantage: Usually allow knowledgeable

user to get good behavior by specifying query the “right” way Problem: The “right” way may change over time

and/or as database grows

12

Cost Based Query Optimization Estimate cost of specific strategies Search space of possible strategies for “best” answer Issue: Need accurate cost estimation

Requires statistics about size, composition of database DBA responsible for periodically running statistics gathering

and updating programs Advantage: Strategy can change as database

changes Advantage: Bad choices are usually not too bad Problem: Harder for knowledgeable user to control

problematic queries

13

ACID Properties

Atomicity: Transactions either complete successfully or have no effect on database

Consistency: Database moves from consistent state to consistent state

Isolation: Transactions that overlap in time are non-interfering Ideal is Serializability: Overlapping transactions behave as

if they were executed in some serial order Durability: Data from completed transactions is

never lost

14

Transactions

Queries, and most importantly updates, by a single application are collected together into a transaction

DBMS provides transaction mechanism Application specifies transaction boundaries Example:

Adding an order is a transaction Insertion of Order and Order Line records are

collected together

15

Concurrency

DBMS must provide mechanisms to prevent multiple transactions from concurrently modifying same parts of database

DBMS can provide mechanisms for “repeatable reads” Does application get same results if SELECT is

repeated Due to performance issues, application usually

has to ask for repeatable reads

16

Some Concurrency Mechanisms Locks

Read and write locks on records and/or pages and/or tables

Lock escalation Read locks become write locks Record and/or page locks become file locks

Optimistic Assume everything is ok Check at transaction completion that everything worked

Versioned pages Updates create new versions of pages Old versions kept around as long as needed

17

Concurrency and Transactions Have mechanism for application to tell DBMS

to begin and end transactions Must have mechanism for DBMS to tell

application that “it didn’t work” “You can’t update that record because another

application has it locked” “Your transaction can’t complete because another

already completed one conflicts” Application is responsible for re-trying as

appropriate

18

Journaling DBMS keeps journal(s) of what has been changed

in the database Journal has

“Before” images: What the database looked like before changes were made

“After” images: What the database looked like after the changes were made

Before and After images may be stored separately or together

Depending on algorithms: Data and/or Journal must be written to non-volatile storage before transaction is complete Needed for durability

DBA is responsible for journal management

19

Rollback/Rollforward

Rollback undoes updates from incomplete transactions Uses “before” images Why?

Transaction could not finish Transaction failed before End Transaction Database failed before transaction completed

Rollforward redoes updates from completed transactions Uses “after” images Why?

Database failed before updates from completed transactions written to secondary storage

20

Recovery

On restart after failure: Perform rollback and/or rollforward as required to return database to consistent state

Consistent state: All updates from all completed transactions

appear in database No results from any incomplete transactions

appear in database Provide ability to restore database from a

saved copy and the journal

21

ACID Properties and Mechanisms

Atom

icity

Co

nsiste

ncy

Isolation

Du

rability

Transactions

Concurrency

Journaling

Rollback /

Rollforward

Recovery

DBMS provides mechanisms to support ACID properties

Applications must direct DBMS properly

Algorithms for ACID mechanisms are well known

Implementing ACID mechanisms is tricky

22

Distributed Database

Replication Replicate changes between multiple copies of the database Frequently uses deferred copying

Clustering Running database on a cluster Needs distributed concurrency mechanisms

Two phase commit: Reliable transaction commit in distributed environment

Would like to take advantage of parallel resources to speed query processing

Date post:	03-Jan-2016
Category:	Documents
Upload:	sylvia-blair
View:	214 times
Download:	0 times

1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.

Documents