Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | sylvia-blair |
View: | 214 times |
Download: | 0 times |
2
Topics
Phyical Storage Indexing Query Optimization Making ACID Work
Transactions Concurrency Journaling Rollback/Rollforward Recovery
Distributed Database
3
Physical Storage
Storage Hierarchy Main memory Secondary Storage (Disk) Maybe: Tape, CD, …
Generally: Databases are too big for main memory Databases require non-volatile storage
I.e. not main memory
Buffer Management: Moving data between levels in the storage hierarchy
4
Secondary Storage Organization Database is stored on
One or more disk files One or more disks
Database can be One file One or more files per table A collection of files
DBA specifies how data is spread among files
5
Database Files
Files are organized into blocks or pages Records are stored in pages
May require that records fit in single pages May allow records to span multiple pages Records can be fixed or variable length
Usually variable length Need to handle VARCHAR, BLOBS
See DBMS documentation for: Available strategies for DBMS How to compute record sizes and file sizes
Good rule of thumb: Database size is twice the raw data size
6
Record Organization
Records may be organized In insertion order Sorted by primary key Hashed by primary key
Records may be clustered Records of different types sharing some key are stored
together E.g. store Order and Order Line records together
Best Record Organization? For small problems: Use DBMS default For large problems: Based on characteristics of problem
7
Indexing
Most common indexing scheme: B Tree or B+ Tree B Tree: Balanced Tree, tree has constant depth Can be used for keys and non-unique indices Allows retrieval based on key in O(log n) time (n is
number of records in table) As many I/Os as there are levels in tree (+/- 1)
Allows retrieval of records based on partial value E.g. First field of two field key, but not second field
Allows retrieval of records in sorted order
8
Indexing
Most common second choice: Hashing Can be used for keys and non-unique indices Retrieve records in O(1) time
Typical retrievals in one or two I/Os Can’t use partial values, doesn’t help sorting Sometimes used for clustering of multiple records
Other choices Bit maps Linear orderings Two-level trees
9
What and How to Index
DBMS will impose some constraints Typical: Any unique key (includes primary key) must be
indexed Additional indices:
Make retrieval faster Make update slower
Strategy: Start with minimal set of indices and build up from there Adding and dropping indices has relatively small cost in an
RDBMS
10
Query Optimization Take a query (say an SQL Select)
Rewrite it into Relational Algebra Figure out best way to answer that query
Lowest cost?, Fastest? Frequently, what is best way to do joins
Have collection of available strategies File scans, Use indices, External file sorts, Parallel
processing, etc., etc. Most RDBMSs have mechanism to describe the
strategy for a specific query Use to analyze problematic queries
11
Heuristic Query Optimization
Use heuristics to choose best approach Sample heuristics:
Use an index if possible Do joins in order given in FROM clause
Problem: Bad choices can be really bad Advantage: Usually allow knowledgeable
user to get good behavior by specifying query the “right” way Problem: The “right” way may change over time
and/or as database grows
12
Cost Based Query Optimization Estimate cost of specific strategies Search space of possible strategies for “best” answer Issue: Need accurate cost estimation
Requires statistics about size, composition of database DBA responsible for periodically running statistics gathering
and updating programs Advantage: Strategy can change as database
changes Advantage: Bad choices are usually not too bad Problem: Harder for knowledgeable user to control
problematic queries
13
ACID Properties
Atomicity: Transactions either complete successfully or have no effect on database
Consistency: Database moves from consistent state to consistent state
Isolation: Transactions that overlap in time are non-interfering Ideal is Serializability: Overlapping transactions behave as
if they were executed in some serial order Durability: Data from completed transactions is
never lost
14
Transactions
Queries, and most importantly updates, by a single application are collected together into a transaction
DBMS provides transaction mechanism Application specifies transaction boundaries Example:
Adding an order is a transaction Insertion of Order and Order Line records are
collected together
15
Concurrency
DBMS must provide mechanisms to prevent multiple transactions from concurrently modifying same parts of database
DBMS can provide mechanisms for “repeatable reads” Does application get same results if SELECT is
repeated Due to performance issues, application usually
has to ask for repeatable reads
16
Some Concurrency Mechanisms Locks
Read and write locks on records and/or pages and/or tables
Lock escalation Read locks become write locks Record and/or page locks become file locks
Optimistic Assume everything is ok Check at transaction completion that everything worked
Versioned pages Updates create new versions of pages Old versions kept around as long as needed
17
Concurrency and Transactions Have mechanism for application to tell DBMS
to begin and end transactions Must have mechanism for DBMS to tell
application that “it didn’t work” “You can’t update that record because another
application has it locked” “Your transaction can’t complete because another
already completed one conflicts” Application is responsible for re-trying as
appropriate
18
Journaling DBMS keeps journal(s) of what has been changed
in the database Journal has
“Before” images: What the database looked like before changes were made
“After” images: What the database looked like after the changes were made
Before and After images may be stored separately or together
Depending on algorithms: Data and/or Journal must be written to non-volatile storage before transaction is complete Needed for durability
DBA is responsible for journal management
19
Rollback/Rollforward
Rollback undoes updates from incomplete transactions Uses “before” images Why?
Transaction could not finish Transaction failed before End Transaction Database failed before transaction completed
Rollforward redoes updates from completed transactions Uses “after” images Why?
Database failed before updates from completed transactions written to secondary storage
20
Recovery
On restart after failure: Perform rollback and/or rollforward as required to return database to consistent state
Consistent state: All updates from all completed transactions
appear in database No results from any incomplete transactions
appear in database Provide ability to restore database from a
saved copy and the journal
21
ACID Properties and Mechanisms
Atom
icity
Co
nsiste
ncy
Isolation
Du
rability
Transactions
Concurrency
Journaling
Rollback /
Rollforward
Recovery
DBMS provides mechanisms to support ACID properties
Applications must direct DBMS properly
Algorithms for ACID mechanisms are well known
Implementing ACID mechanisms is tricky
22
Distributed Database
Replication Replicate changes between multiple copies of the database Frequently uses deferred copying
Clustering Running database on a cluster Needs distributed concurrency mechanisms
Two phase commit: Reliable transaction commit in distributed environment
Would like to take advantage of parallel resources to speed query processing