+ All Categories
Home > Documents > Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all...

Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all...

Date post: 27-Dec-2015
Category:
Upload: ferdinand-mccormick
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
62
Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved
Transcript
Page 1: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Database System Architecture and Performance

CSCI 6442

©Copyright 2015, David C. Roberts, all rights reserved

Page 2: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

2

Agenda

Database performance goals DBMS use of disk Searching B-trees DBMS Architecture

Page 3: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

3

DBMS Architecture

Data is stored on disk Disk is necessary for database to be

reliably available Disk is millions of times slower than

anything that happens in RAM Number of disk accesses is a good

measure of DBMS cost for an operation

Page 4: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

4

Disk

o Disk is composed of fixed-length records, rotating around

o To access information, we need to move the head and wait for the disk to rotate

o We wait the same time whether we use one byte or all the record

o We call this fixed length record a page

Page 5: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

5

Efficient Use of Disk

For efficient use of disk, we want to use all the information contained in a single page

We will look at how we organize disk in order to reduce the number of disk accesses for a search

Page 6: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

6

Disk vs. RAM

RAM is accessible in any order Any sort of structures can be used Data structure courses usually cover

data structures for RAM We’ll talk about how to make efficient

use of disk

Page 7: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

7

Disk as Pages

Disk is composed of fixed-length records, rotating around

To access information, we need to move the head and wait for the disk to rotate

We wait the same time whether we use one byte or all the record

We call this fixed length record a page

Page 8: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

8

Search Methods

Linear search Binary search Binary tree-structured search N-ary trees B-trees Hashing

Page 9: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

9

Linear Search

Elements are stored in arrival order Search starts at the beginning,

continues until desired value is found Average number of accesses for n

elements is approximately n/2

Page 10: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

10

Binary Search

Elements are stored in order by value to be searched

Search starts at midpoint With each probe, half of candidates

are removed Average number of probes is log2n

Page 11: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

11

Disadvantages of Binary Search

Elements must be kept in order Inserting one element may require

reorganization of entire list If stored, search jumps from bucket to

bucket

Page 12: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

12

Using Linked Structure for Binary Search

Using links we can separate physical organization from search sequence

Avoids possible need to reorganize the entire store because of a single update

Accelerates update, still allows fast search

Page 13: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

13

Example Binary Search Tree

Page 14: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

14

Problems with Binary Search Tree

Each node is likely to be on a different page, making inefficient use of disk accesses

What if, instead of just one key at each node, we could store a whole page full of keys?

Then we would use disk efficiently and have a very shallow tree

Page 15: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

15

Balance

Page 16: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

16

Balance

A tree is said to be balanced if the length of all the paths from the root to the leaves differ by no more than one.

Page 17: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

17

B-tree

We allow nodes to be incompletely filled in order to maintain perfect balance

We grow the tree from the bottom; when a node is over-full we split it and put an added node one level up

Deletions are the reverse of additions

Page 18: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

18

B-tree

Page 19: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

19

B-tree

Data Store

We understand that with each entry there is an address in storage. Having understood that, we omit them from the rest of the diagrams

Page 20: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

20

B-tree

Page 21: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

21

B-tree

1

Page 22: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

22

B-tree

1 4

Page 23: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

23

B-tree

1 4 6

Page 24: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

24

B-tree

1 4 6 8

Page 25: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

25

B-tree

5

1 4 6 8

Page 26: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

26

B-tree

1 4

5

6 8

When a node is full, to add a value we split the node and put the middle value in the level above.

Page 27: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

27

How It Really Looks

Page 28: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

28

B-tree questions

How large should node size be? How many values should it contain?

Are the values indexed by a b-tree properly called keys?

How full are b-tree nodes, on the average, after the system has been operating for a while?

Page 29: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

29

B-plus tree

B+ trees have all indexed values represented in the leaves

Other nodes do not have pointers to rows, only pointers to other nodes

B+ trees provide very high density of indexes

Page 30: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

30

B+ tree

Index Set

Sequence Set

Page 31: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

31

B+ Tree Add Algorithm

The insert algorithm for B+ Trees

Leaf Page Full

Index Page FULL

Action

NO NO Place the record in sorted position in the appropriate leaf page

YES NO

1. Split the leaf page 2. Place Middle Key in the index page in sorted order. 3. Left leaf page contains records with keys below the middle key. 4. Right leaf page contains records with keys equal to or greater than

the middle key.

YES YES

1. Split the leaf page. 2. Records with keys < middle key go to the left leaf page. 3. Records with keys >= middle key go to the right leaf page. 4. Split the index page. 5. Keys < middle key go to the left index page. 6. Keys > middle key go to the right index page. 7. The middle key goes to the next (higher level) index.

IF the next level index page is full, continue splitting the index pages.

Page 32: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

32

B+ Tree Delete Algorithm

The delete algorithm for B+ Trees

Leaf Page Below

Fill Factor

Index Page Below

Fill Factor

Action

NO NODelete the record from the leaf page. Arrange keys in ascending order to fill

void. If the key of the deleted record appears in the index page, use the next key to replace it.

YES NOCombine the leaf page and its sibling. Change the index page to reflect the

change.

YES YES

1. Combine the leaf page and its sibling. 2. Adjust the index page to reflect the change. 3. Combine the index page with its sibling.

Continue combining index pages until you reach a page with the correct fill factor or you reach the root page.

Page 33: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

33

Hashing

Develop a function that maps data values into a range of storage addresses

For each search value, use a function to compute a hash value and store the associated data at that address

To search, just compute the hash value and look at that address

Page 34: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

34

Hashing

Instead of storing the data at the hash address, store a pointer to the data

The table of pointers is called a hash table

Using hashing for a search locates a stored value in just one access

Number of accesses to locate a value is independent of n

Page 35: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

35

Hashing Question

Why are b-trees the most used index method for database systems and not hashing, given that hashing is faster?

Hint—think about the disadvantages of hashing

Page 36: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Database System Architecture

Page 37: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

37

DBMS and Applications

ApplicationProgram

Buffer

ApplicationProgram

Buffer

ApplicationProgram

Buffer

ApplicationProgram

Buffer

DatabaseManagement

System

Page 38: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

38

DBMS Software Architecture

ApplicationProgram

Buffer

ApplicationProgram

Buffer

ApplicationProgram

Buffer

ApplicationProgram

Buffer

SystemGlobalArea

DatabaseSystem

Page 39: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

39

Database System Architecture

Lexical Analyzer

Syntax Analyzer

SQL Tokens

Executor

Quads

Results

Page 40: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

40

Executor Software Architecture

SQL Executor

Table Management

Row Management

Page Management

Node Management

Index Management

Data Store

Page 41: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

41

DBMS and Applications

ApplicationProgram

Buffer

ApplicationProgram

Buffer

ApplicationProgram

Buffer

ApplicationProgram

Buffer

DatabaseManagement

System

Page 42: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

42

DBMS Software Architecture

ApplicationProgram

Buffer

ApplicationProgram

Buffer

ApplicationProgram

Buffer

ApplicationProgram

Buffer

SystemGlobalArea

DatabaseSystemCode

Page 43: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

43

Inside the Database System

Lexical Analyzer

Syntax Analyzer,

CodeGenerator

SQL Tokens

Executor

Quads

Results

Page 44: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

44

Executor Software Architecture

SQL Executor

Table Management

Row Management

Page Management

Node Management

Index Management

Data Store

Page 45: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Pages

Disk is divided into physical records called “pages”

A page can be an index page (ie b-tree) or a data page

Index page contains one node of a b-tree

Data page contains rows of tables

45

Page 46: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Page Allocation

Pages are initially considered all unallocated

In response to requests, they are allocated and marked allocated

When freed, they are chained onto a list of free pages

46

Page 47: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Database Extents

Database needs to be able to extend over disk boundaries Size may require it Growth may require it

Typically it’s managed as “extents”, each of which is a file to the OS file system

Multiple files are mapped into a single sequence of page IDs

47

Page 48: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

48

Extents

SQL Executor

Table Management

Row Management

Page Management

Node Management

Index Management

Data Store

Extent Management

Page 49: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

The Database

Extent 3

Extent 2

Extent 1

Row

<<tid>,<rid>,<cid><cli><cv>, … ,<cid>,<cli>,<cv>, … >>49

Page 50: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Startup

At startup, DBMS creates an empty system catalog

Catalog has images of some tables; once images are established, then SQL can be used to create other tables

50

Page 51: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

System Catalog

System Catalog tells you how the database system works

When the system starts with a new database, it lays down part of the system catalog from an image

The rest of the system catalog is created by SQL statements

Many SQL statements reference or change the system catalog

51

Page 52: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Database Performance

52

Page 53: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Join Processing

For a non-join query be sure there are indexes on columns used in predicates

Joins are the issue in database performance

We need to understand how they are performed so that we can make them efficient

53

Page 54: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

“Optimization”

More properly called access path selection “Optimizer” selects a strategy for processing Approaches:

Cost-based: estimate total cost to process by different approaches, choose lowest estimate

Heuristic: use rules to decide how to process Cost-based is typically used by all database

systems today

54

Page 55: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

The Optimizer

Selects indexes to use Chooses the order of using indexes Chooses algorithms to use Decides when to apply predicates

55

Page 56: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Classes of Predicates

Predicate: condition in the WHERE clause

Predicates are combined using AND, OR to make WHERE clauses

Classes of predicates: Sargable: search arguments that can be

processed close to the data Residual: not sargable, such as complex

use of nesting

56

Page 57: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Access Paths

Five possible access paths:Table scanNon-selective index scanSelective index scanIndex only accessFully qualified unique index

Each of these types of scans has different cost estimates for its use

57

Page 58: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Predicate Selectivity

Selectivity function f(p): % of rows retrieved on average by predicate p

Number of rows retrieved is strongly related to cost n = number of rows in table

58

Form of P f

column = value 1/n

column != value 1-1/n (nearly 1)

column > value (high value - search value)/high value - low value)

p1 or p2 f(p1) + f(p2)

p1 and p2 f(p1) * f(p2)

Page 59: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Join Processing

Cartesian Product: for each row of inner table, inspect join value for every row of outer table. n2 operations

Nested loop: for each row of inner table, use index to retrieve matching rows of outer table. > 2n operations

Merge join: single pass through indexes on join columns for both tables. 2n operations

59

Page 60: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Join Order

For JOIN queries, the “outer” table is access first, “inner” second

Order for joining tables must be selected

Most selective first Least costly joins first

60

Page 61: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Query Transformations

Queries and subqueries may be transformed

We’ll ignore this for now, look at the bigger picture

61

Page 62: Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Database Statistics

System catalog includes various database statisticsMax, min valuesCardinality of each tableData distribution

Statistics must be updated

62


Recommended