Group Project B- Tree Student: Yongsheng Ma

Post on 15-Jan-2016

23 views 3 download

description

CS632 – Algorithm Professor: G. Gibson. Group Project B- Tree Student: Yongsheng Ma. B-Tree. Introduction Operations Complexities Applications Summary. B-Tree Properties. A m-way search way Root node may have as few as two children or none if the tree is empty Root may be a leaf - PowerPoint PPT Presentation

transcript

Group Project

B-Tree

Student: Yongsheng Ma

CS632 – AlgorithmProfessor: G. Gibson

B-Tree

Introduction Operations Complexities Applications Summary

B-Tree Properties

A m-way search way Root node may have as few as two

children or none if the tree is empty Root may be a leaf Internal nodes have at least ceiling(m/2)

and at most m non-null sub-trees

B-Tree Properties

All leaf nodes are at the same level; that is, the tree is perfectly balanced.

A leaf node has at least ceiling(m/2)-1 entries (keys) and at most m-1 entries (keys).

B-Tree Properties

“branching factor ” can be quite large. Each node may have many children, from

a handful to thousands. The keys in each node is in non-

decreasing order.

Operations

Searching a key Inserting a key Splitting a node Deleting a node

Searching a key

Much like searching a binary tree. Make a multi-way branching decision at

each node The nodes encountered form a path

downward from the root.

Searching a key

The number of pages accessed is (h)=(logtn) , in which h is the height and n is the number of keys.

CPU time is O(th)=O(t logtn) . Note

t is minimum degree for B-tree. So each node has the maximum number of children

as 2t and entries(keys) as 2t-1.

Searching a key

M

HD XTQ

GFCB LKJ PN WVSR ZY

Creating a empty tree

We can assume there is no disk read.

Allocates one disk page to be used as a new node in O(1) time.

Splitting a node

A fundamental operation used during insertion

The median key moves up into its parent node, which must be non-full.

If it has no parent, then the tree grows in height by one

Splitting a node

WN… …… …

SRQ TP U V

SN… W… … …

RQ TP U V

t=4

Splitting a node

HFD LA N PFD LA N P

t=4

H

Inserting a key

Requiring O(h) disk accesses. CPU time O(th)=O(t logtn) .

Inserting a key

Splitting the root is the only way to increase the height of a B-tree.

Unlike a binary tree, a B-tree increases in height at the top instead of the bottom .

Inserting a key

EDC JA K RN O Y ZVUTS

XPMG(a) initial tree

t=3

Inserting a key

EDC JB K RN O Y ZVUTS

XPMG(b) B inserted

A

t=3

Inserting a key

EDC JB K QN O Y ZVUSR

TPMG(c) Q inserted

A

X

t=3

Inserting a key

EDC JB K QN O Y ZVUSR

T

P

MG

(d) L inserted

A

X

L

t=3

Inserting a key

ED

C

JB K QN O Y ZVUSR

T

P

MG

(e) F inserted

A

X

LF

t=3

Deleting a key

is analogous to insertion but is a little more complicated.

Exists various cases of deleting keys from B-tree.

Deleting a key

Different conditions can affect different behaviors.

In practice, deletion operations are most often used to delete keys from leaves.

Deleting a key

When deleting a key from an internal node, however, the procedure makes a downward pass through the tree but may have to return to the node from which the key was deleted to replace the key with its predecessor or successor.

Deleting a key

Although this procedure seems complicated, it involves only O(h) disk operations for a B-tree with height h.

The CPU time required is

O(th)=O(t logtn) .

Deleting a key

ED

C

JB K QN O Y ZVUSR

T

P

MG

(a) Initial tree

A

X

LF

t=3

Deleting a key

ED

C

JB K QN O Y ZVUSR

T

P

MG

(b) F deleted: case 1

A

X

L

t=3

Deleting a key

ED

C

JB K QN O Y ZVUSR

T

P

LG

(c) M deleted: case 2a

A

X

t=3

Deleting a key

ED

C

JB K QN O Y ZVUSR

T

P

L

(d) G deleted: case 2c

A

X

t=3

Deleting a key

E

C

JB K QN O Y ZVUSR

TPL

(e) D deleted: case 3b

A

X

t=3

Deleting a key

E

C

JB K QN O Y ZVUSR

TPL

(e’) tree shrinks in height

A

X

t=3

Deleting a key

E

JC K QN O Y ZVUSR

TPL

(f) B deleted: case 3a

A

X

t=3

Complexities

A large Branching Factor reduces the number of disk accesses required to find a key.

When root node resides in memory, a tree with a height of 1 will require at most 2 disk accesses to find any key in the tree, this can be realized in Constant Time O(1).

Complexities

Running Time is comprised of the number of disk accesses and the CPU time.

During a disk Read or Write, an entire page of information is accessed

The number of disk accesses is measured in terms of pages that have to be read from or written to the disk.

Complexities

The number of disk pages accessed is

O(h)=O(logtn). The CPU time to traverse within each node is

O(t). The Total Time is O(th) which is equal to

O(tlogtn) or ≈ O(log n).

It is the same for every basic operation.

Applications

Databases cannot typically be maintained entirely in memory.

Secondary storage is usually used. B-tree is often used to index the data and

to provide fast access.

Applications

Searching an un-indexed and unsorted database containing n key values will have a worst case running time of O(n)

Indexed with a B-tree, the same search operation will run in O(log n)

Applications – an example

To perform a search for a single key on a set of one million keys (1,000,000), a linear search will require at most 1,000,000 comparisons.

If the same data is indexed with a B-tree of

minimum order 10 and height 9, 81 comparisons will be required in the worst case.

Summary

B-Tree is a balanced, multi-way file organization.

Search, Insert, and Delete operations retain desirable logarithmic costs.

B-Tree schemes promote 50% storage usage.

Extra

B-tree variants B+ and B* tree Branching factors are improved

Extra

B+ tree Combine features of ISAM and B tree Contain Index pages and Data pages Data pages always appear as leaf nodes Root and intermediate nodes are index pages

Extra

B+ tree Saves more space (but who cares) Non-leaf and leaf nodes contain different numbers

of nodes Deletion more complicated Faster look up for B-trees because the height of the

tree is smaller (because items are stored more compactly)