+ All Categories
Home > Documents > B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data...

B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data...

Date post: 28-Mar-2015
Category:
Upload: gavin-owen
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
B+-Trees Reading: C&B Ch 23 & 29
Transcript
Page 1: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

B+-Trees

Reading: C&B Ch 23 & 29

Page 2: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 2

Recap of Data Storage in Files

• Data is stored in files using primary organization– Unordered (heap)– Ordered (sequential)– Hashed

• To speed up data retrieval, indexes are defined on the data files based on– Ordering Key field – unique key values for all the records -

primary index OR– Ordering Non-key field – clustering index AND– Non-ordering non-key fields - Secondary indexes

• To search for a required record (whose key is given in the WHERE part of the query) in the data file, DBMS first searches the index– Once index is located the pointer field of the index leads the

DBMS to the disk page where the required record is located– binary search can be performed on the ordered index

Page 3: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 3

Primary Indexes(Copied from lecture on file organization)

• The data file is sequentially ordered on the key field• Index file stores all (dense) or some (sparse) values

of the key field and the page number of the data file in which the corresponding record is stored

B002 1

B003 1

B004 2

B005 2

B007 3

Branch

BranchNo Street City Postcode

B002 56 Clover Dr London NW10 6EU

B003 163 Main St Glasgow G11 9QX

B004 32 Manse Rd Bristol BS99 1NZ

B005 22 Deer Rd London SW1 4EH

B007 16 Argyll St Aberdeen AB2 3SU

Branch B002 record

Branch B003 record

Branch B004 record

Branch B005 record

Branch B007 record

1

2

3

4

TablePages on DiskIndex

Page 4: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 4

Multi-level Index• If the index information is large it needs to be stored on the

hard disk• This means efficient techniques are required for searching

indexes as well– Faster than a binary search on the ordered index

• The key idea used to improve search efficiency is to add another level of index to the initial level of index

• This idea can be repeated several times to define several levels of index– The top level index is made to fit into a single disk page– This top level search gives the pointer to the required lower

level index page or the pointer to the required data page• This is the central idea behind Multi-level indexes• ISAM uses a Multi-level index

Page 5: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 5

Dynamic Multi-level Index

• Although multi-level indexes (as described earlier) can speed up search they perform poorly with insertions and deletions

• Dynamic multi-level index addresses this problem by leaving out some space in each of its pages for new entries

• Dynamic multi-level index is implemented using data structures called B-Trees and B+-Trees– B+-Trees are a variation on B-Trees– B+-Trees are more commonly used for

indexing than B-Trees

Page 6: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 6

B-Tree• B-Tree stands for a

Balanced tree– All the paths through a

B-Tree from root to different leaf nodes are of the same length (balanced path lengths)

• All leaf nodes are at the same depth (level)

– This ensures that number of disk accesses required for all the searches are same

• The lesser the depth (level) of an index tree the faster the search

5 * 8 *

9 * 12 *6 * 7 *1 * 3 *

* Is the pointer to the data page

B-Tree of order 3

Page 7: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 7

B+-Tree

• B-Tree stores data pointers in non-leaf nodes and also leaf nodes (refer to the figure on Slide 5)

• B+-Tree stores data pointers in leaf nodes only– This means leaf nodes and non-leaf nodes are

structured differently in B+-Tree– The saved space in the non-leaf (internal)

nodes is used to store more keys and more tree pointers

• Reduction in the depth of a B+-Tree• Faster search

Page 8: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 8

B+-Tree (2)

• Is a Balanced Tree with the following properties• The structure of a B+-Tree is defined based on a

parameter called ‘Order’ denoted by p– Order of a B+-Tree depends upon the page size and the

sizes of different fields in the tree nodes

• The internal and leaf nodes in a B+-Tree are structured differently

• Therefore the order of leaf node is different from the order of the internal nodes and we use– p – order of internal node– pleaf – order of leaf node

Page 9: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 9

Internal Node• For a B+-Tree of order p internal nodes are

structured as follows– Each internal node is of the form <P1,K1,P2,K2,…,pq-1,Kq-

1,Pq> where q<=p and each Pi is a tree pointer and Ki is an index

– Within each internal node, K1<K2<…<Kq-1 – indexes are sorted

– For all search field values X in the subtree pointed at by P i,

Ki-1<X<=Ki and 1<i<q; X<=Ki; and Ki-1<X for i = q– Each internal node has at most p tree pointers– Each internal node, except the root has at least

ceiling(p/2) tree pointers– The root node has at least two tree pointers if it is an

internal node– An internal node with q pointers, q<=p, has q-1 index

values

Page 10: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 10

Leaf Node

• Leaf nodes are structured as follows– Each leaf node is of the form

<<K1,Pr1>,<K2,Pr2>,…,<Kq-1,Prq-1>,Pnext> where q<=p,each Pri is a data pointer, and Pnext points to the next leaf node in the B+-tree

– Within each leaf node, K1<=K2…,Kq-1,q<=p– each leaf node has at least ceiling(p/2) values– All leaf nodes are the same level - balanced

• In B+-tree all the leaf nodes are linked together– First level of index as linked list (could be

doubly linked as well)

Page 11: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 11

Insertion

• We illustrate index insertion with an example

• We want to insert the following indexes into an empty B+-Tree of p=3 and pleaf=2– 8, 5, 1, 7, 3, 12

• Initially you start with the root node which is of type leaf node (no children yet)

5 8* *

Page 12: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 12

5 8* * Insert 1: overflow (new level)

8 *1 5* *

5Insert 7

Overflow in leaf node

Split the leaf node

the first j = ceiling((pleaf+1)/2) entries are kept in the original node and the remaining moved to the new leaf node

create a new internal node and the jth index value is replicated in the parent internal node

a pointer is added to the newly formed leaf node

8 *

Page 13: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 13

1 5* * 7 8* *

5

1 5* *

5Insert 7

8 *

Space available in nodes to store new entries without creating new nodes

Page 14: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 14

1 5* * 7 8* *

5 Insert 3: overflow (split)

1 3* * 7 8* *

3 5

5 *

Overflow in leaf node; Split the leaf node

the first j = ceiling((pleaf+1)/2) entries are kept in the original node and the remaining moved to the new leaf node

the jth index value is replicated in the parent internal node

a pointer is added to the newly formed leaf node

Page 15: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 15

1 3* * 7 8* *

3 5 Insert 12: overflow (split, propagates, New level)

5 *

1 3* * 7 8* *

3

5 * 12 *

5

8

Overflow in internal node; Split the internal node

the entries upto Pj where j = floor((p+1)/2) are kept in the original node and the remaining moved to the new internal node

Create a new internal node and the jth index value is moved to the parent internal node (without replication)

pointers are added to the newly formed nodes

Page 16: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 16

Insertion (2)

• You can see that not all insertions required creation of new nodes.

• B+-Trees ensure that some space is always left in nodes for new entries

• Also B-Trees also make sure all nodes are at least half full

Page 17: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 17

Search• Given an index, K to be searched

– start at the root node– Search for the pointer to follow to the

lower level of the tree until a leaf node is found

– Search for the key in the leaf node

1 3* * 7 8* *

3

5 * 12 *

5

8

Page 18: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 18

Deletion

• We illustrate index deletion with an example

• We want to delete the following indexes from a B+-Tree of p=3 and pleaf=2– 5, 12, 9

Page 19: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 19

7

1 *

1 6 9

5 6* * 7 * 8 9* * 12 *

Delete 5

7

1 *

1 6 9

6 * 7 * 8 9* * 12 *

Page 20: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 20

7

1 *

1 6 9

6 * 7 * 8 9* * 12 *

Delete 12: Underflow (redistribute)

7

1 *

1 6 8

6 * 7 * 8 * 9 *

Underflow in leaf nodeif a sibling node (right or left) exists redistribute entries among

the node and its siblings so that both are at least half fullelse merge the node with its siblings to reduce the number of leaf nodesmodify the parent internal node to reflect the redistribution

Page 21: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 21

7

1 *

1 6 8

6 * 7 * 8 * 9 *

6

1 *

1 7

6 * 7 * 8 *

Delete 9: underflow(merge with left;Redistribute)

Page 22: B+-Trees Reading: C&B Ch 23 & 29. Dept. of Computing Science, University of Aberdeen2 Recap of Data Storage in Files Data is stored in files using primary.

Dept. of Computing Science, University of Aberdeen 22

Summary• B+-Trees provide efficient operations of

– Search, insert and delete• Real databases have nodes of size equal to one disk page

(say of 1KB size)– Thus each node stores lot more indexes than the examples

shown here– Therefore achieve short search trees (small depth values)

leading to faster search• B+ trees offer dynamic multilevel index

– Dynamic• Allow simple insertion and deletion operations in majority of cases

– Multilevel• First level index in the form of the linked list of all its leaf nodes• Each subsequent internal level in a B+-Tree offers another level of

index


Recommended