+ All Categories
Home > Documents > Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf ·...

Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf ·...

Date post: 06-Mar-2018
Category:
Upload: lebao
View: 217 times
Download: 2 times
Share this document with a friend
15
Determining the size of a B + -tree Hussam A.Halim An-Najah National University 2013
Transcript
Page 1: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

Determining the size of a B+-tree

Hussam A.Halim

An-Najah National University

2013

Page 2: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

example

• Consider a relation of 2,000,000 tuples (that's 2million tuples) stored in a heap(unsorted) file structure. How large (in number of disk pages) would a B+-tree index on this relation be? We will assume that we are building a B+-tree that is filled up as much as possible.

• To determine this, we need to know the following:• The size of one disk page. While pages are usually 4KBytes or 8KBytes, to

make our calculations easier we will use a page size of 1,000 bytes • The size of the search key value. Say that we index the relation by using an

attribute that is a 40-byte string and which is a candidate key of the relation.

• The size of physical page addresses. This is needed in our calculations since every node in our B+-tree will store search key values as well as references to disk pages -- i.e. the address of the disk page being referenced. Again, for ease of calculations, we use a 10-byte page address.

• determining the size of our B+-tree index?

Page 3: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

.

• We will determine this by working from bottom to the top of the tree:

• Since a page is 1,000bytes, it can only store a limited number of index entries. Also, we are building a B+-tree that is filled up as much as possible so we will store as many index entries as can fit in a disk page. An index entry is a pair of values: [search key value, reference to tuple], and in our case, size(search key value)=40 and size(reference to tuple)=10.

• So, to determine the number of index entries that can fit in a leaf page we can use this calculation:

Number of index entries per leaf page = Floor( Page_size / (size(search key value) + size(reference to tuple)))

– Number of index entries per leaf page = Floor( 1000 / (40 + 10)) = 20

• The leaf pages of a B+-tree are linked together, these links are stored as the addresses of the adjacent disk pages. Again, for simplicity, the disk space used to store these links is not taken into account in the calculations shown here.

• Now we must determine the number of leaf pages that are necessary. Since the table we are indexing is stored as a heap, the tuples are not ordered in the data file. So, our B+-tree index must contain one index enter per tuple in the table; this is known as a dense index. So, there will be 2million index entries collectively stored in the leaf pages. The number of leaf pages necessary is given by:

– Number of leaf pages = Ceiling ( (Number of index entries to be stored) / (Number of index entries per leaf page) )

– Number of leaf pages = Ceiling ( 2,000,000 / 20 ) = 100,000

• Next, we must determine the branching factor, m, of our B+-tree. This can be done by repeating step 1 above with some slight modifications. In a B+-tree of order m, each internal node must be able to store up to m branches (i.e. pointers to nodes in the level below) and thus, up to m-1 search key values (usually called separator values when they appear in the internal nodes).

– Page_size >= (m-1) * size(search key value) + m * size(pointer)

– In our case, we have:1,000 >= (m-1) * 40 + m * 10

– Solving for m, which must be a whole number, we get:m = Floor( ( 1,000 + 40 ) / 50 ) = 20

solution

Page 4: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

– So, each internal node will store 20 pointers to nodes in the level below and a corresponding 19 search key values. Note the slight but important difference between internal nodes and leaf nodes.

• Now that we have m, we can determine the number of pages at each level in the B+-tree. Since every leaf in our tree needs to be referenced from the level above, collectively we will need 100,000 pointers (i.e. branches), one per leaf page. Therefore the number of internal pages in the level above the leaves is: – Number of pages in level above leaves = Ceiling( Number of leaf pages / branching factor) – Number of pages in level above leaves = Ceiling( 100,000 / 20 ) = 5,000 – We need to continue doing this computation until we reach the point where we have just one

internal node that will be the root of our tree – Number of pages in the next level above = Ceiling( 5000 / 20 ) = 250 – Number of pages in the next level above = Ceiling( 250 / 20 ) = 13 – Number of pages in the next level above = Ceiling( 13 / 20 ) = 1 – Notice that we have repeated this process 5 times, and that in fact:

Height of B+-tree = Ceiling(logmNumberOfLeafPages)And in our case, Height of B+-tree = Ceiling(log20100000) = 4.

• We can finally answer the question as to the size of our B+-tree: – Total number of pages = 100,000 + 5,000 + 250 + 13 + 1 = 105,264 pages – We have a 5-level B+-tree indexing 2million tuples, so by accessing 5 disk pages of the B+-tree

followed by 1 page of the file, we can find any one of these tuples.

• Note that database systems use caching and it is common for the top two levels of a B+-tree to be in the cache, thus saving two disk accesses in this example.

• It is finally worth noting that this example used a small page size of 1,000 bytes for easier calculations. Common page sizes are 4Kbytes, 8Kbytes, or even 16Kbytes. Larger page sizes lead to larger branching factors and thus shallower trees. There is of course a trade-off between the page size and page utilization. Occupying only half of a 4Kbyte page is not as significant as occupying only half of a 16Kbyte page. B*-trees can address this issue,

Page 5: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

Problem

• Consider ordered data file with following parameters

r (number of records) = 16348R (record size) = 32 bytes B (block size) = 1024 bytes

index stored as key + pointer pair

key value = 10 bytesblock pointer = 6 bytes

Find the number of first level and second level blocks required for multilevel index on this problem ?

Page 6: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

Solution

• Number of First level Blocks •

Lets find Number of blocks in data file

Number of records that can be accumulated in block i.eBlocking factor bfr = 1024/32 = 2^5so, can have 32 records in a block

now how many such blocks are required for 16348 records

number of blocks required for data file = (r/bfr)= 16348/ 32 ~= 511

now we know we need 511 entries in the first level index

Page 7: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

primary level of multilevel index and data file

Page 8: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

• Find 511 entries can be stored in how many blocksi.e how many blocks in first level of multilevel index will be required to store this much entries where each entry is of 16 bytes(key + pointer size)

R' = 16B = 1024bfr' = 1024/16 = 2^6

• Blocking factor or fan-out for first level and its subsequent levels will be same because index entry is of same size

so number of blocks required for 512 entries would be = r'/bfr'= 511/64 = 2^3 ~= 8

Page 9: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

• Number of Second level Blocks

Its clear that only a single second level block would be required to store 8 entries but lets calculate

Number of entries in second level = Number of blocks in the first level = 8Number of blocks in second level = (number of fist level blocks)/(bfr)

= r''/bfr'

blocking factor bfr' is same here as second level because here also we will be storing key + pointer pairNumber of records are now 8.

So, Number of blocks for second level = 8/64 ~= 1

Page 10: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

• For secondary index on unordered key data file with same parameters(prev. example)

Page 11: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

Solution

• In case of secondary index there is one index entry required for each data record in data file

• Number of First level blocks•

First level index will store index entries for all the records(16348) in data file

Number of blocks needed for first level index = r/bfr = 16348 / 64 ~= 256(bfr = 1024/(10+6) )

• Number of second level blocks

Number of entries in second level = Number of blocks in first level = 256bfr = 64 is same and r = 256so Number of second level blocks = 256/64 = 4

Page 12: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

Example

To calculate the order p of a B+ tree, suppose the search field is V = 8 bytes long, the block size is B = 1024 bytes, and a block pointer is P = 2 bytes. As internal node of the B+ tree can have up to p tree pointers and p-1 search field values, these must fit into a single block. Hence, we have,

solution:• (p * 2) + (( p – 1 ) * 8 <= 1024

• or (10 * p) <= 1032

• p = 103• We have 103 pointers and 102 search field values

Page 13: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

determine the size of B+-tree

• B+-trees have different leaf structure. In B+- tree leaf node contains keys and record pointer associated with it and a block pointer pointing to next leaf node. Non-leaf nodes contains only keys and child pointer, there is no need to store record pointer at non-leaf node, because all keys are ultimately present on leaf node.For leaf node order will be maximum number of keys, record pointer pair a node can hold, but order of non leaf node is determined by maximum child pointers it can have.

• For leaf node equation will be:n* k(key size)+ n* r(record pointer size) + b = block size

• For non-leaf node equation will be:• (n-1) k(key size)+ n b(block pointer size) = block size

Page 14: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

Example

• A B+- tree index is to be built on the Name attribute of the relation STUDENT. Assume that all student names are of length 8 bytes, disk blocks are of size 512 bytes, and index pointers are of size 4 bytes. Given this scenario, what would be the best choice of the degree(i.e. the number of pointers per node) of the B+tree?

• Solution

Degree of B+-tree can be calculated if we know the maximum number of key a internal node can have. By the formula for internal node of B+-tree

(n-1) k+ n b = block size

(n-1) * 8 + n*4=512

12n=520

n=43

Page 15: Determining the size of a B+-tree - teleng.orgmenawi.teleng.org/cv.org/cv/images/1519640349.pdf · Determining the size of a B+-tree Hussam A.Halim An-Najah National University 2013

example

• The order of a leaf node in a B+- tree is the maximum number of (value, data record pointer) pairs it can hold. Given that the block size is 1K bytes, data record pointer is 7 bytes long, the value field is 9 bytes long and a block pointer is 6 bytes long, what is the order of the leaf node?

• solution

order of leaf node B+ tree can be determined by formula

n*k+ n* r + b = block size

n*9 + n*7 + 6=1024

n*16=1018

n=63


Recommended