+
Ch 4dB+ treesMark McKenney
+Lots of trees, but what happens when memory fills up? Performance tanks!
All the trees we have seen so far assume that they fit in memory When memory fills….. Disk paging comes into play.
To traverse a tree, we need to access nodes that are stored non-sequentially in memory
How big is a node? (a couple of ints and pointers are around 4+4+8+8 = 24 bytes)
What is the minimum amount of data that can be read from memory (usually a word)
What is the minimum amount of memory that can be read from disk? (usually a page: 4kb)
So, if a node is stored on a unique page, we are wasting 4096-24 = 4072 bytes per read 257 reads requires 1 MB data transfer for 6kb of actual data
+So, lets generalize a binary tree to Disk… a B tree Actually, a B+ tree
B trees came out first, are harder, and more complicated\
Approach: Make a node the size of a disk page (fixed!) Make sure that no node is too empty Make sure that the tree is balanced
What if actual data is too big to fit in a disk page Use a Key to index the actual data, and store the data on disk in a separate
file
Advantages Maximum disk performance Persistence!!!! Buffering in terms of disk pages This is all very database oriented
+ 4
Root
B+Tree Example n=3
100
12
01
50
18
0
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
Keys in the tree (stored in its own file)
Data in a separate file
+ 5
Sample non-leaf
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
57
81
95
+ 6
Sample leaf node:
From non-leaf node
to next leaf
in sequence5
7
81
95
To r
eco
rd
wit
h k
ey 5
7
To r
eco
rd
wit
h k
ey 8
1
To r
eco
rd
wit
h k
ey 8
5
+ 7
Size of nodes: n+1 pointers
n keys (fixed)
+ 8
Don’t want nodes to be too empty
Use at least
Non-leaf: (n+1)/2 pointers
Leaf: (n+1)/2 pointers to data
+ 9
Full node min. node
Non-leaf
Leaf
n=3
12
01
50
18
0
30
3 5 11
30
35
counts
even if
null
+ 10
B+tree rules tree of order n
(1) All leaves at same lowest level(balanced tree)
(2) Pointers in leaves point to records
except for “sequence pointer”
+ 11
(3) Number of pointers/keys for B+tree
Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1
Leaf(non-root) n+1 n
Root n+1 n 1 1
Max Max Min Min ptrs keys ptrsdata keys
(n+1)/2 (n+1)/2
+ 12
Insert into B+tree
(a) simple case space available in leaf
(b) leaf overflow
(c) non-leaf overflow
(d) new root
+ 13
(a) Insert key = 32 n=33 5 11
30
31
30
100
+ 14
(a) Insert key = 32 n=33 5 11
30
31
30
100
32
+ 15
(a) Insert key = 7 n=3
3 5 11
30
31
30
100
+ 16
(a) Insert key = 7 n=3
3 5 11
30
31
30
100
3 5
7
+ 17
(a) Insert key = 7 n=3
3 5 11
30
31
30
100
3 5
7
7
+ 18(c) Insert key = 160
n=3
100
120
150
180
150
156
179
180
200
+ 19(c) Insert key = 160
n=3
100
120
150
180
150
156
179
180
200
160
179
+ 20(c) Insert key = 160
n=3
100
120
150
180
150
156
179
180
200
180
160
179
+ 21(c) Insert key = 160
n=3
100
120
150
180
150
156
179
180
200
160
180
160
179
+ 22(d) New root, insert 45 n=3
10
20
30
1 2 3 10
12
20
25
30
32
40
+ 23(d) New root, insert 45 n=3
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
+ 24(d) New root, insert 45 n=3
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
40
+ 25(d) New root, insert 45 n=3
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
40
30new root
+ 26
(a) Simple case - no example
(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys
(d) Cases (b) or (c) at non-leaf
Deletion from B+tree
+ 27(b) Coalesce with sibling
Delete 50
10
40
100
10
20
30
40
50
n=4
+ 28(b) Coalesce with sibling
Delete 50
10
40
100
10
20
30
40
50
n=4
40
+ 29(c) Redistribute keys
Delete 50
10
40
100
10
20
30
35
40
50
n=4
+ 30(c) Redistribute keys
Delete 50
10
40
100
10
20
30
35
40
50
n=4
35
35
31
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Non-leaf coalese– Delete 37
n=4
25
32
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Non-leaf coalese– Delete 37
n=4
30
25
33
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Non-leaf coalese– Delete 37
n=4
40
30
25
34
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Non-leaf coalese– Delete 37
n=4
40
30
25
25
new root
+ 35
B+tree deletions in practice
– Often, coalescing is not implemented Too hard and not worth it!
+Characteristics
B+ trees are typically short and bushy Want searches to touch few nodes since they are on disk For 100 elements in a node
A tree of height 1 can index 100 items A tree of height 2 can index 100 * 100 items = 10,000 A tree of height 3 can index 100*100*100 items =
1,000,000 So, we can find an item in that tree by looking at 3 nodes,
despite the huge number of items Equates to 3 disk reads. Very IO efficient
Databases make heavy use of B trees (usually B+ trees)
+A final note
How to locate an element in a node?
They are sorted… use a binary search!
+So.. Complexity?
We now have a new type of complexity
IO complexity IO’s are disk (secondary storage) IO’s, the slowest IO’s in a computer
system… So we need an IO complexity as well as a computational complexity, but IO
complexity reigns
So, for a B+ tree with a min nodes and b max nodes and block size (disk page size) of B Number of leaf blocks is O(n/B) IO complexity for all operations is O(logB n)
Height of tree is Ω(loga n) and O(logb n)
Time complexity to find is between Ω( f(a) loga n ) and O( f(b) logb n ) Where f(b) is the time to find an element in a node
+Always remember your bandwidth
http://hothardware.com/News/Homing-Pigeon-Faster-Than-Internet-in-Data-Transfer/
Time to transfer 4GB at 2.04MB per second is ……
4 hours, 39 minutes, and 37 sec
Time to transfer 2.57 PB == 2570000GB at 2.04Mbits per second is
130821 Days 12 Hours 32 Minutes 13.54 Seconds == 358 years!
Size of a hard drive: .01 cubic foot
Cargo capacity of a Toyota Yaris: 25.7 cubic feet
Number of hard drives I can transport: 2570
If these are 1 TB hard drives, that’s 2.57 PB == roughly 20.56 peta bits
Time to drive to Chicago: 5hrs == 18000 seconds
Which gives a bandwidth of 1.14 Tbits/second == 142 GB/second
And so the saying is: “Never underestimate the bandwidth of a station wagon loaded with hard drives hurtling down the highway at 70mph”