Date post: | 06-Jan-2018 |
Category: |
Documents |
Upload: | kristin-murphy |
View: | 218 times |
Download: | 1 times |
1
CSE 326: Data Structures Trees
2
Today: Splay Trees
• Fast both in worst-case amortized analysis and in practice
• Are used in the kernel of NT for keep track of process information!
• Invented by Sleator and Tarjan (1985)
• Details: • Weiss 4.5 (basic splay trees)• 11.5 (amortized analysis)• 12.1 (better “top down” implementation)
3
Basic Idea
“Blind” rebalancing – no height info kept!• Worst-case time per operation is O(n)• Worst-case amortized time is O(log n)• Insert/find always rotates node to the root!• Good locality:
– Most commonly accessed keys move high in tree – become easier and easier to find
4
Idea
17
10
92
5
3
You’re forced to make a really deep access:
Since you’re down there anyway,fix up a lot of deep nodes!
move n to root by series of zig-zag and zig-zig rotations, followed by a final single rotation (zig) if necessary
5
Zig-Zag*
g
Xp
Y
n
Z
W
*This is just a double rotation
n
Y
g
W
p
ZX
Helped
Unchanged
Hurt
up 2
down 1
up 1down 1
6
Zig-Zig
n
Z
Y
p
X
g
W
g
W
X
p
Y
n
Z
7
Why Splaying Helps• Node n and its children are always helped (raised)• Except for last step, nodes that are hurt by a zig-
zag or zig-zig are later helped by a rotation higher up the tree!
• Result: – shallow nodes may increase depth by one or two– helped nodes decrease depth by a large amount
• If a node n on the access path is at depth d before the splay, it’s at about depth d/2 after the splay– Exceptions are the root, the child of the root, and the
node splayed
8
Splaying Example
2
1
3
4
5
6
Find(6)
2
1
3
6
5
4
zig-zig
9
Still Splaying 6
zig-zig2
1
3
6
5
4
1
6
3
2 5
4
10
Almost There, Stay on Target
zig
1
6
3
2 5
4
6
1
3
2 5
4
11
Splay Again
Find(4)
zig-zag
6
1
3
2 5
4
6
1
4
3 5
2
12
Example Splayed Out
zig-zag
6
1
4
3 5
2
61
4
3 5
2
13
Locality• “Locality” – if an item is accessed, it is likely to be accessed
again soon– Why?
• Assume m n access in a tree of size n– Total worst case time is O(m log n)– O(log n) per access amortized time
• Suppose only k distinct items are accessed in the m accesses.– Time is O(n log n + m log k )
– Compare with O( m log n ) for AVL tree
getting those k items near root
those k items are all at the top of the tree
14
Splay Operations: Insert• To insert, could do an ordinary BST insert
– but would not fix up tree– A BST insert followed by a find (splay)?
• Better idea: do the splay before the insert!• How?
15
SplitSplit(T, x) creates two BST’s L and R:
– All elements of T are in either L or R – All elements in L are x– All elements in R are x– L and R share no elements
Then how do we do the insert?
16
SplitSplit(T, x) creates two BST’s L and R:
– All elements of T are in either L or R – All elements in L are x– All elements in R are > x– L and R share no elements
Then how do we do the insert?Insert as root, with children L and R
17
Splitting in Splay Trees
• How can we split?– We have the splay operation– We can find x or the parent of where x would
be if we were to insert it as an ordinary BST– We can splay x or the parent to the root– Then break one of the links from the root to a
child
18
Splitsplit(x)
T L R
splay
OR
L R L R
x > x> x < x
could be x, or what would
have been the parent of x
if root is x
if root is > x
19
Back to Insert
split(x)
L R
x
L R> x x
Insert(x):Split on xJoin subtrees using x as root
20
Insert Example
91
6
4 7
2
Insert(5)
split(5)
9
6
7
1
4
2
1
4
2
9
6
7
1
4
2
9
6
7
5
21
Splay Operations: Delete
find(x)
L R
x
L R> x< x
delete x
Now what?
22
Join
• Join(L, R): given two trees such that L < R, merge them
• Splay on the maximum element in L then attach R
L R R
splay L
23
Delete Completed
T
find(x)
L R
x
L R> x< x
delete x
T - x
Join(L,R)
24
Delete Example
91
6
4 7
2
Delete(4)
find(4)
9
6
7
1
4
2
1
2
9
6
7
Find max
2
1
9
6
7
2
1
9
6
7
25
Splay Trees, Summary
• Splay trees are arguably the most practical kind of self-balancing trees
• If number of finds is much larger than n, then locality is crucial!– Example: word-counting
• Also supports efficient Split and Join operations – useful for other tasks– E.g., range queries
26
Dictionary & Search ADTs
• Dictionary ADT (aka map ADT) Stores values associated with user-specified keys– keys may be any (homogenous) comparable type– values may be any (homogenous) type
• Search ADT: (aka Set ADT)stores keys only
27
Dictionary & Search ADTs
insert(kohlrabi, upscale tuber)
find(kreplach)
kreplach: tasty stuffed dough
create : dictionaryinsert : dictionary key values dictionaryfind : dictionary key valuesdelete : dictionary key dictionary
kim chi spicy cabbage
Kreplach tasty stuffed dough
Kiwi Australian fruit
28
Dictionary Implementations
• Arrays:– Unsorted– Sorted
• Linked lists• BST
– Random– AVL– Splay
29
Dictionary ImplementationsArrays Lists Binary Search Trees
unsorted sorted AVL splay
insert O(1) O(n) O(1) O(log n)O(log n)
amortized
find O(n) O(log n) O(n) O(log n)O(log n)
amortized
deletefind + O(1)
O(n) find + O(1) O(log n)O(log n)
amortized
30
The last dictionary we discuss:B-Trees
• Suppose we want to store the data on disk• A disk access is a lot more expensive than one CPU
operation
• Example– 1,000,000 entries in the dictionary– An AVL tree requires log(1,000,000) 20 disk accesses – this is
expensive• Idea in B Trees:
– Increase the fan-out, decrease the hight– Make 1 node = 1 block
31
• All keys are stored at leaves• Nonleaf nodes have guidance keys, to help the search• Parameter d = the degree
B-Trees Basics
book uses the order M = 2d+1)
Rules for Keys:The root is either a leaf, or has between 1 and 2d keysAll other nodes (except the root) have between d and 2d keys
Rule for number of children:Each node (except leaves) has one more children than keys
Balance rule:The tree is perfectly balanced !
32
• A non-leaf node:
• A leaf node:
B-Trees Basics
30 120 240
30<=k<120 120<=k<240 Keys 240<=k
40 50 60
Record with key 40 Record with key 50 Record with key 60
Next leaf
Keys k < 30
Then calleda B+ tree
33
B+Tree Example
80
20 60 100 120 140
10 15 18 20 30 40 50 60 65 80 85 90
10 15 18 20 30 40 50 60 65 80 85 90
d = 2 (M = 5) Find the key 40
40 80
20 < 40 60
30 < 40 40
34
B+Tree Design
• How large d ?• Example:
– Key size = 4 bytes– Pointer size = 8 bytes– Block size = 4096 byes
• 2d x 4 + (2d+1) 8 <= 4096• d = 170
B+ Trees Depth
• Assume d = 170• How deep is the B-tree ?
• Depth = 0 (just the root) at least 170 keys• Depth = 1 at least 170+170171 30103 keys• Depth = 2 170+170171+1701712 5106 keys• Depth = 3 170+... +1701713 860 106 keys• Depth = 4 170+...+1701714 147 109 keysNobody has more keys !
With a B tree we can find any data item with at most 5 disk accesses !
36
Insertion in a B+ TreeInsert (K, P)• Find leaf where K belongs, insert• If no overflow (2d keys or less), halt• If overflow (2d+1 keys), split node, insert in parent:
• If leaf, keep K3 too in right node• When root splits, new root has 1 key only
K1 K2 K3 K4 K5
P0 P1 P2 P3 P4 p5
K1 K2
P0 P1 P2
K4 K5
P3 P4 p5
parent K3
parent