1 CSE 326: Data Structures Trees. 2 Today: Splay Trees Fast both in worst-case amortized analysis...

Post on 06-Jan-2018

218 views 1 download

description

3 Basic Idea “Blind” rebalancing – no height info kept! Worst-case time per operation is O(n) Worst-case amortized time is O(log n) Insert/find always rotates node to the root! Good locality: –Most commonly accessed keys move high in tree – become easier and easier to find

transcript

1

CSE 326: Data Structures Trees

2

Today: Splay Trees

• Fast both in worst-case amortized analysis and in practice

• Are used in the kernel of NT for keep track of process information!

• Invented by Sleator and Tarjan (1985)

• Details: • Weiss 4.5 (basic splay trees)• 11.5 (amortized analysis)• 12.1 (better “top down” implementation)

3

Basic Idea

“Blind” rebalancing – no height info kept!• Worst-case time per operation is O(n)• Worst-case amortized time is O(log n)• Insert/find always rotates node to the root!• Good locality:

– Most commonly accessed keys move high in tree – become easier and easier to find

4

Idea

17

10

92

5

3

You’re forced to make a really deep access:

Since you’re down there anyway,fix up a lot of deep nodes!

move n to root by series of zig-zag and zig-zig rotations, followed by a final single rotation (zig) if necessary

5

Zig-Zag*

g

Xp

Y

n

Z

W

*This is just a double rotation

n

Y

g

W

p

ZX

Helped

Unchanged

Hurt

up 2

down 1

up 1down 1

6

Zig-Zig

n

Z

Y

p

X

g

W

g

W

X

p

Y

n

Z

7

Why Splaying Helps• Node n and its children are always helped (raised)• Except for last step, nodes that are hurt by a zig-

zag or zig-zig are later helped by a rotation higher up the tree!

• Result: – shallow nodes may increase depth by one or two– helped nodes decrease depth by a large amount

• If a node n on the access path is at depth d before the splay, it’s at about depth d/2 after the splay– Exceptions are the root, the child of the root, and the

node splayed

8

Splaying Example

2

1

3

4

5

6

Find(6)

2

1

3

6

5

4

zig-zig

9

Still Splaying 6

zig-zig2

1

3

6

5

4

1

6

3

2 5

4

10

Almost There, Stay on Target

zig

1

6

3

2 5

4

6

1

3

2 5

4

11

Splay Again

Find(4)

zig-zag

6

1

3

2 5

4

6

1

4

3 5

2

12

Example Splayed Out

zig-zag

6

1

4

3 5

2

61

4

3 5

2

13

Locality• “Locality” – if an item is accessed, it is likely to be accessed

again soon– Why?

• Assume m n access in a tree of size n– Total worst case time is O(m log n)– O(log n) per access amortized time

• Suppose only k distinct items are accessed in the m accesses.– Time is O(n log n + m log k )

– Compare with O( m log n ) for AVL tree

getting those k items near root

those k items are all at the top of the tree

14

Splay Operations: Insert• To insert, could do an ordinary BST insert

– but would not fix up tree– A BST insert followed by a find (splay)?

• Better idea: do the splay before the insert!• How?

15

SplitSplit(T, x) creates two BST’s L and R:

– All elements of T are in either L or R – All elements in L are x– All elements in R are x– L and R share no elements

Then how do we do the insert?

16

SplitSplit(T, x) creates two BST’s L and R:

– All elements of T are in either L or R – All elements in L are x– All elements in R are > x– L and R share no elements

Then how do we do the insert?Insert as root, with children L and R

17

Splitting in Splay Trees

• How can we split?– We have the splay operation– We can find x or the parent of where x would

be if we were to insert it as an ordinary BST– We can splay x or the parent to the root– Then break one of the links from the root to a

child

18

Splitsplit(x)

T L R

splay

OR

L R L R

x > x> x < x

could be x, or what would

have been the parent of x

if root is x

if root is > x

19

Back to Insert

split(x)

L R

x

L R> x x

Insert(x):Split on xJoin subtrees using x as root

20

Insert Example

91

6

4 7

2

Insert(5)

split(5)

9

6

7

1

4

2

1

4

2

9

6

7

1

4

2

9

6

7

5

21

Splay Operations: Delete

find(x)

L R

x

L R> x< x

delete x

Now what?

22

Join

• Join(L, R): given two trees such that L < R, merge them

• Splay on the maximum element in L then attach R

L R R

splay L

23

Delete Completed

T

find(x)

L R

x

L R> x< x

delete x

T - x

Join(L,R)

24

Delete Example

91

6

4 7

2

Delete(4)

find(4)

9

6

7

1

4

2

1

2

9

6

7

Find max

2

1

9

6

7

2

1

9

6

7

25

Splay Trees, Summary

• Splay trees are arguably the most practical kind of self-balancing trees

• If number of finds is much larger than n, then locality is crucial!– Example: word-counting

• Also supports efficient Split and Join operations – useful for other tasks– E.g., range queries

26

Dictionary & Search ADTs

• Dictionary ADT (aka map ADT) Stores values associated with user-specified keys– keys may be any (homogenous) comparable type– values may be any (homogenous) type

• Search ADT: (aka Set ADT)stores keys only

27

Dictionary & Search ADTs

insert(kohlrabi, upscale tuber)

find(kreplach)

kreplach: tasty stuffed dough

create : dictionaryinsert : dictionary key values dictionaryfind : dictionary key valuesdelete : dictionary key dictionary

kim chi spicy cabbage

Kreplach tasty stuffed dough

Kiwi Australian fruit

28

Dictionary Implementations

• Arrays:– Unsorted– Sorted

• Linked lists• BST

– Random– AVL– Splay

29

Dictionary ImplementationsArrays Lists Binary Search Trees

unsorted sorted AVL splay

insert O(1) O(n) O(1) O(log n)O(log n)

amortized

find O(n) O(log n) O(n) O(log n)O(log n)

amortized

deletefind + O(1)

O(n) find + O(1) O(log n)O(log n)

amortized

30

The last dictionary we discuss:B-Trees

• Suppose we want to store the data on disk• A disk access is a lot more expensive than one CPU

operation

• Example– 1,000,000 entries in the dictionary– An AVL tree requires log(1,000,000) 20 disk accesses – this is

expensive• Idea in B Trees:

– Increase the fan-out, decrease the hight– Make 1 node = 1 block

31

• All keys are stored at leaves• Nonleaf nodes have guidance keys, to help the search• Parameter d = the degree

B-Trees Basics

book uses the order M = 2d+1)

Rules for Keys:The root is either a leaf, or has between 1 and 2d keysAll other nodes (except the root) have between d and 2d keys

Rule for number of children:Each node (except leaves) has one more children than keys

Balance rule:The tree is perfectly balanced !

32

• A non-leaf node:

• A leaf node:

B-Trees Basics

30 120 240

30<=k<120 120<=k<240 Keys 240<=k

40 50 60

Record with key 40 Record with key 50 Record with key 60

Next leaf

Keys k < 30

Then calleda B+ tree

33

B+Tree Example

80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 15 18 20 30 40 50 60 65 80 85 90

d = 2 (M = 5) Find the key 40

40 80

20 < 40 60

30 < 40 40

34

B+Tree Design

• How large d ?• Example:

– Key size = 4 bytes– Pointer size = 8 bytes– Block size = 4096 byes

• 2d x 4 + (2d+1) 8 <= 4096• d = 170

B+ Trees Depth

• Assume d = 170• How deep is the B-tree ?

• Depth = 0 (just the root) at least 170 keys• Depth = 1 at least 170+170171 30103 keys• Depth = 2 170+170171+1701712 5106 keys• Depth = 3 170+... +1701713 860 106 keys• Depth = 4 170+...+1701714 147 109 keysNobody has more keys !

With a B tree we can find any data item with at most 5 disk accesses !

36

Insertion in a B+ TreeInsert (K, P)• Find leaf where K belongs, insert• If no overflow (2d keys or less), halt• If overflow (2d+1 keys), split node, insert in parent:

• If leaf, keep K3 too in right node• When root splits, new root has 1 key only

K1 K2 K3 K4 K5

P0 P1 P2 P3 P4 p5

K1 K2

P0 P1 P2

K4 K5

P3 P4 p5

parent K3

parent