Splay Trees and the Interleave Bound

Splay Trees and the Interleave Bound

Brendan LucierMarch 15, 2005

Summary of “Dynamic Optimality -- Almost” by Demaine et. Al., 2004.

Outline

Introduction and DefinitionsResults of “Dynamic Optimality -

Almost”Application to Splay Trees

Outline



Dynamic Optimality

Consider a binary search tree T on n nodes, servicing a request sequence X=(x1, x2, …, xm) of m values.

Cost model: the algorithm is charged one operation for each node traversed at each time step (i.e. each access).

The set of all traversed nodes (for a particular access) forms a connected subtree. This tree can be rearranged at no cost.

The offline optimal dynamic BST for X is an algorithm (AOPT) which services X with the lowest cost. Call this cost COPT. Note that AOPT is offline.

We say that an online BST algorithm A is O(f(n))-competitive if C(A) = O(f(n)COPT). The algorithm is dynamically optimal if f(n) = 1.

There are sequences for which COPT = θ(m) (e.g. (1,…,n)k) and others for which COPT = θ(mlgn) (e.g. (n/2, n/4, 3n/4, n/8, …, n)k ).

Interleave Bound

The interleave bound (IB) is a function that assigns a positive integer to a given sequence X.

It has been shown that IB(X)/2 - O(n) is a lower bound on Copt(X). In fact, IB(X) is a simplification of a lower bound developed by Wilber in 1989.

Demaine et. Al. use IB(X) to construct a O(lglgn)-competitive BST algorithm.

Definition of IB(X)

Consider a fixed, perfectly balanced binary tree P on n nodes (assume n = 2k-1). P is not a BST, it’s only used to define IB(X).

Each node in P has a preferred child, either left or right. The preferred child of y is the one whose subtree contains the most recently accessed descendent of y. If the most recently accessed element is y, the preferred child is left.

The interleave cost of x (IC(x)) is the number of child preferences that would change if x were the next value accessed. If no preferences change, we incur a cost of 1.

The interleave bound IB(X) is the sum of all the IC(xi), where the state of P is updated after the access of each xi.

Example: X = (13,5,10,10)

1 3 5 7 9 11 13 15

2 6 10 14

4 12

8

Example: X = (13,5,10,10)

Access Element 13 No preferences change,

so IC(13) = 1, since we always incur a cost of at least 1.

1 3 5 7 9 11 13 15

2 6 10 14

4 12

8

Example: X = (13,5,10,10)



Access Element 5 Two preferences change,

so IC(5) = 2.

1 3 5 7 9 11 13 15

2 6 10 14

4 12

8

Example: X = (13,5,10,10)



Access Element 5 Two preferences

change, so IC(5) = 2. Access Element 10

Note the preference of 10 changes to left. IC(10) = 3.

1 3 5 7 9 11 13 15

2 6 10 14

4 12

8

Example: X = (13,5,10,10)



Access Element 5 Two preferences change,

so IC(5) = 2. Access Element 10

Note the preference of 10 changes to left. IC(10) = 3.

Access Element 10 No changes. IC(10) = 1.

1 3 5 7 9 11 13 15

2 6 10 14

4 12

8

Outline



Interleaving as a lower bound

Theorem: IB(X)/2 – O(n) is a lower bound on Copt(X)

This had already been proven by Wilber in ‘89, but the proof of Demaine et. Al. is simpler and (I think) quite enlightening.

Proof Sketch:

Idea 1: Consider, in any binary tree T, a node y. Then the indices of y and all of y’s descendents form a contiguous range of values. That is, they are precisely the set of values [L, R] for some L and R.

Idea 2: Given any two nodes in any tree T, say x and y, the lowest common ancestor of x and y occurs in the range [x, y]. We conclude that the lowest common ancestor of any range of

values [L,R] must, in fact, be an element of [L,R]. Idea 3: Given any node y in our balanced binary tree P, let

IBy(X) be the number of times the preferred child of y

changes as we process X. Then IB(X) = Σy in T IBy(X).

Proof Sketch (con’t)

Choose some y in P. Let the subtree rooted at y in P correspond to index range [L,R]. Then the left side of y corresponds to [L,y], and the right side to [y+1,R].

Let T be the a binary tree that occurs at a point in AOPT.

Let r1 and r2 be the lowest common ancestors of [L,y] and [y+1,R] in Then one of r1 and r2 must be the lowest common ancestor of [L,R]; say it’s r1.

We call r2 the transition point for y, and it turns out that this relationship forms a bijection between nodes of P and nodes of T.

1 3 5 7

2 6

4P:

Left is [1,4], Right is [5,7]

2

5

7

6

1

T:

3

4

r1

r2

16

y

Proof Sketch (con’t)

Now any access into y’s right subtree in P requires that r2 be traversed in T. But if y’s preferred child changes twice, y’s right subtree must be accessed!

Hence r2 must be traversed when y’s preferred child changes twice, so the BST algorithm must incur a cost of 1 to touch it.

Note that the transition point might change after it’s traversed, but there will always be one.

This means that node y contributes at least IBy(X)/2 – O(1) to the total cost of the BST.

Summing over all y, we get that the BST algorithm must incur a cost of at least IB(X)/2 – O(n), as required.

1 3 5 7

2 6

4P:

Left is [1,4], Right is [5,7]

2

5

7

6

1

T:

3

4 16

y

Any access to [5,7] must touch 5.

Non-Tightness of IB(X)

We know Copt(X) = O(IB(X)), but there are sequences for which Copt(X) = θ(IB(X)lglgn)

Suppose X consists only of values along the “always left” path in P. There are lgn+1 such values, and every access (except possibly the first) has an interleave cost of 1, so IB(X) = m + O(lgn).

We can access any k values in such a way that Copt(X) = θ(mlgk). In particular, we can access our “left-path” values so that Copt(X) = θ(mlglgn) = θ(IB(X)lglgn).

1 3 5 7 9 11 13 15

2 6 10 14

4 12

8

The Tango BST

Demaine et. Al. developed a BST algorithm, Tango, which performs in θ(lglgnIB(X)) time. Since Copt = Ω(IB(X)), Tango is O(lglgn)-competitive.

Idea: take the preferred path of P, and place its values into a balanced (AVL) tree T. Take all the remaining subtrees of P, recursively construct Tango trees from them, then hang those Tango trees from the leaves of T.

Illustration of Tango

1 3 5 7 9 11 13 15

2 6 10 14

4 12

8

9

10

12

8


1 3 5 7 9 11 13 15

2 6 10 14

4 12

8

9

10

12

8

1

2

4

11 13

14


1 3 5 7 9 11 13 15

2 6 10 14

4 12

8

9

10

12

8

1

2

4

11 13

14

153 5

6

7

The Tango Algorithm

The difficult part of Tango is rearranging the Tango tree when a value is accessed, so it still corresponds to the modified interleave tree P.

This requires an extra O(1) bits per node, then cutting and merging trees with n nodes in lgn time. This can be done with AVL trees.

For details, see the paper by Demaine et. Al.

Search Cost in Tango

Each preferred path has O(lgn) nodes, so each balanced tree has depth O(lglgn).

The number of trees one must pass through to reach a node y is simply the number of preferred paths touched on the path to y in P.

This is simply the number of times a non-preferred child is chosen (off by 1), so the number of trees traversed is O(IC(y)).

The depth of y in the Tango tree is therefore O(IC(y)lglgn). Total access time for a sequence X is therefore O(IB(X)lglgn).

Outline



Splay Trees

Splaying is an online BST algorithm that rotates an accessed node to the root of the tree.

Method of rotation is done so that the ancestors of accessed node x form a not-too-unbalanced subtree after all rotations are performed.

Recall: Access Lemma for Splay trees. Assign a weight w(x) to each node x in the tree. Define s(x) to be the sum of the weights of all descendents of x, and r(x) = lg[s(x)].

Then the amortized cost to access node x in a splay tree with root t is 3(r(t) - r(x)) + 1.

The Open Problem

In 1985, Sleator and Tarjan conjectured that Splay Trees are O(1)-competitive. Unfortunately, it has been shown only that splay trees are O(lgn)-competitive (and this was proven by S & T in 1985!).

I believe that splay trees perform in time O(IB(X)lglgn), and are therefore O(lglgn)-competitive.

Consider the weight function w(x) = [lgn]-IC(x). Then for each node, r(x) is between -IC(x)lglgn and lge (not obvious).

In particular, 3(r(t) - r(x)) + 1 = O(IC(x)lglgn), as required (?).

The problem is that our weight function is not fixed: it changes as the access sequence is processed. The access lemma requires a fixed weight function.

A Possible Approach

In fact, there is a generalization to the access lemma that does not require the weight function to be fixed. It is only necessary for the weight of a node to not increase, unless that node is being accessed.

The approach: for any period of time between two accesses to a node x, come up with a fixed value IC’(x) that depends on the values of IC(x) over that time period. Note that assignment of weights can be offline!

Set the weight function to be w(x) = lgnIC’(x), or some variant thereof (i.e. apply a multiplier), so that when x is accessed r(x) = IC’(x)lglgn = O(IC(x)lglgn) and r(t) = O(lglgn). The Access Lemma would then apply to give a total splaying cost of O(IB(X)lglgn).

Thank You

Date post:	14-Jan-2016
Category:	Documents
Upload:	conan
View:	32 times
Download:	2 times

Splay Trees and the Interleave Bound

Documents