+ All Categories
Home > Documents > Amortized Analysis - student.cs.uwaterloo.cacs466/Old... · The time complexity is clearly bounded...

Amortized Analysis - student.cs.uwaterloo.cacs466/Old... · The time complexity is clearly bounded...

Date post: 27-May-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
13
1 Amortized Analysis In the analysis of algorithm, especially in algorithms related to a data structure, one needs to bound the cost of a single operation. E.g., the insertion of an element in a length- sorted list costs (), but the same operation for a heap costs (log ). Often this cost of a single operation differs widely from operation to operation. Consider a queue that is backed with an array of size , as well as a counter to indicate the end of the queue. While the queue is not full, adding an element to the end of the queue only costs (1) time. But when the queue is full, adding an element will have to first copy the whole content of to a larger-sized array. Thus, adding an element costs from (1) to () time. If worst case analysis is used, we’ll have to sadly say that the data structure costs () per operation. You may risk the loss of the job. You can also do average analysis. But this requires an arbitrary assumption about the distribution of the input. That’s not as appealing. Amortized analysis provides a better way. It tries to analyze the average cost of an operation over any sequence of operations. I.e. it is average. But its guarantee does not rely on any probabilistic distribution of the input. It is the worst case of the average cost. Think the array-backed queue example again. Suppose every time we need to resize the array, the size is doubled. Then for any sequence of insertion of length , the total number of copying is bounded by 1+2 1 +2 2 +⋯+2 ⌊log 2 = () This cost averaged to each insertion is only (1). So we say that the amortized cost for insertion in the array-backed queue is (1). This simple strategy is used in Java’s ArrayList and HashMap data structures. The capacity of the data structure is increased automatically when it is full or almost full. They all have amortized constant time for adding an element. Amortized analysis is not always good when the system needs real-time response, but makes no difference for offline computations. Three Common Methods for Amortized Analysis There are three common methods for amortized analysis: 1. Aggregate Method 2. Accounting Method 3. Potential Method Rather than defining these methods right now, we examine examples directly and explain these methods with the examples.
Transcript

1

Amortized Analysis In the analysis of algorithm, especially in algorithms related to a data structure, one needs to

bound the cost of a single operation. E.g., the insertion of an element in a length-𝑛 sorted list

costs 𝑂(𝑛), but the same operation for a heap costs 𝑂(log 𝑛).

Often this cost of a single operation differs widely from operation to operation. Consider a

queue that is backed with an array 𝐴 of size 𝑛, as well as a counter 𝑐 to indicate the end of the

queue. While the queue is not full, adding an element to the end of the queue only costs 𝑂(1)

time. But when the queue is full, adding an element will have to first copy the whole content of

𝐴 to a larger-sized array. Thus, adding an element costs from 𝑂(1) to 𝑂(𝑛) time.

If worst case analysis is used, we’ll have to sadly say that the data structure costs 𝑂(𝑛) per

operation. You may risk the loss of the job.

You can also do average analysis. But this requires an arbitrary assumption about the

distribution of the input. That’s not as appealing. Amortized analysis provides a better way. It

tries to analyze the average cost of an operation over any sequence of operations. I.e. it is

average. But its guarantee does not rely on any probabilistic distribution of the input. It is the

worst case of the average cost.

Think the array-backed queue example again. Suppose every time we need to resize the array,

the size is doubled. Then for any sequence of insertion of length 𝑛, the total number of copying

is bounded by

1 + 21 + 22 +⋯+ 2⌊log2 𝑛⌋ = 𝑂(𝑛)

This cost averaged to each insertion is only 𝑂(1). So we say that the amortized cost for insertion

in the array-backed queue is 𝑂(1).

This simple strategy is used in Java’s ArrayList and HashMap data structures. The capacity of the

data structure is increased automatically when it is full or almost full. They all have amortized

constant time for adding an element. Amortized analysis is not always good when the system

needs real-time response, but makes no difference for offline computations.

Three Common Methods for Amortized Analysis

There are three common methods for amortized analysis:

1. Aggregate Method

2. Accounting Method

3. Potential Method

Rather than defining these methods right now, we examine examples directly and explain these

methods with the examples.

2

Multipop Stack

A multipop stack 𝑆 adds one more operation to a regular stack: 𝑚𝑢𝑙𝑡𝑖𝑝𝑜𝑝(𝑆, 𝑘). This operation

removes 𝑘 elements from the top of the stack, or if the stack has fewer than 𝑘 elements, the

stack is emptied.

Suppose we simply use a regular stack to back this new data structure, and use the following

pseudo code for the multipop operation:

Multipop(S, k):

While k>0 and S is not empty

pop(S).

A multipop operation costs in worst case 𝑂(𝑘) in the pseudocode. But if we add up the cost of a

sequence of 𝑛 operations, or in another word, aggregate, things are different.

Since each pushed element is popped at most once in the pseudo code, the aggregated pop cost

is bounded by 𝑂(𝑛). The amortized cost for one operation (regardless of operation type) is

therefore 𝑂(1).

So the aggregate methods is simply adding the costs of a series of operations.

Incrementing a Binary Counter

A binary number 𝑥 is represented by an array 𝐴[0. . 𝑘 − 1] of bits. Here 𝑘 is the length of the

array. The lowest-order bit is in 𝐴[0].

Increment(A)

1. 𝑖 0

2. While 𝑖 < 𝑘 and 𝐴[𝑖] = 1

2.1 𝐴[𝑖] 0

2.2 𝑖 𝑖 + 1

3. If 𝑖 < 𝑘 then 𝐴[𝑖] 1

The time complexity is clearly bounded by 𝑂(𝑘) bit flips for each increment. So, if the counter is

increased 𝑛 times, the total time complexity is 𝑂(𝑛𝑘). Notice that to maintain a counter with

value at most 𝑛, 𝑘 = ⌈log2(𝑛 + 1)⌉ bits are sufficient.

But we can show that the amortized complexity for each increment is only 𝑂(1). This is much

better than 𝑂(𝑘) = 𝑂(log2 𝑛). We will provide three different analyses for this.

Aggregate method

For all the increments that increase the counter from 0 to 𝑛, the lowest bit is flipped each time.

The second lowest bit is flipped every other time. In general, the 𝑖-th bit is flipped every 2𝑖 times.

Thus, the total number of flips is bounded by ∑ ⌊𝑛

2𝑖⌋𝑘−1

𝑖=0 < 𝑛∑1

2𝑖𝑘−1𝑖=0 = 2𝑛 times.

3

Accounting method

Imagine the algorithm is some kind of a service company. We have to pay it $1 to flip a bit. We

just need to estimate how much we need to pay the algorithm to carry out all the computation.

Instead of paying the exact amount for each bit flip, we set up an “account”. We voluntarily pay

some additional dollars for some operations. But the extra dollars will be saved in the account

so that we can pay few dollars for some future operations.

Let’s examine one paying scheme: We pay $2 whenever we ask the algorithm to flip a bit to 1,

and $0 whenever we ask the algorithm to flip a bit to 0.

Since each bit flip costs $1, and a bit must be flipped to 1 first before it can be flipped to 0, our

account balance is always nonnegative. Thus, the total money we pay is an upper bound of the

actual charges, i.e., the number of bit flips.

How much would we pay? Examine the pseudo code. Each increment only sets one bit to 1. So,

we pay at most $2𝑛. Thus, 𝑛 increments need at most 2𝑛 bit flips.

A more formal description of the proof simply replaces the “dollar” analog with the “amortized

cost” for an operation. It will do three things:

1. Propose the amortized cost for each operation.

2. Prove that the total amortized cost is always no less than the total actual cost.

3. Upper bound the total amortized cost.

Potential Method

Potential method is essentially an advanced accounting method. But the “account balance” is

now a function of the data structure’s states. This function is called the “potential” of the data

structure.

Let 𝐷𝑖 indicate the data structure state after 𝑖 increments of the binary counter. Define the

potential function, Φ(𝐷𝑖), as the number of bits that are set to 1.

Let 𝑐𝑖 be the number of bits flipped at the 𝑖-th increment. Then 1 bit is flipped to 1 and 𝑐𝑖 − 1

bits are flipped to 0. Therefore the potential energy is changed by

Φ(𝐷𝑖) − Φ(𝐷𝑖−1) = 1 − (𝑐𝑖 − 1) = 2 − 𝑐𝑖 .

Add up the equation for every 𝑖 will give us

Φ(𝐷𝑛) − Φ(𝐷0) = 2𝑛 −∑𝑐𝑖

𝑛

𝑖=1

Since Φ(𝐷0) = 0 and Φ(𝐷𝑛) ≥ 0, we have ∑ 𝑐𝑖𝑛𝑖=1 < 2𝑛. So the amortized time complexity is

𝑂(1).

4

A more standard pattern for such proof is to define an “amortized cost” by adding the energy

change to the actual cost.

�̃�𝑖 = 𝑐𝑖 +Φ(𝐷𝑖) − Φ(𝐷𝑖−1).

Then we show that

1. Φ(𝐷𝑖) ≥ Φ(𝐷0) for every 𝑖. (Energy won’t be lower than initial state.)

2. �̃�𝑖 = 𝑐𝑖 + Φ(𝐷𝑖) − Φ(𝐷𝑖−1) = 2. (Amortized cost is bounded.)

And any provable bound for �̃�𝑖 is a bound for the amortized time complexity.

To summarize, a potential method does the following:

1. Defines a potential (energy) function Φ of the data structure. Φ is such that Φi ≥ Φ0.

2. For each operation, the “amortized cost” is the actual cost plus the change of potential

(energy). Bound the amortized cost.

Remark: The three methods – aggregate, accounting and potential – are just three different

ways to count (bound) the same amount. The difference is only technical.

Self-Adjusting Linked List

By using very simple data structures, we’ve already illustrated the three methods to estimate

the amortized cost. Next let us examine a less trivial data structure.

A linked list is a common data structure.

A query requires the comparison with each object in the list sequentially. If all objects are

queried with uniform frequencies, each query in a length-𝑛 linked list takes on average 𝑛

2

comparisons. But very often the objects stored in the linked list are queried with different

frequencies. For example, some English words have a much higher frequency than others. Can

we adjust the order of the list so that the query is faster?

Assume we know that element 𝑖 would be queried with probability 𝑝𝑖 > 0. The optimal ordering

for a static linked list will be such that 𝑝1 ≥ 𝑝2 ≥ ⋯ ≥ 𝑝𝑛. The expected number of comparisons

for each query is

5

𝑆𝑜𝑝𝑡 =∑𝑖 ⋅ 𝑝𝑖

𝑛

𝑖=1

The difficulty is that often before the queries finish, we do not know 𝑝𝑖. This knowledge is only

computed in hindsight. After all 𝑚 queries are completed,

𝑝𝑖 =# 𝑜𝑓 𝑖 𝑖𝑠 𝑞𝑢𝑒𝑟𝑖𝑒𝑑

𝑚.

But since the queries have already finished one cannot go back to adjust the order any more.

Under such a situation, the idea is to adjust the linked list on-the-fly. Each time a query is

performed, the order of the list is adjusted hoping that future queries will be faster. Can we

achieve anything closer to 𝑺𝒐𝒑𝒕?

Note that the adjustment can be based on all past queries, but does not have access to future

queries. This situation is very like the online algorithm. The previous question asks us to use a

dynamic online data structure to compete with a static offline data structure. So, it is not strictly

the competitive analysis in online algorithm.

Possible Strategies:

- MTF (Move to Front): The found element is moved to the front of the list.

- Transpose: The found element is swapped with the previous element.

Note that transpose is easy to implement for array backed list. But unfortunately it is disastrous

in terms of competitive ratio. Keep querying the last two elements alternatively will make the

competitive ratio as bad as 𝑛.

Theorem: For any access sequence, the cost of MTF satisfies 𝑐𝑜𝑠𝑡𝑀𝑇𝐹 ≤ 2 ⋅ 𝑆𝑜𝑝𝑡 +𝑛(𝑛−1)

2. Thus,

for sufficiently long sequence, MTF is a 2-competitive algorithm against the optimal static list.

Proof: For any access sequence, suppose the optimal static list is 𝐿𝑜𝑝𝑡 .

Suppose at a moment, MTF gives a list 𝐿𝑀𝑇𝐹 . 𝐿𝑀𝑇𝐹 is a permutation of 𝐿𝑜𝑝𝑡 . Define the potential

function Φ(𝐿𝑀𝑇𝐹) as the number of reversals of the permutation:

Φ = |{(𝑎, 𝑏) |𝑎 and 𝑏 are ordered differently in 𝐿𝑜𝑝𝑡 and 𝐿𝑀𝑇𝐹}|

Let 𝑜 be the next accessed object, which is ranked at place 𝑖 in 𝐿𝑀𝑇𝐹 and 𝑗 in 𝐿𝑜𝑝𝑡 . We examine

the potential change caused by move 𝑜 to the front of 𝐿𝑀𝑇𝐹 .

6

For every object 𝑥 in 𝐿𝑀𝑇𝐹 after 𝑜, the MTF does not change Φ.

For every object 𝑥 in 𝐿𝑀𝑇𝐹 before 𝑜, whether (𝑥, 𝑜) are reversed will become the opposite way.

Among these elements, suppose 𝑘 of them are not reversed before (so there are 𝑖 − 𝑘 − 1

reversed). After MTF, there would be 𝑘 elements reversed. Note that because 𝑜 was at 𝑗-th

position in 𝐿𝑜𝑝𝑡 , we know that 𝑘 < 𝑗. Thus,

ΔΦ = 𝑘 − (𝑖 − 𝑘 − 1) = 2𝑘 − 𝑖 + 1 < 2𝑗 − 𝑖 + 1

Let 𝑐 denote the cost for accessing 𝑖. Define the amortized cost �̃� = 𝑐 + ΔΦ. Then

�̃� = 𝑐 + ΔΦ < 𝑖 + (2𝑗 − 𝑖 + 1) = 2𝑗.

Adding up all the amortized cost will give us

𝑐𝑜𝑠𝑡𝑀𝑇𝐹 +Φfinal −Φinit < 2 ⋅ 𝑆𝑂𝑃𝑇

Since Φinit ≤𝑛(𝑛−1)

2, the theorem is proved.

QED.

So we have shown that MTF can compete with an offline optimal static list with a competitive

ratio 2. How about the competitiveness against an offline optimal dynamic list?

If the cost function is just the number of comparisons, then we have a problem. Because the

offline algorithm can always move the next accessed object to the front. So, we’ll have to charge

for the element movement too. If the cost is defined to be the element distance from the front

plus the distance of element movements, then MTF is 4-competitive against an offline adversary.

Note that this is the situation of an array backed list. We ignore the proof here.

Union-Find

A practical question: In the following network, are the two big dots connected?

7

Suppose there are 𝑛 vertices and 𝑚 edges in the graph. Doing search for each query isn’t wise –

would cost 𝑂(𝑛2) time to compute a shortest path. Notice that connectivity relation is transitive.

All the connected vertices form an equivalence class. Such query can be answered with the

following index structure:

Put the 𝑛 vertices into 𝑘 disjoint subsets (equivalence classes under the connectivity

relation).

Find: Given any vertex, query the subset id (or a representative) that holds the vertex.

Then, one can determine if two vertices are connected by comparing their subset ids. The index

can be implemented with an array 𝐴 where 𝐴[𝑖] keeps the ID of the subset that contains

element 𝑖.

For example, the following array indicates six sets: {0}, {1}, {2,3,4,9}, {5,6}, {7}, {8}.

Real applications of this include

Computers in a network

Web pages on the Internet

Transistors in a computer chip

Variable name aliases.

Pixels in a digital photo

Minimum spanning tree

8

Note that if the graph does not change, then computing the connected component can be done

by a depth-first search. The time complexity to build the subsets takes 𝑂(𝑛 +𝑚) time. But if the

graph changes dynamically, then rebuilding the index from scratch isn’t wise. So we’d like to

have a dynamically changing data structure that supports the adding of new edges to the graph.

We don’t deal with deletion.

UnionFind Data Structure: Maintains a collection of disjoint sets subject to the following two

operations:

1. Union(𝐴, 𝐵): Replace two sets 𝐴 and 𝐵 with one new set 𝐴 ∪ 𝐵.

2. Find(𝑒): Return the subset that contains 𝑒.

Thus, adding an edge can be achieved by first finding the two sets containing the two vertices,

and then union of the two sets.

Recall the MST algorithm:

Minimum Spanning Tree

1. 𝑇 ∅

2. Sort edges according to weight 𝑒1, … , 𝑒𝑚

3. For 𝑖 from 1 to 𝑚

3.1 if 𝑒𝑖 connects two connected components in 𝑇

3.2 𝑇 = 𝑇 ∪ {𝑒𝑖}

Clearly the connected components in 𝑇 can be maintained with a UnionFind data structure.

Then adding an edge will union the two sets that contain the new edge’s two vertices,

respectively. The amortized efficiency of the two operations will affect the MST algorithm’s

complexity. The MST algorithm takes 𝑂(𝑚 log𝑚) + 2𝑚 ⋅ 𝐹𝑖𝑛𝑑𝑠 + 𝑛 ⋅ 𝑈𝑛𝑖𝑜𝑛𝑠.

Data Structure I:

If we still use array as our data structure, then Find takes 𝑂(1) time.

Union(A,B) would require rewriting the set ID for all elements in one of A and B. That would take

𝑂(|𝐴|) time. The worst case is that we keep adding a single-element set B to A, which would

cost 𝑂(𝑛2) for 𝑛 union operations.

Data Structure II:

In addition to data structure I, additionally maintain a size array for each subset. During the

union operation, rewrite the id for the smaller of A and B.

9

Examine the 𝑁 − 1 union operations that are responsible to a set with size 𝑁. Whenever an

element’s set ID is rewritten, its set’s size is at least doubled. So, an element’s set ID is rewritten

by at most log2𝑁 times. The total time spent for all the 𝑁 elements is then 𝑁 log2𝑁. Average it

on each union operation, and repeat the argument on each set. The amortized time complexity

for each union operation is 𝑂(log 𝑛) if there are 𝑛 elements in total.

Note that this data structure II is good enough for the MST algorithm. The sorting of edges start

to dominate the time complexity. However, when the graph is unweighted or when the weights

can be sorted with a linear time sorting algorithm, it still makes sense to improve the data

structure.

Data Structure III: (Disjoint set forest)

Use a tree to represent a set.

Find: Starting from an element, keep checking parent until find the root; return the root.

Union(A,B): Put B’s root as a child of A’s root.

Example: Union(9,6) will merge the two trees as one.

Time complexity: Union takes 𝑂(1) time. But Find may take 𝑂(𝑛) because the tree depth can be

very large.

Data Structure IV: (Disjoint forest with union by rank heuristics)

In addition to data structure III, maintain a height value at the root of each tree. When union

needs to merge two trees, add the lower tree as the taller tree’s root’s child.

10

Instead of using “height”, let us call this value as the “rank”. The rank of a single element tree is

initialized as 0. Then rank is updated as follows during union operations.

Clearly in this data structure, rank is equal to the height of a vertex. But in the data structure we

will learn later, rank and height may differ.

Lemma. A rank 𝑟 vertex has at least 2𝑟 descendants.

Proof: Lemma is true for 𝑟 = 0. Prove 𝑟 > 0 by induction. Getting the first rank 𝑟 tree will have

to be the union of two rank 𝑟 − 1 trees. By induction a height 𝑟 − 1 tree has at least 2𝑟−1

descendants, thus a rank 𝑟 tree must have at least 2𝑟 descendants.

QED.

By the lemma, the rank of a tree is at most log2 𝑛. Thus, the Find takes 𝑂(log2 𝑛) in this new

data structure.

Compare data structures II and IV. II is fast at find and slow at union, but IV is fast at union but

slow at find.

Data Structure V: (Disjoint-set forest with union by rank and path compression heuristics)

This is a further improvement over IV.

During a find operation, we link each node encountered on the parenting path to the root. This

heuristics, named path compression, takes little additional time but makes future find

operations easier.

Note that this data structure is very easy to implement. But the analysis of its performance is

exceedingly difficult. The data structure was first described in (Galler, Bernard A.; Fischer,

11

Michael J., An improved equivalence algorithm, Communications of the ACM 7 (5): 301–303,

1964). But the proof of the first precise analysis appeared in (Tarjan, Robert Endre. Efficiency of

a Good But Not Linear Set Union Algorithm. Journal of the ACM 22 (2): 215–225. 1975). Here we

give the main theorem without proof.

The Ackermann function is defined on 𝑚 ≥ 0 and 𝑛 ≥ 1.

𝐴(𝑚, 𝑛) = {

𝑛 + 1, if 𝑚 = 0𝐴(𝑚 − 1,1), if 𝑚 > 0 and 𝑛 = 0

𝐴(𝑚 − 1, 𝐴(𝑚, 𝑛 − 1)), if 𝑚 > 0 and 𝑛 > 0

Theorem: Using the disjoint-set forest with union by rank and path compression heuristics, 𝑚

operations on 𝑛 elements takes 𝑂(𝑚 ⋅ 𝛼(𝑚, 𝑛)) time. Here

𝛼(𝑚, 𝑛) = min {𝑘: 𝐴 (𝑘, ⌊𝑚

𝑛⌋) ≥ log2 𝑛}

is the inverse Ackermann function.

Essentially, 𝛼(𝑚, 𝑛) ≤ 4 for most practical cases (when 𝑛 ≤ 265533). So, 𝛼(𝑚, 𝑛) is almost a

constant.

Here we prove a weaker result but the proof is simper.

Notation: Denote 𝑡(𝑘) = 22⋯2

⏟𝑘 times

. 𝑡(𝑘) is the tower of 2 function. Let

log∗ 𝑛 = min{𝑘: log log… log 𝑛⏟ ≤ 1𝑘 times

}

be the number of times we need to apply logarithm to reduce 𝑛 to be less than 1.

Theorem: Using the disjoint-set forest with union by rank and path compression heuristics, 𝑚

operations on 𝑛 elements takes 𝑂(𝑚 ⋅ log∗ 𝑛) time.

n 2 3. .4 5. .16 17. .65536 65537. . 265536 log*(n) 1 2 3 4 5

So, the result is not much weaker.

Proof:

First a simple fact:

Claim 1: If 𝑢 is 𝑣’s parent, then 𝑟𝑎𝑛𝑘(𝑣) < 𝑟𝑎𝑛𝑘(𝑢).

This is clear from the rank assignment during union operations.

Second, without loss of generality we can assume there are 𝑛 − 1 union operations that put

everything in a single tree. Otherwise we can prove the theorem on each tree separately, and

12

add things up. A consequence of this assumption is 𝑚 ≥ 𝑛, whre 𝑚 is the number of union and

find operations.

We will use potential analysis. Let us define the Φ function. For each vertex 𝑣, let 𝑝(𝑣) be its

parent. 𝑣 contributes to the potential function by 1 iff:

1. 𝑝(𝑣) exists and is not the root of a tree.

2. log∗(𝑟𝑎𝑛𝑘(𝑣)) = log∗(𝑟𝑎𝑛𝑘(𝑝(𝑣)).

The potential function Φ is the sum of the contributions from all vertices. Clearly Φ starts off

with 0 and ends up with nonnegative.

The idea behinds this definition is simple: In a Find operation, we will travel along the parenting

path, and do path compression. Each time we reconnect a pointer, Φ is usually reduced by 1.

This energy reduction is used to pay the cost of the reconnection. The only reconnections that

are not paid for are those 𝑣 such that log∗(𝑟𝑎𝑛𝑘(𝑣)) < log∗(𝑟𝑎𝑛𝑘(𝑝(𝑣)). But there are at

most log∗ 𝑛 of them. Since there are at most 𝑚 find operations, then the cost is bounded by

𝑂(𝑚 ⋅ log∗ 𝑛) plus the total energy reduction during the process.

How much total energy reduction can it happen? Since Φ starts off with 0 and ends up with

nonnegative, the total reduction is bounded by the total increment, which only happens when a

union operation is performed. If we can bound the total increment by 𝑂(𝑛 log∗ 𝑛), we actually

proved that the total cost is bounded by 𝑂(𝑚 log∗ 𝑛 + 𝑛 log∗ 𝑛) = 𝑂(𝑚 log∗ 𝑛).

Now check a union operation, the energy Φ increases only due to the direct children of B’s root.

Let 𝑤 be root of B. Before union, 𝑤’s children contribute 0 to Φ. After union, Φ is increased by 1

for each of 𝑤’s child 𝑢 satisfying log∗(𝑟𝑎𝑛𝑘(𝑢)) = log∗(𝑟𝑎𝑛𝑘(𝑤)).

13

Each time a non-root node 𝑢’s energy is increased this way, its direct parent’s rank is increased

by 1. Let 𝑢 be a non-root node such that log∗(𝑟𝑎𝑛𝑘(𝑢)) = 𝑘. Then 𝑢 will make at most 𝑡(𝑘) −

𝑡(𝑘 − 1) ≤ 𝑡(𝑘) such contributions until its parent’s rank becomes strictly larger than 𝑡(𝑘).

Thus, the total increase of Φ is upper bounded by ∑ 𝑡(log∗(𝑟𝑎𝑛𝑘(𝑢)))𝑢 . Thus, to prove the

theorem, we only need to prove the following technical lemma.

Lemma:

∑𝑡(log∗(𝑟𝑎𝑛𝑘(𝑢)))

𝑢

≤ 𝑛 log∗ 𝑛

Now the proof becomes purely technical. We group all the vertices into log∗ 𝑛 groups. Group 𝑘

contains all the vertices such that log∗(𝑟𝑎𝑛𝑘(𝑢)) = 𝑘. Then 𝑡(log∗(𝑟𝑎𝑛𝑘(𝑢))) = 𝑡(𝑘) for all

vertices in group 𝑘. Notice that

∑𝑡(log∗(𝑟𝑎𝑛𝑘(𝑢)))

𝑢

= ∑ ∑ 𝑡(𝑘)

𝑢∈𝑔𝑟𝑜𝑢𝑝 𝑘

log∗ 𝑛

𝑘=1

= ∑ 𝑡(𝑘) ⋅ (# 𝑜𝑓 𝑢 𝑖𝑛 𝑔𝑟𝑜𝑢𝑝 𝑘)

log∗ 𝑛

𝑘=1

,

the only remaining thing is to prove that the size of group 𝑘 is at most 𝑛

𝑡(𝑘).

Claim 2. The number of vertices with rank 𝑟 is no more than 𝑛

2𝑟.

Proof of Claim 2: This is because a vertex of rank 𝑟 has 2𝑟 descendants and vertices of rank 𝑟

have disjoint descendants.

Thus,

𝑠𝑖𝑧𝑒 𝑜𝑓 𝑔𝑟𝑜𝑢𝑝 𝑘 ≤ ∑𝑛

2𝑟

𝑡(𝑘)

𝑟=𝑡(𝑘−1)+1

≤𝑛

2𝑡(𝑘−1)+1⋅∑

1

2𝑟

𝑟=0

≤𝑛

2𝑡(𝑘−1)=

𝑛

𝑡(𝑘)

Thus the Lemma is proved. And therefore the theorem.

QED


Recommended