Equivalence Classes
Frequently, we declare an equivalence relation on a set S. So the elements of the set S are partitioned into equivalence classes. Two equivalence classes are either the same or disjoint.
Example: S={1,2,3,4,5,6}
Equivalence classes: {1}, {2,4}, {3,5,6}.
The set S is partitioned into equivalence classes.
Examples
Is there a path from A to B?
Connected components form a partition of the set of nodes.
Network connectivity
Basic abstractions
• set of objects
• union command: connect two objects
• find query: is there a path connecting one object to another?
4
Kruskal’s Minimum Spanning Tree Algorithm
The vertices are partitioned into a forest of trees.
!
Need: Efficient way to dynamically change the equivalence relation.
Disjoint Sets
How can we represent the elements of disjoint sets?
Declare a representative element for each set.
Implementation: Inverted trees. Each element points to parent. Root of the tree is the representative element.
Example
Intro Aggregate Charging Potential Table resizing Disjoint sets Inverted Trees Union by Depth Threaded Trees Path compression
Disjoint Sets
Represent disjoint sets as “inverted trees”
Each element has a parent pointer ⇡
To compute the union of set A with B, simply make B’s root the
parent of A’s root.
Figure 5.5 A directed-tree representation of two sets {B,E} and {A,C,D,F,G,H}.
E H
B C F
A
D
G18 / 33
Disjoint Set Operations
Makeset(x), a procedure to form the set {x}
Find(x), a procedure to find the representative element of the set containing x.
Union(A,B), form the union of the sets A and B.
Disjoint Set Operations (Simple)makeset(x)
π(x) = x
find(x)
while( x != π(x) ) do
x = π(x) // find rep.
end
return x
union(x, y)
a = find(x)
b = find(y)
π(b) = a
!
Complexity of Simple Scheme
makeset(x): O(1) time
find(x): O(n) for sets of cardinality n in the worst case.
union(x): O(1) for root element, O(n) worst case.
Disjoint Set Operations (Better)
makeset(x)
π(x) = x
rank(x) = 0
find(x)
while( x != π(x) ) do
x = π(x)
end
return x
union(x, y)
a = find(x); b = find(y)
return if a = b
if rank(a)>rank(b): π(b) = a
else // rank(a) <= rank(b)
π(a) = b // make b the root
if rank(a)=rank(b): rank(b)++
The rank of a node is the height of the subtree rooted at that node.
Disjoint Sets (Better) Intro Aggregate Charging Potential Table resizing Disjoint sets Inverted Trees Union by Depth Threaded Trees Path compression
Disjoint Sets with Union by Depth (2)
Figure 5.6 A sequence of disjoint-set operations. Superscripts denote rank.
After makeset(A),makeset(B), . . . ,makeset(G):
A0 B0 C0 D0 E0 F0 0G
After union(A,D),union(B,E),union(C,F ):
A0 B0 C0
G0F1E1D1
21 / 33
Disjoint Sets (Better) Intro Aggregate Charging Potential Table resizing Disjoint sets Inverted Trees Union by Depth Threaded Trees Path compression
Disjoint Sets with Union by Depth (3)After union(C,G),union(E,A):
B
1
F1
C 0G
0
E
D2
A0 0
After union(B,G):
A
G0
FE1
0
C0
D2
B0
1
22 / 33
Complexity of Better Scheme
makeset(x): constant time
union(x,y): constant time if x and y are roots
find(x): Number of nodes of rank k never exceeds n/2k. So find needs at most O(log N) time.
Improving FindIntro Aggregate Charging Potential Table resizing Disjoint sets Inverted Trees Union by Depth Threaded Trees Path compression
Improving performance
Idea: Why not force depth to be 1? Then will have O(1)
complexity!
Approach: Threaded Trees
a
b c d
p
q r
a
p q r b c d
Problem: Worst-case complexity of becomes O(n)
Solution:
Merge smaller set with larger set
Amortize cost of over other operations
25 / 33
Disjoint Sets w/ Threaded TreesIntro Aggregate Charging Potential Table resizing Disjoint sets Inverted Trees Union by Depth Threaded Trees Path compression
Sets w/ threaded trees: Amortized analysis
Other than cost of updating parent pointers, costs O(1)
Idea: Charge the cost of updating a parent pointer to an element.
Key observation: Each time an element’s parent pointerchanges, it is in a set that is twice as large as before
So, with n operations, you can at most O(log n) parent pointer updates
per element
Thus, amortized cost of n operations, consisting of some mix of
, and is at most n log n
26 / 33
Quo Vadis?
Threaded trees are better for find, but not so great for union.
The previous scheme was better for union, but not so great for find.
Can we formulate an eager approach for find and a lazy approach for union, getting the best of both worlds?
Path Compression
Intro Aggregate Charging Potential Table resizing Disjoint sets Inverted Trees Union by Depth Threaded Trees Path compression
Further improvement
Can we combine the best elements of the two approaches?Threaded trees employ an eager approach to while the originalapproach used a lazy approachEager approach is better for , while being lazy is better for .
So, why not use lazy approach for and eager approach for ?
Path compression: Retains lazy , but when ais called, eagerly promotes x to the level beloe the root
Actually, we promote x, ⇡(x), ⇡(⇡(x)), ⇡(⇡(⇡(x))) and so on.
As a result, subsequent calls to find x or its parents become cheap.
From here on, we let rank be defined by the union algorithm
For root node, rank is same as depthBut once a node becomes a non-root, its rank stays fixed,even when path compression decreases its depth.
27 / 33
ExampleIntro Aggregate Charging Potential Table resizing Disjoint sets Inverted Trees Union by Depth Threaded Trees Path compression
Disjoint sets w/ Path compression: Illustration
(I) followed by (K)
B0
D0
I0 J0 K0
H0
C1
1 G1
A3
F
E2 B0
0D
K0
J0
I0
H0
C1 F1
G1
A3
E2
B0
D H0 J0
I0 K0 G1C1 F1E2
A
0
3
28 / 33
Log*Intro Aggregate Charging Potential Table resizing Disjoint sets Inverted Trees Union by Depth Threaded Trees Path compression
Sets w/ Path compression: Amortized analysis
Amortized cost per operation of n set operations is O(log⇤ n) where
log⇤ x = smallest k such that log(log(· · · log| {z }k times
(x) · · · )) = 1
Note: log⇤(x) 5 for virtually any n of practical relevance.
Specifically,
log⇤(265536) = log⇤(22222
) = 5
Note that 265536 is approximately a 20, 000 digit decimal number.
We will never be able to store input of that size, at least not in our
universe. (Universe contains may be 10100 elementary particles.)
So, we might as well treat log⇤(n) as O(1).
29 / 33
Path CompressionIntro Aggregate Charging Potential Table resizing Disjoint sets Inverted Trees Union by Depth Threaded Trees Path compression
Path compression: Amortized analysis (2)For n operations, rank of any node falls in the range [0, log n]
Divide this range into following groups:
[1], [2], [3–4], [5–16], [17–216], [216 + 1–265536], . . .
Each range is of the form [k–2k�1]
Let G(v) be the group rank(v) belongs to: G(v) = log⇤(rank(v))
Note: when a node becomes a non-root, its rank never changes
Key Idea
Give an “allowance” to a node when it becomes a non-root. This
allowance will be used to pay costs of path compression operations
involving this node.
For a node whose rank is in the range [k–2k�1], the allowance is 2k�1.
30 / 33
Amortized CostIntro Aggregate Charging Potential Table resizing Disjoint sets Inverted Trees Union by Depth Threaded Trees Path compression
Total allowance handed out
Recall that number of nodes of rank r is at most n/2r
Recall that a node of rank is in the range [k–2k�1] is given an
allowance of 2k�1.
Total allowance handed out to nodes with ranks in the range
[k–2k�1] is therefore given by
2k�1⇣ n2k
+n
2k+1+ · · ·+ n
22k�1
⌘ 2k�1 n
2k�1= n
Since total number of ranges is log⇤ n, total allowance granted to
all nodes is n log⇤ n
We will spread this cost across all n operations, thus contributing
O(log⇤ n) to each operation.31 / 33
Amortized Cost 2Intro Aggregate Charging Potential Table resizing Disjoint sets Inverted Trees Union by Depth Threaded Trees Path compression
Paying for all ’s
Cost of a equals # of parent pointers followed
Each pointer followed is updated to point to root of current set.
Key idea: Charge the cost of updating ⇡(p) to:Case 1: If G(⇡(p)) 6= G(p), then charge it to the currentoperationCan apply only log⇤ n times: a leaf’s G-value is at least 1, and the root’s
G-value is at most log⇤ n.
Adds only log⇤ n to cost of
Case 2: Otherwise, charge it to p’s allowance.Need to show that we have enough allowance to to pay each time this case
occurs.
32 / 33
Amortized Cost 3Intro Aggregate Charging Potential Table resizing Disjoint sets Inverted Trees Union by Depth Threaded Trees Path compression
Paying for all ’s (2)If ⇡(p) is updated, then the rank of p’s parent increases.
Let p be involved in a series of ’s, with qi being its parent after the
ith . Note
rank(p) < rank(q0) < rank(q1) < rank(q2) < · · ·
Let m be the number of such operations before p’s parent has a higher
G-value than p, i.e., G(p) = G(qm) < G(qm+1).
Recall thatA G(p) = r then r corresponds to a range [k–2k�1] wherek rank(p) 2k�1. Since G(p) = G(qm), qm 2k�1
The allowance given to p is also 2k�1
So, there is enough allowance for all promotions up to m.
After m+ 1th , the operation will pay for pointer updates, as
G(⇡(p)) > G(p) from here on.33 / 33