Union-Find Disjoint Set• An ADT(Abstract Data Type) used to manage
disjoint sets of elements.• Operations supported
• Union: Given 2 elements, if they belong to different sets, merge the 2 sets into 1
• Find: Given an element, find the set it belongs to
9/7/15 2
1
2
5
6
47
3Union(1, 2) 1
2
56
473
1
2
5
6
47
3Find(5) 1
2
5
6
47
3
Implementing Union-Find Disjoint Set• Data structure for Disjoint Set• Algorithms for Union and Find operations• Run time analysis & reflection
9/7/15 3
Version 1(Quick Find)• Data Structure
• Each Set is represented as a list, and the head of the list is used as the representative(identifier) of the set
• To facilitate find, each node contains a pointer to its representative
• The representative records the number of elements in the set
9/7/15 4
Version 1(Quick Find)• Union(x,y)
• Merge all elements in the smaller set to the larger set• Update the number of elements in the combined set • Redirect the representative pointer in all the elements of the
smaller set• Find(x)
• If the representative pointer of x is null, return itself• Else, return its representative pointer
9/7/15 5
Example of Version 1
9/7/15 6
1 2 3 4 5 6 7 8 9Initial state
1
2
3 4 5 6 7 8 9Union(2,6)
1
2
4 5 6 7 8 9
3
Union(3,8)
1
2
4 5 6 7 8
93
Union(3,9)
1 4 5 7 8
93
Union(2,3)
26
Analysis of Version 1•
9/7/15 7
Analysis of Version 1• Amortized run time of Union(cont’)
• If an element is still contained in a singleton, we say it is still not touched by the Union operations; or else, it is touched
• k Union operations touch at most 2*k elements• If an Union operation merges 2 singletons, it adds 2
untouched elements• If it merges a singleton to an non-singleton, it adds 1
untouched element• If it merges 2 non-singletons, it adds 0 untouched
elements
9/7/15 8
Analysis of Version 1• Amortized run time of Union(cont’)
• The total run time of k Union operations is at most 2 * k * log n• The amortized run time = 2 * k * log n / k = O (log n)
9/7/15 9
Reflection on Version 1• It is because we have to update all the
representative pointers in the smaller set that makes Union operation slow
• What if we only update the representative pointer of the representative of the smaller set?
9/7/15 10
Version 2(Quick Union)• Data Structure
• Each element has a pointer to the representative of the set it was merged to(if any)
• Each element records the size of the set of which it was the representative
• Next pointer is not needed any more because we do not have to update the representative pointer of all elements in the smaller set
• Data representation actually becomes tree structure9/7/15 11
Version 2(Quick Union)• Union(x, y)
• Find the representatives of x and y using Find• Redirect the representative pointer of the smaller
set to the representative of the larger set• Find(x)
• Follow the representative pointer chain until root and return root.
9/7/15 12
Example of Version 2/Union
9/7/15 13
1 2 3 4 5 6 7 8 9Initial state
1
2
3 4 5 6 7 8 9Union(2,6)
1
2
4 5 6 7 8 9
3
Union(3,8)
1
2
4 5 6 7 8
93
Union(3,9)
13
Example of Version 2/Union
9/7/15 14
1
2
4 5 6 7 8
93
Union(3,9)
1 4 5 7 8
93
Union(2,3)
6
2
Analysis of Version 2•
9/7/15 15
T1 T2
T
Analysis of Version 2• If the size of the set is n, the height of its tree
representation is at most log n + 1• Proof by contradiction using the previous characteristic
• The worst run time of both Union and Find is O(log n)
• The amortized run time is better
9/7/15 16
log n
Find
Reflection on Version 2• If we do m Finds on element e, the total time is
m * log n; if we redirect the representative pointer to root in the first Find, the total time becomes shorter.
9/7/15 17
Version 3(Path Compression)• Data Structure
• The same as Version 2• Union
• The same as Version 2• Find(x)
• For all the node in the path from x to root, redirect their representative pointer to root
9/7/15 18
Example
9/7/15 19
13
3
1 7 10
2 4
6
5 9 11 12
14
15
13
31
7 102
4
6
5 9 11 12
14
15
Find(4)
Example
9/7/15 20
13
31
7 102
4
6
5 9 11 12
14
15
13
31
7
10
2
4
6
5 9
11
12
14
15Find(12)
Run time analysis of Version 3
9/7/15 21
nearly O(1)
Review of non-recursive algorithm analysis
S=0;for (j = 1; j <= n; j++) { for (k=j; k <= n; k++){ S++; }}
+
+
+
=
11/15/2015 22
Analyzing recursive algorithms
• F1(A, k1, k2): m = (k2-k1+1)/2; if (m <= 0): return; B = new a vector of size m; for (j = 1; j < m; j++): B[j] = A[k1+2*j] – A[k1+2*j-1];
F1(A, k1, k1+m); F1(A, k1+m+1, k2); F1(B, 1, m);
+
+
+
+
=
Recurrence
11/15/2015 23
Recursion tree(1)
•
……
11/15/2015 24
Important properties of logarithm
•
11/15/2015 25
Recursion tree(2)
•
……
Property 2
11/15/2015 26
Recursion tree(3)
•
Property 2
11/15/2015 27
Recursion tree(4)
•
11/15/2015 28
Example
•
11/15/2015 29
Recursion tree
……
•
11/15/2015 30
31
Many recurrence relations arising from divide-and-conquer algorithms have the form:
T(n) = aT(n/b) + f(n)where a ≥1, b>1 are constants and f is asymptotically positive
• We create a problem instances, each of size n/b
• Setting up the problem instance to recurse on and combining sub-solutions returned takes f(n) work
32
33
34
35
Solving T(n) = aT(n/b) + f(n)
36
f(n) grows polynomially slower than g(n)
● Does a polynomial nε separate f(n) and g(n)?
37
T(n) = aT(n/b) + f(n)
38
39
T(n) = aT(n/b) + f(n)
40
41
T(n) = aT(n/b) + f(n)
42
f(n) grows polynomially faster than g(n)
●
43
T(n) = T(n/2) + n log n = Ɵ (n log n)
a=1, b=2, n logb a = O(1), f(n) = n log n
regularity condition: 1*n/2*log (n/2) < n/2 log n
we have c=1/2<1
T(n) = 4T(n/2) + n3 = Ɵ(n3)
a=4, b=2, n logb a = n2, f(n) = n3
regularity condition: 4(n/2)3 = n3/4; we have c=1/4 < 1
44
45
Outside the Master Theorem
T(n) = T(√n) + c = O(log log n)T(n) = T(n/4) + T(n/2) + n2 = O(n2)T(n) = 2T(n-1) + 1 = O(2n)f(n) = f(n-1) + f(n-2) T(n) = 4T(n/2) + n2/log n ● Has the right form, but ...● compare n2 and n2/log n: f(n) is smaller
by a factor of log n, not a polynomial