Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | dinah-henry |
View: | 232 times |
Download: | 4 times |
Analysis of AlgorithmsCS 477/677
Sorting – Part BInstructor: George Bebis
(Chapter 7)
2
Sorting
• Insertion sort– Design approach:– Sorts in place:– Best case:– Worst case:
• Bubble Sort– Design approach:– Sorts in place:– Running time:
Yes(n)
(n2)
incremental
Yes(n2)
incremental
3
Sorting
• Selection sort– Design approach:– Sorts in place:– Running time:
• Merge Sort– Design approach:– Sorts in place:– Running time:
Yes
(n2)
incremental
NoLet’s see!!
divide and conquer
4
Divide-and-Conquer
• Divide the problem into a number of sub-problems
– Similar sub-problems of smaller size
• Conquer the sub-problems
– Solve the sub-problems recursively
– Sub-problem size small enough solve the problems in
straightforward manner
• Combine the solutions of the sub-problems
– Obtain the solution for the original problem
5
Merge Sort Approach
• To sort an array A[p . . r]:
• Divide– Divide the n-element sequence to be sorted into two
subsequences of n/2 elements each
• Conquer
– Sort the subsequences recursively using merge sort
– When the size of the sequences is 1 there is nothing
more to do
• Combine
– Merge the two sorted subsequences
6
Merge Sort
Alg.: MERGE-SORT(A, p, r)
if p < r Check for base case
then q ← (p + r)/2 Divide
MERGE-SORT(A, p, q) Conquer
MERGE-SORT(A, q + 1, r) Conquer
MERGE(A, p, q, r) Combine
• Initial call: MERGE-SORT(A, 1, n)
1 2 3 4 5 6 7 8
62317425
p rq
7
Example – n Power of 2
1 2 3 4 5 6 7 8
q = 462317425
1 2 3 4
7425
5 6 7 8
6231
1 2
25
3 4
74
5 6
31
7 8
62
1
5
2
2
3
4
4
7 1
6
3
7
2
8
6
5
Divide
8
Example – n Power of 2
1
5
2
2
3
4
4
7 1
6
3
7
2
8
6
5
1 2 3 4 5 6 7 8
76543221
1 2 3 4
7542
5 6 7 8
6321
1 2
52
3 4
74
5 6
31
7 8
62
ConquerandMerge
9
Example – n Not a Power of 2
62537416274
1 2 3 4 5 6 7 8 9 10 11
q = 6
416274
1 2 3 4 5 6
62537
7 8 9 10 11
q = 9q = 3
274
1 2 3
416
4 5 6
537
7 8 9
62
10 11
74
1 2
2
3
16
4 5
4
6
37
7 8
5
9
2
10
6
11
4
1
7
2
6
4
1
5
7
7
3
8
Divide
10
Example – n Not a Power of 2
77665443221
1 2 3 4 5 6 7 8 9 10 11
764421
1 2 3 4 5 6
76532
7 8 9 10 11
742
1 2 3
641
4 5 6
753
7 8 9
62
10 11
2
3
4
6
5
9
2
10
6
11
4
1
7
2
6
4
1
5
7
7
3
8
74
1 2
61
4 5
73
7 8
ConquerandMerge
11
Merging
• Input: Array A and indices p, q, r such that p ≤ q < r– Subarrays A[p . . q] and A[q + 1 . . r] are sorted
• Output: One single sorted subarray A[p . . r]
1 2 3 4 5 6 7 8
63217542
p rq
12
Merging
• Idea for merging:
– Two piles of sorted cards
• Choose the smaller of the two top cards
• Remove it and place it in the output pile
– Repeat the process until one pile is empty
– Take the remaining input pile and place it face-down
onto the output pile
1 2 3 4 5 6 7 8
63217542
p rq
A1 A[p, q]
A2 A[q+1, r]
A[p, r]
13
Example: MERGE(A, 9, 12, 16)p rq
14
Example: MERGE(A, 9, 12, 16)
15
Example (cont.)
16
Example (cont.)
17
Example (cont.)
Done!
18
Merge - Pseudocode
Alg.: MERGE(A, p, q, r)1. Compute n1 and n2
2. Copy the first n1 elements into L[1 . . n1 + 1] and the next n2 elements into R[1 . . n2 + 1]
3. L[n1 + 1] ← ; R[n2 + 1] ←
4. i ← 1; j ← 15. for k ← p to r6. do if L[ i ] ≤ R[ j ]7. then A[k] ← L[ i ]8. i ←i + 19. else A[k] ← R[ j ]10. j ← j + 1
p q
7542
6321rq + 1
L
R
1 2 3 4 5 6 7 8
63217542
p rq
n1 n2
19
Running Time of Merge(assume last for loop)
• Initialization (copying into temporary arrays):
(n1 + n2) = (n)
• Adding the elements to the final array:
- n iterations, each taking constant time (n)
• Total time for Merge: (n)
20
Analyzing Divide-and Conquer Algorithms
• The recurrence is based on the three steps of the paradigm:– T(n) – running time on a problem of size n– Divide the problem into a subproblems, each of size n/b: takes D(n)
– Conquer (solve) the subproblems aT(n/b) – Combine the solutions C(n)
(1) if n ≤ c
T(n) = aT(n/b) + D(n) + C(n)otherwise
21
MERGE-SORT Running Time
• Divide: – compute q as the average of p and r: D(n) = (1)
• Conquer: – recursively solve 2 subproblems, each of size n/2
2T (n/2)
• Combine: – MERGE on an n-element subarray takes (n) time
C(n) = (n)
(1) if n =1
T(n) = 2T(n/2) + (n) if n > 1
22
Solve the Recurrence
T(n) = c if n = 12T(n/2) + cn if n > 1
Use Master’s Theorem:
Compare n with f(n) = cnCase 2: T(n) = Θ(nlgn)
23
Merge Sort - Discussion
• Running time insensitive of the input
• Advantages:– Guaranteed to run in (nlgn)
• Disadvantage– Requires extra space N
24
Sorting Challenge 1
Problem: Sort a file of huge records with tiny keys
Example application: Reorganize your MP-3 files
Which method to use?A. merge sort, guaranteed to run in time NlgNB. selection sort
C. bubble sort
D. a custom algorithm for huge records/tiny keys
E. insertion sort
25
Sorting Files with Huge Records and Small Keys
• Insertion sort or bubble sort?
– NO, too many exchanges
• Selection sort?
– YES, it takes linear time for exchanges
• Merge sort or custom method?
– Probably not: selection sort simpler, does less swaps
26
Sorting Challenge 2
Problem: Sort a huge randomly-ordered file of small records
Application: Process transaction record for a phone company
Which sorting method to use?A. Bubble sort
B. Selection sort
C. Mergesort guaranteed to run in time NlgND. Insertion sort
27
Sorting Huge, Randomly - Ordered Files
• Selection sort?
– NO, always takes quadratic time
• Bubble sort?
– NO, quadratic time for randomly-ordered keys
• Insertion sort?
– NO, quadratic time for randomly-ordered keys
• Mergesort?
– YES, it is designed for this problem
28
Sorting Challenge 3
Problem: sort a file that is already almost in order
Applications:– Re-sort a huge database after a few changes– Doublecheck that someone else sorted a file
Which sorting method to use?A. Mergesort, guaranteed to run in time NlgNB. Selection sort
C. Bubble sort
D. A custom algorithm for almost in-order files
E. Insertion sort
29
Sorting Files That are Almost in Order
• Selection sort?– NO, always takes quadratic time
• Bubble sort?– NO, bad for some definitions of “almost in order”– Ex: B C D E F G H I J K L M N O P Q R S T U V W X Y Z A
• Insertion sort?– YES, takes linear time for most definitions of “almost
in order”
• Mergesort or custom method?– Probably not: insertion sort simpler and faster
30
Quicksort
• Sort an array A[p…r]
• Divide– Partition the array A into 2 subarrays A[p..q] and A[q+1..r],
such that each element of A[p..q] is smaller than or equal to
each element in A[q+1..r]
– Need to find index q to partition the array
≤A[p…q] A[q+1…r]
31
Quicksort
• Conquer
– Recursively sort A[p..q] and A[q+1..r] using
Quicksort
• Combine
– Trivial: the arrays are sorted in place
– No additional work is required to combine them
– The entire array is now sorted
A[p…q] A[q+1…r]≤
32
QUICKSORT
Alg.: QUICKSORT(A, p, r)
if p < r
then q PARTITION(A, p, r)
QUICKSORT (A, p, q)
QUICKSORT (A, q+1, r)
Recurrence:
Initially: p=1, r=n
PARTITION())T(n) = T(q) + T(n – q) + f(n)
33
Partitioning the Array
• Choosing PARTITION()
– There are different ways to do this
– Each has its own advantages/disadvantages
• Hoare partition (see prob. 7-1, page 159)
– Select a pivot element x around which to partition
– Grows two regions
A[p…i] x
x A[j…r]
A[p…i] x x A[j…r]
i j
34
Example
73146235
i j
75146233
i j
75146233
i j
75641233
i j
73146235
i j
A[p…r]
75641233
ij
A[p…q] A[q+1…r]
pivot x=5
35
Example
36
Partitioning the Array
Alg. PARTITION (A, p, r)
1. x A[p]2. i p – 13. j r + 14. while TRUE
5. do repeat j j – 16. until A[j] ≤ x7. do repeat i i + 18. until A[i] ≥ x9. if i < j10. then exchange A[i] A[j]11. else return j
Running time: (n)n = r – p + 1
73146235
i j
A:
arap
ij=q
A:
A[p…q] A[q+1…r]≤
p r
Each element isvisited once!
37
Recurrence
Alg.: QUICKSORT(A, p, r)
if p < r
then q PARTITION(A, p, r)
QUICKSORT (A, p, q)
QUICKSORT (A, q+1, r)
Recurrence:
Initially: p=1, r=n
T(n) = T(q) + T(n – q) + n
38
Worst Case Partitioning
• Worst-case partitioning
– One region has one element and the other has n – 1 elements
– Maximally unbalanced
• Recurrence: q=1
T(n) = T(1) + T(n – 1) + n,
T(1) = (1)
T(n) = T(n – 1) + n
=
2 2
1
1 ( ) ( ) ( )n
k
n k n n n
nn - 1
n - 2
n - 3
2
1
1
1
1
1
1
n
nnn - 1
n - 2
3
2
(n2)
When does the worst case happen?
39
Best Case Partitioning
• Best-case partitioning– Partitioning produces two regions of size n/2
• Recurrence: q=n/2
T(n) = 2T(n/2) + (n)
T(n) = (nlgn) (Master theorem)
40
Case Between Worst and Best
• 9-to-1 proportional splitQ(n) = Q(9n/10) + Q(n/10) + n
41
How does partition affect performance?
42
How does partition affect performance?
43
Performance of Quicksort
• Average case– All permutations of the input numbers are equally likely– On a random input array, we will have a mix of well balanced
and unbalanced splits– Good and bad splits are randomly distributed across throughout
the tree
Alternate of a goodand a bad split
Nearly wellbalanced split
nn - 11
(n – 1)/2(n – 1)/2
n
(n – 1)/2(n – 1)/2 + 1
• Running time of Quicksort when levels alternate between good and bad splits is O(nlgn)
combined partitioning cost:2n-1 = (n)
partitioning cost:n = (n)