Sorting and Selection - Carleton...

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s10-1

Sorting and Selection

• Introduction• Divide and Conquer• Merge-Sort• Quick-Sort• Radix-Sort• Bucket-Sort



Version 03.s10-2

Introduction

• Assuming we have a sequence S storing a list of key-element entries.

• The key of the element stored at rank i is key(i).• Sorting is a process that rearrange the elements in S so that:

– if i ≤ j then key(i) ≤ key(j) according to the total order relationassociated with the key.



Version 03.s10-3

Divide and Conquer

• Divide and Conquer is a designer pattern that allows us tosolve larger problems by decomposing them into smallerand manageable sub-problems.

• Steps for divide and conquer:– if the problem size is small enough, solve it using a straight

forward algorithm– divide the problem into two or more smaller sub-problems– recursively solve the sub-problems– combine the results of the sub-problems to obtain the result of the

original problem.



Version 03.s10-4

Sorting Algorithms Review

SortingAlgorithm

AveragePerformance

Worst CasePerformance Remarks

Bubble Sort O (n2) O (n2)

Insertion Sort O (n2) O (n2)

Selection Sort O (n2) O (n2)

Simple but slow

Heap Sort O (n log n) O (n log n) Fast but complicated



Version 03.s10-5

Merge Sort

• An efficient sorting algorithm based on divide andconquer.

• Algorithm:– Divide: If S has at leas two elements (nothing needs to be done if S

has zero or one elements), remove all the elements from S and putthem into two sequences, S1 and S2, each containing about half ofthe elements of S. (e.g. S1 contains the first n/2 elements and S2contains the remaining n/2 elements.)

– Recur: Recursively sort sequences S1 and S2.– Conquer: Put back the elements into S by merging the sorted

sequences S1 and S2 into a unique sorted sequence.



Version 03.s10-6

Merging Two Sequences

• But how can we merge two sorted sequences efficiently?• We can use the pseudo code shown in the next slides.



Version 03.s10-7


• Algorithm merge (S1, S2, S):• Input: Sequence S1 and S2 (on whose elements a total order

relation is defined) sorted in non-decreasing order, and anempty sequence S.

• Output: Sequence S containing the union of the elementsfrom S1 and S2 sorted in non-decreasing order; sequence S1and S2 become empty at the end of the execution.



Version 03.s10-8

Merging Two Sequenceswhile S1 is not empty and S2 is not empty do

if S1.first ().element () ≤ S2.first ().element () then{move the first element of S1 to the end of S}

S.insertLast (S1.remove (S1.first ()))

else

{move the first element of S2 to the end of S}


{move the remaining elements of S1 to S}

while S1 is not empty do


{move the remaining elements of S2 to S}

while S2 is not empty do




Version 03.s10-9




Version 03.s10-10




Version 03.s10-11




Version 03.s10-12




Version 03.s10-13




Version 03.s10-14




Version 03.s10-15




Version 03.s10-16




Version 03.s10-17




Version 03.s10-18

Analysis

• Proposition 1: The merge-sort tree (see text book fordetails about the merge-sort tree) associated with theexecution of a merge-sort on a sequence of n elements hasa height of logn

• Proposition 2: A merge sort algorithm sorts a sequence ofsize n in O(n log n) time

• The only assumption we have made is that the inputsequence S and each of the sub-sequences created by therecursive calls of the algorithm can access, insert to, anddelete from the first and last nodes in O(1) time.



Version 03.s10-19

Analysis

• We call the time spent at node v of merge-sort tree T therunning time of the recursive call associated with v,excluding the recursive calls sent to v’s children.

• If we let i represent the depth of node v in the merge-sorttree, the time spent at node v is O(n/2i ) since the size ofthe sequence associated with v is n/2i.

• Observe that T has exactly 2i nodes at depth i. The totaltime spent at depth i in the tree is then O(2in/2i), which isO(n). We know the tree has height log n

• Therefore, the time complexity is O(n log n)



Version 03.s10-20

Quick-Sort• A simple sorting algorithm also based on divide and

conquer.• Steps for divide and conquer:

– Divide : If the sequence S has 2 or more elements, select anelement x from S to be your pivot. Any arbitrary element, like thelast, will do. Remove all the elements of S and divide them into 3sequences:

• L, holds S’s elements less than x• E, holds S’s elements equal to x• G, holds S’s elements greater than x

– Recurse: Recursively sort L and G– Conquer: Finally, to put elements back into S in order, first inserts

the elements of L, then those of E, and those of G.



Version 03.s10-21

Example

• Select - pick an element



Version 03.s10-22

Example

• Divide - rearrange elements so that x goes to its finalposition E



Version 03.s10-23

Example

• Recurse and Conquer - recursively sort



Version 03.s10-24

In-Place Quick-Sort

• Divide step: l scans the sequence from the left, and r fromthe right.



Version 03.s10-25

In-Place Quick-Sort

• A swap is performed when l is at an element larger thanthe pivot and r is at one smaller than the pivot.



Version 03.s10-26

In-Place Quick-Sort



Version 03.s10-27

In-Place Quick-Sort

• A final swap with the pivot completes the divide step



Version 03.s10-28

∑

−

=

1

0)(

n

ii nsO

( ) ( )2

1

1

0nOiOinO

n

i

n

i=

=

− ∑∑

=

−

=

Analysis

• Consider a quick-sort tree T:– Let si(n) denote the sum of the input sizes of the nodes at depth i in T.

• We know that s0(n) = n since the root of T is associated with the entireinput set.

• Also, s1(n) = n - 1 since the pivot is not propagated.• Thus: either s2(n)= n - 3, or n - 2 (if one of the nodes has a zero input

size).• The worst case running time of a quick-sort is then:

• Which reduces to:

• Thus quick-sort runs in time O(n2) in the worst case.



Version 03.s10-29

Analysis• Now to look at the best case running time:• We can see that quicksort behaves optimally if, whenever a sequence S

is divided into subsequences L and G, they are of equal size.• More precisely:

– s0(n) = n– s1(n) = n - 1– s2(n) = n - (1 + 2) = n - 3– s3(n) = n - (1 + 2 + 22 ) = n - 7– …– si(n) = n - (1 + 2 + 22 + ... + 2i-1) = n - 2i + 1– ...

• This implies that T has height O(log n)• Best Case Time Complexity: O(n log n)



Version 03.s10-30

Randomized Quick-Sort

• Select the pivot as a random element of the sequence• The expected running time of randomized quick-sort on a

sequence of size n is O(nlogn)• The time spent at a level of the quick-sort tree is O(n)• We show that the expected height of the quick-sort tree is

O(logn)



Version 03.s10-31

Randomized Quick-Sort

• good vs. bad pivots

– good: 1/4 ≤ nL/n ≤ 3/4– bad: nL/n < 1/4 or nL/n > 3/4

• the probability of a good pivot is 1/2, thus we expect k/2 good pivotsout of k pivots

• after a good pivot the size of each child sequence is at most 3/4 the sizeof the parent sequence

• After h pivots, we expect (3/4)h/2 n elements• the expected height h of the quick-sort tree is at most: 2 log4/3n



Version 03.s10-32

Decision Tree For Comparison-Based Sorting



Version 03.s10-33

How Fast Can We Sort?

• Proposition: The worst case running time of any comparison-basedalgorithm for sorting an n-element sequence S is Θ(n log n).

• Justification:– The running time of a comparison-based sorting algorithm must be equal

to or greater than the depth of the decision tree T associated with thisalgorithm.

– Each internal node of T is associated with a comparison that establishesthe ordering of two elements of S.

– Each external node of T represents a distinct permutation of the elementsof S.

– Hence T must have at least n! external nodes which implies T has a heightof at least log(n!)

– Since n! has at least n/2 terms that are greater than or equal to n/2, wehave: log(n!) ≥ (n/2) log(n/2). So the total time complexity: Θ(n log n).



Version 03.s10-34

Can We Sort Faster Than O(n log n)?

• As we can see in the previous slides, O(n log n) is the bestwe can do in comparison-based sorting.

• How about non-comparison-based sorting?• Can we sort faster than O(n log n) using non-comparison-

based sorting?• The answer to this question is yes.



Version 03.s10-35

Radix-Sort

• Unlike other sorting methods, radix sort considers the structure of thekeys

• Assuming keys are represented in a base M number system (M is theradix), i.e., if M = 2, the keys are represented in binary

• Sorting is done by comparing bits in the same position• Extension to keys that are alphanumeric strings



Version 03.s10-36

Radix Exchange Sort

• We examine bits from left to right• First sort the array with respect to the leftmost bit:



Version 03.s10-37

Radix Exchange Sort

• Then we partition the array into 2 arrays:



Version 03.s10-38

Radix Exchange Sort

• Finally, we– recursively sort top sub-array, ignoring leftmost bit(s)– recursively sort bottom sub-array, ignoring leftmost bit(s)

• Time to sort n b-bit numbers: O(bn)



Version 03.s10-39

Radix Exchange Sort

• How do we do the sort from the previous page?• Same idea as partition in Quicksort:

– repeat• scan top-down to find key starting with 1;• scan bottom-up to find key starting with 0;• exchange keys;

– until scan indices cross;



Version 03.s10-40

Radix Exchange Sort



Version 03.s10-41

Radix Exchange Sort



Version 03.s10-42

Radix Exchange Sort vs. Quick Sort

• Similarities– both partition array– both recursively sort sub-arrays

• Differences– Method of partitioning

• radix exchange divides array based on greater than or less than 2b-1

• quick sort partitions based on greater than or less than some elementof the array

– Time complexity• Radix exchange: O(bn)• Quick sort average case: O(n log n)



Version 03.s10-43

Straight Radix Sort

• Examines bits fromright to left:for k ← 0 to b-1 do

sort the array in a stableway, looking only at bitk



Version 03.s10-44

Stable Sorting

• In a stable sort, the initialrelative order of equal keys isunchanged.

• For example, observe the firststep of the sort from theprevious page:

• Note that the relative order ofthose keys ending with 0 isunchanged, and the same istrue for elements ending in 1



Version 03.s10-45

Stable Sorting

• We show that any two keys are in the correct relative orderat the end of the algorithm

• Given two keys, let k be the leftmost bit-position wherethey differ

• At step k the two keys are put in the correct relative order• Because of stability, the successive steps do not change the

relative order of the two keys



Version 03.s10-46

Example

• Consider sorting on an array withthese two keys

• It makes no difference what orderthey are in when the sort begins.

• When the sort visits bit k, the keysare put in the correct relativeorder.

• Because the sort is stable, theorder of the two keys will not bechanged when bits > k arecompared.



Version 03.s10-47

Radix Sort on Decimal Numbers



Version 03.s10-48

Straight Radix Sort on Decimal Numbers

for k ← 0 to b - 1 dosort the array in a stable way, looking only at digit k

• Suppose we can perform the stable sort above in O(n)time. The total time complexity would be O(bn)

• As you might have guessed, we can perform a stable sortbased on the keys’ kth digit in O(n) time.

• The method? Bucket Sort.



Version 03.s10-49

Bucket Sort

• n numbers• Each number ∈ {1, 2, 3, ... m}• Stable• Time: O(n + m)• For example, m = 3 and our array is:

• Note that there are two “2”s and two “1”s• First, we create M “buckets”



Version 03.s10-50

Example



Version 03.s10-51

Example



Version 03.s10-52

Example

• Now, pull the elements from the buckets into the array

• At last, the sorted array (sorted in a stable way):



Version 03.s10-53

Sorting Algorithms Summary

SortingAlgorithm

AveragePerformance

Worst CasePerformance Remarks

Bubble Sort O (n2) O (n2)

Insertion Sort O (n2) O (n2)

Selection Sort O (n2) O (n2)

Simple but slow

Heap Sort O (n log n) O (n log n) Fast but complicated

Merge Sort O (n log n) O (n log n) Fast but still relatively complicated

Quick Sort O (n log n) O (n2) Fast and simple, but poorperformance in worst case

Integer Sort O (n) O (n) Fast and simple, but onlyapplicable to integer keys



Version 03.s10-54

Selection

• Finding the minimum or maximum element from anunsorted sequence takes O(n).

• This problem can be generalized into finding the kth

minimum element from an unsorted sequence.• We can first sort the sequence and then return the element

stored at rank k-1. This will take O(n log n) due to sorting.• But we can do better…



Version 03.s10-55

Prune and Search

• Also call decrease-and-conquer.• A design pattern that is also used in binary search.• We find the solution by pruning away a fraction of the

objects in the original problem and solve it recursively.• The prune-and-search algorithm that we will discuss is call

randomized quick selection.• Not surprisingly, randomized quick selection is very

similar to randomized quick sort.



Version 03.s10-56

Randomized Quick SelectionAlgorithm quickSelect (S, k):

Input: Unsorted sequences S containing n comparable elements,

and an integer k∈{1,n}Output: The kth smallest element of S

if n=1 then

return the (first) element of S.

pick a random element x of S

remove all element from S and put them into 3 sequences:

- L, storing the element in S less than x

- E, storing the element in S equal to x

- G, storing the element in S greater than x.

if k≤|L| then return quickSelect (L, k)else if k≤|L|+|E| then return xelse return quickSelect (G, k-|L|-|E|)

Everyelement in Eis equal to x

Note the newselection

parameter.

Performance:Worst case: O(n2)Expected: O(n)

Date post:	26-Sep-2020
Category:	Documents
Upload:	others
View:	10 times
Download:	0 times

Sorting and Selection - Carleton...

Documents