OverviewSelection Sort Analysis Number of comparisons similar to bubble sort: always ( n 2). Number...

Overview

code directory (link works only from HTML version of slides).

Sorting using random access data structures like arrays or

vectors.

Simple but O(n2) algorithms like insertion sort, bubble sort

and selection sort.

More complex algorithms with better performance like

shellsort, mergesort, quicksort and heapsort.

Basic analysis of algorithms.

Divide-and-conquer algorithms.

Sorting

Sorting Terminology

Internal sorting: refers to sorting data which is in memory.

External sorting: refers to data which is not in memory; for

example, in le storage.

Sorting of records r1, r2, . . . , rn having key values k1, k2, . . . , kn.

Rearrange records in order rs1 , rs2 , . . . , rsn such that

ks1 ≤ ks2 ≤ . . . ≤ ksn for some ordering relation ≤.If application allows duplicate keys, then a sorting algorithm is

said to be stable if it does not change the relative ordering of

two records with equal key values.

The amount of storage used by an in-place sorting algorithm

is independent of the number of records being sorted.

Sorting

Sorting Parameters

Basic parameter is n: number of records being sorted.

Another parameter is the ordering relation. Often specied

using a comparison function compare(r1, r2) which returns

positive, zero, negative depending on whether k1 is

greater-than, equal-to or less-than k2 respectively.

Algorithms can be compared based on number of comparisons

between keys.

When record sizes are large, it may be useful to consider the

number of record swaps.

Sorting

Insertion Sort

Given a collection of sorted records, insert additional record at

end of sorted collection.

If record is out-of-order, then swap with previous record.

Continue swapping with previous record, until record is in

order.

Sorting

Insertion Sort Code

voidinsertionSort(int a[], int n)

for (int i = 1; i < n; ++i)

assert(isSorted(a, i) && "prefix not sorted");

//insert i'th record into sorted portionfor (int j = i; (j > 0) && (a[j] < a[j - 1]); --j)

swap(a, j, j - 1);

Sorting

Insertion Sort Trace

Sorting

Insertion Sort Analysis

Outer loop executed n − 1 times. Number of executions of

inner loop depends on how many keys in [0, i) have values

less than key at position i.

Worst-case occurs when keys are sorted in reverse of the

desired order, with number of comparisons in inner loop

proportionate to the position i of the element being inserted.

Hence total number of comparisons is:

n∑i=2

i ' n2/2 = Θ(n2)

Best case occurs when records are already in desired order. In

that case, when a new record is inserted, it needs a single

comparison to ensure it is in place. Hence we have a total of

Θ(n) comparisons and no swaps.

Good for data which is close to already sorted.

Sorting

Insertion Sort Analysis Continued

In general case, number of executions of inner loop will depend

on how many elements in sorted portion are greater than value

being inserted. Each such occurrence is called an inversion.

In average case, we will expect roughly i/2 inversions when

inserting element at index i . Leads to Θ(n2) average case

performance.

Similar results for swaps: best case 0 swaps; average and worst

case Θ(n2) swaps.

Sorting

Bubble Sort

Look for minimum value in unsorted portion of array and

bubble it to the start of the unsorted portion of the array.

Initial portion of array accumulates minimums and will be

sorted.

Sorting

Bubble Sort Code

voidbubbleSort(int a[], int n)

for (int i = 0; i < n - 1; ++i)


for (int j = n - 1; j > i; --j)

if (a[j] < a[j - 1]) swap(a, j, j - 1);

Sorting

Bubble Sort Trace

Sorting

Analysis of Bubble Sort

Number of comparisons made by inner loop is always i. Hence

total number of comparisons is:

n∑i=1

i ' n2/2 = Θ(n2)

Hence best-case, worst-case and average-case performance is

always Θ(n2).

Number of swaps identical to that in insertion sort. So 0 in

best case, Θ(n2) in average and worst case.

Not much to recommend this algorithm except for catchy

name.

Sorting

Selection Sort

Look for minimum value in unsorted portion of array and move

it to the start of the unsorted portion of the array.

Like bubble sort, but instead of bubbling value by doing

repeated swaps, we nd its nal position and then do a single

swap.

Sorting

Selection Sort Code

voidselectionSort(int a[], int n)

for (int i = 0; i < n - 1; ++i)


int minIndex = i;

for (int j = n - 1; j > i; --j)

//nd for min value in rest of arrayif (a[j] < a[minIndex]) minIndex = j;

swap(a, i, minIndex);

Sorting

Selection Sort Trace

Sorting

Selection Sort Analysis

Number of comparisons similar to bubble sort: always Θ(n2).

Number of swaps is always Θ(n), much less than bubble sort.

Advantageous if record sizes are large and swaps are expensive.

Sorting

Minimizing Swap Cost By Swapping Pointers

When record sizes are large, use collection of pointers to records to

minimize swap cost.

Sorting

Comparison of Simple Θ(n2) Algorithms

Sorting

Mergesort

Split array into two halves.

Recursively sort each half.

Merge sorted halves together.

Terminate recursion when array size ≤ 1.

Sorting

Mergesort Pseudo-Code

Seq mergesort(Seq inSeq)

if (inSeq.size <= 1) return inSeq;

Seq seq1 = half of items from inSeq;

Seq seq2 = other half of items from inSeq;

return merge(mergesort(seq1), mergesort(seq2));

Sorting

Mergesort Illustrated

Sorting

Mergesort with Linked Lists

Does not require random access, hence suitable for sorting

linked lists.

Breaking linked list into half is dicult.

If we know the length of list, we need to traverse half the list

in order to reach the second half.

If we do not know the length of the list, build halves by by

putting successive elements in alternate halves: rst element

goes into rst half, second element goes into second half, third

element into rst half, fourth element into second half, and so

on. Requires a complete pass over the list.

Sorting

Mergesort with Arrays

Splitting array into halves is easy if we know the size of the

array (we merely track the bounds of the array).

Merging sorted arrays is also easy if we use an auxiliary array.

Very dicult without using an auxiliary array.

Hence mergesort with arrays requires twice the amount of

space.

Could have mergesort alternate between original array and

auxiliary array.

Simpler to copy sorted sub-arrays into auxiliary array and then

merge from auxiliary array to original array.

Sorting

Mergesort Code

//sort a[lo, hi) using temp[] as temporary storagestatic voidmsort(int a[], int temp[], int lo, int hi)

if (hi - lo < 2) return; //empty or single elementint mid = (lo + hi)/2; //select midpointmsort(a, temp, lo, mid); //mergesort lo halfmsort(a, temp, mid, hi); //mergesort hi half

for (int i = lo; i < hi; ++i)

//copy subarray to temptemp[i] = a[i];

Sorting

Mergesort Code: Merge

//merge temp[] subarrays back to a[]int i1 = lo;

int i2 = mid;

for (int dest = lo; dest < hi; ++dest)

if (i1 == mid)

//left sublist exhausteda[dest] = temp[i2++];

else if (i2 == hi)

//right sublist exhausteda[dest] = temp[i1++];

Sorting

Mergesort Code: Merge Continued

else if (temp[i1] <= temp[i2])

//smaller value in i1a[dest] = temp[i1++];

else

//get smaller value from i2a[dest] = temp[i2++];

Sorting

Mergesort Code: Wrapper

voidmergeSort(int a[], int n)

int* temp = new int[n];msort(a, temp, 0, n);

delete[] temp;

Sorting

Optimizing Mergesort

Instead of recursing down to 1-element arrays, quit recursion

when sub-array size is smaller than some threshold and then

use something like insertion sort for the sub-array.

When copying the sorted sub-arrays into the auxiliary array,

reverse the order of the second sub-array. This makes it

possible for merge operation to become simpler.

Sorting

Optimizing Mergesort: Pseudo-Code

//sort a [lo, hi): note exclusive hivoid msort(int a[], int tmp[], int lo, int hi)

if ((hi - lo) < THRESHOLD)

insertionSort(&a[lo], hi - lo);

return;

int mid = (lo + hi)/2;

msort(a, tmp, lo, mid);

msort(a, tmp, mid, hi);

Sorting

Optimizing Mergesort: Pseudo-Code Continued

for (int i = lo; i < mid; ++i) tmp[i] = a[i];

for (int j = 0; j < hi - mid; ++j) //reversed copytmp[hi - j - 1] = a[mid + j];

for (int i1 = lo, i2 = hi - 1, dest = lo; dest < hi;

++dest) //optimized mergeif (tmp[i1] < tmp[i2])

a[dest] = a[i1++];

else

a[dest] = a[i2--];

Sorting

Analysis of Mergesort

Depth of recursion is dlg ne.Merge of arrays of length i is Θ(i).

Assuming n is power-of-2, at bottom level, n arrays of size 1

are merged requiring Θ(n) steps, at the next level n/2 arrays

of size 2 are merged, again requiring Θ(n) steps, n/4 arrays of

size 4 with Θ(n) steps, and so on.

Total cost will be Θ(n lg n).

Sorting

Quicksort

Uses a dierent divide and conquer strategy from mergesort.

Pick some "arbitrary" element from array to use as a pivot.

Partition array in-place into two halves: elements less than

pivot and elements greater than or equal to pivot.

Recursively sort both halves (not including pivot element).

Array is sorted!!

Can have O(n2) performance when one half of partition is

empty at each step! Can happen when array is already sorted

and we choose rst element as pivot.

Sorting

Quicksort Code

/** sort a [lo, hi] */static voidqsort(int a[], int lo, int hi)

if (hi - lo < 1) return;int pivotindex = ndPivot(a, lo, hi); // Pick a pivotswap(a, pivotindex, hi); // Stick pivot at end// k will be the rst position in the right subarrayint k = partition(a, lo, hi-1, a[hi]);

swap(a, k, hi); // Put pivot in placeqsort(a, lo, k - 1); // Sort left partitionqsort(a, k + 1, hi); // Sort right partition

Sorting

Quicksort Illustrated

Sorting

Partition Code

/** partition a [lo, hi] into < pivot left sub-array and >= pivot* right sub-array, returning index of rst position in right* sub-array*/static int partition(int a[], int lo, int hi, int pivot)

while (lo <= hi) //while interval is non-emptywhile (a[lo] < pivot)

//loop terminates because pivot is at a[hi + 1]++lo;

while ((hi >= lo) && (a[hi] >= pivot)) --hi;

if (hi > lo) swap(a, lo, hi); //swap out-of-place values

return lo; // Return rst position in right partition

Sorting

Partitioning Illustrated

Sorting

Choosing Pivot

Ideally, pivot should be median value in array.

Choice of rst element in array is a bad choice if array is

already sorted.

Could choose pivot as median of rst, middle and last

elements.

Simply choose middle element as pivot.

static intndPivot(int a[], int i, int j)

return (i+j)/2;

Sorting

Analysis of Quicksort

Finding pivot is a constant time operation.

Partitioning an array of size n is O(n) as each inner loop

moves bounds by at least 1.

Worst case occurs when the choice of the pivot at each step

leaves one of the sub-arrays empty. That means that at each

step, we will have a recursive call quicksort(n - 1) when

doing a quicksort(n). This leads to:

n∑i=1

i ' Θ(n2)

No better than bubble sort!!

Sorting

Analysis of Quicksort Continued

Best case occurs when at each step partition splits array into

two equal halves. Hence there wil be lg n levels of recursive

calls with the partition step at each step being O(n) with total

cost of O(n lg n).

For average case analysis, assume that at each step any

possible position for the partition boundary is equally likely; i.e,

when partitioning array of size n, partition position is equally

likely to be 0, 1, 2, . . . , n − 1.

T (0) = T (1) = c

T (n) = cn +1

n

n−1∑i=0

[T (i) + T (n − 1− i)]

where T (i) and T (n − 1− i) represent cost of recursive calls.

Sorting

Analysis of Quicksort Continued

T (n) is dened in terms of T (i) for i < n. This is referred to

as a recurrence relation.

In this case, the recurrence relation has a closed form solution

of the form n lg(n).

So average case of quicksort is O(n lg(n)).

This means that the number of worst case situations must be

going towards a limit of zero as n grows.

Sorting

Optimizing Quicksort

Like mergesort, we could replace quicksort by an insertion sort

when n is suciently small.

Given the way the partitioning process works, if we simply

leave the small sub-arrays unsorted, the elements will be pretty

close to their nal positions. So number of out-of-place

elements will be small.

Then make a nal insertion sort over the entire array. Since

the number of out-of-place elements is small, this nal

insertion sort should be close to O(n).

Empirical results show that sub-arrays should be left unsorted

when n ≤ 9.

Other optimizations involve inlining partition() and

findPivot().

Sorting

Heapsort

Recall that a max heap is a binary tree where a parent node is

always greater than or equal to its children.

Build a max heap from initial array.

Repeatedly delete max element from heap and insert into

position at end of array.

void heapsort(int a[], int n)

Heap heap = buildMaxHeap(a, n);

for (int i = 0; i < n; ++i)

int max = heap.removeFirst(heap);a[n - 1 - i] = max;

Sorting

Analysis of Heapsort

Recall that building a heap of size n is Θ(n), when the build is

done using a batch build where the heap is built from the

bottom up and at level i we need to sift an element down at

most i levels.

Each deletion of the max element could take lg(n) steps to

rearrange the heap to preserve the heap property.

Hence deletion of n elements will be Θ(n lg(n)).

So cost of heap sort is Θ(n + n lg(n)) which is Θ(n lg(n)).

Sorting

Use of Heapsort

Heapsort may be good choice for nding the largest or

smallest k elements from an array where 1 < k n which will

have cost Θ(n + k lg(n)).

This is often useful.

Sorting

Shell Sort

Named for inventor Donald Shell who published algorithm in

1959.

Better performance than other Θ(n2) sorts when carefully

implemented at the cost of substantially greater

implementation complexity.

Exploit best case performance of insertion sort by successively

using insertion sort on virtual sub-arrays of increasing size. At

each step, insertion sort is sorting a virtual sub-array which is

close to being sorted; hence it should perform close to its best

case.

Each step of shell sort is characterized by an increment I . Thestep will do an insertion sort on a virtual sub-array where the

elements of the virtual sub-array are I positions apart.

Sorting

Shell Sort Continued

Can use increments which are powers-of-2.

Start o by using I which is the largest power-of-2 less than n;for example, when n is 12, initial I will be 8.

When n is 12:1 Sort virtual sub-arrays of length 2 where the elements of the

2-element sub-arrays are 8 positions apart.2 Sort virtual sub-arrays of length 4 where the elements of the

4-element sub-arrays are 4 positions apart.3 Sort virtual sub-arrays of length 8 (actually 6) where the

elements of the sub-arrays are 2 positions apart. There will betwo sub-arrays: the elements at even indexes and the elementsat odd indexes.

4 Sort elements which are 1 position apart: i.e. an insertion sortover the entire list.

Sorting

Choice of Increment

Series of increments must be decreasing.

Series of increments must end with 1. This guarantees thatthe last pass does an insertion sort over the entire list.

Powers-of-2 increments are not the best choice as the

sub-arrays for successive passes overlap.

One choice is 3n + 1: . . . , 364, 121, 40, 13, 4, 1.

Another choice would be using relatively prime increments:

. . . , 11, 7, 3, 1.

Sorting

Shell Sort Code: Modied insertSort()

void insertionSort(int a[], int n) //originalfor (int i = 1; i < n; ++i)

for (int j = i; (j > 0) && (a[j] < a[j - 1]); --j)

swap(a, j, j - 1);

void insertSort2(int a[], int n, int incr=1) //modiedfor (int i = incr; i < n; i += incr)

for (int j = i; (j >= incr && a[j] < a[j - incr]);

j -= incr)

swap(a, j, j - incr);

Sorting

Shell Sort Code: Basic Algorithm using Powers-of-2Increments

void shellSort(int a[], int n)

for (int incr = n/2; incr > 2; incr /= 2)

for (int j = 0; j < incr; ++j)

insertSort2(&a[j], n - j, incr);

insertSort2(a, n, 1); //nal pass

Sorting

Shell Sort Performance

Analysis is dicult, but average case performance has been

shown to be Θ(n√n) = Θ(n1.5).

Not much slower than n lg(n) algorithms for medium size n.

Sorting

Binsort and Radix Sort

Basic idea is to put each record into a bin based on the value

of its key, sort each bin using some other sorting algorithm and

then merge the bins.

When sorting integers, we can put integers into bins based on

successive digits.

This leads to radix sort which is a bin sort where bins are

chosen based on the interpretation of parts of a key as

numbers in some base.

Not covered further.

Sorting

Comparative Performance of Sorting Algorithms

Sorting

How Fast Can We Sort?

Since any sorting algorithm must look at each of the n values,

any sorting algorithm must be in Ω(n).

We know that we have sorting algorithms like quicksort and

mergesort which are Θ(n lg(n)).

Hence we know that the performance of sorting algorithms

must be bounded by Ω(n) and O(n lg(n)).

Can we do better than Θ(n lg(n))?

Turns out we cannot do better if we count key comparisons:

i.e. any sorting algorithm based on key comparisons is in

Ω(n lg(n)).

Sorting

Sorting Finds an Ordered Permutation of Input

Specication for sorting is that sorting produces an ordered

permutation of its input sequence.

Input sequence of size n has n! permutations.

Each step in a sorting algorithm rearranges sequence to a

"more ordered" permutation.

Sorting

Decision Tree for Insertion Sort of 3-element Array

Sorting

Decision Tree for n-element Array

Decision tree must contain n! nodes.

We know that a tree containing n nodes must have a depth of

at least dlg(n + 1)e.So decision tree for n! permutations must have depth of at

least Ω(lg(n!)).

A sort completes only at the leaves of the decision tree, i.e. we

need at least Ω(lg(n!)) comparisons.

There is an approximation for n! called Stirling'sApproximation:

n! '√

(2πn)(ne

)n

where e = 2.718 . . . is the base of natural logarithms.

Taking the log of both sides, we have lg(n!) is Ω(n lg(n)).

So any sorting algorithm is Ω(n lg(n)).

Sorting

Date post:	06-Jul-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

OverviewSelection Sort Analysis Number of comparisons similar to bubble sort: always ( n 2). Number...

Documents