Sorting HKOI Training Team (Advanced) 2006-01-21.

Post on 12-Jan-2016

216 views 0 download

Tags:

transcript

Sorting

HKOI Training Team (Advanced)

2006-01-21

What is sorting?

Given: A list of n elements: A1,A2,…,An

Re-arrange the elements to make them follow a particular order, e.g.Ascending Order: A1 ≤ A2 ≤ … ≤ An

Descending Order: A1 ≥ A2 ≥ … ≥ An

We will talk about sorting in ascending order only

Why is sorting needed?

Some algorithms works only when data is sortede.g. binary search

Better presentation of dataOften required by problem setters, to reduce

workload in judging

Why learn Sorting Algorithms?

C++ STL already provided a sort() function

Unfortunately, no such implementation for Pascal This is a minor point, though

Why learn Sorting Algorithms?

Most importantly, OI problems does not directly ask for sorting, but its solution may be closely linked with sorting algorithms

In most cases, C++ STL sort() is useless. You still need to write your own “sort”

So… it is important to understand the idea behind each algorithm, and also their strengths and weaknesses

Some Sorting Algorithms…

Bubble Sort Insertion Sort Selection Sort Shell Sort Heap Sort Merge Sort Quick Sort Counting Sort Radix Sort

How many of them do you

know?

Bubble, Insertion, Selection…

Simple, in terms of Idea, and Implementation

Unfortunately, they are inefficientO(n2) – not good if N is large

Algorithms being taught today are far more efficient than these

Shell Sort

Named after its inventor, Donald ShellObservation: Insertion Sort is very

efficient whenn is smallwhen the list is almost sorted

Shell Sort

Divide the list into k non-contiguous segments Elements in each segments are k-elements

apart In the beginning, choose a large k so that all

segments contain a few elements (e.g. k=n/2) Sort each segment with Insertion Sort

2 1 4 7 4 8 3 6 4 774

Shell Sort

Definition: A list is said to be “k-sorted” when A[i] ≤ A[i+k] for 1 ≤ i ≤ n-k

Now the list is 5-sorted

2 1 4 4 4 8 3 6 7 7

Shell Sort

After each pass, reduces k (e.g. by half) Although the number of elements in each

segments increased, the segments are usually mostly sorted

Sort each segments with Insertion Sort again

2 1 4 7 4 8 3 6 4 7

Insert≥2 Insert≥1

Shell Sort

After each pass, reduces k (e.g. by half) Although the number of elements in each

segments increased, the segments are usually mostly sorted

Sort each segments with Insertion Sort again

2 1 4 7 4 8 3 6 4 7

Insert≥4 Insert≥7

Shell Sort

After each pass, reduces k (e.g. by half) Although the number of elements in each

segments increased, the segments are usually mostly sorted

Sort each segments with Insertion Sort again

2 1 4 7 4 8 3 6 4 7

Insert<4<4≥2

Shell Sort

After each pass, reduces k (e.g. by half) Although the number of elements in each

segments increased, the segments are usually mostly sorted

Sort each segments with Insertion Sort again

2 1 47 4 83 6 4 7

Insert<8<7≥1

Shell Sort

After each pass, reduces k (e.g. by half) Although the number of elements in each

segments increased, the segments are usually mostly sorted

Sort each segments with Insertion Sort again

2 1 474 83 6 4 7

Insert≥4

Shell Sort

After each pass, reduces k (e.g. by half) Although the number of elements in each

segments increased, the segments are usually mostly sorted

Sort each segments with Insertion Sort again

2 1 474 83 6 4 7

Insert<8≥7

Shell Sort

Finally, k is reduced to 1 The list look like mostly sorted Perform 1-sort, i.e. the ordinary Insertion Sort

2 1 474 83 6 4721 4 74 83 64 7

Shell Sort – Worse than Ins. Sort?

In Shell Sort, we still have to perform an Insertion Sort at last

A lot of operations are done before the final Insertion Sort

Isn’t it worse than Insertion Sort?

Shell Sort – Worse than Ins. Sort?

The final Insertion Sort is more efficient than before

All sorting operations before the final one are done efficiently

k-sorts compare far-apart elementsElements “moves” faster, reducing

amount of movement and comparison

Shell Sort – Increment Sequence

In our example, k starts with n/2, and half its value in each pass, until it reaches 1, i.e. {n/2, n/4, n/8, …, 1}

This is called the “Shell sequence”In a good Increment Sequence, all

numbers should be relatively prime to each other

Hibbard’s Sequence: {2m-1, 2m-1-1, …, 7, 3, 1}

Shell Sort – Analysis

Average Complexity: O(n1.5)Worse case of Shell Sort with Shell

Sequence: O(n2)When will it happen?

Heap Sort

In Selection Sort, we scan the entire list to search for the maximum, which takes O(n) time

Are there better way to get the maximum?

With the help of a heap, we may reduce the searching time to O(lg n)

Heap Sort – Build Heap

1. Create a Heap with the list

2 8 5 7 1 42

8 5

7 1 4

Heap Sort

2. Pick the maximum, restore the heap property

28 57 1 4

2

8

57

1 4

Heap Sort

3. Repeat step 2 until heap is empty

2 857 14

2

5

7

1

4

Heap Sort

3. Repeat step 2 until heap is empty

2 85 714

2

5

14

Heap Sort

3. Repeat step 2 until heap is empty

2 85 714

2 1

4

Heap Sort

3. Repeat step 2 until heap is empty

2 85 71 42

1

Heap Sort

3. Repeat step 2 until heap is empty

2 85 71 41

Heap Sort – Analysis

Complexity: O(n lg n)Not a stable sortDifficult to implement

Merging

Given two sorted list, merge the list to form a new sorted list

A naïve approach: Append the second list to the first list, then sort themSlow, takes O(n lg n) time

Are there any better way?

Merging

We make use of a property of sorted lists: The first element is always the minimumWhat does that imply?

An additional array is needed store temporary merged list

Pick the smallest number from the un-inserted numbers and append them to the merged list

Merging

List A

List B

1 3 7 9

2 3 6

Temp

Merge Sort

Merge sort follows the divide-and-conquer approachDivide: Divide the n-element sequence into

two (n/2)-element subsequencesConquer: Sort the two subsequences

recursivelyCombine: Merge the two sorted

subsequence to produce the answer

Merge Sort

1. Divide the list into two

2. Call Merge Sort recursively to sort the two subsequences

Merge Sort

2 8 5 7 1 485

Merge Sort

1 4 7

Merge Sort

3. Merge the list (to temporary array)

2 85 1 4 7

4. Move the elements back to the list

Merge Sort – Analysis

Complexity: O(n lg n)Stable Sort

What is a stable sort?Not an “In-place” sort

i.e. Additional memory requiredEasy to implement, no knowledge of

other data structures needed

Stable Sort

What is a stable sort?The name of a sorting algorithmA sorting algorithm that has stable

performance over all distribution of elements, i.e. Best ≈ Average ≈ Worse

A sorting algorithm that preserves the original order of duplicated keys

Stable Sort

1 3 5 3 4 2Original List a b

1 2 3 3 4 5Stable Sort a b

1 2 3 3 4 5Un-stable Sort b a

Stable Sort

Which sorting algorithms is/are stable?

Stable Un-stable

Bubble Sort

Merge Sort

Insertion Sort

Selection Sort

Shell Sort

Heap Sort

Stable Sort

In our previous example, what is the difference between 3a and 3b?

When will stable sort be more useful?Sorting recordsMultiple keys

Quick Sort

Quick Sort also uses the Divide-and-Conquer approachDivide: Divide the list into two by partitioningConquer: Sort the two list by calling Quick

Sort recursivelyCombine: Combine the two sorted list

Quick Sort – Partitioning

Given: A list and a “pivot” (usually an element in the list)

Re-arrange the elements so thatElements on the left-hand side of “pivot” are

less than the pivot, andElements on the right-hand side of the

“pivot” are greater than or equal to the pivot

Pivot< Pivot ≥ Pivot

Quick Sort – Partitioning

e.g. Take the first element as pivot

Swap all pairs of elements that meets the following criteria:The left one is greater than or equal to pivotThe right one is smaller than pivot

Swap pivot with A[hi]

4 6 7 0 9 3 9 4

Pivot lo hi≥ pivot? < pivot?< pivot?< pivot?≥ pivot? < pivot?< pivot?

Quick Sort

After partitioning:

Apply Quick Sort on both lists

4 670 93 9 4

PivotQuick Sort Quick Sort

6 7 9 94

Quick Sort – Analysis

ComplexityBest: O(n lg n)Worst: O(n2)Average: O(n lg n)

When will the worst case happen?How to avoid the worst case?In-Place SortNot a stable sort

Counting Sort

Consider the following list of numbers

5, 4, 2, 1, 4, 3, 4, 2, 5, 1, 4, 5, 3, 2, 3, 5, 5Range of numbers = [1,5]We may count the occurrence of each

number

1 2 3 4 5

2 3 3 4 5

Counting Sort (1)

With the frequency table, we can reconstruct the list in ascending order

1 2 3 4 5

2 3 3 4 5

1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5

Counting Sort (1)

Can we sort records with this counting sort?

Is this sort stable?

Counting Sort (2)

An alternative way: use cumulative frequency table and a temporary array

Given the following “records”

3 2 1 2 2 3

1 2 3

Frequency Table

1 3 2

Cumulative

4 6

Counting Sort (2)

1 2 3

1 4 6

13 2 2 2 3

5320 1 4

Counting Sort – Analysis

Complexity: O(n+k), where k is the range of numbers

Not an In-place sortStable Sort (Method 2)Cannot be applied on data with wide

ranges

Radix Sort

Counting Sort requires a “frequency table”

The size of frequency table depends on the range of elements

If the range is large (e.g. 32-bit), it may be infeasible, if not impossible, to create such a table

Radix Sort

We may consider a integer as a “record of digits”, each digit is a key

Significance of keys decrease from left to right

e.g. the number 123 consists of 3 digitsLeftmost digit: 1 (Most significant)Middle digit: 2Rightmost digit: 3 (Least signficant)

Radix Sort

Now, the problem becomes a multi-key record sorting problem

Sort the records on the least significant key with a stable sort

Repeat with the 2nd least significant key, 3rd least significant key, and so on

Radix Sort

For all keys in these “records”, the range is [0,9] Narrow range

We apply Counting Sort to do the sorting here

Radix Sort

101 97 141 110 997 733

Original List

0

Radix Sort

Sort on the least significant digit

101 097 141 110 997 733

0 1

1 3

2 3

3 4

4 4

5 4

6 4

7 6

8 6

9 6

Radix Sort

Sort on the 2nd least significant digit

101 097141110 997733

0 1

1 2

2 2

3 3

4 4

5 4

6 4

7 4

8 4

9 6

Radix Sort

Lastly, the most significant digit

101 097141110 997733

0 1

1 4

2 4

3 4

4 4

5 4

6 4

7 5

8 5

9 6

Radix Sort – Analysis

Complexity: O(dn), where d is the number of digits

Not an In-place SortStable SortCan we run Radix Sort on

Real numbers?String?

Choosing Sorting Algorithms

List SizeData distributionData TypeAvailability of Additional MemoryCost of Swapping/Assignment

Choosing Sorting Algorithms

List Size If N is small, any sorting algorithms will do If N is large (e.g. ≥5000), O(n2) algorithms

may not finish its job within time limitData Distribution

If the list is mostly sorted, running QuickSort with “first pivot” is extremely painful

Insertion Sort, on the other hand, is very efficient in this situation

Choosing Sorting Algorithms

Data Type It is difficult to apply Counting Sort and

Radix Sort on real numbers or any other data types that cannot be converted to integers

Availability of Additional MemoryMerge Sort, Counting Sort, Radix Sort

require additional memory

Choosing Sorting Algorithms

Cost of Swapping/AssignmentMoving large records may be very time-

consumingSelection Sort takes at most (n-1) swap

operationsSwap pointers of records (i.e. swap the

records logically rather than physically)