CS 2412 Data Structures
Chapter 10
Sorting and Searching
Some concepts
• Sorting is one of the most common data-processing
applications.
• Sorting algorithms are classed as either internal or external.
• Sorting order can be either ascending sequence or descending
sequence.
• Sort stability is an attribute of a sort, indicating that data with
equal keys maintain their relative input order in the output.
• Sort efficiency usually is based on the comparisons and moves
required for the sorting. The best possible sorting algorithms
are O(n log n).
• During the sorting process, each traversal of the data is
referred to as a sort pass.
Data Structure 2016 R. Wei 2
Selection sorts
• Heap sort: we have already discussed. First build a heap. Then
remove the root of the heap and put the last element to the
root and reheap down.
• Straight selection sort: In each pass of the selection sort, the
smallest element is selected from the unsorted sublist and
exchange with the element at the beginning of the unsorted list.
Data Structure 2016 R. Wei 3
Data Structure 2016 R. Wei 4
Algorithm selectionSort (list, last)
set current to 0
loop (until last element sorted)
set smallest to current
set walker to current +1
loop (walker key < smallest key)
set smallest to walker
increment walker
end loop
exchange (current, smallest)
increment current
end loop
Data Structure 2016 R. Wei 5
The efficiency of selection sort
• Straight select sort: O(n2). The algorithm has two level of
loops, each of the loop executes about n times.
• Heap sort: O(n log n). To build a heap, about n log n loops are
needed. To sort from the heap needs another n log n loops. In
big-O notation, the complexity is O(n log n).
Data Structure 2016 R. Wei 6
Insertion sorts
• Straight insertion sort: the list is divided into sorted and
unsorted sublists. In each pass the first element of the unsorted
sublist is inserted into the sorted sublist at correct position.
• Shell sort: the list is divided into K segments and each
segment is sorting (the segments are dispersed through the
list). After each passing, the number of segments is reduced
according to a increment. When the number of segments is
reduced to 1, the list is sorted.
Data Structure 2016 R. Wei 7
Data Structure 2016 R. Wei 8
Algorithm insertionSort(list, last)
set current to 1
loop (until last element sorted)
move current element to hold
set walker to current - 1
loop (walker >= 0 AND hold key < walker key)
move walker element right one element
decrement walker
end loop
move hold to walker + 1 element
increment current
end loop
Data Structure 2016 R. Wei 9
The main idea for the Shell sort is divide the list into segments and
use insertion sort to sort each segment.
The positions of the elements of a segment are at a distance of
increment. In the following example, the list is of size 10. The 5
segments for increment K = 5 are as follows:
Segment 1. A[0], A[5]
Segment 2. A[1], A[6]
Segment 3. A[2], A[7]
Segment 4. A[3], A[8]
Segment 5. A[4], A[9]
Then for increment K = 2
Segment 1. A[0], A[2], A[4], A[6], A[8]
Segment 2. A[1], A[3], A[5], A[7], A[9]
Data Structure 2016 R. Wei 10
Data Structure 2016 R. Wei 11
Data Structure 2016 R. Wei 12
Algorithm shellSort (list, last)
set incre to last / 2
loop (incre not 0)
set current to incre
loop(until last element sorted)
move current element to hold
set walker to current - incre
loop (walker>=0 AND hold key < walker key)
move walker element one increment right
set walker to walker - incre
end loop
move hold to walker + incre element
increment current
end loop
set incre to incre / 2
end loop
Data Structure 2016 R. Wei 13
void shellSort (int list [], int last)
{
int hold;
int incre;
int walker;
incre = last / 2;
while (incre != 0)
{
for (int curr = incre; curr <= last; curr++)
{
hold = list [curr];
walker = curr - incre;
while (walker >= 0 && hold < list [walker])
{
list [walker + incre] = list [walker];
walker = ( walker - incre );
Data Structure 2016 R. Wei 14
} // while
list [walker + incre] = hold;
} // for walk
incre = incre / 2;
} // while
return;
} // shellSort
Note
In the above algorithm, the increment start from n/2, then each
pass reduce half of the size. This is not the most efficient way, but
simple. The ideal increments should be set so that no two elements
will appear at same segment more than once. But this is not easy
in general.
Data Structure 2016 R. Wei 15
Insertion sort efficiency:
• Straight insertion sort: O(n2). The algorithm has two
embedded loops. The execute times is about n(n+ 1)/2.
• Shell sort: the complexity is difficult to analysis. Using
empirical studies show that the average sort complexity is
O(n1.25)
Data Structure 2016 R. Wei 16
Exchange sorts
• Bubble sort: the list in divided into two sublists: sorted and
unsorted. The smallest element is bubbled from the unsorted
sublist to the sorted sublist each time.
• Quick sort: each time a pivot is selected. Then the elements
less than pivot and the elements greater or equal to pivot are
separated into two sublist. The pivot is put at its ultimately
correct location in the list.
Data Structure 2016 R. Wei 17
Example:
23 78 45 8 56 32
8 ∥23 78 45 32 56
8 23 ∥32 78 45 56
8 23 32 ∥45 78 56
8 23 32 45 ∥56 78
Data Structure 2016 R. Wei 18
Algorithm bubbleSort(list, last)
set current to 0
set sorted to false
loop (current <= last AND sorted false)
set walker to last
set sorted to true
loop (walker > current)
if (walker dta < walker -1 data)
set sorted to false
exchange (list, walker, walker -1)
end if
decrement walker
end loop
increment current
end loop
Data Structure 2016 R. Wei 19
Data Structure 2016 R. Wei 20
Note for quick sort
• There are different methods for selecting the pivot.
– Select the first element.
– Select the middle element.
– Select the median value of three elements: left, right and
the element in the middle of the list. This text uses this
method.
• When the partition becomes small, a straight insertion sort can
be used, which may be more efficient.
Data Structure 2016 R. Wei 21
Example for one pass of a quick sort:
Data Structure 2016 R. Wei 22
Algorithm medianLeft(sortData, left, right)
set mid to (left + right ) /2
if (left key > mid key)
exchange (sortData, left, mid)
end if
if (left key > right key)
exchange ( sortData, left, right)
end if
if(mid key > right key)
exchange (sortData, mid, right)
end if
exchange (sortData, left, mid) //put pivot in left.
Data Structure 2016 R. Wei 23
Data Structure 2016 R. Wei 24
Data Structure 2016 R. Wei 25
The list in Figure 12-15 is sorted as follows:
Data Structure 2016 R. Wei 26
The exchange sort efficiency:
• Bubble sort: O(n2). There are two loops in the algorithm. The
comparison is about n(n+ 1)/2.
• Quick sort: O(n logn). The algorithm has 5 loops. However,
for each pass, the partition is general half size as previous pass.
Roughly say, there are total log2 n passes.
Data Structure 2016 R. Wei 27
void bubbleSort (int list [], int last)
{
int temp;
for (int current = 0, sorted = 0;
current <= last && !sorted;
current++)
for (int walker = last, sorted = 1;
walker > current;
walker--)
if (list[ walker ] < list[ walker - 1 ])
{
sorted = 0;
temp = list[walker];
list[walker] = list[walker - 1];
list[walker - 1] = temp;
} // if
return;
} // bubbleSort
Data Structure 2016 R. Wei 28
External sorts
In external sorting, portions of the data may be stored in secondary
memory during the sorting process.
One important method for the external sort is merge the (sorted)
files in to one sorted file.
Data Structure 2016 R. Wei 29
Merge sorts
A simple merge is merge two sorted files into one file. For example,
we have two sorted lists:
• 1, 3, 5
• 2, 4, 6, 8, 10
After we merged these two list, we should obtain the following list:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
Data Structure 2016 R. Wei 30
The following algorithm merges two sorted files file1, file2.
The combined data are written into file3
Algorithm mergeFiles
open files
read (file1 into record1)
read (file2 into record2)
loop (not end file1 or not end file2)
if (record1.key <= record2.key)
write (record1 to file3)
read (file1 into record1)
if (end of file1)
set record1.key to infinity
end if
else
write (record2 to file3)
Data Structure 2016 R. Wei 31
read (file2 into record2)
if (end of file2)
set record2 key to infinity
end if
end if
end loop
close files
end mergeFiles
Data Structure 2016 R. Wei 32
Merge unsorted files:
• Form merge runs for the files. Each run is ordered.
• The end of each run is identified by a stepdown.
• Merge each run of the two files.
• When one run is stepdown, the another run is rollout (copied
to the merged file).
Data Structure 2016 R. Wei 33
Data Structure 2016 R. Wei 34
The sorting process:
• Sort phase: Divide the file into merge files according to the size
of memory. Foe example, if we have 2300 records, but the
memory only can handle 500 records. We first read in 500
records and sort it as the first merge run. Then read and sort
501-1000 records as first run of the merge 2, etc.
• Merge phase: merge the sorted runs.
Data Structure 2016 R. Wei 35
Data Structure 2016 R. Wei 36
There are different merge concepts. We discuss 3 of them as
examples
• Natural merge: after merge, all data are written in one file and
need a distribute phase to redistribute the data to two files.
• Balance merge: use a constant number of input merge files and
the same number of output merger files.
• Ployphase merge: A constant number of input merge files are
merged to one output merge file, the input merge files are
immediately reused when their input has been completely
merged.
Data Structure 2016 R. Wei 37
Data Structure 2016 R. Wei 38
Data Structure 2016 R. Wei 39
Searching
• Binary search: for sorted list.
• Sequential search:
– Straight sequential search: each time check if the key
equals to the target AND if it is the last key.
– Sentinel sequential search: add the target at the end of the
list so that each time just check if key equals to the
target.
– Probability search: when a target is found, move the
element containing target up one location. In this way, most
frequent targets are easier to found.
Data Structure 2016 R. Wei 40
Hashed list searches
• Hashing is a method using key-to-address mapping to find the
data quickly.
• The basic idea is using a hash function to map a key (which is
at a large range) to a index (which is at a small range) of data.
• Some keys may be mapped to a same index (synonyms). Then
we need some method to solve the collision.
• The main part of hashing is to find good hashing methods.
Data Structure 2016 R. Wei 41
Data Structure 2016 R. Wei 42
Hashing methods:
• Direct method: the range of keys and the range of index are
the same.
• Subtraction method: subtract a fixed number from the key.
Also require both ranges are the same.
• Modulo-division method: index= key MODULO listSize
• Digit-extraction method: select digits at certain positions as
the index.
• Midsquare method: key is squared and the middle digits are
used as index.
Data Structure 2016 R. Wei 43
• Folding method: fold shift (key is divided into parts whose size
matches the size of the index. Then the left and right parts are
shifted and added with the middle part); fold boundary (the
left and right numbers are folded on a fixed boundary between
them and the center number. The two outside values are
reversed).
Data Structure 2016 R. Wei 44
• Rotation method: rotating the last character to the front of the
key. Usually used by incorporating with other methods.
• Pseudorandom method: the key is used as the seed in a
pseudorandom number generator, the resulting random number
is then scaled into the possible index range.
Data Structure 2016 R. Wei 45
Some concepts used in collision resolution method:
• Load factor: the number of elements in the list divided by the
number of physical allocated for the list, expressed as
percentage (better less than 75).
α =k
n× 100.
• Clustering: as data are added to a list and collisions are
resolved, some hashing algorithms tend to cause data to group
within the list.
Data Structure 2016 R. Wei 46
Data Structure 2016 R. Wei 47
Open addressing to resolve collisions (disadvantage: each collision
resolution increases the probability of future collisions).
• Linear probe: when data cannot be stored in the home address,
we resolve the collision by adding 1 to the current address.
Data Structure 2016 R. Wei 48
• Quadratic probe: the increment is the collision probe number
squared.
Data Structure 2016 R. Wei 49
• Pseudorandom collision resolution (double hashing): use a
pseudorandom number to resolve the collision. Use the collision
address as the key of the the pseudorandom generator.
Data Structure 2016 R. Wei 50
• Key offset (double hashing): calculate the new address as a
function of the old address and the key.
For example:
offSet = key / listSize
address = (offSet + old address) modulo listSize
Data Structure 2016 R. Wei 51
Linked list collision resolution: use a separate area to store
collisions and chains all synonyms together in a linked list (usually
use LIFO sequence). Two storage areas are used: prime area and
the overflow area.
Data Structure 2016 R. Wei 52
Bucket hashing: keys are hashed to buckets, nodes that
accommodate multiple data occurrences. (disadvantage: use more
empty space, when the bucket is full, collision occurs)
Data Structure 2016 R. Wei 53
Combination approaches may used:
bucket hashing first, then a linear probe is used if bucket is full.
Data Structure 2016 R. Wei 54