Sorting Algorithms CS 524 – High-Performance Computing.

Post on 20-Dec-2015

217 views 2 download

Tags:

transcript

Sorting Algorithms

CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 2

Sorting

Sorting is the task of arranging an unordered collection (sequence) of elements into monotonically increasing (or decreasing) order

Sorting transforms an unordered set of elements S = {a1, a2, a3,…an} into the set S’ = {a’1, a’2, a’3,…a’n} where a’i ≤ a’j for 0 ≤ i ≤ j ≤ n and S’ is a permutation of S

Sorting algorithms can be categorized into internal (S can fit into main memory) and external (S cannot fit in main memory) We study internal algorithms only

Sorting algorithms can also be categorized as comparison-based or noncomparison-based

CS 524 (Au 2004/05)- Asim Karim @ LUMS 3

Data Storage on Parallel Computers

Storage of input and output sequences Where? One processor or distributed among processors? How? What is the order of data distribution with respect to

the order of the processors

CS 524 (Au 2004/05)- Asim Karim @ LUMS 4

Compare-Exchange on Parallel Computers

One element per processor: ai on Pi and aj on Pj

Compare-exchange between two processors Pi and Pj requires a communication and a comparison operation

A parallel system with as many processors as number of elements would deliver poor performance. Why?

CS 524 (Au 2004/05)- Asim Karim @ LUMS 5

Compare-Split on Parallel Computers (1)

CS 524 (Au 2004/05)- Asim Karim @ LUMS 6

Compare-Split on Parallel Computers (2)

Each processors has n/p elements of the sequence Initially processor Pi has block Ai

After sorting, the blocks of elements are ordered such that A’i ≤ A’j for i ≤ j and union of Ai = union of A’i

Compare-split Each processor sends its block to the other (each block is

sorted locally) The processor merges the two blocks of elements The processor splits the merged elements and retains the

appropriate half of it

CS 524 (Au 2004/05)- Asim Karim @ LUMS 7

Sorting Network (1)

Sorting network is a specialized interconnection network that can perform many comparisons simultaneously thus improving sorting performance significantly

Key component of the soriting network: comparator Increasing comparator Decreasing comparator

CS 524 (Au 2004/05)- Asim Karim @ LUMS 8

Sorting Network (2)

CS 524 (Au 2004/05)- Asim Karim @ LUMS 9

Bubble Sort

Complexity: O(n2) Bubble sort is difficult to parallelize. Why?

CS 524 (Au 2004/05)- Asim Karim @ LUMS 10

Odd-Even Transposition Sort (1)

CS 524 (Au 2004/05)- Asim Karim @ LUMS 11

Odd-Even Transpositon Sort (2)

CS 524 (Au 2004/05)- Asim Karim @ LUMS 12

Parallel Implementation: p = n

Data partitioning: Each processor Pi has one element ai

Computation and Communication: During each phase, the odd or even numbered processors perform a compare-exchange with their right processors

Performance On a linear array On a crossbar On a bus

Not cost optimal

CS 524 (Au 2004/05)- Asim Karim @ LUMS 13

Parallel Implementation: p < n

Data partitioning: Each processor Pi has n/p elements in the block Ai

Computation and Communication: Sort Ai locally (using merge sort or quicksort). Then, execute p phases (p/2 odd and p/2 even) performing compare-split operations with the right neigboring processor.

Performance On a linear array On a crossbar On a bus

Cost optimal on linear array and crossbar when p = O(log n). Not cost optimal on bus

CS 524 (Au 2004/05)- Asim Karim @ LUMS 14

Shellsort (1)

Odd-even transposition sort moves elements one position at a time If a sequence has only a few unordered elements and if they

are far away from their correct position then OE sort will take a long time to sort the sequence

Shellsort can move elements longer distances. It has two phases: In the first phase, blocks that are far away are compare-split In the second phase, an odd-even transposition sort is

conducted. This is continued as long as blocks are changing positions

CS 524 (Au 2004/05)- Asim Karim @ LUMS 15

Shellsort (2)

CS 524 (Au 2004/05)- Asim Karim @ LUMS 16

Shellsort (3)

Initially, each processor sort its block of elements locally

First phase1. Compare-split Pi (i < p/2) with Pp-i-1 (reverse order compare-

split)

2. The processors are partitioned into two groups; one group has the first p/2 processors and the other the next p/2 processors. Compare-split (in reverse order) among each group.

3. Go to 1. Repeat for log p times.

Second phase Perform OE sort until no changes occur

CS 524 (Au 2004/05)- Asim Karim @ LUMS 17

Shellsort (4)

Performance On a linear array On a crossbar On a bus

CS 524 (Au 2004/05)- Asim Karim @ LUMS 18

Quicksort (1)

CS 524 (Au 2004/05)- Asim Karim @ LUMS 19

Quicksort (2)

Recursive divide-and-conquer algorithm that has an average complexity of O(nlogn)

CS 524 (Au 2004/05)- Asim Karim @ LUMS 20

Quicksort (3)

The partitioning of a sequence of length n has a complexity of O(n)

The selection of the pivot affects significantly the overall complexity of quicksort In the worst case, where a n-length sequence is partitioned

into a 1 and a n-1-length subsequences, the overall complexity becomes O(n2)

On average, the complexity is O(nlogn)

CS 524 (Au 2004/05)- Asim Karim @ LUMS 21

Parallelizing Quicksort

A naïve formulation Start off with one process with does the initial partitioning.

Then, assign one of the subproblems (the recursion) to another process. Repeat for each subsequence until no further partitioning is possible.

Not cost-optimal (Why?)

Analysis

CS 524 (Au 2004/05)- Asim Karim @ LUMS 22

Message-Passing Parallel Formulation

Data partitioning: Each processor Pi has Ai of n/p elements

Computation and communication Select a pivot Broadcast the pivot to all processors Locally rearrange the block Ai into sub-blocks Si and Li

Combine Si and Li from all processors as S and L

Partition S to one group of processors and L to the other Recursively perform these operations until a sub-block is

assigned to one processor only. Then, the processors sort the set locally

CS 524 (Au 2004/05)- Asim Karim @ LUMS 23