+ All Categories
Home > Documents > Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker...

Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker...

Date post: 16-Dec-2015
Category:
Upload: denisse-nore
View: 216 times
Download: 1 times
Share this document with a friend
Popular Tags:
33
Advanced Algorithms Advanced Algorithms Piyush Kumar Piyush Kumar (Lecture 12: Parallel Algorithms) (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.
Transcript
Page 1: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Advanced AlgorithmsAdvanced AlgorithmsAdvanced AlgorithmsAdvanced Algorithms

Piyush KumarPiyush Kumar(Lecture 12: Parallel Algorithms)(Lecture 12: Parallel Algorithms)

Welcome to COT5405 Courtesy Baker 05.

Page 2: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Parallel Models• An abstract description of a real

world parallel machine.• Attempts to capture essential

features (and suppress details?)• What other models have we seen

so far?

RAM?External Memory Model?

Page 3: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

RAM• Random Access Machine Model

– Memory is a sequence of bits/words.– Each memory access takes O(1) time.– Basic operations take O(1) time:

Add/Mul/Xor/Sub/AND/not…– Instructions can not be modified.– No consideration of memory hierarchies.– Has been very successful in modelling real

world machines.

Page 4: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Parallel RAM aka PRAM• Generalization of RAM• P processors with their own programs

(and unique id)• MIMD processors : At each point in time

the processors might be executing different instructions on different data.

• Shared Memory • Instructions are synchronized among

the processors

Page 5: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

PRAM

Shared Memory

EREW/ERCW/CREW/CRCW

EREW: A program isnt allowed to access the same memory locationat the same time.

Page 6: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Variants of CRCW• Common CRCW: CW iff processors

write same value.• Arbitrary CRCW• Priority CRCW• Combining CRCW

Page 7: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Why PRAM?• Lot of literature available on

algorithms for PRAM.• One of the most “clean” models.• Focuses on what communication is

needed ( and ignores the cost/means to do it)

Page 8: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

PRAM Algorithm design.

• Problem 1: Produce the sum of an array of n numbers.

• RAM = ?• PRAM = ?

Page 9: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Problem 2: Prefix Computation

Let X = {s0, s1, …, sn-1} be in a set S

Let be a binary, associative, closed operator with respect to S(usually (1) time – MIN, MAX, AND, +, ...)

The result of s0s1 … sk is called the k-th prefix

Computing all such n prefixes is the parallel prefix computation

s0

s0 s1

s0 s1 s2

...s0 s1 ... sn-1

1st prefix2nd prefix3rd prefix

...(n-1)th prefix

Page 10: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Prefix computation• Suffix computation is a similar

problem.• Assumes Binary op takes O(1)• In RAM = ?

Page 11: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Prefix Computation (Akl)

Page 12: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

EREW PRAM Prefix computation

• Assume PRAM has n processors and n is a power of 2.

• Input: si for i = 0,1, ... , n-1. • Algorithm Steps:

for j = 0 to (lg n) -1, do for i = 2j to n-1 do

h = i - 2j

si = sh si

endfor endfor

Total time in EREW PRAM?

Page 13: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Problem 3: Array packing

• Assume that we have– an array of n elements, X = {x1, x2, ... , xn}– Some array elements are marked (or distinguished).

• The requirements of this problem are to– pack the marked elements in the front part of the

array.– place the remaining elements in the back of the

array.• While not a requirement, it is also desirable to

– maintain the original order between the marked elements

– maintain the original order between the unmarked elements

Page 14: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

In RAM?• How would you do this?• Inplace?• Running time?• Any ideas on how to do this in

PRAM?

Page 15: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

EREW PRAM Algorithm1. Set si in Pi to 1 if xi is marked and set si = 0

otherwise. 2. Perform a prefix sum on S =(s1, s2 ,..., sn) to obtain

destination di = si for each marked xi .

3. All PEs set m = sn , the total nr of marked elements.

4. Pi sets si to 0 if xi is marked and otherwise sets si = 1.

5. Perform a prefix sum on S and set di = si + m for each unmarked xi .

6. Each Pi copies array element xi into address di in X.

Page 16: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Array Packing• Assume n processors are used above.• Optimal prefix sums requires O(lg n) time.• The EREW broadcast of sn needed in Step 3 takes

O(lg n) time using a binary tree in memory• All and other steps require constant time.• Runs in O(lg n) time and is cost optimal.• Maintains original order in unmarked group as wellNotes: • Algorithm illustrates usefulness of Prefix Sums• There many applications for Array Packing algorithm

Page 17: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Problem 4: PRAM MergeSort

• RAM Merge Sort Recursion?• PRAM Merge Sort recursion?• Can we speed up the merging?

– Merging n elements with n processors can be done in O(log n) time.

– Assume all elements are distinct– Rank(a, A) = number of elements in A smaller

than a. For example rank(8, {1,3,5,7,9}) = 4

Page 18: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

PRAM Merging

A = 2,3,10,15,16 B = 1,8,12,14,19

Rank(2)=1 Rank(3)=1 Rank(10)=2 Rank(15)=4 Rank(16)=4

Rank(1)=0 Rank(8)=2 Rank(12)=3 Rank(14)=3 Rank(19)=5

+1 +2 +3 +4 +5

+1 +2 +3 +4 +5

1 2 3 8 10 12 14 15 16 19

Page 19: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

PRAM Merge Sort• T(n) = T(n/2) + O(log n)• Using the idea of pipelined d&c

PRAM Mergesort can be done in O(log n).

• D&C is one of the most powerful techniques to solve problems in parallel.

Page 20: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Problem 5: Closest Pair• RAM Version ?

12

21

1

2

3

45

6

7

L

= min(12, 21)

Page 21: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Closest Pair: RAM Version

Closest-Pair(p1, …, pn) { Compute separation line L such that half the points are on one side and half on the other side.

1 = Closest-Pair(left half) 2 = Closest-Pair(right half) = min(1, 2)

Delete all points further than from separation line L

Sort remaining points by y-coordinate.

Scan points in y-order and compare distance between each point and next 11 neighbors. If any of these distances is less than , update .

return .}

O(n log n)

2T(n / 2)

O(n)

O(n log n)

O(n)

Page 22: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Closest Pair: PRAM Version?

Closest-Pair(p1, …, pn) { Compute separation line L such that half the points are on one side and half on the other side.

1 = Closest-Pair(left half) 2 = Closest-Pair(right half) = min(1, 2)

Delete all points further than from separation line L

Sort remaining points by y-coordinate.

Scan points in y-order and compare distance between each point and next 11 neighbors. Find min of all these distances, update .

return .}

O(1)

T(n / 2)

O(log n)

O(1)

O(log n)

In parallel

Use sorted lists

Use presorting and prefix

computation.

Again use prefix

computation.

Recurrence : T(n) = T(n/2) + O(log n)

Page 23: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Problem 6: Planar Convex hulls

MergeHull (P)• HL = MergeHull( Left of median)• HR = MergeHull( Right of median)• Return JoinHulls(HL,HR)

Time complexity in RAM?Time complexity in PRAM?

Page 24: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Join_Hulls

Page 25: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Towards a betterPlanar Convex hull

• Let Q = {q1, q2, . . . , qn} be a set of points in

the Euclidean plane (i.e., E2-space).• The convex hull of Q is denoted by CH(Q) and

is the smallest convex polygon containing Q.– It is specified by listing its corner points (which are

from Q) in order (e.g., clockwise order).

• Usual Computational Geometry Assumptions:– No three points lie on the same straight line.– No two points have the same x or y coordinate.– There are at least 4 points, as CH(Q) = Q for n 3.

Page 26: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

PRAM CONVEX HULL(n,Q, CH(Q))

1. Sort the points of Q by x-coordinate.

2. Partition Q into k =n subsets Q1,Q2,. . . ,Qk of k points each such that a vertical line can separate Qi from Qj

– Also, if i < j, then Qi is left of Qj.

3. For i = 1 to k , compute the convex hulls of Qi in parallel, as follows:

– if |Qi| 3, then CH(Qi) = Qi– else (using k=n PEs) call PRAM CONVEX HULL(k, Qi,

CH(Qi))4. Merge the convex hulls in

{CH(Q1),CH(Q2), . . . ,CH(Qk)} together.

Page 27: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Basic Idea

Page 28: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Last Step• The upper hull is found first. Then, the lower hull is

found next using the same method.– Only finding the upper hull is described here– Upper & lower convex hull points merged into

ordered set• Each CH(Qi) has n PEs assigned to it.• The PEs assigned to CH(Qi) (in parallel) compute the

upper tangent from CH(Qi) to another CH(Qj) . – A total of n-1 tangents are computed for each

CH(Qi) – Details for computing the upper tangents will be

separately

Page 29: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.
Page 30: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Last Step

• Among the tangent lines to CH(Qi) , and polygons to the left of CH(Qi), let Li be the one with the smallest slope.

• Among the tangent lines to CH(Qi) and polygons to the right, let Ri be the one with the largest slope.

• If the angle between Li and Ri is less than 180 degrees, no point of CH(Qi) is in CH(Q).– See Figure 5.13 on next slide (from Akl’s Online text)– Otherwise, all points in CH(Q) between where Li touches

CH(Qi) and where Ri touches CH(Qi) are in CH(Q).• Array Packing is used to combine all convex hull points

of CH(Q) after they are identified.

Page 31: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.
Page 32: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Complexity• Step 1: The sort takes O(lg n) time.• Step 2: Partition of Q into subsets takes O(1) time.• Step 3: The recursive calculations of CH(Qi) for 1 i n

in parallel takes t(n) time (using n PEs for each Qi).• Step 4: The big steps here require O(lgn) and are

– Finding the upper tangent from CH(Qi) to CH(Qj) for each i, j pair.

– Array packing used to form the ordered sequence of upper convex hull points for Q.

• Above steps find the upper convex hull. The lower convex hull is found similarly.– Upper & lower hulls merged in O(1) time to ordered set

Page 33: Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.

Complexity• Cost for Step 3: Solving the recurrance

relation t(n) = t(n) + lg n

yieldst(n) = O(lg n)

• Running time for PRAM Convex Hull is O(lg n) since this is maximum cost for each step.

• Then the cost for PRAM Convex Hull isC(n) = O(n lg n).


Recommended