+ All Categories
Home > Documents > • Embarrassingly Parallel Computations • Partitioning and Divide-and-Conquer Strategies

• Embarrassingly Parallel Computations • Partitioning and Divide-and-Conquer Strategies

Date post: 15-Mar-2016
Category:
Upload: lance-suarez
View: 31 times
Download: 3 times
Share this document with a friend
Description:
Parallel Techniques. • Embarrassingly Parallel Computations • Partitioning and Divide-and-Conquer Strategies • Pipelined Computations • Synchronous Computations • Asynchronous Computations • Strategies that achieve load balancing. 3.1. - PowerPoint PPT Presentation
Popular Tags:
35
Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations Strategies that achieve load balancing Parallel Techniques 3. 1 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.
Transcript
Page 1: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

• Embarrassingly Parallel Computations

• Partitioning and Divide-and-Conquer Strategies

• Pipelined Computations

• Synchronous Computations

• Asynchronous Computations

• Strategies that achieve load balancing

Parallel Techniques

3.1 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.

Page 2: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Embarrassingly Parallel Computations

3.2

Page 3: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Embarrassingly Parallel ComputationsA computation that can obviously be divided into a number of completely independent parts, each of which can be executed by a separate process(or).

No communication or very little communication between processesEach process can do its tasks without any interaction with other processes

3.3

Page 4: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Practical embarrassingly parallel computation with static process

creation and master-slave approach

3.4

Page 5: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Embarrassingly Parallel Computation Examples

• Low level image processing

• Mandelbrot set

• Monte Carlo Calculations

3.6

Page 6: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Low level image processing

Many low level image processing operations only involve local data with very limited if any communication between areas of interest.

3.7

Page 7: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Image coordinate system

3.9

. (x, y)

x

y

Origin (0,0)

Page 8: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Some geometrical operationsShifting

Object shifted by x in x-dimension and y in y-dimension:x = x + xy = y + y

where x and y are original coordinates and x and y are the new coordinates.

3.8

x

yx

y

Page 9: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Some geometrical operationsScaling

Object scaled by factor Sx in x-direction and Sy in y-direction:x = xSx

y = ySy

3.8

x

y

Page 10: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

RotationObject rotated through an angle q about the origin of the coordinate system:

x = x cos + y siny = -x sin + y cos

3.8

x

y

Page 11: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Parallelizing Image OperationsPartitioning into regions for individual processes

Example

Square region for each process (can also use strips) 3.9

Page 12: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Mandelbrot Set

Set of points in a complex plane that are quasi-stable (will

increase and decrease, but not exceed some limit) when

computed by iterating the function:

where zk +1 is the (k + 1)th iteration of complex number:

z = a + bi

and c is a complex number giving position of point in the

complex plane. The initial value for z is zero.3.10

Page 13: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Mandelbrot Set continuedIterations continued until magnitude of z is:

• Greater than 2

or

• Number of iterations reaches arbitrary limit.

Magnitude of z is the length of the vector given by:

3.10

Page 14: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Sequential routine computing value of one point returning number of iterations

structure complex {float real;float imag;

};int cal_pixel(complex c){

int count, max;complex z;float temp, lengthsq;max = 256;z.real = 0; z.imag = 0;count = 0; /* number of iterations */do {

temp = z.real * z.real - z.imag * z.imag + c.real;z.imag = 2 * z.real * z.imag + c.imag;z.real = temp;lengthsq = z.real * z.real + z.imag * z.imag;count++;

} while ((lengthsq < 4.0) && (count < max));return count;

} 3.11

Page 15: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Mandelbrot SetThe display height is disp_height, the display width is disp_width, and

the point in the display area is (x, y). If this window is to display the complex plane within minimum values of (real_min, imag_min) and maximum values of (real_max, imag_max), each (x,y) needs to be scaled by the factors.

scale_real = (real_max – real_min) /disp_width;scale_imag = (imag_max – imag_min) / disp_height;

for(x = 0; x < disp_width; x++) for (y = 0; y < disp_height; y++) {

c.real = real_min + ((float) x * scale_real); c.imag = imag_min + ((float) y * scale_imag);

color = cal_pixel(c); display(x, y, color);

}

Page 16: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Mandelbrot set

3.12

http://www.cs.siu.edu/~mengxia/Courses%20PPT/520/mandelbrot.c

cc -I/usr/X11R6/include –o mandelbrot mandelbrot.c -L/usr/X11R6/lib64 -lX11 -lm

Page 17: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Parallelizing Mandelbrot Set Computation

Static Task Assignment

Simply divide the region in to fixed number of parts, each computed by a separate processor. Each processor needs to execute the procedure, cal_pixel(), after being given the coordinates of the pixels to compute. Suppose the display area is 480x640 and we were to compute 60 rows in a process, a grouping of 60x640 pixels and 8 processors in our Athena cluster. Not very successful because different regions require different numbers of iterations and time.

3.13

Page 18: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Parallel pseudocodeMasterfor (i = 0; row= 0; i< 8; i++, row=row+60) /*for each process*/

send(&row, pi); /*send row number*/for (i = 0; i< (480*640); i++) { /*from processes, any order */ recv(&c, &color, PANY); display(c, color);}Slave ( process i)recv(&row, Pmaster); /*receive row no. */for (x=0; x<disp_width; x++) /*screen coordinates x and y*/ for ( y=row; y<(row+60); y++) {

c.real = real_min + ((float) x * scale_real); c.imag = imag_min + ((float) y * scale_imag);

color = cal_pixel(c);send(&c, &color, Pmaster); /*send coords, color to master*/

}

Page 19: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Analysis

• Serial time: ts <= max_iter x n //for n pixels

• tcomm1 = (p-1)(tstartup + tdata) // for the first row number

• tcomp <= max_iter x n / (p-1) //divide the image into groups of n/(p-1) pixels, each requires at most max_iter steps

• tcomm2 = n (tstartup + 3 x tdata) // each pixel has three integers, two for coordinates and one for color

• Speedup factor approaches p if max_iter is large.

Page 20: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

3.14

Dynamic Task Assignment

Have processor request regions after computing previousregions

Page 21: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

• Each slave is first given one row to compute and, from then on, gets another row when it returns a result, until there are no more rows to compute.

• The master sends a terminator message when all the rows have been taken. To differentiate between different messages, tags are used with the message data_tag for rows being sent to the slaves, terminator_tag for the terminator message, and result_tag for the computed results from the slaves.

Page 22: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Mastercount = 0; /*counter for termination*/row = 0; /*row being sent*/For (k=0; k< num_proc; k++) {

send ( &row, Pk, data_tag);/*send initial row to process*/count++; /*count rows sent*/row++; /*next row*/

}do {

recv (&slave, &r, color, PANY, result_tag);count --; /*reduce count as rows received*/if ( row < dis_height) {send (&row, Pslave, data_tag); /*send next row*/row++; /*next row*/count++;} elsesend (&row, Pslave, terminator_tag); /*terminate*/display( r, color); /*display row*/

} while (count > 0);

Page 23: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Slave

recv( y, Pmaster, ANYTAG, source_tag); /*receive 1st row to compute*/While (source_tag == data_tag) {

c.img = imag.min + ((float) y * scale_imag);for ( x= 0; x< disp_width, x++) { /*compute row colors*/

c.real = real_min ++ ((float) x*scale_real;color [x] = cal_pixel(c);

}send (&y, color, Pmaster, result_tag);recv(&y, Pmaster, source_tag);

};

Page 24: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Monte Carlo Methods

Another embarrassingly parallel computation.

Monte Carlo methods use of random selections.

3.15

Page 25: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Circle formed within a 2 x 2 square. Ratio of area of circle to square given by:

Points within square chosen randomly. Score kept of how many points happen to lie within circle.

Fraction of points within circle will be , given sufficient number of randomly selected samples.

3.16

Page 26: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

2.26

Page 27: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

PI Calculation• The value of PI can be calculated in a number of ways. Consider the following method of

approximating PI – Inscribe a circle in a square – Randomly generate points in the square – Determine the number of points in the square that are also in the circle – Let r be the number of points in the circle divided by the number of points in the square – Note that the more points generated, the better the approximation

• Serial pseudo code for this procedure: npoints = 10000 circle_count = 0 • do j = 1,npoints • generate 2 random numbers between 0 and 1 xcoordinate = random1 ; ycoordinate = random2 • if (xcoordinate, ycoordinate) inside circle then circle_count = circle_count + 1 • end do • PI = 4.0*circle_count/npoints • Note that most of the time in running this program would be spent executing the loop

2.27

Page 28: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

• Parallel strategy:. For the task of approximating PI: Each task executes its portion of the loop a number of times. • Each task can do its work without requiring any information from the other tasks (there are no data dependencies). • Uses the SPMD model. One task acts as master and collects the results. • Pseudo code solution:

• npoints = 10000; circle_count = 0; p = number of tasks; num = npoints/p • find out if I am MASTER or WORKER • do j = 1 to num • generate 2 random numbers between 0 and 1 • xcoordinate = random1; ycoordinate = random2;• if (xcoordinate, ycoordinate) inside circle • then circle_count = circle_count + 1; • end do• if I am MASTER receive from WORKERS their circle_counts compute PI (use MASTER and WORKER

calculations) • else if I am WORKER send to MASTER circle_count • endif

2.28

Page 29: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Computing an IntegralOne quadrant can be described by integral

Random pairs of numbers, (xr,yr) generated, each between 0 and 1.

3.18

Page 30: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Alternative (better) MethodUse random values of x to compute f(x) and sum values of f(x):

where xr are randomly generated values of x between x1 and x2.

Monte Carlo method very useful if the function cannot be integrated numerically (maybe having a large number of variables)

3.19

Page 31: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

ExampleComputing the integral

Sequential Codesum = 0;for (i = 0; i < N; i++) { /* N random samples */

xr = rand_v(x1, x2); /* generate next random value */sum = sum + xr * xr - 3 * xr; /* compute f(xr) */

}area = (sum / N) * (x2 - x1);

Routine randv(x1, x2) returns a pseudorandom number between x1 and x2.

3.20

Page 32: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

For parallelizing Monte Carlo code, must address best way to generate random numbers in parallel . Each computation uses a different random number and there is no correlation between the numbers.

3.21

Page 33: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Pseudorandom-number generator

• For successful Monte Carlo simulations, the random numbers must be independent of each other. The most popular way of creating a pseudorandom-number sequence is by evaluation x(i+1) from a carefully chosen function of xi. The key is to find a function that will create a large sequence with the correct statistical properties. The linear congruential generator function is of the form:

• X(i+1)= (a*xi+c) mod m where a, c and m are constants.

Many choices of the constants, a “good” generator is with a=16807, m=2^31 -1 ( a prime number), and c = 0. A repeating sequence of 2^31 -2 different numbers. But it is slow.

Numbers are repeatable and deterministic. Being repeatable is god for experiment testing.

Page 34: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Pseudorandom-number generator• Approaches:

– Rely on a centralized linear congruential pseudorandom-number generator to send out numbers to slave processes when they request number. Slow and Bottle neck in master node.

– Alternatively, each slave can use a separate generator with different formula. If same formula, correlation might occur.

Problems: – Even if a random-number generator appears to create a series

of random numbers from statistical tests, we cannot assume that different sub-sequence or sampling of numbers from the sequences are not correlated. Generator repeat their sequences at some point.

Page 35: •  Embarrassingly Parallel Computations •  Partitioning and Divide-and-Conquer Strategies

Pseudorandom-number Generator

• A solid parallel versions of pseudorandom-number generator called SPRNG (scalable Pseudorandom Number generator ) is a library specifically for parallel Monte Carlo computations and generate random-number streams for parallel processes.

• It has several different generators and features to minimize interprocess correlation. Interface for MPI is also provided.

Link to SPRNG packages: • http://www.nersc.gov/nusers/resources/software/libs/math/random/www

/index.html• Please use SPRNG to run your Monte Carlo method to compute pi as

homework.


Recommended