+ All Categories
Home > Documents > Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP

Date post: 05-Jan-2016
Category:
Upload: arlen
View: 44 times
Download: 1 times
Share this document with a friend
Description:
Parallel Programming in C with MPI and OpenMP. Michael J. Quinn. Chapter 12. Solving Linear Systems. Outline. Terminology Back substitution Gaussian elimination Jacobi method Conjugate gradient method. Terminology. System of linear equations Solve Ax = b for x Special matrices - PowerPoint PPT Presentation
70
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn Michael J. Quinn
Transcript
Page 1: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Parallel Programmingin C with MPI and OpenMP

Michael J. QuinnMichael J. Quinn

Page 2: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Chapter 12

Solving Linear SystemsSolving Linear Systems

Page 3: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Outline

TerminologyTerminology Back substitutionBack substitution Gaussian eliminationGaussian elimination Jacobi methodJacobi method Conjugate gradient methodConjugate gradient method

Page 4: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Terminology

System of linear equationsSystem of linear equations Solve Solve Ax Ax = = bb for for xx

Special matricesSpecial matrices Symmetrically bandedSymmetrically banded Upper triangularUpper triangular Lower triangularLower triangular Diagonally dominantDiagonally dominant SymmetricSymmetric

Page 5: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Symmetrically Banded

4 2 -1 0 0 0

3 -4 5 6 0 0

1 6 3 2 4 0

0 2 -2 0 9 2

0 0 7 3 8 7

0 0 0 4 0 2

Semibandwidth 2

Page 6: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Upper Triangular

4 2 -1 5 9 2

0 -4 5 6 0 -4

0 0 3 2 4 6

0 0 0 0 9 2

0 0 0 0 8 7

0 0 0 0 0 2

Page 7: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Lower Triangular

4 0 0 0 0 0

0 0 0 0 0 0

5 4 3 0 0 0

2 6 2 3 0 0

8 -2 0 1 8 0

-3 5 7 9 5 2

Page 8: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Diagonally Dominant

19 0 2 2 0 6

0 -15 2 0 -3 0

5 4 22 -1 0 4

2 3 2 13 0 -5

5 -2 0 1 16 0

-3 5 5 3 5 -32

Page 9: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Symmetric

3 0 2 2 0 6

0 7 4 3 -3 5

5 4 0 -1 0 4

2 3 -1 9 0 -5

0 -3 0 0 5 5

6 5 4 -5 5 -3

Page 10: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Back Substitution

Used to solve upper triangular systemUsed to solve upper triangular systemTxTx = = bb for for xx

Methodology: one element of Methodology: one element of xx can be can be immediately computedimmediately computed

Use this value to simplify system, revealing Use this value to simplify system, revealing another element that can be immediately another element that can be immediately computedcomputed

RepeatRepeat

Page 11: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Back Substitution

1x0 +1x1 –1x2 +4x3 8=

– 2x1 –3x2 +1x3 5=

2x2 – 3x3 0=

2x3 4=

Page 12: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Back Substitution

1x0 +1x1 –1x2 +4x3 8=

– 2x1 –3x2 +1x3 5=

2x2 – 3x3 0=

2x3 4=x3 = 2

Page 13: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Back Substitution

1x0 +1x1 –1x2 0=

– 2x1 –3x2 3=

2x2 6=

2x3 4=

Page 14: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Back Substitution

1x0 +1x1 –1x2 0=

– 2x1 –3x2 3=

2x2 6=

2x3 4=x2 = 3

Page 15: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Back Substitution

1x0 +1x1 3=

– 2x1 12=

2x2 6=

2x3 4=

Page 16: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Back Substitution

1x0 +1x1 3=

– 2x1 12=

2x2 6=

2x3 4=x1 = –6

Page 17: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Back Substitution

1x0 9=

– 2x1 12=

2x2 6=

2x3 4=

Page 18: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Back Substitution

1x0 9=

– 2x1 12=

2x2 6=

2x3 4=x0 = 9

Page 19: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Pseudocodefor i n 1 down to 1 do

x [ i ] b [ i ] / a [ i, i ]for j 0 to i 1 do

b [ j ] b [ j ] x [ i ] × a [ j, i ]endfor

endfor

Time complexity: (n2)

Page 20: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Data Dependence Diagram

We cannot execute the outer loop in parallel.We can execute the inner loop in parallel.

Page 21: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Row-oriented Algorithm

Associate primitive task with each row of Associate primitive task with each row of AA and corresponding elements of and corresponding elements of xx and and bb

During iteration During iteration ii task associated with row task associated with row jj computes new value of computes new value of bbjj

Task Task ii must compute must compute xxii and broadcast its and broadcast its valuevalue

Agglomerate using rowwise interleaved Agglomerate using rowwise interleaved striped decompositionstriped decomposition

Page 22: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Interleaved Decompositions

Rowwise interleavedstriped decomposition

Columnwise interleavedstriped decomposition

Page 23: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Complexity Analysis

Each process performs about Each process performs about n n / (2/ (2pp) ) iterations of loop iterations of loop j j in allin all

A total of A total of n n -1 iterations in all-1 iterations in all Computational complexity: Computational complexity: ((nn22//pp)) One broadcast per iterationOne broadcast per iteration Communication complexity: Communication complexity: ((nn loglog p p))

Page 24: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Column-oriented Algorithm

Associate one primitive task per column of Associate one primitive task per column of AA and associated element of and associated element of xx

Last task starts with vector Last task starts with vector bb During iteration During iteration ii task task ii computes computes xxii, ,

updates updates bb, and sends , and sends bb to task to task i i -1-1 In other words, no computational In other words, no computational

concurrencyconcurrency Agglomerate tasks in interleaved fashionAgglomerate tasks in interleaved fashion

Page 25: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Complexity Analysis

Since Since bb always updated by a single process, always updated by a single process, computational complexity same as computational complexity same as sequential algorithm: sequential algorithm: ((nn22))

Since elements of Since elements of bb passed from one passed from one process to another each iteration, process to another each iteration, communication complexity is communication complexity is ((nn22))

Page 26: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Comparison

p

n

C o lu m n - o r ien teda lg o r ith m s u p er io r

R o w - o r ien teda lg o r ith m s u p er io r

2

Message-passing timedominates

Computationtime dominates

Page 27: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Gaussian Elimination

Used to solve Used to solve AxAx = = bb when when AA is dense is dense Reduces Reduces AxAx = = bb to upper triangular system to upper triangular system

TxTx = = cc Back substitution can then solve Back substitution can then solve TxTx = = cc

for for xx

Page 28: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Gaussian Elimination

4x0 +6x1 +2x2 – 2x3 = 8

2x0 +5x2 – 2x3 = 4

–4x0 – 3x1 – 5x2 +4x3 = 1

8x0 +18x1 – 2x2 +3x3 = 40

Page 29: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Gaussian Elimination

4x0 +6x1 +2x2 – 2x3 = 8

+4x2 – 1x3 = 0

+3x1 – 3x2 +2x3 = 9

+6x1 – 6x2 +7x3 = 24

– 3x1

Page 30: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Gaussian Elimination

4x0 +6x1 +2x2 – 2x3 = 8

+4x2 – 1x3 = 0

1x2 +1x3 = 9

2x2 +5x3 = 24

– 3x1

Page 31: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Gaussian Elimination

4x0 +6x1 +2x2 – 2x3 = 8

+4x2 – 1x3 = 0

1x2 +1x3 = 9

3x3 = 6

– 3x1

Page 32: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Iteration of Gaussian Elimination

E lem en ts th a t w ill n o t b e c h an g ed

E lem en ts th a t w ill b e c h an g ed

P iv o t r o w

E lem en ts a lr ead y d r iv en to 0

i

i

Page 33: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Numerical Stability Issues

If pivot element close to zero, significant If pivot element close to zero, significant roundoff errors can resultroundoff errors can result

Gaussian elimination with partial pivoting Gaussian elimination with partial pivoting eliminates this problemeliminates this problem

In step In step ii we search rows we search rows ii through through nn-1 for -1 for the row whose column the row whose column ii element has the element has the largest absolute valuelargest absolute value

Swap (pivot) this row with row Swap (pivot) this row with row ii

Page 34: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Implementing Partial Pivoting

Without partial pivoting With partial pivoting

Page 35: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Row-oriented Parallel Algorithm

Associate primitive task with each row of Associate primitive task with each row of AA and and corresponding elements of corresponding elements of xx and and bb

A kind of reduction needed to find the identity of A kind of reduction needed to find the identity of the pivot rowthe pivot row

Tournament: want to determine identity of row Tournament: want to determine identity of row with largest value, rather than largest value itselfwith largest value, rather than largest value itself

Could be done with two all-reductionsCould be done with two all-reductions MPI provides a simpler, faster mechanismMPI provides a simpler, faster mechanism

Page 36: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

MPI_MAXLOC, MPI_MINLOC

MPI provides reduction operators MPI provides reduction operators MPI_MAXLOC, MPI_MINLOCMPI_MAXLOC, MPI_MINLOC

Provide datatype representing a (value, Provide datatype representing a (value, index) pairindex) pair

Page 37: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

MPI (value,index) DatatypesMPI_DatatypeMPI_Datatype MeaningMeaning

MPI_2INTMPI_2INT Two intsTwo ints

MPI_DOUBLE_INTMPI_DOUBLE_INT A double followed by an intA double followed by an int

MPI_FLOAT_INTMPI_FLOAT_INT A float followed by an intA float followed by an int

MPI_LONG_INTMPI_LONG_INT A long followed by an intA long followed by an int

MPI_LONG_DOUBLE_INTMPI_LONG_DOUBLE_INT A long double followed by A long double followed by an intan int

MPI_SHORT_INTMPI_SHORT_INT A short followed by an intA short followed by an int

Page 38: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Example Use of MPI_MAXLOC

struct { double value; int index;} local, global;...local.value = fabs(a[j][i]);local.index = j;...MPI_Allreduce (&local, &global, 1, MPI_DOUBLE_INT, MPI_MAXLOC, MPI_COMM_WORLD);

Page 39: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Second Communication per Iteration

j

i

p ick ed

k

a [p icked ] [k]

a [p icked ] [i]

a [j] [i]

a [j] [k]

Page 40: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication Complexity

Complexity of tournament: Complexity of tournament: (log (log pp)) Complexity of broadcasting pivot row:Complexity of broadcasting pivot row:

((nn log log pp)) A total of A total of nn - 1 iterations - 1 iterations Overall communication complexity:Overall communication complexity:

((nn2 2 log log pp))

Page 41: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Isoefficiency Analysis

Communication overhead: Communication overhead: ((nn22 p p log log pp)) Sequential algorithm has time complexity Sequential algorithm has time complexity

((nn33)) Isoefficiency relationIsoefficiency relation

nn33 CnCn22 p p log log pp nn C pC p log log pp

This system has poor scalabilityThis system has poor scalability

ppCpppCppCpM 22222 log/log/)log(

Page 42: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Column-oriented Algorithm

Associate a primitive task with each column of Associate a primitive task with each column of AA and another primitive task for and another primitive task for bb

During iteration During iteration ii task controlling column task controlling column ii determines pivot row and broadcasts its identitydetermines pivot row and broadcasts its identity

During iteration During iteration ii task controlling column task controlling column ii must must also broadcast column also broadcast column ii to other tasks to other tasks

Agglomerate tasks in an interleaved fashion to Agglomerate tasks in an interleaved fashion to balance workloadsbalance workloads

Isoefficiency same as row-oriented algorithmIsoefficiency same as row-oriented algorithm

Page 43: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Comparison of Two Algorithms

Both algorithms evenly divide workloadBoth algorithms evenly divide workload Both algorithms do a broadcast each iterationBoth algorithms do a broadcast each iteration Difference: identification of pivot rowDifference: identification of pivot row

Row-oriented algorithm does search in parallel Row-oriented algorithm does search in parallel but requires all-reduce stepbut requires all-reduce step

Column-oriented algorithm does search Column-oriented algorithm does search sequentially but requires no communicationsequentially but requires no communication

Row-oriented superior when Row-oriented superior when nn relatively larger relatively larger and and pp relatively smaller relatively smaller

Page 44: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Problems with These Algorithms

They break parallel execution into They break parallel execution into computation and communication phasescomputation and communication phases

Processes not performing computations Processes not performing computations during the broadcast stepsduring the broadcast steps

Time spent doing broadcasts is large Time spent doing broadcasts is large enough to ensure poor scalabilityenough to ensure poor scalability

Page 45: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Pipelined, Row-Oriented Algorithm

Want to overlap communication time with Want to overlap communication time with computation timecomputation time

We could do this if we knew in advance the We could do this if we knew in advance the row used to reduce all the other rows.row used to reduce all the other rows.

Let’s pivot columns instead of rows!Let’s pivot columns instead of rows! In iteration In iteration ii we can use row we can use row i i to reduce the to reduce the

other rows.other rows.

Page 46: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication Pattern

0

2

13

Row 0

Reducing UsingRow 0

Page 47: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication Pattern

0

2

13

Reducing UsingRow 0

Reducing UsingRow 0

Row 0

Page 48: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication Pattern

0

2

13

Reducing UsingRow 0

Reducing UsingRow 0

Reducing UsingRow 0

Row 0

Page 49: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication Pattern

0

2

13

Reducing UsingRow 0

Reducing UsingRow 0

Reducing UsingRow 0

Reducing UsingRow 0

Page 50: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication Pattern

0

2

13 Reducing UsingRow 0

Reducing UsingRow 0

Reducing UsingRow 0

Page 51: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication Pattern

0

2

13 Reducing UsingRow 1

Reducing UsingRow 0

Reducing UsingRow 0

Row 1

Page 52: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication Pattern

0

2

13 Reducing UsingRow 1

Reducing UsingRow 1

Reducing UsingRow 0

Row 1

Page 53: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication Pattern

0

2

13 Reducing UsingRow 1

Reducing UsingRow 1

Reducing UsingRow 1

Row 1

Page 54: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication Pattern

0

2

13

Reducing UsingRow 1

Reducing UsingRow 1

Reducing UsingRow 1

Reducing UsingRow 1

Page 55: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Analysis (1/2)

Total computation time: Total computation time: ((nn33//pp)) Total message transmission time: Total message transmission time: ((nn22)) When When nn large enough, message large enough, message

transmission time completely overlapped by transmission time completely overlapped by computation timecomputation time

Message start-up not overlapped: Message start-up not overlapped: ((nn)) Parallel overhead: Parallel overhead: ((npnp))

Page 56: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Analysis (2/2)

Isoefficiency relation:Isoefficiency relation:

Scalability function:Scalability function:

Parallel system is perfectly scalableParallel system is perfectly scalable

CpnCnpn 3 CpnCnpn 3

CpCppCpM //)(

Page 57: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Sparse Systems

Gaussian elimination not well-suited for Gaussian elimination not well-suited for sparse systemssparse systems

Coefficient matrix gradually fills with Coefficient matrix gradually fills with nonzero elementsnonzero elements

ResultResult Increases storage requirementsIncreases storage requirements Increases total operation countIncreases total operation count

Page 58: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Example of “Fill”

Page 59: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Iterative Methods

Iterative method: algorithm that generates a Iterative method: algorithm that generates a series of approximations to solution’s valueseries of approximations to solution’s value

Require less storage than direct methodsRequire less storage than direct methods Since they avoid computations on zero Since they avoid computations on zero

elements, they can save a lot of elements, they can save a lot of computationscomputations

Page 60: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Jacobi Method

ij

kjjiia

ki xabx

ii)( ,

11

,

ij

kjjiia

ki xabx

ii)( ,

11

,

Values of elements of vector x at iteration k+1depend upon values of vector x at iteration k

Gauss-Seidel method: Use latest versionavailable of xi

Page 61: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Jacobi Method Iterations4

3

2

1

1 2 3 40x

x3

1x

2x

4x

Page 62: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Rate of Convergence

Even when Jacobi method and Gauss-Seidel Even when Jacobi method and Gauss-Seidel methods converge on solution, rate of methods converge on solution, rate of convergence often too slow to make them convergence often too slow to make them practicalpractical

We will move on to an iterative method We will move on to an iterative method with much faster convergencewith much faster convergence

Page 63: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Conjugate Gradient Method

AA is positive definite if for every nonzero vector x is positive definite if for every nonzero vector x and its transpose and its transpose xxTT, the product , the product xxTTAxAx > 0 > 0

If If AA is symmetric and positive definite, then the is symmetric and positive definite, then the functionfunction

has a unique minimizer that is solution to has a unique minimizer that is solution to AxAx = = bb Conjugate gradient is an iterative method that Conjugate gradient is an iterative method that

solves solves AxAx = = bb by minimizing by minimizing q(x)q(x)

cbxAxxxq TT 21)(

Page 64: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Conjugate Gradient Convergence4

3

2

1

1 2 3 40x

1x 2

x Finds value ofn-dimensional solutionin at most n iterations

Page 65: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Conjugate Gradient Computations

Matrix-vector multiplicationMatrix-vector multiplication Inner product (dot product)Inner product (dot product) Matrix-vector multiplication has higher Matrix-vector multiplication has higher

time complexitytime complexity Must modify previously developed Must modify previously developed

algorithm to account for sparse matricesalgorithm to account for sparse matrices

Page 66: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Rowwise Block Striped Decomposition of a Symmetrically Banded Matrix

Matrix

Decomposition

Page 67: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Representation of Vectors

Replicate vectorsReplicate vectors Need all-gather step after matrix-vector Need all-gather step after matrix-vector

multiplymultiply Inner product has time complexity Inner product has time complexity ((nn))

Block decomposition of vectorsBlock decomposition of vectors Need all-gather step before matrix-vector Need all-gather step before matrix-vector

multiplymultiply Inner product has time complexityInner product has time complexity

((n/p + n/p + loglog p p))

Page 68: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Comparison of Vector Decompositions

R ep lic a tedVec to r sS u p er io r

Blo c kD ec o m p o s it io nS u p er io r

p

n

Page 69: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Summary (1/2)

Solving systems of linear equationsSolving systems of linear equations Direct methodsDirect methods Iterative methodsIterative methods

Parallel designs forParallel designs for Back substitutionBack substitution Gaussian eliminationGaussian elimination Conjugate gradient methodConjugate gradient method

Page 70: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Summary (2/2)

Superiority of one algorithm over another Superiority of one algorithm over another depends on size of problem, number of depends on size of problem, number of processors, characteristics of parallel processors, characteristics of parallel computercomputer

Overlapping communications with Overlapping communications with computations can be key to scalabilitycomputations can be key to scalability


Recommended