+ All Categories
Home > Documents > CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

Date post: 26-Dec-2015
Category:
Upload: thomas-walters
View: 230 times
Download: 0 times
Share this document with a friend
Popular Tags:
27
CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries
Transcript
Page 1: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

CS 591x – Cluster Computing and Programming Parallel Computers

Parallel Libraries

Page 2: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

Parallel Libraries

Recall that so far we have been – Breaking up (decomposing) our

“large” problems into smaller pieces…

Distributing the pieces of the problem to multiple processors

Explicitly moving data among processes through message passing

Page 3: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

Parallel Libraries

Note that – Large scientific and engineering

problems often represent data in matrices and vectors

Large scientific and engineering problems make heavy use of linear algebra, linear systems, non-linear systems

Page 4: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

Parallel Libraries

MPI is designed to support the development of librariesConsequently, there are a number of libraries, based on MPI, used to develop parallel softwareSome libraries take care of much, or all of the parallelizationThat means….

Page 5: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

Parallel Libraries

… You don’t have to…… but you still can…… if you want… sometimes…

Page 6: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

Parallel Libraries

ScaLAPACK Scalable Linear Algebra PACKage

PETSc Portable, Extensible Toolkit for

Scientific Computation

Page 7: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

ScaLaPACK

Built on LAPACK – Linear Algebra Package Powerful Widely used in scientific and engineering

computing not scalable to distributed memory parallel

computers

LAPACK is built on BLAS – the Basic Linear Algebra Subprogram library

Page 8: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

ScaLAPACK

uses PBLAS – Parallel BLAS performs local matrix and vector operations

in parallel application uses BLAS

uses BLACS – Basic Linear Algebra Communications Subprograms library handles interprocess communications for

ScaLAPACK uses MPI (other implementations also)

Page 9: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

ScaLAPACK

Maps matrices and vectors to a process grid called a BLACSgrid similar to an MPI Cartesian topology matrices and vectors decomposed

into rectangular blocks – block cyclically distributed to BLACSgrid

Page 10: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

ScaLAPACK – sample based on Pacheco pg. 345-350

MPI_Init(&argc, &argv);

MPI_Comm_size(MPI_COMM_WORLD, &p);

MPI_Comm_rank(MPI_COMM_WORLD,&myrank);

Get_input(p, myrank, &n, &n_proc_rows,&nproc_cols,

&row_block_size, &col_block_size);

m=n;

Cblacs_get(0,0,&blacs_grid); /* build blacs grid */

/* R process grid will use row major order */

Cblacs_gridinit(&blacs_grid,”R”,nproc_rows, nproc_cols);

Cblacs_pcoord(blacs_grid,my_rank,&my_proc_row,&my_proc_col);

Page 11: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

ScaLAPACK – sample cont.local_mat_rows=get_dim(m,row_block_size,my_proc_row,nproc_rows);

local_mat_cols=get_dim(n,col_block_size,my_proc_col,nproc_cols);

Allocate(my_rank,”A”,&A_local,local_mat_rows*local_mat_cols,1);

b_local_size=get_dim(m,row_block_size,my_proc_row,nproc_rows);

Allocate(my_rank,”b”,b_local,b_local_size,1);

exact_local_size=get_dim(m,col_block_size,my_proc_row,nproc_rows);

Allocate(myrank,”Exact”,&exact_local,exact_local_size,1);

Page 12: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

ScaLAPACK – sample cont.Build_descript(my_rank,”A”,A_descript,m,n,row_block_size,col_block_size,blacs_grid,local_mat_rows);

Build_descript(my_rank,”B”,b_descript,m,1,row_block_size,1,blacs_grid,b_local_size);

Build_descript(my_rank,”Exact”,exact_descript,n,1,col_block_size,1,blacs_grid,exact_local_size);

Page 13: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

scaLAPACK – sample cont.Initialize(p,my_rank,A_local,local_mat_rows,local_mat_cols,exact_local,exact_local_size);

Mat_vect_mult(m,n,A_local,A_descript, exact_local, exact_descript, b_local, b_descript);

Allocate(my_rank,”pivot_list”,&pivot_list,local_mat_rows + row_block_size,0);

MPI_Barrier(MPI_COMM_WORLD);

Page 14: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

scaLAPACK – sample cont./* psgesv solves Ax=b returns solution in b */

solve(my_rank,n,A_local,A_descript,pivot_list, b_local, b_descript);

Cblacs_exit(1);

MPI_Finalize();

}

Page 15: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

scaLAPACK – sample cont.void Mat_vect_mult(int m, int n, float* A_local,

int A_descript, float* x_local, int* x_descript, float y_local, int* y_descript) (

char transpose = ‘N’;

psgemv(&transpose, &m, &n, &alpha, A_local, &first_row_A, &first_col_A, A_descript, x_local, &first_row_x, &first_col_x, x_descript, &beta,

y_local, &first_row_y, &first_col_y, y_descript,

y_increment);

}

Page 16: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

Crossing Languages – Some Issues

Calling routines from another language calling Fortran subroutine in C

Using n dimensional arrays remember row major vs column major

Passing arguments in routine/function calls Fortran passes by address, C passes

by value

Page 17: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

PETSc

Portable, Extensible Toolkit for Scientific ComputationLarge, powerfulSolves Partial differential equations Linear systems Non-linear systems

Solves matrices – Dense Sparse

Page 18: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

PETSc

PETSc routines return error codesPETSc error checking routines to help troubleshoot problems CHKERRRA(errorcode)

Page 19: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

PETSc

Built on top of MPIDeveloped primarily for C/C++ unlike scaLAPACK has Fortran interface

Dense and sparce matrices same interface

Page 20: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

PETScIncludes many non-blocking operations i.e. any process can update any cell matrix as

non-blocking operation --- other work can be going on while this update

operation is carried out

Many options available from command linePETSc includes many solversSolvers can be selected from command line can change solvers without recompiling

PETSC_DECIDES

Page 21: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

PETSc

from -- http://www.epcc.ed.ac.uk/tracsbin/petsc-2.0.24/docs/splitmanual/node2.html#Node2

Page 22: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

PETSc

from -- http://www.epcc.ed.ac.uk/tracsbin/petsc-2.0.24/docs/splitmanual/node2.html#Node2

Page 23: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

PETSc – sample routinesPetscOptionsGetInt(PETSC_NULL, “-n”, &n, &flg);

VecSetType(Vec x, Vec_type vec_type);

VecCreate(MPI_Comm comm, Vec *x);

VecSetSizes(Vec x, int m, int M);

VecDuplicate(Vec old, Vec new);

MatCreate(MPI_Comm comm, int m, int n, int M, int N, Mat* A);

MatSetValues(Mat A, int m, int* im, int n, int* in,

PetscScalar *values, INSERT_VALUES);

Page 24: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

PETSc – sample routinesMatAssemblyBegin(Mat A, MAT_FINAL_ASSEMBLY);

MatAssemblyEnd(Mat A, MAT_FINAL_ASSEMBLY);

KSPCreate(MPI_Comm comm, KSP *ksp);

KSPSolve(KSP ksp, Vec b, Vec x);

PetscInitialize(&argc, &argv);

PetscFinalize();

Page 25: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

BLAS (Basic Linear Algebra Subprograms http://www.netlib.org/blas/

LAPACK Linear Algebra PACKage http://www.netlib.org/lapack/ http://www.netlib.org/lapack/lug/

index.html

ScaLaPACK http://www.netlib.org/scalapack/scalapac

k_home.html

Page 26: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

PETSc http://www-unix.mcs.anl.gov/petsc/petsc-

as/ http://acts.nersc.gov/petsc/ http://www.chuug.org/talks/petsc.pdf http://www.epcc.ed.ac.uk/tracsbin/petsc-

2.0.24/docs/splitmanual/manual.html#Node0

Page 27: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

Recommended