+ All Categories
Home > Documents > AMS 529: Finite Element Methods: Fundamentals, Applications, and...

AMS 529: Finite Element Methods: Fundamentals, Applications, and...

Date post: 29-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
23
AMS 529: Finite Element Methods: Fundamentals, Applications, and New Trends Lecture 24: Distributed-Memory Programming for FEM Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Finite Element Methods 1 / 23
Transcript
Page 1: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

AMS 529: Finite Element Methods:Fundamentals, Applications, and New TrendsLecture 24: Distributed-Memory Programming for FEM

Xiangmin Jiao

SUNY Stony Brook

Xiangmin Jiao Finite Element Methods 1 / 23

Page 2: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Outline

1 Introduction to MPI

2 Distributed-Memory Parallelization of FEM

Xiangmin Jiao Finite Element Methods 2 / 23

Page 3: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Review of Shared-Memory Programming Paradigm

There is a single (shared) address spaceAll parallel threads can access all data (if address is known)

Advantages:Easier to get started

Disadvantages:Prone to performance bottlenecks due to synchronization (locks,mutex)Difficult to scale due to hardware limitations (number of cores,memory bandwidth, etc.)Bottomline: Difficult to get good performance

Xiangmin Jiao Finite Element Methods 3 / 23

Page 4: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Distributed-Memory Programming Paradigm

Multiple machines with their own address spacesNo direct access to remote data; data must be transported explicitly

Advantages:Programmer has more control of communication and synchronizationEasier to scale to a large number of cores (more than 1M cores)

Disadvantages:Much more difficult programming modelYou are forced to think algorithmically and make hard decisionsPractical difficulties in debugging, profiling, etc.

Good distributed-memory algorithms may be converted into efficientshared-memory codes, so it is advantageous to think distributed-memoryalgorithmically

Xiangmin Jiao Finite Element Methods 4 / 23

Page 5: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Distributed-Memory Programming Languages/Libraries

Message Passing Interface (MPI)Remote memory access (RMA) – one-sided communication; supportedby MPI-2 & 3Partitioned global address space (PGAS)

I Unified Parallel C (UPC)I Coarray Fortran (part of Fortran 2008)I Chapel, X10, Titanium, etc.

Remote procedure calls (RPC)Most of these belong to SPMD (Single-Program, Multiple Data) orMIMD (Multiple Instructions, Multiple Data) parallelism

Xiangmin Jiao Finite Element Methods 5 / 23

Page 6: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Message Passing

Parallel program consists of processes, each with own address spaceData need to be

I distributed explicitly by algorithm/programmerI sent/received explicitly by algorithm/programmer

Communication patternsI Point-to-point (one-to-one) communication (one side sends; the other

side receives)I Collective (global) communication (broadcast and reduction) and

synchronization (barriers) on arbitrary subset of processes

Xiangmin Jiao Finite Element Methods 6 / 23

Page 7: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Message Passing Interface (MPI)MPI is a library, with C and Fortran bindings

I Also for other languages such as Python, MATLAB/Octave, etc.I C++ binding was added in MPI-2 but deleted in MPI-3

Communicators in MPII Processes form groups (subset of processes)I Each process has its rank (ID) within group

MPI is composed ofI data types (MPI_Comm etc.)I functions

F System: MPI_Init/MPI_Init_thread, MPI_FinalizeF Communicator: MPI_Comm_size, MPI_Comm_rankF Point-to-point: MPI_Send, MPI_Recv, MPI_Isend, MPI_Irecv,

MPI_Test, MPI_Wait, MPI_Probe/MPI_Iprobe, etc.F Collective: MPI_Barrier, MPI_Bcast, MPI_Scatter, MPI_Gather,

MPI_Reduce, MPI_Allreduce, ...F Others: Parallel I/O, profiling, communicator/datatype/operator

creation, etc.I constants (MPI_COMM_WORLD, MPI_FLOAT, etc.)

Xiangmin Jiao Finite Element Methods 7 / 23

Page 8: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Example MPI Program in C#include "mpi.h"int main(int argc, char *argv[]) {

int rank, size, i, provided;float A[10];MPI_Init_thread(&argc, &argv, MPI_THREAD_SINGLE, &provided);MPI_Comm_size(MPI_COMM_WORLD, &size);MPI_Comm_rank(MPI_COMM_WORLD, &rank);for (i=0; i<10; i++) A[i] = i;printf("My rank %d of %d\n", rank, size);printf("Here are my values for A\n");for (i=0; i<10; i++) printf("%f ", A[i]);printf("\n");MPI_Finalize();

}

Compile using mpicc instead of ccRun using “mpirun -np nprocs program” (optional when nprocs=1)

Xiangmin Jiao Finite Element Methods 8 / 23

Page 9: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Extended Example with Communication

#include "mpi.h"int main(int argc, char *argv[]) {

...../* for (i=0; i<10; i++) A[i] = i; */if (rank == 0) {

for (i=0; i<10; i++) A[i] = i;for (i=1, i<size; i++)

MPI_Send(A, /*count*/ 10, /*datatype*/ MPI_FLOAT,/*to*/ i, /*tag*/ 0, MPI_COMM_WORLD);

} else {MPI_Recv(A, /*count*/ 10, /*datatype*/ MPI_FLOAT,/*from*/ 0, /*tag*/ 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

}.....

}

Xiangmin Jiao Finite Element Methods 9 / 23

Page 10: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Point-to-Point Communication in MPI

MPI_Send/MPI_Recv are blockingI safe to change/use data after function returns

MPI_Isend and MPI_Irecv are nonblockingI Cannot change/use data after function returnsI Use MPI_Test/MPI_Wait[some|any|all] to check completion

Algorithmic considerations:1 Avoid deadlock in blocking communication2 Overlap computation with communication using nonblocking

communication3 Reduce number of messages by aggregating small messages

Xiangmin Jiao Finite Element Methods 10 / 23

Page 11: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Collective Communication in MPI

Barrier: block until all reach here (MPI_Barrier)Broadcast: from root process to others in group (MPI_Bcast)Scatter: scatter buffer in parts to all in group (MPI_Scatter[v])Gather: gather buffer in parts from all in group (MPI_Gather[v])Reduce: reduce values on all processes to single value(MPI_Reduce/MPI_Allreduce)All processes in group must participate to avoid deadlock!Algorithmic considerations:

1 Collective comms. are synchronous and expensive, so use scarcely2 Reduce number of messages by aggregating small messages3 Load balancing is critical when there are collective communications

Xiangmin Jiao Finite Element Methods 11 / 23

Page 12: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Advanced MPI Topics

One-sided communication (MPI_Get, MPI_Put)I One side gets and puts data without involvement of other process

An MPI process can be multithreaded (hybrid programming models)I Process must be initialized using MPI_Init_thread instead of MPI_InitI All threads share a common MPI rankI Typically, only one thread should be in charge of MPI communication

(MPI_THREAD_FUNNELED)Debugging using totalview or gdb (or ddd)

I “mpirun -np 4 xterm -e gdb ./myprog” or “mpirun -np 4 ddd ./myprog”

Timing using MPI_Wtime (typically should use barriers before call)Profiling using TAU and mpiPAdditional resources

I W. Gropp, CS 598 Designing and Building Applications for ExtremeScale Systems at University of Illinois

Xiangmin Jiao Finite Element Methods 12 / 23

Page 13: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Outline

1 Introduction to MPI

2 Distributed-Memory Parallelization of FEM

Xiangmin Jiao Finite Element Methods 13 / 23

Page 14: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Partitioning and Distribution of Mesh

Mesh is typically partitioned and distributed onto processes

Algorithmic considerationsI Load balancing: computation on each

process should be about equalI Reduce communication: “minimize”

partition boundaries (surface tovolume ratio)

I Cost of partitioner: partitioner itselfshould be fast

I Parallel partitioning/repartitioning:partitioner need to run in parallel

Xiangmin Jiao Finite Element Methods 14 / 23

Page 15: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Graph/Mesh Partitioning TechniquesGraph partitioner partitions nodes of graph

I balance work loads, whileI “minimizing” edge cuts

When partitioning meshes, apply to nodal graph or dual graph

Xiangmin Jiao Finite Element Methods 15 / 23

Page 16: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Geometric Partitioning Techniques

Coordinate nested dissection (CND): split along shortest axesI Fast but low quality

Recursive inertial bisection (RIB): extension of CND with PCAI Fast and better quality than CND

Space-filling curve (SFC): locality preserving curvesI Fast and good quality; popular for hierarchical meshes

CND vs. RIB Peano-Hilbert SFCXiangmin Jiao Finite Element Methods 16 / 23

Page 17: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Combinatorial Partitioning TechniquesLevelized nested dissection (LND): bread-first search from v0 untilreach half of vertices

LND Before and after KLKL/FM partition refinement: starting from initial partitioning, movevertices to reduce edge cut; need to escape local extremaOthers

I Spectral: based on graph LaplacianI Multilevel schemes: recursively coarsen/partition/refine

Xiangmin Jiao Finite Element Methods 17 / 23

Page 18: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Comparison of Partitioning Methods

Xiangmin Jiao Finite Element Methods 18 / 23

Page 19: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Repartitioning MethodsNeeded for adaptive meshesKey issue: tradeoff between redistribution cost and edge-cut

Xiangmin Jiao Finite Element Methods 19 / 23

Page 20: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Graph Partitioning Software

Chaco METIS ParMETIS ScotchGeometric schemes RIB SFCSpectral methods X

Combinatorial schemes KL/FM XMultilevel schemes X X X

Dynamic repartitioning XParallel partitioning X

Another software similar to ParMETIS was Jostle, but it is no longeravailable as open sourceAdditional reading

I K. Schloegel, G. Karypis, V. Kumar, Graph Partitioning forHigh-Performance Scientific Simulations, in Sourcebook for ParallelComputing, Chapter 18, 2003, Morgan Kaufmann

I Slides by R. van Engelen

Xiangmin Jiao Finite Element Methods 20 / 23

Page 21: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Distribution of Mesh and Field Variables

In FEM, all global data must be distributedI mesh (coordinates and elements)I degree of freedomsI matrixI right-hand side and solution vectorsI post-processed data (parallel I/O)

It is natural to partition DoFs and matrices based on nodesI Each node (and corresponding DoFs and rows in matrix) is owned by

one process

What about elements?

Xiangmin Jiao Finite Element Methods 21 / 23

Page 22: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Typical Control Flow of Parallel FEM

Each element is owned by one process (partition by dual graph)Each process owns part of nodes (DoFs) associated with elementsA process may own ghost nodesParallel assembly

I Each process computes element matrices on locally owned elementsI Merge partial nodal values along partition boundaries of elements

Required parallel communication patterns:I Requires updating values on ghost nodesI Merge partial nodal values during assembly

These communication patterns are in general abstracted into high-levelfunctions, typically implemented using nonblocking point-to-pointcommunication; application codes do not call MPI directlyTypically, linear solvers are typically done by calling PETSc or TrilinosExample code: parallel assembly in deal.II

Xiangmin Jiao Finite Element Methods 22 / 23

Page 23: AMS 529: Finite Element Methods: Fundamentals, Applications, and …jiao/teaching/ams529/lectures/... · 2017-11-20 · AMS 529: Finite Element Methods: Fundamentals, Applications,

Alternative Control Flow of Parallel FEM

Each node (and associated DoFs) is owned by one process (partitionby nodal graph)Each process owns part of elements incident on nodeA process may own ghost nodes and ghost elementsParallel assembly: Each process computes element matrices on locallyowned and ghost elements incident on locally owned nodesRequired parallel communication pattern:

I Update values on ghost nodes and ghost elements

Disadvantage: Computation of element matrices for elements incidenton partition boundaries are duplicatedAdvantage: Can be adapted to shared-memory implementationwithout race condition

Xiangmin Jiao Finite Element Methods 23 / 23


Recommended