Parallel Solution of Navier Stokes Equations

Parallel Solution of Navier Parallel Solution of Navier Stokes EquationsStokes Equations

Xing CaiXing Cai

Dept. of InformaticsDept. of InformaticsUniversity of OsloUniversity of Oslo

Outline of the TalkOutline of the Talk

• Two parallelization strategies Two parallelization strategies – based on domain decompositionbased on domain decomposition

– at the linear algebra levelat the linear algebra level

• Parallelization of Navier-StokesParallelization of Navier-Stokes

• Numerical experimentsNumerical experiments

DiffpackDiffpack

• O-O software environment for O-O software environment for scientific computationscientific computation

• Rich collection of PDE solution Rich collection of PDE solution components - components - portable, flexible, extensibleportable, flexible, extensible

• www.diffpack.comwww.diffpack.com

• H.P.Langtangen: H.P.Langtangen: Computational Computational Partial Differential EquationsPartial Differential Equations, , Springer 1999Springer 1999

The QuestionThe Question

Starting point: sequential PDE Starting point: sequential PDE solverssolversHow to do the parallelization?How to do the parallelization?

Resulting parallel solvers should haveResulting parallel solvers should have good parallel efficiencygood parallel efficiency good overall numerical performancegood overall numerical performance

We needWe need a good parallelization strategya good parallelization strategy a good and simple implementation of the strategya good and simple implementation of the strategy

Domain DecompositionDomain Decomposition

• Solution of the original large problem Solution of the original large problem through iteratively solving many smaller through iteratively solving many smaller subproblemssubproblems

• Can be used as solution method or Can be used as solution method or preconditionerpreconditioner

• Flexibility -- localized treatment of Flexibility -- localized treatment of irregular geometries, singularities etcirregular geometries, singularities etc

• Very efficient numerical methods -- even Very efficient numerical methods -- even on sequential computerson sequential computers

• Suitable for coarse grained parallelizationSuitable for coarse grained parallelization

Additive Schwarz MethodAdditive Schwarz Method

• Subproblems can be solved in Subproblems can be solved in parallelparallel

• Subproblems are of the Subproblems are of the same same form form as the original large problem, with as the original large problem, with possibly different boundary possibly different boundary conditions on artificial boundariesconditions on artificial boundaries

Convergence of the SolutionConvergence of the Solution

Single-phaseSingle-phasegroundwatergroundwaterflowflow

ObservationsObservations

• DD is a good parallelization strategyDD is a good parallelization strategy• The approach is not PDE-specificThe approach is not PDE-specific• A program for the original global A program for the original global

problem can be reused (modulo B.C.) problem can be reused (modulo B.C.) for each subdomainfor each subdomain

• Must communicate overlapping point Must communicate overlapping point values values

• No need for global dataNo need for global data• Explicit temporal schemes are a Explicit temporal schemes are a

special case where no iteration is special case where no iteration is needed (“exact DD”)needed (“exact DD”)

Goals for the ImplementationGoals for the Implementation

• Reuse sequential solver as subdomain Reuse sequential solver as subdomain solversolver

• Add DD management and Add DD management and communication as separate modulescommunication as separate modules

• Collect common operations in generic Collect common operations in generic library moduleslibrary modules

• Flexibility and portabilityFlexibility and portability

• Simplified parallelization process for the Simplified parallelization process for the end-userend-user

Generic Programming FrameworkGeneric Programming Framework

Generic Subdomain SimulatorsGeneric Subdomain Simulators

• SubdomainSimulatorSubdomainSimulator– abstract interface to all subdomain abstract interface to all subdomain

simulators, as seen by the Administratorsimulators, as seen by the Administrator

• SubdomainFEMSolverSubdomainFEMSolver– Special case of SubdomainSimulator for Special case of SubdomainSimulator for

finite element-based simulatorsfinite element-based simulators

• These are generic classes, not These are generic classes, not restricted to specific application areasrestricted to specific application areas

SubdomainFEMSolver

SubdomainSimulatorSubdomainSimulator

Making the Simulator ParallelMaking the Simulator Parallel

class SimulatorP : public SubdomainFEMSolverclass SimulatorP : public SubdomainFEMSolver public Simulatorpublic Simulator{{ // // … just a small amount of code… just a small amount of code virtual void createLocalMatrix ()virtual void createLocalMatrix () { Simulator::makeSystem (); }{ Simulator::makeSystem (); }};};

SubdomainSimulatorSubdomainSimulator

SubdomainFEMSolver

AdministratorAdministrator

SimulatorPSimulatorP

SimulatorSimulator

Summary So FarSummary So Far

• A generic approachA generic approach

• Works if the DD algorithm worksWorks if the DD algorithm works

• Make use of class hierarchiesMake use of class hierarchies

• The new parallel-specific code, The new parallel-specific code, SimulatorPSimulatorP, is very small and simple , is very small and simple to writeto write

ApplicationApplication Single-phase groundwater flowSingle-phase groundwater flow DD as the global solution methodDD as the global solution method Subdomain solvers use CG+FFTSubdomain solvers use CG+FFT Fixed number of subdomains Fixed number of subdomains MM=32 (independent of =32 (independent of

PP)) Straightforward parallelization of an existing Straightforward parallelization of an existing

simulatorsimulator P Sim. Time Speedup Efficiency

1 53.08 N/A N/A

2 27.23 1.95 0.97

4 14.12 3.76 0.94

8 7.01 7.57 0.95

16 3.26 16.28 1.02

32 1.63 32.56 1.02

P: number of processors

Linear-algebra-level ApproachLinear-algebra-level Approach

• Parallelize matrix/vector operationsParallelize matrix/vector operations– inner-product of two vectorsinner-product of two vectors

– matrix-vector productmatrix-vector product

– preconditioning - block contribution from subgridspreconditioning - block contribution from subgrids

• Easy to useEasy to use– access to all Diffpack access to all Diffpack v3.0 v3.0 iterative methods, iterative methods,

preconditioners and convergence monitorspreconditioners and convergence monitors

– ““hidden” parallelizationhidden” parallelization

– need only to add a few lines of new codeneed only to add a few lines of new code

– arbitrary choice of number of procs at run-timearbitrary choice of number of procs at run-time

– less flexibility than DDless flexibility than DD

Straightforward ParallelizationStraightforward Parallelization

• Develop a sequential simulator, without Develop a sequential simulator, without paying attention to parallelismpaying attention to parallelism

• Follow the Diffpack coding standardsFollow the Diffpack coding standards

• Need Diffpack add-on libraries for Need Diffpack add-on libraries for parallel computingparallel computing

• Add a few new statements for Add a few new statements for transformation to a parallel simulatortransformation to a parallel simulator

Library ToolLibrary Tool

• class GridPartAdmclass GridPartAdm– Generate overlapping or non-Generate overlapping or non-

overlapping subgridsoverlapping subgrids

– Prepare communication patternsPrepare communication patterns

– Update global valuesUpdate global values

– matvec, innerProd, normmatvec, innerProd, norm

A Simple Coding ExampleA Simple Coding Example

GridPartAdm* adm;GridPartAdm* adm; //// access to parallelizaion functionalityaccess to parallelizaion functionality

LinEqAdm* lineq;LinEqAdm* lineq; //// administrator for linear system & solveradministrator for linear system & solver

// ...// ...

#ifdef PARALLEL_CODE#ifdef PARALLEL_CODE

adm->scan (menu);adm->scan (menu);

adm->prepareSubgrids ();adm->prepareSubgrids ();

adm->prepareCommunication ();adm->prepareCommunication ();

lineq->attachCommAdm (*adm);lineq->attachCommAdm (*adm);

#endif#endif

// ...// ...

lineq->solve ();lineq->solve ();

set subdomain list = DEFAULTset subdomain list = DEFAULT

set global grid = grid1.fileset global grid = grid1.file

set partition-algorithm = METISset partition-algorithm = METIS

set number of overlaps = 0set number of overlaps = 0

Single-phase Groundwater FlowSingle-phase Groundwater Flow

)())(( xfuxK

•Highly unstructured gridHighly unstructured grid•Discontinuity in the coefficientDiscontinuity in the coefficient K K (0.1 & 1)(0.1 & 1)

MeasurementsMeasurements

P # iter Time Speedup

1 480 420.09 N/A

3 660 200.17 2.10

4 691 156.36 2.69

6 522 83.87 5.01

8 541 60.30 6.97

12 586 38.23 10.99

16 564 28.32 14.83

•130,561 degrees of freedom130,561 degrees of freedom•Overlapping subgridsOverlapping subgrids•Global BiCGStab using (block) ILU prec.Global BiCGStab using (block) ILU prec.

Test Case: Vortex-SheddingTest Case: Vortex-Shedding

Simulation SnapshotsSimulation Snapshots

PressurePressure

Animated Pressure FieldAnimated Pressure Field

Some CPU-MeasurementsSome CPU-Measurements

P CPU Speedup Efficiency

1 1418.67 N/A N/A

2 709.79 2.00 1.00

3 503.50 2.82 0.94

4 373.54 3.80 0.95

6 268.38 5.29 0.88

8 216.73 6.55 0.82

The pressure equation is solved by the CG methodThe pressure equation is solved by the CG method

Combined ApproachCombined Approach

• Use a CG-like method as basic solverUse a CG-like method as basic solver(i.e. use a parallelized Diffpack linear solver)(i.e. use a parallelized Diffpack linear solver)

• Use DD as preconditionerUse DD as preconditioner(i.e. (i.e. SimulatorPSimulatorP is invoked as a preconditioner solve) is invoked as a preconditioner solve)

• Combine with coarse grid correctionCombine with coarse grid correction

• CG-like method + DD prec. is normally CG-like method + DD prec. is normally faster than DD as a basic solverfaster than DD as a basic solver

Two-phase Porous Media FlowTwo-phase Porous Media Flow

P Total CPU Subgrid CPU PEQ I CPU SEQ

1 4053.33 241x241 3586.98 3.10 440.58

2 2497.43 129 x 241 2241.78 3.48 241.08

4 1244.29 129 x 129 1101.58 2.97 134.28

8 804.47 129 x 69 725.58 3.93 72.76

16 490.47 69 x 69 447.27 4.13 39.64

psv

Tqps

Tsfvst

)(

,0in ))((

0,in 0))((

PEQ:PEQ:

SEQ:SEQ:

BiCGStab + DD prec. for global pressure eq.BiCGStab + DD prec. for global pressure eq.Multigrid V-cycle in subdomain solvesMultigrid V-cycle in subdomain solves

Two-phase Porous Media FlowTwo-phase Porous Media Flow

History of saturation for water and oilHistory of saturation for water and oil

Nonlinear Water WavesNonlinear Water Waves

• Fully nonlinear 3D water waves• Primary unknowns:• Parallelization based on an existing sequential Diffpack

simulator

wallssolidon 0

surfaceon water 02/)(

surfaceon water 0

olumein water v 0

222

2

n

gzyxt

zyyxxt

,

Nonlinear Water WavesNonlinear Water Waves• CG + DD prec. for global solverCG + DD prec. for global solver• Multigrid V-cycle as subdomain solverMultigrid V-cycle as subdomain solver• Fixed number of subdomains Fixed number of subdomains MM=16 (independent =16 (independent

of of PP))• Subgrids from partition of a global 41x41x41 gridSubgrids from partition of a global 41x41x41 grid

P Execution time Speedup Efficiency

1 1404.44 N/A N/A

2 715.32 1.96 0.98

4 372.79 3.77 0.94

8 183.99 7.63 0.95

16 90.89 15.45 0.97

Nonlinear Water WavesNonlinear Water Waves

3D Poisson equation in water wave simulation3D Poisson equation in water wave simulation

SummarySummary

• Goal: provide software and Goal: provide software and programming rules for easy programming rules for easy parallelization of sequential simulatorsparallelization of sequential simulators

• Two parallelization strategies:Two parallelization strategies:– domain decomposition:domain decomposition: very flexible, compact visible very flexible, compact visible

code/algorithmcode/algorithm– parallelization at the linear algebra level:parallelization at the linear algebra level: “ “automatic” hidden parallelizationautomatic” hidden parallelization

• Performance: satisfactory speed-upPerformance: satisfactory speed-up

Date post:	03-Jan-2016
Category:	Documents
Upload:	kareem-owens
View:	48 times
Download:	1 times

Parallel Solution of Navier Stokes Equations

Documents