Numerical Simulation of3D Fully Nonlinear Waters Waves
on Parallel Computers
Xing CaiXing CaiUniversity of Oslo
PARA'98
Outline of the Talk
Mathematical model
Numerical scheme (sequential)
Parallelization strategy (domain decomposition)
Object-oriented implementation
Numerical experiment
PARA'98
Mathematical Model
Fully nonlinear 3D water waves Primary unknowns:
wallssolidon 0
surfaceon water 02/)(
surfaceon water 0
olumein water v 0
222
2
n
gzyxt
zyyxxt
,
PARA'98
Numerical Scheme
Physical domain:
Transformation: (a fixed domain)
),,( ,),( ),,( )( tyxzHyxzyxt xy
HH
Hzz 1
)(t
0 ,),( ),,( zHyxzyx xy
PARA'98
Numerical Scheme
• Operator splitting• At each time level:
FDM for updating free surface conditions FEM solution of an elliptic boundary value problem in
0)( K
H
HzHHzHz
HzH
HzH
HtzyxK
yxyx
y
x
)()()()(
)(0
)(01
),,,(2222
PARA'98
Preconditioning
Elliptic boundary value problem - most CPU intensive Resulting system of linear equations Preconditiong
bAxbMAxM 11
Gauss-Seidel O(N2)CG+MILU O(N7/6)
CG+MG/DD O(N)
N- number of unknowns
Computational cost
PARA'98
The Question
How to do the parallelization?
Different approaches on different levels: Automatic parallelization Parallelization on the low matrix-vector level Parallelization on the level of simulators
Starting point: an o-o water wave simulator(built in Diffpack: C++ environment for scientific computing)
PARA'98
Parallelization Strategy
Domain Decomposition
• Divide and conquer• Solution of the original large problem through iteratively
solving many smaller subproblems -- solution method or preconditioner
• Flexible -- localized treatment of irregular geometries, singularities etc
• Very efficient numerical methods -- even on sequential computers
• Suitable for coarse grained parallelization
PARA'98
Overlapping Domain Decomposition
Alternating Schwarz method for two subdomains
Example: solving an elliptic boundary value problem
in
A sequence of approximations
where
on
in
gu
fAu
21
nuuu ,, 10
1|
\on
in
121
111
111
nn
n
n
uu
gu
fAu
2|
\on
in
12
222
222
nn
n
n
uu
gu
fAu
PARA'98
Numerical Foundation
Additive Schwarz Method
Subproblems are of the same form as the original large problem, with possibly different boundary conditions on artificial boundaries.
Subproblems can be solved in parallel.
PARA'98
Convergence of the Solution
Example:Solving the Poissonproblem on the unitsquare
PARA'98
Numerical Foundation
Coarse Grid Correction
Important for good DD convergence
Run on each processor, shared with subdomain
simulators on the same processor
PARA'98
Some Observations
Parallel Computing
efficiency relies on the parallelization
Domain Decomposition
suits well for parallel computing
a good parallelization strategy
Object-Oriented Programming Technique flexible and efficient sequential simulators
can be used in subdomain solves -- main ingredient of DD
PARA'98
New Programming Model
A simulator-parallel model
Each processor hosts an arbitrary number of subdomains balance between numerical efficiency and load balancing
One subdomain is assigned a sequential simulator
Flexibility -- different types of grids, linear system solvers, preconditioners, convergence monitors etc. are allowed for different subproblems
Domain decomposition on the level of subdomain simulators!
PARA'98
Simulator-Parallel
Reuse of existing sequential simulators
Data distribution is implied
No need for global data
Needs additional functionalities for exchanging nodal values inside the overlapping region
Needs some global administration
PARA'98
A Generic Programming Framework
An add-on library (SPMD model) Use of object-oriented programming technique Flexibility and portability Simplified parallelization process for end-user
PARA'98
The Administrator
Parameter Interfacesolution method or preconditioner, max iterations, stopping criterion etc
DD algorithm Interfaceaccess to predifined numerical algorithm e.g. CG
Operation Interface (standard codes & UDC)access to subdomain simulators, matrix-vector product, inner product etc
PARA'98
The Subdomain Simulator
Subdomain Simulator -- a generic representation C++ class hierarchy Interface of generic member functions
PARA'98
Adaptation of Sequential Simulator
Class SubdomainSimulator - generic representation of a sequential simulator.
Class SubdomainFEMSolver - generic representation of a sequential simulator using FEM.
A new sequential wave simulator that fits in the framework is
readily extended from the
existing sequential simulator,
also being a subclass of
SubdomainFEMSolver.
SubdomainSimulator
SubdomainFEMSolver WaveSimulator
NewWSimulator
PARA'98
Performance
Algorithmic efficiency efficiency of original sequential simulator(s) efficiency of domain decomposition method
Parallel efficiency communication overhead (low) coarse grid correction overhead (normally low) synchronization overhead load balancing
subproblem size work on subdomain solves
PARA'98
Parallel Simulation of Waves
PARA'98
Parallel Efficiency
Fixed number of subdomains M=16. Subdomain grids from partition of a global 41x41x41 grid. Simulation over 32 time steps. DD as preconditioner of CG for the Laplace eq. Multigrid V-cycle as subdomain solver.
P Execution time Speedup Efficiency
1 1404.44 N/A N/A
2 715.32 1.96 0.98
4 372.79 3.77 0.94
8 183.99 7.63 0.95
16 90.89 15.45 0.97
PARA'98
Overall Efficiency
Number of subdomains equal to number of processors
P/M Execution time Subgrid Iterations
1 642.14 68921 7.69
2 597.47 38663 9.00*
4 265.62 21689 13.59
8 172.23 12259 17.25
16 90.89 6929 16.56
*For P=2 parallel BiCGStab is used.
PARA'98
Summary
Efficient solution of elliptic boundary value problems
Parallelization based on DD
Introduction of a simulator-parallel model
A generic framework for implementation
http:www.nobjects.com