Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | griffin-collins |
View: | 216 times |
Download: | 0 times |
HPC Middleware (HPC-MW)HPC Middleware (HPC-MW)Infrastructure for Scientific Applications on Infrastructure for Scientific Applications on HPC EnvironmentsHPC Environments
Overview and Recent ProgressOverview and Recent Progress
Kengo Nakajima, RIST.
3rd ACES WG Meeting, June 5th & 6th, 2003.Brisbane, QLD, Australia,
2
3rd ACES WG mtg. June 2003.
My Talk
HPC-MW (35 min.) Hashimoto/Matsu’ura Code on ES (5 min.)
3
3rd ACES WG mtg. June 2003.
Table of Contents Background Basic Strategy Development On-Going Works/Collaborations XML-based System for User-Defined Data Structure
Future Works Examples
4
3rd ACES WG mtg. June 2003.
Frontier Simulation Software for Industrial Science (FSIS)http://www.fsis.iis.u-tokyo.ac.jp
Part of "IT Project" in MEXT (Minitstry of Education, Culture, Sports, Science & Technology) HQ at Institute of Industrial Science, Univ. Tokyo 5 years, 12M USD/yr. (decreasing ...) 7 internal projects, > 100 people involved.
Focused on Industry/Public Use Commercialization
5
3rd ACES WG mtg. June 2003.
Frontier Simulation Software for Industrial Science (FSIS) (cont.)http://www.fsis.iis.u-tokyo.ac.jp
Quantum Chemistry Quantum Molecular Interaction System Nano-Scale Device Simulation Fluid Dynamics Simulation Structural Analysis Problem Solving Environment (PSE) High-Performance Computing Middleware (HPC-MW)
6
3rd ACES WG mtg. June 2003.
Background Various Types of HPC Platforms Parallel Computers PC Clusters MPP with Distributed Memory SMP Cluster (8-way, 16-way, 256-way): ASCI, Earth Simulator
Power, HP-RISC, Alpha/Itanium, Pentium, Vector PE GRID Environment -> Various Resources Parallel/Single PE Optimization is important !! Portability under GRID environment Machine-dependent optimization/tuning. Everyone knows that ... but it's a big task especially for application experts, scientists.
7
3rd ACES WG mtg. June 2003.
Background Various Types of HPC Platforms Parallel Computers PC Clusters MPP with Distributed Memory SMP Cluster (8-way, 16-way, 256-way): ASCI, Earth Simulator
Power, HP-RISC, Alpha/Itanium, Pentium, Vector PE GRID Environment -> Various Resources Parallel/Single PE Optimization is important !! Portability under GRID environment Machine-dependent optimization/tuning. Everyone knows that ... but it's a big task especially for application experts, scientists.
8
3rd ACES WG mtg. June 2003.
Reordering for SMP Cluster with Vector PEs: ILU Factorization
PDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-ordering
9
3rd ACES WG mtg. June 2003.
3D Elastic SimulationProblem Size~GFLOPSEarth Simulator/SMP node (8 PEs)
● : PDJDS/CM-RCM ,■: PCRS/CM-RCM ,▲: Natural Ordering
1.E-02
1.E-01
1.E+00
1.E+01
1.E+02
1.E+04 1.E+05 1.E+06 1.E+07
DOF
GF
LO
PS
10
3rd ACES WG mtg. June 2003.
3D Elastic SimulationProblem Size~GFLOPSIntel Xeon 2.8 GHz, 8 PEs
● : PDJDS/CM-RCM ,■: PCRS/CM-RCM ,▲: Natural Ordering
1.50
2.00
2.50
3.00
1.E+04 1.E+05 1.E+06 1.E+07
DOF
GF
LO
PS
11
3rd ACES WG mtg. June 2003.
Parallel Volume RenderingMHD Simulation of Outer Core
12
3rd ACES WG mtg. June 2003.
Volume Rendering Moduleusing Voxels
On PC Cluster
-Hierarchical Background Voxels-Linked-List
-Globally Fine Voxels-Static-Array
On Earth Simulator
13
3rd ACES WG mtg. June 2003.
Background (cont.) Simulation methods such as FEM, FDM etc. have
several typical processes for computation.
14
3rd ACES WG mtg. June 2003.
"Parallel" FEM Procedure
Initial Grid DataInitial Grid Data
PartitioningPartitioning
Post ProcPost Proc..Data Input/OutputData Input/Output
Domain Specific Domain Specific Algorithms/ModelsAlgorithms/Models
Matrix AssembleMatrix Assemble
Linear SolversLinear Solvers
VisualizationVisualization
Pre-ProcessingPre-Processing MainMain Post-ProcessingPost-Processing
15
3rd ACES WG mtg. June 2003.
Background (cont.) Simulation methods such as FEM, FDM etc. have several typical processes for computation. How about "hiding" these process from users by Middleware between applications and compilers ?
Development: efficient, reliable, portable, easy-to-maintain accelerates advancement of the applications (= physics) HPC-MW = Middleware close to "Application Layer"HPC-MW = Middleware close to "Application Layer"
Applications
Compilers, MPI etc.
Big Gap
Applications
Compilers, MPI etc.
Middleware
16
3rd ACES WG mtg. June 2003.
Example of HPC Middleware Simulation Methods include Some Typical Processes
I/O
Matrix Assemble
Linear Solver
Visualization
FEM
17
3rd ACES WG mtg. June 2003.
Example of HPC Middleware Individual Process can be optimized for Various Types of MPP Architectures
I/O
Matrix Assemble
Linear Solver
Visualization
FEM MPP-A
MPP-B
MPP-C
18
3rd ACES WG mtg. June 2003.
Example of HPC MiddlewareLibrary-Type HPC-MW for Existing HW
FEM code developed on PC
19
3rd ACES WG mtg. June 2003.
Example of HPC MiddlewareLibrary-Type HPC-MW for Existing HW
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Xeon Cluster
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Earth Simulator
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Hitachi SR8000
FEM code developed on PC
20
3rd ACES WG mtg. June 2003.
Example of HPC MiddlewareLibrary-Type HPC-MW for Existing HW
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Xeon Cluster
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Earth Simulator
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Hitachi SR8000
FEM code developed on PCI/F forVis.
I/F forSolvers
I/F forMat.Ass.
I/F forI/O
21
3rd ACES WG mtg. June 2003.
Example of HPC MiddlewareParallel FEM Code Optimized for ES
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Xeon Cluster
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Earth Simulator
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Hitachi SR8000
FEM code developed on PCI/F forVis.
I/F forSolvers
I/F forMat.Ass.
I/F forI/O
22
3rd ACES WG mtg. June 2003.
Example of HPC MiddlewareParallel FEM Code Optimized for Intel Xeon
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Xeon Cluster
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Earth Simulator
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Hitachi SR8000
FEM code developed on PCI/F forVis.
I/F forSolvers
I/F forMat.Ass.
I/F forI/O
23
3rd ACES WG mtg. June 2003.
Example of HPC MiddlewareParallel FEM Code Optimized for SR8000
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Xeon Cluster
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Earth Simulator
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Hitachi SR8000
FEM code developed on PCI/F forVis.
I/F forSolvers
I/F forMat.Ass.
I/F forI/O
24
3rd ACES WG mtg. June 2003.
Background Basic Strategy Development On-Going Works/Collaborations XML-based System for User-Defined Data Structure Future Works Examples
25
3rd ACES WG mtg. June 2003.
HPC Middleware( HPC-MW) ? Based on idea of "Plug-in" in GeoFEM.
26
3rd ACES WG mtg. June 2003.
System Config. of GeoFEM
Visualization dataGPPView
One-domain mesh
Utilities Pluggable Analysis Modules
PEs
Partitioner
Equationsolvers
VisualizerParallelI/O
構造計算(Static linear)構造計算(Dynamic linear)構造計算(
Contact)
Partitioned mesh
PlatformSolverI/ F
Comm.I/ F
Vis.I/ F
Structure
FluidWave
Visualization dataGPPView
One-domain mesh
Utilities Pluggable Analysis Modules
PEs
Partitioner
Equationsolvers
VisualizerParallelI/O
構造計算(Static linear)構造計算(Dynamic linear)構造計算(
Contact)
Partitioned mesh
PlatformSolverI/ F
Comm.I/ F
Vis.I/ F
Structure
FluidWave
http://geofem.tokyo.rist.or.jp/
27
3rd ACES WG mtg. June 2003.
Local Data StructureNode-based Partitioninginternal nodes-elements-external nodes
1 2 3 4 5
21 22 23 24 25
1617 18 19
20
1112 13 14
15
67 8 9
10
1 2 3
4 5
6 7
8 9 11
10
14 13
15
12
PE#0
7 8 9 10
4 5 6 12
3111
2
PE#1
7 1 2 3
10 9 11 12
568
4
PE#2
34
8
69
10 12
1 2
5
11
7
PE#3
1 2 3 4 5
21 22 23 24 25
1617 18 19
20
1112 13 14
15
67 8 9
10
PE#0PE#1
PE#2PE#3
1 2 3 4 5
21 22 23 24 25
1617 18 19
20
1112 13 14
15
67 8 9
10
1 2 3 4 5
21 22 23 24 25
1617 18 19
20
1112 13 14
15
67 8 9
10
1 2 3 4 5
21 22 23 24 25
1617 18 19
20
1112 13 14
15
67 8 9
10
1 2 3
4 5
6 7
8 9 11
10
14 13
15
12
PE#0
7 8 9 10
4 5 6 12
3111
2
PE#1
7 1 2 3
10 9 11 12
568
4
PE#2
34
8
69
10 12
1 2
5
11
7
PE#3
1 2 3
4 5
6 7
8 9 11
10
14 13
15
12
PE#0
1 2 3
4 5
6 7
8 9 11
10
14 13
15
12
1 2 3
4 5
6 7
8 9 11
10
14 13
15
12
PE#0
7 8 9 10
4 5 6 12
3111
2
PE#1
7 8 9 10
4 5 6 12
3111
2
7 8 9 10
4 5 6 12
3111
2
PE#1
7 1 2 3
10 9 11 12
568
4
PE#27 1 2 3
10 9 11 12
568
4
7 1 2 3
10 9 11 12
568
4
PE#2
34
8
69
10 12
1 2
5
11
7
PE#3
34
8
69
10 12
1 2
5
11
7
34
8
69
10 12
1 2
5
11
7
PE#3
1 2 3 4 5
21 22 23 24 25
1617 18 19
20
1112 13 14
15
67 8 9
10
PE#0PE#1
PE#2PE#3
1 2 3 4 5
21 22 23 24 25
1617 18 19
20
1112 13 14
15
67 8 9
10
PE#0PE#1
PE#2PE#3
1 2 3 4 5
21 22 23 24 25
1617 18 19
20
1112 13 14
15
67 8 9
10
1 2 3 4 5
21 22 23 24 25
1617 18 19
20
1112 13 14
15
67 8 9
10
28
3rd ACES WG mtg. June 2003.
What can we do by HPC-MW ? We can develop optimized/parallel code easily on HPC-MW from user's code developed on PC. Library-Type most fundamental approach optimized library for individual architecture Compiler-Type Next Generation Architecture Irregular Data Network-Type GRID Environment (heterogeneous) Large-Scale Computing (Virtual Petaflops), Coupling.
29
3rd ACES WG mtg. June 2003.
HPC-MWProcedure
CNTL DataFEM
Exec. FileParallel FEM
DistributedLocal Results
Patch Filefor Visualization
Image Filefor Visualization
DistributedLocal Mesh Data
Utility Software forParallel Mesh Generation
& Partitioning
Initial EntireMesh Data
CNTL Datamesh generation
partitioning
Library-typeHPC-MW
FEM Codes
線形ソルバソースコード線形ソルバ
ソースコード線形ソルバソースコード線形ソルバ
ソースコードNetwork-Type
HPC-MWSource Code
線形ソルバソースコード線形ソルバ
ソースコード線形ソルバソースコード線形ソルバ
ソースコードCompiler-Type
HPC-MWSource Code
HW Data
HPC-MWPrototype
Source FIle
FEM Source Codeby Users
MPICH, MPICH-G
DistributedLocal Mesh Data
Compiler-TypeHPC-MW Generator
Network-TypeHPC-MW Generator
NetworkHW Data
GEN.
GEN.
LINK
30
3rd ACES WG mtg. June 2003.
Library-Type HPC-MWParallel FEM Code Optimized for ES
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Xeon Cluster
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Earth Simulator
Vis.LinearSolver
MatrixAssembleI/O
HPC-MW for Hitachi SR8000
FEM code developed on PCI/F forVis.
I/F forSolvers
I/F forMat.Ass.
I/F forI/O
31
3rd ACES WG mtg. June 2003.
What can we do by HPC-MW ? We can develop optimized/parallel code easily on HPC-MW from user's code developed on PC. Library-Type most fundamental approach optimized library for individual architecture Compiler-Type Next Generation Architecture Irregular Data Network-Type GRID Environment, Large-Scale Computing (Virtual Petaflops), Coupling.
32
3rd ACES WG mtg. June 2003.
Compiler-Type HPC-MW
FEM
I/O
Matrix Assemble
Linear Solver
Visualization
MPP-A
MPP-B
MPP-C
Data for Analysis Model
Parametersof H/W
SpecialCompiler
I/O
Matrix Assemble
Linear Solver
Visualization
Optimized code is generated by special language/ compiler based on analysis data (cache blocking etc.) and H/W information.
33
3rd ACES WG mtg. June 2003.
What can we do by HPC-MW ? We can develop optimized/parallel code easily on HPC-MW from user's code developed on PC. Library-Type most fundamental approach optimized library for individual architecture Compiler-Type Next Generation Architecture Irregular Data Network-Type GRID Environment (heterogeneous). Large-Scale Computing (Virtual Petaflops), Coupling.
34
3rd ACES WG mtg. June 2003.
Network-Type HPC-MWHeterogeneous Environment"Virtual" Supercomputer
FEM
analysismodelspace
analysismodelspace
35
3rd ACES WG mtg. June 2003.
What is new, What is nice ? Application Oriented (limited to FEM at this stage) Various types of capabilities for parallel FEM are supported. NOTNOT just a library Optimized for Individual Hardware Single Performance Parallel Performance Similar Projects Cactus Grid Lab (on Cactus)
36
3rd ACES WG mtg. June 2003.
Background Basic Strategy Development On-Going Works/Collaborations XML-based System for User-Defined Data Structure Future Works Examples
37
3rd ACES WG mtg. June 2003.
Schedule
Public Release
FY.2002 FY.2003 FY.2004 FY.2005 FY.2006
Basic Design
Prototype
Library-typeHPC-MW
Compiler-typeNetwork-type
HPC-MW
FEM Codeson HPC-MW
Scalar Vector
Compiler Network
38
3rd ACES WG mtg. June 2003.
System for Development
HPC-MW
Public Users
HW Vendors
FEM Code on HPC-MW
Infrastructure Feed-backAppl. Info.
HW Info.
Library-Type Compiler-Type Network-Type
I/O Vis. Solvers Coupler
AMR DLB Mat.Ass.
Solid Fluid Thermal
Public Release
Feed-back, Comments
39
3rd ACES WG mtg. June 2003.
FY. 2003
Library-Type HPC-MW FORTRAN90,C, PC-Cluster Version Public Release
Sept. 2003: Prototype Released March 2004: Full version for PC Cluster
Demonstration on Network-Type HPC-MW in SC2003, Phoenix, AZ, Nov. 2003. Evaluation by FEM Code
40
3rd ACES WG mtg. June 2003.
Library-Type HPC-MWMesh Generation is not considered Parallel I/O : I/F for commercial code (NASTRAN etc.) Adaptive Mesh Refinement (AMR) Dynamic Load-Balancing using pMETIS (DLB) Parallel Visualization Linear Solvers ( GeoFEM + AMG, SAI ) FEM Operations (Connectivity, Matrix Assembling) Coupling I/F Utility for Mesh Partitioning On-line Tutorial
41
3rd ACES WG mtg. June 2003.
AMR+DLB
42
3rd ACES WG mtg. June 2003.
Parallel Visualization
Scalar Field Vector Field Tensor Field
Surface rendering
Interval volume-fitting
Volume rendering
Streamlines
Particle tracking
Topological map
LIC
Volume rendering
Hyperstreamlines
Extension of functions
Extension of dimensions
Extension of Data Types
43
3rd ACES WG mtg. June 2003.
PMR ( Parallel Mesh Relocator ) Data Size is Potentially Very Large in Parallel
Computing. Handling entire mesh is impossible. Parallel Mesh Generation and Visualization are
difficult due to Requirement for Global Info. Adaptive Mesh Refinement (AMR) and Grid
Hierarchy.
44
3rd ACES WG mtg. June 2003.
Parallel Mesh Generationusing AMR
Initial Mesh
Prepare Initial Mesh with size as large as single PE can handle.
45
3rd ACES WG mtg. June 2003.
Parallel Mesh Generationusing AMR
Partition LocalData
LocalData
LocalData
LocalData
Initial Mesh
Partition the Initial Mesh into Local Data. potentially very coarse
46
3rd ACES WG mtg. June 2003.
Parallel Mesh Generationusing AMR
Partition LocalData
LocalData
LocalData
LocalData
Initial Mesh
PMR
Parallel Mesh Relocation (PMR) by Local Refinement on Each PEs.
47
3rd ACES WG mtg. June 2003.
Parallel Mesh Generationusing AMR
Partition LocalData
LocalData
LocalData
LocalData
Initial Mesh
Refine
Refine
Refine
Refine
PMR
Parallel Mesh Relocation (PMR) by Local Refinement on Each PEs.
48
3rd ACES WG mtg. June 2003.
Parallel Mesh Generationusing AMR
Partition LocalData
LocalData
LocalData
LocalData
Initial Mesh
Refine
Refine
Refine
Refine
Refine
Refine
Refine
Refine
PMR
Parallel Mesh Relocation (PMR) by Local Refinement on Each PEs.
49
3rd ACES WG mtg. June 2003.
Parallel Mesh Generationusing AMR
Partition LocalData
LocalData
LocalData
LocalData
Initial Mesh
Refine
Refine
Refine
Refine
Refine
Refine
Refine
Refine
PMR
Hierarchical Refinement History can be utilized for visualization & multigrid
50
3rd ACES WG mtg. June 2003.
Parallel Mesh Generationusing AMR
mapping results reversely from local fine mesh to initial coarse meshInitial Mesh
Hierarchical Refinement History can be utilized for visualization & multigrid
51
3rd ACES WG mtg. June 2003.
Parallel Mesh Generationusing AMR
mapping results reversely from local fine mesh to initial coarse meshInitial Mesh
Visualization is possible on Initial Coarse Mesh using Single PE.various existing software can be used.
52
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation and Visualization using AMRSummary
Initial Coarse Mesh(Single)
53
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation and Visualization using AMRSummary
PartitionPMR
Initial Coarse Mesh(Single)
Local Fine Mesh(Distributed)
54
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation and Visualization using AMRSummary
PartitionPMR
Initial Coarse Mesh(Single)
Local Fine Mesh(Distributed)
Initial Coarse Mesh
ReverseMapping
Second tothe Coarsest
55
3rd ACES WG mtg. June 2003.
Example (1/3) Initial Entire Mesh
node : 1,308elem : 795
56
3rd ACES WG mtg. June 2003.
Example (2/3) Initial Partitioning
224
184
188
207
203
222
202
194
nodes elementsPE
1
2
3
4
5
6
7
0 100
99
100
100
99
99
99
99
57
3rd ACES WG mtg. June 2003.
Example (3/3) Refinemenet
58
3rd ACES WG mtg. June 2003.
Parallel Linear Solvers
GFLOPS rate Parallel Work Ratio
0
1000
2000
3000
4000
0 16 32 48 64 80 96 112 128 144 160 176 192
NODE#
GF
LO
PS
40
50
60
70
80
90
100
0 16 32 48 64 80 96 112 128 144 160 176 192
NODE#
Pa
rall
el
Wo
rk R
ati
o:
%
●Flat MPI
○Hybrid
On the Earth Simulator : 176 node , 3.8 TFLOPS
59
3rd ACES WG mtg. June 2003.
Coupler
Fluid
Structure
Coupler
Coupler= MAIN( MpCCI )
Fluid
Structure
Coupler
Fluid= MAINStructure is called from Fluid as a subroutine trough Couper.
60
3rd ACES WG mtg. June 2003.
Coupler
Fluid
Structure
Coupler
Fluid= MAINStructure is called from Fluid as a subroutine trough Couper.
module hpcmw_mesh type hpcmw_local_mesh ... end type hpcmw_local_mesh end module hpcmw_mesh
FLUID program fluid
use hpcmw_mesh type (hpcmw_local_mesh) :: local_mesh_b
... call hpcmw_couple_PtoS_put call structure_main call hpcmw_couple_StoP_get ... end program fluid
STRUCTURE subroutine structure_main
use hpcmw_mesh type (hpcmw_local_mesh) :: local_mesh_b
call hpcmw_couple_PtoS_get ... call hpcmw_couple_StoP_put
end subroutine structure_main
61
3rd ACES WG mtg. June 2003.
Coupler
Fluid
Structure
Coupler
Fluid= MAINStructure is called from Fluid as a subroutine trough Couper.
module hpcmw_mesh type hpcmw_local_mesh ... end type hpcmw_local_mesh end module hpcmw_mesh
FLUID program fluid
use hpcmw_mesh type (hpcmw_local_mesh) :: local_mesh_b
... call hpcmw_couple_PtoS_put call structure_main call hpcmw_couple_StoP_get ... end program fluid
STRUCTURE subroutine structure_main
use hpcmw_mesh type (hpcmw_local_mesh) :: local_mesh_b
call hpcmw_couple_PtoS_get ... call hpcmw_couple_StoP_put
end subroutine structure_main
Different MeshFiles
62
3rd ACES WG mtg. June 2003.
Coupler
Fluid
Structure
Coupler
Fluid= MAINStructure is called from Fluid as a subroutine trough Couper.
module hpcmw_mesh type hpcmw_local_mesh ... end type hpcmw_local_mesh end module hpcmw_mesh
FLUID program fluid
use hpcmw_mesh type (hpcmw_local_mesh) :: local_mesh_b
... call hpcmw_couple_PtoS_put call structure_main call hpcmw_couple_StoP_get ... end program fluid
STRUCTURE subroutine structure_main
use hpcmw_mesh type (hpcmw_local_mesh) :: local_mesh_b
call hpcmw_couple_PtoS_get ... call hpcmw_couple_StoP_put
end subroutine structure_main
Communication !!
Communication !!
63
3rd ACES WG mtg. June 2003.
FEM Codes on HPC-MW Primary Target: Evaluation of HPC-MW itself ! Solid Mechanics
Elastic, Inelastic Static, Dynamic Various types of elements, boundary conditions.
Eigenvalue Analysis Compressible/Incompressible CFD Heat Transfer with Radiation & Phase Change
64
3rd ACES WG mtg. June 2003.
Release in Late September 2003 Library-Type HPC-MW
Parallel I/O Original Data Structure, GeoFEM, ABAQUS
Parallel Visualization PVR , PSR
Parallel Linear Solvers Preconditioned Iterative Solvers (ILU, SAI)
Utility for Mesh Partitioning Serial Partitioner , Viewer
On-line Tutorial
FEM Code for Linear-Elastic Simulation (prototype)
65
3rd ACES WG mtg. June 2003.
Technical Issues Common Data Structure
Flexibility vs. Efficiency Our data structure is efficient ... How to keep user's original data structure
Interface to Other Toolkits PETSc (ANL), Aztec/Trillinos (Sandia) ACTS Toolkit (LBNL/DOE) DRAMA (NEC Europe), Zoltan (Sandia)
66
3rd ACES WG mtg. June 2003.
Background Basic Strategy Development On-Going Works/Collaborations XML-based System for User-Defined Data Structure Future Works Examples
67
3rd ACES WG mtg. June 2003.
Public Use/Commercialization
Very important issues in this project.
Industry Education Research
Commercial Codes
68
3rd ACES WG mtg. June 2003.
Strategy for Public Use
General Purpose Parallel FEM Code Environment for Development (1): for Legacy Code
"Parallelization" F77->F90 , COMMON -> Module Parallel Data Structure, Linear Solvers, Visualization
Environment for Development (2): from Scratch Education
Various type of collaboration
69
3rd ACES WG mtg. June 2003.
On-going Collaboration Parallelization of Legacy Codes
CFD Grp. in FSIS project Mitsubishi Material: Groundwater Flow JNC (Japan Nuclear Cycle Development Inst.): HLW others: research, educations.
Part of HPC-MW Coupling Interface for Pump Simulation Parallel Visualization: Takashi Furumura (ERI/U.Tokyo)
70
3rd ACES WG mtg. June 2003.
On-going Collaboration Environment for Development
ACcESS ( Australian Computational Earth Systems Simulator ) Group
Research Collaboration ITBL/JAERI DOE ACTS Toolkit ( Lawrence Berkeley National Laboratory ) NEC Europe ( Dynamic Load Balancing ) ACES/iSERVO GRID
71
3rd ACES WG mtg. June 2003.
(Fluid+Vibrarion) Simulationfor Boiler Pump Hitachi
Suppression of Noise
Collaboration in FSIS project Fluid Structure PSE HPC-MW: Coupling Interface
Experiment, Measurement accelerometers on surface
72
3rd ACES WG mtg. June 2003.
(Fluid+Vibrarion) Simulationfor Boiler Pump
Hitachi Suppression of Noise
Collaboration in FSIS project Fluid Structure PSE HPC-MW: Coupling Interface
Experiment, Measurement accelerometers on surface
Fluid
Structure
Coupler
73
3rd ACES WG mtg. June 2003.
Parallelization of Legacy Codes Many cases !! Under Investigation through real collaboration
Optimum Procedure: Document, I/F, Work Assignment FEM: suitable for this type of procedure
Works to introduce new matrix storage manner for parallel iterative solvers in HPC-MW. to add subroutine calls for using parallel visualization functions in HPC-MW. to change data-structure is a big issue !!
flexibility vs. efficiency
problem specific/general
74
3rd ACES WG mtg. June 2003.
Element Connectivity In HPC-MW
do icel= 1, ICELTOT iS= elem_index(icel-1) in1= elem_ptr(iS+1) in2= elem_ptr(iS+2) in3= elem_ptr(iS+3)enddo
Sometimes...
do icel= 1, ICELTOT in1= elem_node(1,icel) in2= elem_node(2,icel) in3= elem_node(3,icel)enddo
75
3rd ACES WG mtg. June 2003.
Parallelization of Legacy Codes
Input
Mat. Conn.
Mat. Assem.
Linear Solver
Visualization
Output
FEM
Original Code
Input
Mat. Conn.
Mat. Assem.
Linear Solver
Visualization
Output
HPC-MWoptimized
Comm.
Input
Mat. Conn.
Mat. Assem.
Linear Solver
Visualization
Output
FEM
Parallel Code
Comm.
?
?
?
?
not optimumnot optimumbut easy to do.but easy to do.
Input
Mat. Conn.
Mat. Assem.
Linear Solver
Visualization
Output
FEM
Parallel Code
Comm.
76
3rd ACES WG mtg. June 2003.
Works with CFD grp. in FSIS Original CFD Code
3D Finite-Volume, Serial, Fortran90 Strategy
use Poisson solver in HPC-MW keep ORIGINAL data structure: We (HPC-MW) developed new partitioner. CFD people do matrix assembling using HPC-MW's format: Matrix assembling part is intrinsically parallel
Schedule April 3, 2003: 1st meeting, overview. April 24, 2003: 2nd meeting, decision of strategy May 2, 2003: show I/F for Poisson solver May 12, 2003: completed new partitioner
77
3rd ACES WG mtg. June 2003.
CFD Code: 2 types of comm.(1) Inter-Domain Communication
78
3rd ACES WG mtg. June 2003.
CFD Code: 2 types of comm.(1) Inter-Domain Communication
79
3rd ACES WG mtg. June 2003.
CFD Code: 2 types of comm.(2) Wall-Law Communication
80
3rd ACES WG mtg. June 2003.
CFD Code: 2 types of comm.(2) Wall-Law Communication
81
3rd ACES WG mtg. June 2003.
(Common) Data Structure
Users like to keep their original data structure. But they wan to parallelize the code. Compromise at this stage
keep original data structure We (I, more precisely) develop partitioners for individual users (1day work after I understand the data structure).
82
3rd ACES WG mtg. June 2003.
Domain Partitioning Util. for User’s Original Data Structure very important for parallelization of legacy codes
Functions Input: Initial Entire Mesh in ORIGINAL Format Output: Distributed Local Meshes in ORIGINAL Format Communication Information (Separate File)
Merit Original I/O routines can be utilized. Operations for communication are hidden.
Technical Issues Basically, individual support (developing) ... BIG WORK. Various data structures for individual user code. Problem specific information.
83
3rd ACES WG mtg. June 2003.
Mesh Partitioningfor Original Data Structure
cntl.
mesh
Initial Entire Data
Partitioner
Local Distributed Data
commoncntl.
meshdistributed
Comm. Info.
commoncntl.
meshdistributed
Comm. Info.
commoncntl.
meshdistributed
Comm. Info.
commoncntl.
meshdistributed
Comm. Info.
• Original I/O for local distributed data.
• I/F for Comm.Info. provided by HPC-MW
84
3rd ACES WG mtg. June 2003.
Distributed Mesh + Comm. Info.Local Distributed Data
commoncntl.
meshdistributed
Original InputSubroutines
Comm. Info. HPCMW_COMM_INIT
This part is hidden except "CALL HPCMW_COMM_INIT".
85
3rd ACES WG mtg. June 2003.
Background Basic Strategy Development On-Going Works/Collaborations XML-based System for User-Defined Data Structure Future Works Examples
86
3rd ACES WG mtg. June 2003.
XML-based I/O Func. Generator K.Sakane (RIST) 8th-JSCES Conf., 2003.
User's original data structure can be described by certain XML-based definition information.
Generate I/O subroutines (C, F90) for partitioning utilities according to XML-based definition information.
Substitute existing I/O subroutines for partitioning utilities in HPC-MW with generated codes.
87
3rd ACES WG mtg. June 2003.
XML-based I/O Func. Generator K.Sakane (RIST) 8th-JSCES Conf., 2003.
Utility partitioning reads initial entire mesh in HPC-MW format and writes distributed local mesh files in HPC-MW format with communication information.
Utility forPartitioning
I/O for Mesh DataHPCMW
initial mesh
HPCMWlocal mesh comm
HPCMWlocal mesh comm
HPCMWlocal mesh comm
HPCMWlocal mesh comm
This is ideal for us...But all of the users are not necessarilyhappy with that...
88
3rd ACES WG mtg. June 2003.
XML-based I/O Func. Generator K.Sakane (RIST) 8th-JSCES Conf., 2003.
Generate I/O subroutines (C, F90) for partitioning utilities according to XML-based definition information.
Utility forPartitioning
I/O for Mesh Data
XML-based SystemI/O Func. Generator
for Orig. Data Structure
Definition Datain XML
I/O for Mesh DataUser-Def. Data
Structure
89
3rd ACES WG mtg. June 2003.
XML-based I/O Func. Generator K.Sakane (RIST) 8th-JSCES Conf., 2003.
Substitute existing I/O subroutines for partitioning utilities in HPC-MW with generated codes.
Utility forPartitioning
I/O for Mesh Data
XML-based SystemI/O Func. Generator
for Orig. Data Structure
Definition Datain XML
I/O for Mesh DataUser-Def. Data
Structure
90
3rd ACES WG mtg. June 2003.
XML-based I/O Func. Generator K.Sakane (RIST) 8th-JSCES Conf., 2003.
Substitute existing I/O subroutines for partitioning utilities in HPC-MW with generated codes.
Utility forPartitioning
I/O for Mesh DataUser-Def. Data
Structure
User Definedinitial mesh
comm
comm
comm
comm
User Definedlocal mesh
HPCMWlocal mesh
HPCMWlocal mesh
HPCMWlocal mesh
91
3rd ACES WG mtg. June 2003.
TAGS in Definition Info. File
usermesh starting pointparameter parameter definitiondefine sub-structure definition
token smallest definition unit (number, label etc.)
mesh entire structure definitionref reference of sub-structure
92
3rd ACES WG mtg. June 2003.
Example
NODE 1 0. 0. 0.NODE 2 1. 0. 0.NODE 3 0. 1. 0.NODE 4 0. 0. 1.ELEMENT 1 TETRA 1 2 3 4
TetrahedronConnectivity
93
3rd ACES WG mtg. June 2003.
Example (ABAQUS, NASTRAN)
** ABAQUS format*NODE1, 0., 0., 0.2, 1., 0., 0.3, 0., 1., 0.4, 0., 0., 1.*ELEMENT, TYPE=C3D41 1 2 3 4
$ NASTRAN formatGRID 1 0 0. 0. 0. 0GRID 2 0 1. 0. 0. 0GRID 3 0 0. 1. 0. 0GRID 4 0 0. 0. 1. 0CTETRA 1 1 1 2 3 4
94
3rd ACES WG mtg. June 2003.
Ex.: Definition Info. File (1/2)
<?xml version="1.0" encoding="EUC-JP"?><usermesh>
<parameter name=“TETRA">4</parameter><parameter name=“PENTA">5</parameter><parameter name=“HEXA">8</parameter>
<define name="mynode"> <token means="node.start">NODE</token> <token means="node.id"/> <token means="node.x"/> <token means="node.y"/> <token means="node.z"/> <token means="node.end"/></define>
<define name="mynode"> <token means="node.start">NODE</token> <token means="node.id"/> <token means="node.x"/> <token means="node.y"/> <token means="node.z"/> <token means="node.end"/></define>
Sub-structure
<parameter name=“TETRA">4</parameter><parameter name=“PENTA">5</parameter><parameter name=“HEXA">8</parameter>
Parameters
User can definethese ...#User format
NODE 1 0. 0. 0.
!! HPC-MW format !NODE node.start 1, 0.0, 0.0, 0.0
node.id node.x node.y node.z
95
3rd ACES WG mtg. June 2003.
Ex.: Definition Info. File (2/2)
<define name="myelement"> <token means="element.start">ELEMENT</token> <token means="element.id"/> <token means="element.type"/> <token means="element.node" times="$element.type"/> <token means="element.end"/></define>
<mesh> <ref name="mynode"/> <ref name="myelement"/></mesh>
</usermesh>
<define name="myelement"> <token means="element.start">ELEMENT</token> <token means="element.id"/> <token means="element.type"/> <token means="element.node" times="$element.type"/> <token means="element.end"/></define>
Sub-Structure
<mesh> <ref name="mynode"/> <ref name="myelement"/></mesh>
Entire Structure !! HPC-MW format !ELEMENT, TYPE=311 1, 1, 2, 3, 4 element.type
element.start element.id element.node
#User formatELEMENT 1 TETRA 1 2 3 4
Corresponding to parameter in "element.type
"
96
3rd ACES WG mtg. June 2003.
Background Basic Strategy Development On-Going Works/Collaborations XML-based System for User-Defined Data Structure Future Works Examples
97
3rd ACES WG mtg. June 2003.
Further Study/Works
Develop HPC-MW Collaboration Simplified I/F for Non-Experts Interaction is important !!: Procedure for Collaboration
98
3rd ACES WG mtg. June 2003.
System for Development
HPC-MW
Public Users
HW Vendors
FEM Code on HPC-MW
Infrastructure Feed-backAppl. Info.
HW Info.
Library-Type Compiler-Type Network-Type
I/O Vis. Solvers Coupler
AMR DLB Mat.Ass.
Solid Fluid Thermal
Public Release
Feed-back, Comments
99
3rd ACES WG mtg. June 2003.
Further Study/Works Develop HPC-MW
Collaboration Simplified I/F for Non-Experts Interaction is important !!: Procedure for Collaboration Extension to DEM etc.
Promotion for Public Use Parallel FEM Applications Parallelization of Legacy Codes Environment for Development
100
3rd ACES WG mtg. June 2003.
Current Remarks If you want to parallelize your legacy code on PC cluster
... Keep your own data structure.
customized partitioner will be provided (or automatically generated by the XML system)
Rewrite your matrix assembling part. Introduce linear solvers and visualization in HPC-MW
Input
Mat. Conn.
Mat. Assem.
Linear Solver
Visualization
Output
Comm.
User’s Original HPC-MW
101
3rd ACES WG mtg. June 2003.
Current Remarks (cont.) If you want to optimize your legacy code on the E
arth Simulator Use HPC-MW's data structure Anyway, you have to rewrite your code for ES. Utilize all components of HPC-MW
Input
Mat. Conn.
Mat. Assem.
Linear Solver
Visualization
Output
Comm.
User’s Original HPC-MW
102
3rd ACES WG mtg. June 2003.
Some Examples
103
3rd ACES WG mtg. June 2003.
Simple Interface: Communication call SOLVER_SEND_RECV_3 & & ( NP, NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT, & & STACK_EXPORT, NOD_EXPORT, WS, WR, WW(1,ZP) , SOLVER_COMM, & & my_rank)
module solver_SR_3 contains subroutine SOLVER_SEND_RECV_3 & & ( N, NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT, & & STACK_EXPORT, NOD_EXPORT, & & WS, WR, X, SOLVER_COMM,my_rank) implicit REAL*8 (A-H,O-Z) include 'mpif.h' include 'precision.inc'
integer(kind=kint ) , intent(in) :: N integer(kind=kint ) , intent(in) :: NEIBPETOT integer(kind=kint ), pointer :: NEIBPE (:) integer(kind=kint ), pointer :: STACK_IMPORT(:) integer(kind=kint ), pointer :: NOD_IMPORT (:) integer(kind=kint ), pointer :: STACK_EXPORT(:) integer(kind=kint ), pointer :: NOD_EXPORT (:) real (kind=kreal), dimension(3*N), intent(inout):: WS real (kind=kreal), dimension(3*N), intent(inout):: WR real (kind=kreal), dimension(3*N), intent(inout):: X integer , intent(in) ::SOLVER_COMM integer , intent(in) :: my_rank...
GeoFEM's Original I/F
104
3rd ACES WG mtg. June 2003.
Simple Interface: Communication call SOLVER_SEND_RECV_3 & & ( NP, NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT, & & STACK_EXPORT, NOD_EXPORT, WS, WR, WW(1,ZP) , SOLVER_COMM, & & my_rank)
module solver_SR_3 contains subroutine SOLVER_SEND_RECV_3 & & ( N, NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT, & & STACK_EXPORT, NOD_EXPORT, & & WS, WR, X, SOLVER_COMM,my_rank) implicit REAL*8 (A-H,O-Z) include 'mpif.h' include 'precision.inc'
integer(kind=kint ) , intent(in) :: N integer(kind=kint ) , intent(in) :: NEIBPETOT integer(kind=kint ), pointer :: NEIBPE (:) integer(kind=kint ), pointer :: STACK_IMPORT(:) integer(kind=kint ), pointer :: NOD_IMPORT (:) integer(kind=kint ), pointer :: STACK_EXPORT(:) integer(kind=kint ), pointer :: NOD_EXPORT (:) real (kind=kreal), dimension(3*N), intent(inout):: WS real (kind=kreal), dimension(3*N), intent(inout):: WR real (kind=kreal), dimension(3*N), intent(inout):: X integer , intent(in) ::SOLVER_COMM integer , intent(in) :: my_rank...
105
3rd ACES WG mtg. June 2003.
Simple Interface: Communication call SOLVER_SEND_RECV_3 & & ( NP, NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT, & & STACK_EXPORT, NOD_EXPORT, WS, WR, WW(1,ZP) , SOLVER_COMM, & & my_rank)
module solver_SR_3 contains subroutine SOLVER_SEND_RECV_3 & & ( N, NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT, & & STACK_EXPORT, NOD_EXPORT, & & WS, WR, X, SOLVER_COMM,my_rank) implicit REAL*8 (A-H,O-Z) include 'mpif.h' include 'precision.inc'
integer(kind=kint ) , intent(in) :: N integer(kind=kint ) , intent(in) :: NEIBPETOT integer(kind=kint ), pointer :: NEIBPE (:) integer(kind=kint ), pointer :: STACK_IMPORT(:) integer(kind=kint ), pointer :: NOD_IMPORT (:) integer(kind=kint ), pointer :: STACK_EXPORT(:) integer(kind=kint ), pointer :: NOD_EXPORT (:) real (kind=kreal), dimension(3*N), intent(inout):: WS real (kind=kreal), dimension(3*N), intent(inout):: WR real (kind=kreal), dimension(3*N), intent(inout):: X integer , intent(in) ::SOLVER_COMM integer , intent(in) :: my_rank...
use hpcmw_utiluse dynamic_griduse dynamic_cntltype (hpcmw_local_mesh) :: local_mesh N = local_mesh%n_internalNP= local_mesh%n_nodeICELTOT= local_mesh%n_elem
NEIBPETOT= local_mesh%n_neighbor_peNEIBPE => local_mesh%neighbor_pe
STACK_IMPORT=> local_mesh%import_index NOD_IMPORT=> local_mesh%import_nodeSTACK_EXPORT=> local_mesh%export_index NOD_EXPORT=> local_mesh%export_node
106
3rd ACES WG mtg. June 2003.
Simple Interface: Communication use hpcmw_util type (hpcmw_local_mesh) :: local_mesh... call SOLVER_SEND_RECV_3 ( local_mesh, WW(1,ZP)) ...
module solver_SR_3 contains
subroutine SOLVER_SEND_RECV_3 (local_mesh, X) use hpcmw_util type (hpcmw_local_mesh) :: local_mesh real(kind=kreal), dimension(:), allocatable, save:: WS, WR ...
use hpcmw_utiluse dynamic_griduse dynamic_cntltype (hpcmw_local_mesh) :: local_mesh N = local_mesh%n_internalNP= local_mesh%n_nodeICELTOT= local_mesh%n_elem
NEIBPETOT= local_mesh%n_neighbor_peNEIBPE => local_mesh%neighbor_pe
STACK_IMPORT=> local_mesh%import_index NOD_IMPORT=> local_mesh%import_nodeSTACK_EXPORT=> local_mesh%export_index NOD_EXPORT=> local_mesh%export_node
107
3rd ACES WG mtg. June 2003.
Preliminary Study in FY.2002 FEM Procedure for 3D Elastic Problem Parallel I/O Iterative Linear Solvers (ICCG, ILU-BiCGSTAB) FEM Procedures (Matrix Connectivity/Assembling)
Linear Solvers SMP Cluster/Distributed Memory Vector/Scalar Processors CM-RCM/MC Reordering
108
3rd ACES WG mtg. June 2003.
Method of Matrix Storage Scalar/Distributed
Memory CRS with Natural
Ordering
Scalar/SMP Cluster PDCRS/CM-RCM PDCRS/MC
Vector/Distributed & SMP Cluster
PDJDS/CM-RCM PDJDS/MC
PDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-ordering
109
3rd ACES WG mtg. June 2003.
Main Program
program SOLVER33_TEST
use solver33use hpcmw_all
implicit REAL*8(A-H,O-Z)
call HPCMW_INITcall INPUT_CNTLcall INPUT_GRID
call MAT_CON0call MAT_CON1
call MAT_ASS_MAIN (valA,valB,valX)call MAT_ASS_BC
call SOLVE33 (hpcmwIarray, hpcmwRarray)call HPCMW_FINALIZE
end program SOLVER33_TEST
FEM program developed by users
• call same subroutines• interfaces are same• NO MPI !!
Procedure of each subroutine is different in the individual library
110
3rd ACES WG mtg. June 2003.
HPCMW_INIT ?for MPI
subroutine HPCMW_INIT
use hpcmw_allimplicit REAL*8(A-H,O-Z)
call MPI_INIT (ierr)call MPI_COMM_SIZE (MPI_COMM_WORLD, PETOT, ierr)call MPI_COMM_RANK (MPI_COMM_WORLD, my_rank, err)
end subroutine HPCMW_INIT
for NO-MPI (SMP)
subroutine HPCMW_INIT
use hpcmw_allimplicit REAL*8(A-H,O-Z)ierr= 0end subroutine HPCMW_INIT
111
3rd ACES WG mtg. June 2003.
Solve33 for SMP Cluster/Scalar module SOLVER33
contains subroutine SOLVE33 (hpcmwIarray, hpcmwRarray)
use hpcmw_solver_matrix use hpcmw_solver_cntl use hpcmw_fem_mesh
use solver_CG_3_SMP_novec use solver_BiCGSTAB_3_SMP_novec
implicit REAL*8 (A-H,O-Z)
real(kind=kreal), dimension(3,3) :: ALU real(kind=kreal), dimension(3) :: PW
integer :: ERROR, ICFLAG character(len=char_length) :: BUF
data ICFLAG/0/
integer(kind=kint ), dimension(:) :: hpcmwIarray real (kind=kreal), dimension(:) :: hpcmwRarray
!C!C +------------+!C | PARAMETERs |!C +------------+!C=== ITER = hpcmwIarray(1) METHOD = hpcmwIarray(2) PRECOND = hpcmwIarray(3) NSET = hpcmwIarray(4) iterPREmax= hpcmwIarray(5)
RESID = hpcmwRarray(1) SIGMA_DIAG= hpcmwRarray(2)
if (iterPREmax.lt.1) iterPREmax= 1 if (iterPREmax.gt.4) iterPREmax= 4!C===
!C!C +-----------+!C | BLOCK LUs |!C +-----------+!C===
(skipped)
!C===
!C!C +------------------+!C | ITERATIVE solver |!C +------------------+!C=== if (METHOD.eq.1) then call CG_3_SMP_novec & & ( N, NP, NPL, NPU, PEsmpTOT, NHYP, IVECT, STACKmc, & & NEWtoOLD, OLDtoNEW, & & D, AL, indexL, itemL, AU, indexU, itemU, & & B, X, ALUG, RESID, ITER, ERROR, & & my_rank, NEIBPETOT, NEIBPE, & & NOD_STACK_IMPORT, NOD_IMPORT, & & NOD_STACK_EXPORT, NOD_EXPORT, & & SOLVER_COMM , PRECOND, iterPREmax) endif
if (METHOD.eq.2) then call BiCGSTAB_3_SMP_novec & & ( N, NP, NPL, NPU, PEsmpTOT, NHYP, IVECT, STACKmc, & & NEWtoOLD, OLDtoNEW, & & D, AL, indexL, itemL, AU, indexU, itemU, & & B, X, ALUG, RESID, ITER, ERROR, & & my_rank, NEIBPETOT, NEIBPE, & & NOD_STACK_IMPORT, NOD_IMPORT, & & NOD_STACK_EXPORT, NOD_EXPORT, & & SOLVER_COMM , PRECOND, iterPREmax) endif
ITERactual= ITER!C===
end subroutine SOLVE33 end module SOLVER33
112
3rd ACES WG mtg. June 2003.
Solve33 for SMP Cluster/Vector module SOLVER33
contains subroutine SOLVE33 (hpcmwIarray, hpcmwRarray)
use hpcmw_solver_matrix use hpcmw_solver_cntl use hpcmw_fem_mesh
use solver_VCG33_DJDS_SMP use solver_VBiCGSTAB33_DJDS_SMP
implicit REAL*8 (A-H,O-Z)
real(kind=kreal), dimension(3,3) :: ALU real(kind=kreal), dimension(3) :: PW
integer :: ERROR, ICFLAG character(len=char_length) :: BUF
data ICFLAG/0/
integer(kind=kint ), dimension(:) :: hpcmwIarray real (kind=kreal), dimension(:) :: hpcmwRarray!C!C +------------+!C | PARAMETERs |!C +------------+!C=== ITER = hpcmwIarray(1) METHOD = hpcmwIarray(2) PRECOND = hpcmwIarray(3) NSET = hpcmwIarray(4) iterPREmax= hpcmwIarray(5)
RESID = hpcmwRarray(1) SIGMA_DIAG= hpcmwRarray(2)
if (iterPREmax.lt.1) iterPREmax= 1 if (iterPREmax.gt.4) iterPREmax= 4!C===
!C!C +-----------+!C | BLOCK LUs |!C +-----------+!C===
(skipped)
!C===
!C!C +------------------+!C | ITERATIVE solver |!C +------------------+!C=== if (METHOD.eq.1) then call VCG33_DJDS_SMP & & ( N, NP, NLmax, NUmax, NPL, NPU, NHYP, PEsmpTOT, & & STACKmcG, STACKmc, NLmaxHYP, NUmaxHYP, IVECT, & & NEWtoOLD, OLDtoNEW_L, OLDtoNEW_U, NEWtoOLD_U, LtoU,& & D, PAL, indexL, itemL, PAU, indexU, itemU, B, X, & & ALUG_L, ALUG_U, RESID, ITER, ERROR, my_rank, & & NEIBPETOT, NEIBPE, NOD_STACK_IMPORT, NOD_IMPORT, & & NOD_STACK_EXPORT, NOD_EXPORT, & & SOLVER_COMM, PRECOND, iterPREmax) endif
if (METHOD.eq.2) then call VBiCGSTAB33_DJDS_SMP & & ( N, NP, NLmax, NUmax, NPL, NPU, NHYP, PEsmpTOT, & & STACKmcG, STACKmc, NLmaxHYP, NUmaxHYP, IVECT, & & NEWtoOLD, OLDtoNEW_L, OLDtoNEW_U, NEWtoOLD_U, LtoU,& & D, PAL, indexL, itemL, PAU, indexU, itemU, B, X, & & ALUG_L, ALUG_U, RESID, ITER, ERROR, my_rank, & & NEIBPETOT, NEIBPE, NOD_STACK_IMPORT, NOD_IMPORT, & & NOD_STACK_EXPORT, NOD_EXPORT, & & SOLVER_COMM, PRECOND, iterPREmax) endif
ITERactual= ITER!C===
end subroutine SOLVE33 end module SOLVER33
113
3rd ACES WG mtg. June 2003.
Mat.Ass. for SMP Cluster/Scalar do ie= 1, 8 ip = nodLOCAL(ie) do je= 1, 8 jp = nodLOCAL(je)
kk= 0 if (jp.gt.ip) then iiS= indexU(ip-1) + 1 iiE= indexU(ip )
do k= iiS, iiE if ( itemU(k).eq.jp ) then kk= k exit endif enddo endif
if (jp.lt.ip) then iiS= indexL(ip-1) + 1 iiE= indexL(ip )
do k= iiS, iiE if ( itemL(k).eq.jp) then kk= k exit endif enddo endif
PNXi= 0.d0 PNYi= 0.d0 PNZi= 0.d0 PNXj= 0.d0 PNYj= 0.d0 PNZj= 0.d0
VOL= 0.d0
do kpn= 1, 2 do jpn= 1, 2 do ipn= 1, 2 coef= dabs(DETJ(ipn,jpn,kpn))*WEI(ipn)*WEI(jpn)*WEI(kpn)
VOL= VOL + coef PNXi= PNX(ipn,jpn,kpn,ie) PNYi= PNY(ipn,jpn,kpn,ie) PNZi= PNZ(ipn,jpn,kpn,ie)
PNXj= PNX(ipn,jpn,kpn,je) PNYj= PNY(ipn,jpn,kpn,je) PNZj= PNZ(ipn,jpn,kpn,je)
a11= valX*(PNXi*PNXj+valB*(PNYi*PNYj+PNZi*PNZj))*coef a22= valX*(PNYi*PNYj+valB*(PNZi*PNZj+PNXi*PNXj))*coef a33= valX*(PNZi*PNZj+valB*(PNXi*PNXj+PNYi*PNYj))*coef
a12= (valA*PNXi*PNYj + valB*PNXj*PNYi)*coef a13= (valA*PNXi*PNZj + valB*PNXj*PNZi)*coef ...
if (jp.gt.ip) then PAU(9*kk-8)= PAU(9*kk-8) + a11 ... PAU(9*kk )= PAU(9*kk ) + a33 endif
if (jp.lt.ip) then PAL(9*kk-8)= PAL(9*kk-8) + a11 ... PAL(9*kk )= PAL(9*kk ) + a33 endif
if (jp.eq.ip) then D(9*ip-8)= D(9*ip-8) + a11 ... D(9*ip )= D(9*ip ) + a33 endif enddo enddo enddo
enddo endif enddo enddo
114
3rd ACES WG mtg. June 2003.
Mat.Ass. for SMP Cluster/Vector do ie= 1, 8 ip = nodLOCAL(ie) if (ip.le.N) then do je= 1, 8 jp = nodLOCAL(je) kk= 0 if (jp.gt.ip) then ipU= OLDtoNEW_U(ip) jpU= OLDtoNEW_U(jp) kp= PEon(ipU) iv= COLORon(ipU) nn= ipU - STACKmc((iv-1)*PEsmpTOT+kp-1)
do k= 1, NUmaxHYP(iv) iS= indexU(npUX1*(iv-1)+PEsmpTOT*(k-1)+kp-1) + nn if ( itemU(iS).eq.jpU) then kk= iS exit endif enddo endif
if (jp.lt.ip) then ipL= OLDtoNEW_L(ip) jpL= OLDtoNEW_L(jp) kp= PEon(ipL) iv= COLORon(ipL) nn= ipL - STACKmc((iv-1)*PEsmpTOT+kp-1)
do k= 1, NLmaxHYP(iv) iS= indexL(npLX1*(iv-1)+PEsmpTOT*(k-1)+kp-1) + nn if ( itemL(iS).eq.jpL) then kk= iS exit endif enddo endif
PNXi= 0.d0 PNYi= 0.d0 PNZi= 0.d0 PNXj= 0.d0 PNYj= 0.d0 PNZj= 0.d0
VOL= 0.d0
do kpn= 1, 2 do jpn= 1, 2 do ipn= 1, 2 coef= dabs(DETJ(ipn,jpn,kpn))*WEI(ipn)*WEI(jpn)*WEI(kpn)
VOL= VOL + coef PNXi= PNX(ipn,jpn,kpn,ie) PNYi= PNY(ipn,jpn,kpn,ie) PNZi= PNZ(ipn,jpn,kpn,ie)
PNXj= PNX(ipn,jpn,kpn,je) PNYj= PNY(ipn,jpn,kpn,je) PNZj= PNZ(ipn,jpn,kpn,je)
a11= valX*(PNXi*PNXj+valB*(PNYi*PNYj+PNZi*PNZj))*coef a22= valX*(PNYi*PNYj+valB*(PNZi*PNZj+PNXi*PNXj))*coef a33= valX*(PNZi*PNZj+valB*(PNXi*PNXj+PNYi*PNYj))*coef
a12= (valA*PNXi*PNYj + valB*PNXj*PNYi)*coef a13= (valA*PNXi*PNZj + valB*PNXj*PNZi)*coef ...
if (jp.gt.ip) then PAU(9*kk-8)= PAU(9*kk-8) + a11 ... PAU(9*kk )= PAU(9*kk ) + a33 endif
if (jp.lt.ip) then PAL(9*kk-8)= PAL(9*kk-8) + a11 ... PAL(9*kk )= PAL(9*kk ) + a33 endif
if (jp.eq.ip) then D(9*ip-8)= D(9*ip-8) + a11 ... D(9*ip )= D(9*ip ) + a33 endif enddo enddo enddo
enddo endif enddo enddo
115
3rd ACES WG mtg. June 2003.
Hardware Earth Simulator SMP Cluster, 8 PE/node Vector Processor Hitachi SR8000/128 SMP Cluster, 8 PE/node Pseudo-Vector Xeon 2.8 GHz Cluster 2 PE/node, Myrinet Flat MPI only Hitachi SR2201 Pseudo-Vector Flat MPI
116
3rd ACES WG mtg. June 2003.
Simple 3D Cubic Model
x
y
z
Uz=0 @ z=Zmin
Ux=0 @ x=Xmin
Uy=0 @ y=Ymin
Uniform Distributed Force in z-dirrection @ z=Zmin
Ny-1
Nx-1
Nz-1
117
3rd ACES WG mtg. June 2003.
Earth Simulator, DJDS.64x64x64/SMP node, up to 125,829,120 DOF
● : Flat MPI, ○ : Hybrid
0
1000
2000
3000
4000
0 16 32 48 64 80 96 112 128 144 160 176 192
NODE#
GF
LO
PS
40
50
60
70
80
90
100
0 16 32 48 64 80 96 112 128 144 160 176 192
NODE#
Pa
rall
el
Wo
rk R
ati
o:
%
GFLOPS rate Parallel Work Ratio
●Flat MPI
○Hybrid
118
3rd ACES WG mtg. June 2003.
Earth Simulator, DJDS. 100x100x100/SMP node, up to 480,000,000 DOF
● : Flat MPI, ○ : Hybrid
GFLOPS rate Parallel Work Ratio
0
1000
2000
3000
4000
0 16 32 48 64 80 96 112 128 144 160 176 192
NODE#
GF
LO
PS
40
50
60
70
80
90
100
0 16 32 48 64 80 96 112 128 144 160 176 192
NODE#
Pa
rall
el
Wo
rk R
ati
o:
%
●Flat MPI
○Hybrid
119
3rd ACES WG mtg. June 2003.
Earth Simulator, DJDS. 256x128x128/SMP node, up to 2,214,592,512 DOF
● : Flat MPI, ○ : Hybrid
GFLOPS rate Parallel Work Ratio
0
1000
2000
3000
4000
0 16 32 48 64 80 96 112 128 144 160 176 192
NODE#
GF
LO
PS
40
50
60
70
80
90
100
0 16 32 48 64 80 96 112 128 144 160 176 192
NODE#
Pa
rall
el
Wo
rk R
ati
o:
%
3.8TFLOPS for 2.2G DOF 3.8TFLOPS for 2.2G DOF 176 nodes (33.8%)176 nodes (33.8%)
●Flat MPI
○Hybrid
120
3rd ACES WG mtg. June 2003.
Hitachi SR8000/1288 PEs/1-SMP node
0.00
0.50
1.00
1.50
2.00
2.50
1.E+04 1.E+05 1.E+06 1.E+07
DOF
GF
LO
PS
0.00
0.50
1.00
1.50
2.00
2.50
1.E+04 1.E+05 1.E+06 1.E+07
DOF
GF
LO
PS
SMP Flat-MPI
● : PDJDS, ○ : PDCRS, ▲ : CRS-Natural
PDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS PDJDS CRS
121
3rd ACES WG mtg. June 2003.
Hitachi SR8000/1288 PEs/1-SMP nodePDJDS
● : SMP○ : Flat-MPI
0.00
0.50
1.00
1.50
2.00
2.50
1.E+04 1.E+05 1.E+06 1.E+07
DOF
GF
LO
PS
PDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS PDJDS CRS
122
3rd ACES WG mtg. June 2003.
Xeon & SR22018 PEs
Xeon SR2201
● : PDJDS, ○ : PDCRS, ▲ : CRS-Natural
0.00
0.20
0.40
0.60
0.80
1.00
1.E+04 1.E+05 1.E+06 1.E+07
DOF
GF
LO
PS
1.50
2.00
2.50
3.00
1.E+04 1.E+05 1.E+06 1.E+07
DOF
GF
LO
PS
PDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS PDJDS CRS
123
3rd ACES WG mtg. June 2003.
Xeon: Speed UP1-24 PEs
163 nodes/PE 323 nodes/PE
● : PDJDS, ○ : PDCRS, ▲ : CRS-Natural
0.00
2.00
4.00
6.00
8.00
0 8 16 24 32
PE#
GF
LO
PS
0.00
2.00
4.00
6.00
8.00
0 8 16 24 32
PE#
GF
LO
PS
PDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS PDJDS CRS
124
3rd ACES WG mtg. June 2003.
Xeon: Speed UP1-24 PEs
163 nodes/PE 323 nodes/PE
● : PDJDS, ○ : PDCRS, ▲ : CRS-Natural
0
8
16
24
32
0 8 16 24 32
PE#
Sp
ee
d U
P
0
8
16
24
32
0 8 16 24 32
PE#
Sp
ee
d U
P
PDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS PDJDS CRS
125
3rd ACES WG mtg. June 2003.
Hitachi SR2201: Speed UP1-64 PEs
163 nodes/PE 323 nodes/PE
● : PDJDS, ○ : PDCRS, ▲ : CRS-Natural
0.00
1.00
2.00
3.00
4.00
5.00
6.00
0 8 16 24 32 40 48 56 64
PE#
GF
LO
PS
0.00
1.00
2.00
3.00
4.00
5.00
6.00
0 8 16 24 32 40 48 56 64
PE#
GF
LO
PS
PDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS/CM-RCM PDCRS/CM-RCMshort innermost loop
CRS no re-orderingPDJDS PDJDS CRS