Post on 01-Dec-2021
transcript
Developing a highly scalable molecular dynamics simulation
program
Kwang Jin Oh
KISTI Supercomputing center
What is molecular dynamics simulation ?
amfrr
=
)(trr
)(tvr postprocessing
ThermodynamicStructural
MechanicalRheological
OpticalElectrostatic
Solve numerically
Equation of motion
Trajectory
Molecular dynamics simulation in NVE ensemble
ii v
dt
rd rr
=
i
ii
m
f
dt
vdrr
=
VKH f +=′
)2
()0()(t
vtrtr∆
∆+=∆rrr
)(2
1)
2()( tF
m
ttvtv ∆
∆+
∆=∆
rrr
)0(2
1)0()
2( F
m
tv
tv
rrr ∆+=
∆
velocity Verlet algorithm
TkK B2
3>=<Temperature
><+= WTNkPV BPressure
VB CTkK 22 >=< δHear capacity
Diffusion constant >−=< 2))0()((2 rtrDt
rr
∑∑ −<= )()(2 ijrr
N
Vrg
rrδRadial distribution function
Trajectory analysis
RMSD Ramachandran plot
SASA
Density profile
Pressure profile
Trajectory analysis
Trajectory near liquid-solid Phase transition
IBM 704 at Lawrence Livermore (5 KFLOPS)
MD of hard spheres by Alder and Wainright: First MD
Computing power
Computing power
What is problem?
Time scale issue -> Need large number of iterations• Local Motions (0.01 to 5 Å, 10-15 to 10-1 s)
– Atomic fluctuations – Sidechain Motions – Loop Motions
• Rigid Body Motions (1 to 10Å, 10-9 to 1s) – Helix Motions – Domain Motions (hinge bending) – Subunit motions
• Large-Scale Motions (> 5Å, 10-7 to 104 s) – Helix coil transitions – Dissociation/Association – Folding and Unfolding
Length scale issue -> Need large number of particles• About 10 billion atoms in µm3 -> prohibitive
Computationally very intensive !!!
Simple example
0
500
1000
1500
2000
2500
3000
3500
4000
NAMD LAMMPS Gromacs
Seconds/step
Performance
5dhfr
apoa1
f1atpase
stmv
5dhfr: 23558 apoa1: 92224
f1atpase: 327506 stmv: 1066628
1 ns simulation of stmv with 1 fs integration time step
-> 1M steps -> ~109 seconds-> ~10000 days
What we are trying to do?
• Faster force calculation algorithm (neighbor list, SPME, FMM, coarse-grained model, �)
• More efficient numerical integration algorithm (RESPA, �)
• Higher scalable parallel algorithm (DD, load balancing, �)
Challenges we face
• Utilizing cores of O(105) with good scalability
- N/P is small
- computation/communication is low
• Utilizing emerging architectures
- need new programming model
- need an efficient parallelization scheme
• Optimizing performance to reduce the gap
We need better parallel algorithm !!!
N
pp
S
+−
=
)1(
1
N: # of processorsp: parallel portion
Amdahl’s law
• All-to-all communication• Poor scalability• Better load balancing
• Communication only with neighbor processors• Better scalability• Poor load balancing• # of cores are limited by the domain size
Atom decomposition Domain decomposition
rcut
Midpoint method
Parallelization schemes
• 0.5*rcut
Bond constraints
0
02
22
2
)(ij
ijij
ijijij
ij dddt
ddG
rrr
r
′•∆
′−≈
µ
12Gr
1
2
1 2
1’
2’
21Gr
• To constrain bonds or angles like C-H, H-O-H, and so on
• Integrating vibrational motion of high frequency with large time step-> the trajectory blows up eventually
SHAKE
Smooth particle mesh Ewald
Ewald: O(N3/2)
SPME: O(NlogN)
∑∑>
=i ij ij
ijji
r
rerfcqqV
)(
4
1
0
α
πε
∑≠
−+
0
2
2
22
0
)()4/exp(1
k
kSk
kr
rα
ε
∑−i
iqπ
α
πε2
04
1
∑ •=i
ii rkiqkS )exp()(rrr
),,)(()()()()( 321332211 kkkQFkbkbkbkS ≈r
Ewald SPME
# of processors is limited by the dimension !!!
volume->slab
slab->volume
slab decomposition volumetric decomposition
0
50
100
150
200
250
300
350
400
450
2 4 8 16 32 64 128
3DFFT MKL
Mesh size: 1024Ⅹ1024Ⅹ1024
Hierarchical parallelization
5DHFR (charmm22) + TIP3PMDNVE 1000 steps on tachyon2Box: ~64 A, rcut: 12 A
0
100
200
300
400
500
600
700
1 2 4 8 16 32 64
Wall c
lock t
ime (
sec)
# of MPI tasks
1 thread
2 threads
4 threads
8 threads
Parallel performance
Load balancing: Voronoi diagram
0
200
400
600
CPUGPU (single
precision)GPU (double
precision)
565.1
21.75 79.88
Timing (seconds/100 steps)
GPU computing
Abstraction, inheritance, polymorphism …
Object-oriented programming
• Object-oriented design using C++• Hierarchical parallelization using MPI and OPENMP• Domain decomposition based on atomic group • Charmm, Amber, and easily extended to handle other forcefields• NVE MD, NVT MD (global, molecule type, molecule, rigid group),
NPT MD (flexible, isotropic, x-y-z, xy-z, z), LD, DPD• Multiple time step using RESPA • Replica exchange molecular dynamics • Shake/Rattle for constraint dynamics• Electrostatic force calculation using SPME• Neighbor lists (Verlet neighbor list, cell-linked list, combined list)• 3D FFT using volumetric decomposition• Implicit solvent model (SASA, GB)• Trajectory analysis
mm_par: A general purpose parallel MD code
5DHFR (CHARMM22)+7023 water (TIP3P)
Bonded force
Non-bonded force (real space/
reciprocal space)
Time(sec)
0.03 2.57 0.3
Future issues: strong scalability
0
0.2
0.4
0.6
0.8
1
1.2
1 4 16 64 256 1024
Norm
alized w
all c
lock t
ime
Number of processors
Non-bonded
SPME
Bonded
Future issues: strong scalability
Thank you !!!
koh@kisti.re.kr