+ All Categories
Home > Documents > *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

*In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Date post: 06-Jan-2016
Category:
Upload: kimama
View: 25 times
Download: 4 times
Share this document with a friend
Description:
Simulating extended time and length scales using parallel kinetic Monte Carlo and accelerated dynamics Jacques G. Amar, University of Toledo. - PowerPoint PPT Presentation
22
Simulating extended time and length scales using parallel kinetic Monte Carlo and accelerated dynamics Jacques G. Amar, University of Toledo Kinetic Monte Carlo (KMC) is an extremely efficient method to carry out dynamical simulations when relevant thermally-activated atomic-scale processes are known. Used to model a variety of dynamical processes from catalysis to thin-film growth Temperature-accelerated dynamics (TAD - Sorensen & Voter, 2000) may be used to carry out realistic simulations even when relevant atomic-scale processes are extremely complicated and are not known. GOAL: to extend both of these techniques in order to carry out realistic simulations over larger system- sizes, longer time scales *In collaboration with Yunsic Shim Supported by NSF DMR-0219328
Transcript
Page 1: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Simulating extended time and length scales using parallel kinetic Monte Carlo and accelerated dynamics

Jacques G. Amar, University of Toledo

• Kinetic Monte Carlo (KMC) is an extremely efficient method to carry out dynamical simulations when relevant thermally-activated atomic-scale processes are known.

Used to model a variety of dynamical processes from catalysis to thin-film growth

• Temperature-accelerated dynamics (TAD - Sorensen & Voter, 2000) may be used to carry out realistic simulations even when relevant atomic-scale processes are extremely complicated and are not known.

GOAL: to extend both of these techniques in order to carry out realistic simulations over larger system-sizes, longer time scales

*In collaboration with Yunsic ShimSupported by NSF DMR-0219328

Page 2: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Parallel Kinetic Monte Carlo

• While standard KMC is extremely efficient it is inherently a

serial algorithm! No matter how large the system, at every step only

one event can be simulated!

• In contrast, Nature is inherently parallel!

• We would like to use KMC to carry out simulations of thin-film

growth over longer time and length scales

How to “parallelize” the KMC algorithm in order to

simulate larger system-sizes, longer time scales?

Page 3: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Temperature Accelerated Dynamics (TAD)

• KMC simulations are limited by requirement that complete catalog of

all relevant processes and their rate constants must be specified. However,

often all relevant transition mechanisms are not known.

• TAD allows realistic simulations of low temperature processes over

timescales of seconds and even hours

• Computational work for TAD scales as N3 where N = # of atoms, so can only be applied to extremely small systems (a few hundred atoms)

How to “parallelize” the TAD algorithm in order to

simulate larger system-sizes?

Page 4: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Parallel KMC - Domain Decomposition

• Domain decomposition is a natural approach since intuitively one expects that widely separated regions may evolve independently “in parallel”

Problems

• In parallel KMC, time evolves at different rates in different

regions!

• How to deal with time synchronization?

• How to deal with conflicts between neighboring processors?

1 2 3 4

Page 5: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Only update processors whose next event times correspond to local minima in time horizon (Chang, 1979; Lubachevsky, 1985)

Advantages: works for Metropolis Monte Carlo since acceptance probability depends on local configuration but event-times do not.

t3Time Horizon

P1 P2 P3 P4 P5 P6

t1t2

t4

t5

t6

t = 0

Disadvantages: does not work for kinetic Monte Carlo since event-times depend on local configuration. Fast events can “propagate” from processor to processor and lead to rollbacks.

Parallel Discrete Event Simulation (PDES)Conservative Algorithm

Page 6: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Three approaches to parallel KMC

Rigorous Algorithms

• Conservative asynchronous algorithm

Lubachevsky (1988), Korniss et al (1999), Shim & Amar (2004)

• Synchronous relaxation algorithm

Lubachevsky & Weiss (2001), Shim & Amar (2004)

Semi-rigorous Algorithm

• Synchronous sublattice algorithm

Shim & Amar (2004)

Page 7: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Thin-film growth models studied

“Fractal model”

Deposition rate F per site per unit time

Monomer hopping rate D

Irreversible sticking/attachment (i =1)

“Edge-diffusion model”

Same as above with edge-diffusion

(relaxation) of singly-bonded cluster atoms

“Reversible attachment model”

Detachment of singly and multiply bonded atoms

(bond-counting model)

D/F = 107

Ee

Ec

E1

Page 8: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Methods of domain decomposition (2D)

Square decomposition

(8 nbors)

Strip decomposition

(2 nbors)

Page 9: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Synchronous relaxation (SR) algorithm

(Lubachevsky & Weiss, 2001)

• All processors ‘in-synch’ at beginning & end

of each cycle

• Iterative relaxation - at each iteration

processors use boundary info. from

previous iteration

• Relaxation complete when current iteration

identical to previous iteration for all processors

2 processors

1 2

Bdyevent

t = 0

t = T

t11

t12

t22

t23

t21

P1 P2

Disadvantages:

• Complex: requires ‘keeping list’ of all events, random numbers

used in each iteration

• Algorithm does not scale: faster than CA algorithm but still slow due to

global synchronization and requirement of multiple iterations per cycle

One Cycle

Page 10: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Average calc. time per cycle T for parallel simulation may be written:

tav (Np) = Niter < nmax > (t1p /nav ) + tcom

where: < nmax >/nav ~ T-1/2 log(Np)2/3 and Niter ~ T log(Np)

tcom ~ (a + bT) log(Np)

In limit of zero communication time fluctuations still play a role:

Maximum PE PEmax = (1/ Niter ) (nav/ < nmax >) ~ 1/log(Np)

Parallel efficiency (PE) of SR algorithm

PE = t1p / tav =1

N iter(tcom / t1p + < nmax > /nav )

Optimize PE by varying cycle length T (feedback)

Page 11: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Parallel Efficiency of SR algorithm

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60 70

T = 1/D (256 x 256)T = 1/D (256 x 1k)E

opt (256 x 1k)

Np

Fractal model: D/F = 10 5

Ideal: nopt

= 2 (256 x 1k)

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60 70

T = 1/D (256 x 256)T = 1/D (256 x 1k)E

opt (256 x 1k)

Edge diffusion model: D/F = 10 5

Np

Ideal: nopt

= 2 (256 x 1k)

Fractal model Edge-diffusion model

---- PEideal = 1/[1 + 0.6 ln(Np)1.1]

Page 12: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Synchronous sublattice (SL) algorithm

(Shim & Amar, 2004)

• At beginning of each synchronous cycle one subregion (A,B,C, or D) randomly selected. All processors update sites in selected sublattice only => eliminates conflicts between PE’s.

• Sublattice event in each processor selected as in usual KMC. At end of synchronous cycle processors communicate changes to neighboring processors.

2D (square) decomposition

(2 send/receives per cycle)

1D (strip) decomposition

(1 send/receive per cycle)

Advantages:

• No global communication required

• Many events per cycle => reduced

communication overhead due to latency

Disadvantages:

• Not rigorous, PE still somewhat reduced due to fluctuations

Page 13: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Synchronous sublattice algorithm(Shim & Amar, 2004)

4-processors

A

A

A

A

B

B

B

B

C

C

C

C

D

D

D

D

• Each processor sets its time t = 0 at beginning of cycle,

then carries out KMC sublattice events (time increment

ti = -ln(r)/Ri) until time of next event exceeds time

interval T. Processors then communicate changes as

necessary to neighboring processors.

0

T

t2

t1

t3

2 events

• Maximum time interval T determined by maximum

possible single-event rate in KMC simulation.

For simple model of deposition on a square lattice with

deposition rate F per site and monomer hopping rate D,

T = 1/D

• Many possible events per cycle!

X

Page 14: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Comparison with serial results(Fractal model D/F = 105, L = 256)

1D strip decomposition

System size 256 x 256

Processor size Nx x 256

Np = 4 (Nx = 64)

Np = 8 (Nx = 32)

Np = 16 (Nx = 16)

Page 15: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Nx = 64 Ny = 1024 Np= 16

512 by 512 portion of 1k by 1k system

Reversible growth modelT = 300 K, D/F = 105, E1 = 0.1 eV, and Eb = 0.07 eV

128

Page 16: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Average time per cycle for parallel simulation may be written:

tav = t1p + tcom + <()> (t1p/nav)

where <()> is (average) delay per cycle due to fluctuations

in number of events in neighboring processors.

Parallel efficiency (PE = t1p /tav) may be written:

PE = [ 1 + (tcom / t1p) + <()>/nav ] -1

In limit of no communication time fluctuations still play important role:

Ideal PE PEideal = [ 1 + <()>/nav ] -1

where <()>/nav ~ 1/ nav1/2

Parallel efficiency (PE) of SL algorithm

P2

Fluctuations

n2

P1

n1

Page 17: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Results for <()>/nav

Fractal model

D/F dependence (Np = 4) Np dependence (D/F = 105)

0.1

1

103 104 105 106 107

<()>/nav

<S()>/n

av

/D F

= 1/3Slope

: Fractal model Np = 4

Nx = 256 N

y = 1024

<()>/nav ~ (D/F)1/3 <()>/nav saturates for large Np

Page 18: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

0

0.2

0.4

0.6

0.8

1

103 104 105 106 107

Square sub. (N=512)

Max. p. e. (square)

Strip sub. (Nx = 256 N

y = 1k)

Max. p. e. (strip)

D/F

Edge diffusion model: Np = 4

Edge-diffusion Model

PEmax

Parallel efficiency as function of D/F (Np = 4)

0

0.2

0.4

0.6

0.8

1

103 104 105 106 107

Square sub. (N=512)

Max. p. e. (square)

Strip sub. (Nx = 256 N

y = 1k)

Max. p. e. (strip)

D/F

Fractal model: Np = 4

Fractal Model

PEmax

PEmax = 1/[1 + 0.2 (D/F)1/3/(NxNy)1/2]

Page 19: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Parallel efficiency as function of Np (D/F = 105)

0

0.2

0.4

0.6

0.8

1

100 101 102 103

Np

Parallel efficiency

fractal (256 x 1k): OSC

Edge (256 x 256)

fractal (256 x 1k)

fractal (256 x 256): OSC

Ideal p.e. (fractal)

Ideal p.e. (edge)

Ny = 1k

Ny = 256

Page 20: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80N

p

SL

Syncr. Relax.

Ideal p. e.

Ideal p. e.

Comparison of SR and SL algorithmsFractal model, D/F =105

Nx = 256 Ny = 1024

Page 21: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Summary

• We have studied 3 different algorithms for parallel KMC: conservative asynchronous (CA), synch. relaxation (SR), synch. sublattice (SL)

• CA algorithm not efficient due to rejection of bdy events

• SL algorithm significantly more efficient than SR algorithm

SR algorithm: PE ~ 1/log(Np) where ≥ 1

SL algorithm: PE independent of Np !

• For all algorithms, communication time, latency, fluctuations play

significant role

• For more complex models, we expect that parallel efficiency

of SR and SL algorithms will be significantly increased

Global synch.

Local synch.

Page 22: *In collaboration with Yunsic Shim Supported by NSF DMR-0219328

Future work

• Extend SL algorithm to simulations with realistic geometry in

order to carry out pKMC simulations of Cu epitaxial growth

=> properly include fast processes such as edge-diffusion

• Apply SR and SL algorithms to parallel TAD simulations of

Cu/Cu(100) growth at low T (collaboration with Art Voter)

=> Vacancy formation and mound regularization

in low temperature metal epitaxial growth

• Develop hybrid algorithm combining SR + SL algorithms

• Develop local SR algorithm

• Implement SL and SR algorithms on shared memory machines


Recommended