GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

GPU-‐enabled Studies of Molecular Systems on Keeneland

-‐ On pursuing high resource u/liza/on and coordinated simula/ons’ progression

Michela Taufer, Sandeep Patel, Samuel Schlachter, and Stephen Herbein

University of Delaware

Jeremy Logan, ORNL

Taxonomy of simula/ons

•  Simula/ons applying fully atomis/cally resolved molecular models and force fields   GPUs enable longer /me and space scales

•  Variable job lengths (ns/day):   As a trajectory evolves   Across trajectories with different e.g., concentra/ons

•  Fully or par/ally coordinated simula/on progression:   Fully coordinated needed for e.g., replica-‐exchange molecular dynamics (REMD)

  Par/ally coordinated for e.g., SDS and nanotubes systems

1

Constraints on high–end computer systems

•  Resource constraints on high-‐end clusters:   Limited wall-‐/me limit per job (e.g., 24 hours)   Mandatory use of resource managers   No direct submission and monitoring of GPU jobs

•  Logical GPU job does not map to physical GPU job   Workflow managers s/ll in infancy

•  System and applica/on failures on GPUs are undetected   Resource managers remain with no no/on of job termina/ons on GPUs

2

Moving beyond virtualiza/on

• When clusters do include virtualiza;on   E.g., Shadowfax

• We can schedule isolated CPU/GPU pairs   This allows us to associate failures with a specific GPU

•  Virtualiza/on imposes overheads   Power   Performance   Noise or ji]er   Portability and maintainability

… and may not be available

3

Our goal: Pursuing BOTH high accelerators’ utilization and (fully or partially) coordinated simulations’ progression

on GPUs in effective and cross-platform ways

Our approach

•  Two so_ware modules that plug into exis/ng resource managers and workflow managers   No virtualiza/on to embrace diverse clusters and programming languages

•  A companion module:   Runs on the head node of the cluster   Accepts jobs from workflow manager   Instan/ates "children" wrapper modules   Dynamically splits jobs and distributes job segments to wrapper modules

•  A wrapper module:   Launches on compute node as a resource manager job   Receives and runs job segments from companion module   Reports status of job segments to companion module

4

Modules in ac/on

5

Workflow Manager

Resource Manager

Front-end node Back-end node

Companion Module

User node

Job queue

Modules in ac/on

6 Front-end node Back-end node

Workflow Manager: •  generate set of 24-hour jobs

Workflow Manager

Resource Manager

Companion Module

24-hour jobs

User node

Job queue

Modules in ac/on

7 Front-end node

Job queue

Back-end node

24-hour jobs

Workflow Manager: •  send set of 24-hour jobs to companion module Companion Module: •  receive 24-hour jobs •  generate a Wrapper Module (WM) instance per back-node

Workflow Manager

Resource Manager

Companion Module

WM instance

User node

Modules in ac/on

8 Front-end node

Job queue

Back-end node

Workflow Manager

Resource Manager

Companion Module

Companion Module: •  submit WM instance as a job to resource manager

User node

WM instance

24-hour jobs

Modules in ac/on

9 Front-end node

Job queue

Back-end node

Workflow Manager

Resource Manager

Companion Module

Companion Module: •  submit WM instance as a job to resource manager

User node

WM instance

24-hour jobs

Modules in ac/on

10 Front-end node Back-end node

Workflow Manager

Resource Manager

Companion Module

Resource Manager: •  launch WM instance as a job on back-end node

Job queue

24-hour jobs

WM job

User node

WM instance

Modules in ac/on

11

Job queue

WM job

Resource Manager

Companion Module

Wrapper Module: •  ask companion module for job segments, as many as the available GPUs

24-hour jobs

User node


Modules in ac/on

12

Job queue

WM job

Workflow Manager

Resource Manager

Companion Module

Companion Module: •  fragment jobs into 6-hour subjobs

24-hour jobs

User node


Modules in ac/on

13

Job queue

WM job

Workflow Manager

Resource Manager

Companion Module

Companion Module: •  fragment jobs into 6-hour subjobs •  send bundle of 3 subjobs to WM job

24-hour jobs

User node


Modules in ac/on

14

Job queue

WM job

Workflow Manager

Resource Manager

Companion Module

Companion Module: •  fragment jobs into 6-hour subjobs •  send bundle of 3 subjobs to WM job

24-hour jobs

User node


Modules in ac/on

15

Job queue

Workflow Manager

Resource Manager

Companion Module

24-hour jobs

WM job

Wrapper Module: •  instantiate subjobs on GPUs •  monitor system and application failures as well as time constraints

User node


Modules in ac/on

16

Job queue

Workflow Manager

Resource Manager

Companion Module

24-hour jobs

WM job

User node Wrapper Module: •  instantiate subjobs on GPUs •  monitor system and application failures as well as time constraints


Modules in ac/on

17

Job queue

Workflow Manager

Resource Manager

Companion Module

24-hour jobs

WM job

Wrapper Module: •  if subjob terminates prematurely because of e.g., system or application failures, it request new subjob

User node


Modules in ac/on

18

Job queue

Workflow Manager

Resource Manager

Companion Module

24-hour jobs

WM job

User node


Companion Module: •  adjust length of new subjob based on heuristics, e.g., to complete initially 6-hour period •  send subjob to wrapper module for execution

Modules in ac/on

19

Job queue

Workflow Manager

Resource Manager

Companion Module

24-hour jobs

WM job

User node


Companion Module: •  adjust length of new subjob based on heuristics, e.g., to complete initially 6-hour period •  send subjob to wrapper module for execution

MD Simula/ons

• MD simula/ons:   Case study 1: Study of sodium dodecyl sulfate (SDS) molecules aqueous solu/ons and electrolyte solu/ons

  Case study 2: Study of nanotubes in aqueous solu/ons and electrolyte solu/ons

•  GPU code FEN ZI (Yun Dong de FEN ZI = Moving MOLECULES)   MD simula/ons in NVT and NVE ensembles and energy minimiza/on in explicit solvent

  Constraints on interatomic distances e.g., shake and ra]le, atomic restraints, and freezing fast degrees of mo/ons

  Electrosta/c interac/ons, i.e., Ewald summa/on, performed on GPU • Metrics of interest:

  U/liza/on of GPUs – i.e., /me ra/o accountable for simula/on’s progression

20

The Keeneland system

•  GPU descrip/on:   3 M2090 GPUs per node

•  So_ware:   TORQUE Resource Manager   Globus allows for the use of Pegasus Workflow Manager   Shared Lustre file system

•  Constraints:   24-‐hour /me limit   1 job per node (cannot have mul/ple jobs on one node)   Can set GPUs into Shared/Exclusive mode but not complete isola/on (e.g., user that get access first can steal all the GPUs)

  Vendor specific with specific version of NVIDIA driver (>260))

21

Modeling max u/liza/on

• With our approach using segments in 24-‐hour period:

• Without our approach:

22

€

utilization =days∑

tmax − tarrival (i) − tlastchk (i)( ) + trestart[ ]i=1

n−1

∑ − tmax − tarrival (n)( )

tmaxGPUs∑

€


tmax − tarrival (1) − tlastchk (1)( ) − tmax − tarrival (1)( )tmaxGPUs

∑

€

tarrival (i) = tlastcheck (i) when tarrival (i) > tmaxtarrival (i) otherwise

⎧ ⎨ ⎪

⎩ ⎪

tlastchk (n) = f (molecular _ system)

where:

€

n

Case study 1: Sodium Dodecyl Sulfate (SDS)

23

molar concentrations: 0.10 molar concentrations: 0.25

molar concentrations: 0.50 molar concentrations: 1.00

Initial structures: surfactant molecules randomly distributed

Case study 1: variable simula/on /mes

24

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 2 4 6 8 10 12

perfo

rman

ce (n

s/da

y)

simulation time (ns)

0.1 0.25 0.5 1

Case study 1: testbeds

•  Taxonomy of our simula/ons:   4 concentra/ons and 3 200-‐ns trajectories per concentra/on at 298K

•  Test 1:   Jobs with same concentra/ons assigned to same node

•  Test 2:   Jobs with different concentra/ons assigned to same node

25

24 hours 24 hours 24 hours 24 hours

Test 1: Test 2:

€



∑€



n−1


tmaxGPUs∑

Case study 1: modeling max u/liza/on

• With our approach using segments in 24-‐hour period:


26

€


tmax − trestart[ ]i=1

n−1


tmaxGPUs∑

€

tarrival (i) = tlastcheck (i) when tarrival (i) > tmaxtarrival (i) otherwise

⎧ ⎨ ⎪

⎩ ⎪

tmax = 24hours

where:

€

n

€


tmax − tmax − tarrival (1)( )tmaxGPUs

∑

Case study 1: modeling arrival /me

We model in two ways:   Scien/sts: run short simula/on, compute ns/day, define job’s speed to constant rate to fit into 24-‐hour period

  Our approach: segment 24-‐hour job in segments, adjust segment length based on heuris/c that takes into account change in ns/day

27

€

tarrival (i)

Case study 1: our heuris/c

28

observed performance

our heuristic

projected performance

Case study 1: results

29

•  Run 12 10-‐day trajectories with 4 concentra/ons and 3 different seeds on Keeneland, three trajectories per node

99.54%

99.18%

97.83%

95.83%

98.82%

98.44%

96.98%

94.85%

98.78%

98.26%

96.21%

93.50%

97.08%

96.53%

94.49%

91.72%

0.5

1

3

6

With our approach W/o our approach

test 1test 2 test 2test 1tchkpnt(hours)

Case study 1: snapshots of ongoing simula/ons

30

molar concentrations: 0.10 time: 0ns




Initial structures: surfactant molecules randomly distributed molar concentrations: 0.10 time: 22ns




Case study 2: Carbon Nanotubes

•  Study nanotubes in aqueous solu/ons and electrolyte solu/ons   Different temperatures   Different separa/ons

•  Scien/fic metrics:   Poten/al of mean force   Effect of electrolytes, ie, sodium chloride and iodide

  Ion spa/al distribu/ons

31

24 Å

13.6 Å

Case study 2: testbeds

•  Taxonomy of the simula/ons:   10 temperatures ranging from 280K to 360K along with 20 tube separa/ons

  200ns per trajectory with 5.8ns+/-‐3% per day on 64 nodes •  Test 1:

  Hardware errors, i.e., ECC error and system failures •  Test 2:

  Hardware and applica/on errors

32 24 hours 24 hours

Modeling max u/liza/on

• With our approach:


33

€



n−1


tmaxGPUs∑

€



∑

€

tarrival (i)i<n = weilbul(scale,shape)tarrival (n) = 0.03 × tmaxtmax = 24hours

where:

Case study 2: modeling system failures

34 • Weibul distribu/on: scale = 203.8 and shape = 0.525

failu

res

occu

rren

ces

probability density function (pdf)

P(system failure) = 0.057 P(two more jobs fail because of system given that one already failed) = 0.333

hours

Case study 2: modeling applica/on failures

35 • Weibul distribu/on: scale = 56.56, shape = 0.3361

failu

res

occu

rren

ces

probability density function (pdf)

hours

P(application failure) = 0.038

Case study 2: results

•  Run 200ns for each nanotube system – equivalent to ~35 days on 64 nodes of Keeneland, each with 3 GPUs

36

99.69%

99.64%

99.47%

99.28%

99.54%

99.47%

99.23%

98.98%

94.07%

94.02%

93.79%

93.61%

90.32%

90.24%

89.98%

89.73%

0.5

1

3

6

With our approach W/o our approach

sysfailsysfailappfail

sysfailappfail

sysfailtchkpnt(hours)

Case study 2: scien/fic results

37

38



39

Conclusions

•  GPUs are s/ll second class ci/zens on high-‐end clusters   Virtualiza/on is too costly   Lightweight, user-‐level OSs are work in progress

•  Rather than rewri/ng exis/ng workflow and resource managers, we propose to complement them with:   Companion Module complemen/ng the workflow manager   Wrapper Module suppor/ng the resource managers

• We model the maximum usability for:   SDS systems with dynamically variable run/mes   Carbon nanotube systems with hardware and applica/on failures

•  Usability increases significantly in both cases •  The science is work in progress

  Stay tune for our next publica/ons

40

Acknowledgments

41

Related work: Taufer et al., CiSE 2012 Ganesan et al., JCC 2011 Bauer et al., JCC 2011 Davis et al., BICoB 2009

Patel’s group Taufer’s group

Sponsors:

Contact: [email protected] , [email protected]

Date post:	12-Sep-2021
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Documents