Real Science at the Petascale Radhika S. Saksena 1, Bruce Boghosian 2, Luis Fazendeiro 1, Owain A....

Real Science at the Petascale

Radhika S. Saksena1, Bruce Boghosian2,

Luis Fazendeiro1, Owain A. Kenway, Steven Manos1,

Marco Mazzeo1, S. Kashif Sadik1, James L. Suter1,

David Wright1 and Peter V. Coveney1

1. Centre for Computational Science, UCL, UK2. Tufts University, Boston, USA

2

• New era of petascale resources

• Scientific applications at petascale:

– Unstable periodic orbits in turbulence – Liquid crystalline rheology– Clay-polymer nanocomposites– HIV drug resistance– Patient specific haemodynamics

• Conclusions

Contents

3

New era of petascale machines

Ranger (TACC) - NSF funded SUN Cluster

• 0.58 petaflops (theoretical) peak: ~ 10 times HECToR (59 Tflops)“bigger” than all other TeraGrid resources combined

• Linpack speed 0.31 petaflops, 123TB memory

• Architecture: 82 racks; 1 rack = 4 chassis; 1 chassis = 12 nodes

• 1 node = Sun blade x6420 (four 16 bit AMD Opteron Quad-Core processors);

• 3,936 nodes = 62,976 cores

Intrepid (ALCF) - DOE funded BlueGene/P

• 0.56 petaflops (theoretical) peak

• 163,840 cores; 80TB memory

• Linpack speed 0.45 petaflops• “Fastest” machine available for open science and third in general1

1. http://www.top500.org/lists/2008/06

4


US firmly committed to path to petascale (and beyond)

NSF: Ranger (5 years, $59 million award)

University of Tennessee, to build system with just under 1PF

peak performance ($65 million, 5-year project)1

“Blue Waters” will come online in 2011 at NCSA ($208 grant), using

IBM technology – to deliver peak 10 Pflops performance

(~ 200K cores, 10PB of disk)

1. http://www.nsf.gov/news/news_summ.jsp?cntn_id=109850

5


• We wish to do new science at this scale – not just incremental advances

• Applications that scale linearly up to tens of thousands of cores

(large system sizes, many time steps) – capability computing at

petascale

• High throughput for “intermediate scale” applications (in the 128 – 512 core range)

6

Intercontinental HPC grid environment

AHELeeds

Manchester

Oxford

RAL

HPCx

UK NGS

PSCSDSC

NCSAUS TeraGrid

TACC (Ranger)

HECToR

DEISA

Lightpaths

ANL (Intrepid)

Massive data transfers

Advanced reservation/ co-scheduling

Emergency/pre-emptive access

7

JANET Lightpath is a centrally managed service which supports large research projects on the JANET network by providing end-to-end connectivity, from 100’s of Mb up to whole fibre wavelengths (10 Gb).

Typical usage – Dedicated 1Gb network to connect to

national and international HPC infrastructure– Shifting TB datasets between the

UK/US – Real-time visualisation – Interactive computational steering– Cross-site MPI runs (e.g. between

NGS2 Manchester and NGS2 Oxford)

Lightpaths - Dedicated 1 Gb UK/US network

8

Advanced reservations• Plan in advance to have access to the resources

Process of reserving multiple resources for use by a single application

- HARC1 - Highly Available Resource Co-Allocator - GUR2 - Grid Universal Remote

• Can reserve the resources:

– For the same time:• Distributed MPIg/MPICH-G2 jobs• Distributed visualization • Booking equipment (e.g. visualization facilities)

– Or some coordinated set of times– Computational workflows

Urgent computing and pre-emptive access (SPRUCE)

1. http://www.realitygrid.org/middleware.shtml#HARC

2. http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/TGIA64LinuxCluster/Doc/coschedule.html

9

Also available via the HARC API - can be easily built into Java applications.

Deployed on a number of systems- LONI (ducky, bluedawg, zeke, neptune IBM p5 clusters)- TeraGrid (NCSA, SDSC IA64 clusters, Lonestar, Ranger(?))- HPCx- North West Grid (UK)- National Grid Service - UK NGS - Manchester, Oxford, Leeds

Advanced reservations

10

• Middleware which simplifies access to distributed resources; manage workflows

• Wrestling with middleware can't be a limiting step for scientists - Hiding complexities of the ‘grid’ from the end user

• Applications are stateful Web services

• Application can consist of a coupled model, parameter sweep, steerable application, or a single executable

Application Hosting Environment

11

HYPO4D1 (Hydrodynamic periodic orbits in 4D)

Scientific goal: to identify and characterize periodic orbits in turbulent fluid

flow (from which exact time averages can be computed exactly) Uses lattice-Boltzmann method: highly scalable (linear scaling up to

at least 33K cores on Intrepid and close to linear up to 65K)

a) Ranger b) Intrepid + Surveyor (Blue Gene/P)

1. L. Fazendeiro et al. “A novel computational approach to turbulence”, AHM08

12

HYPO4D1 (Hydrodynamic periodic orbits in 4D) Novel approach to turbulence studies: efficiently parallelizes time and

space

Algorithm is extremely memory-intensive: full spacetime trajectories are

numerically relaxed to nearby minimum (unstable periodic orbit)

• Ranger is ideal resource for this work (123 TB of RAM)

• During early-user period millions

of time steps for different

systems simulated and

then compared for similarities

~ 9TB of data

1. L. Fazendeiro et al. “A novel computational approach to turbulence”, AHM08

13

LB3D1

LB3D -- three-dimensional lattice-Boltzmann solver for multi-component

fluid dynamics, in particular amphiphilic systems • Mature code - 9 years in development. It has been extensively used on

the US TeraGrid, UK NGS, HECToR and HPCx machines

• Largest model simulated to date is 20483 (needs Ranger)

1. R. S. Saksena et al. “Petascale lattice-Boltzmann simulations of amphiphilic liquid crystals”, AHM08

14

Cubic Phase Rheology Results1

2563 lattice-sites gyroidal system with multiple domains

Recent results include the tracking of large time-scale defect dynamics on 10243

lattice-sites systems; only possible on Ranger, due to sustained core count and disk storage requirements

• Regions of high stress magnitude are localized in the vicinity of defects

1. R. S. Saksena et al. “Petascale lattice-Boltzmann simulations of amphiphilic liquid crystals”, AHM08

15

LAMMPS1

Fully-atomistic simulations of clay-polymer nanocomposites on Ranger

More than 85 million atoms simulated

Clay mineral studies, with ~ 3 million atoms, 2-3 orders of magnitude greater than any previous study

Prospects: to include the edges of the clay (not periodic boundary) and do realistic-sized models – at least 100 million atoms (~2 weeks wall clock, using 4096 cores)

1. J Suter et al. Grid-Enabled Large-Scale Molecular Dynamics of Clay Nano-materials, AHM08

16

HIV-1 drug resistance1

Goal: to study the effect of anti- retroviral inhibitors (targetting proteins in the HIV lifecycle, such as viral protease and reverse- transcriptase enzymes)

High end computational power to confer clinical decision support On Ranger, up to 100 replicas (configurations) simulated, for the first time, in some cases going to 100 ns

3.5TB of trajectory and free

energy analysis

Energy differences of binding compared with experimental results for wildtype and MDR proteases with inhibitors LPV and RTV using 10ns trajectory.

1. K. Sadiq et al., “Rapid, Accurate and Automated Binding Free Energy Calculations of Ligand-Bound HIV Enzymes for Clinical Decision Support using HPC and Grid Resources”, AHM08

• 6 microseconds in four weeks

• AHE orchestrated workflows

17

GENIUS project1

Grid Enabled Neurosurgical Imaging Using Simulation (GENIUS)

• Scientific goal: to perform real time patient specific medical simulation

• Combines blood flow simulation with clinical data

• Fitting the computational time scale

to the clinical time scale:

• Capture the clinical workflow• Get results which will influence

clinical decisions: 1 day? 1 week?• GENIUS - 15 to 30 minutes

1. S. Manos et al., “Surgical Treatment for Neurovascular Pathologies Using Patient-specific Whole Cerebral Blood Flow Simulation”, AHM08

18

GENIUS project1 • Blood flow is simulated using lattice-Boltzmann method (HemeLB)• Parallel ray tracer doing real time in situ visualization• Sub-frames rendered on each MPI processor/rank and composited before

being sent over the network to a (lightweight) viewing client• Addition of volume rendering cuts down scalability of fluid solver due to

required global communications• Even so, datasets rendered at more than 30 frames per second (10242

pixel resolution)

1. S. Manos et al., “Surgical Treatment for Neurovascular Pathologies Using Patient-specific Whole Cerebral Blood Flow Simulation”, AHM08

19

CONCLUSIONS

• A wide range of scientific research activities were presented that make

effective use of the new range of petascale resources available in the USA

• These demonstrate the emergence of new science not possible without

access to this scale of resources

• Some existing techniques still hold however, such as MPI, as some of

these applications have shown, scaling linearly up to at least tens of

thousands of cores

• Future prospects: we are well placed to move onto next machines coming

online in the US and Japan

20

JANET/David SalmonNGS staffTeraGrid StaffSimon Clifford (CCS)Jay Bousseau (TACC) Lucas Wilson (TACC)Pete Beckmann (ANL)Ramesh Balakrishnan (ANL)Brian Toonen (ANL)Prof. Nicholas Karonis (ANL)

Acknowledgements

Date post:	28-Mar-2015
Category:	Documents
Upload:	anthony-reyes
View:	229 times
Download:	6 times

Real Science at the Petascale Radhika S. Saksena 1, Bruce Boghosian 2, Luis Fazendeiro 1, Owain A....

Documents