Date post: | 28-Mar-2015 |
Category: |
Documents |
Upload: | anthony-reyes |
View: | 229 times |
Download: | 6 times |
Real Science at the Petascale
Radhika S. Saksena1, Bruce Boghosian2,
Luis Fazendeiro1, Owain A. Kenway, Steven Manos1,
Marco Mazzeo1, S. Kashif Sadik1, James L. Suter1,
David Wright1 and Peter V. Coveney1
1. Centre for Computational Science, UCL, UK2. Tufts University, Boston, USA
2
• New era of petascale resources
• Scientific applications at petascale:
– Unstable periodic orbits in turbulence – Liquid crystalline rheology– Clay-polymer nanocomposites– HIV drug resistance– Patient specific haemodynamics
• Conclusions
Contents
3
New era of petascale machines
Ranger (TACC) - NSF funded SUN Cluster
• 0.58 petaflops (theoretical) peak: ~ 10 times HECToR (59 Tflops)“bigger” than all other TeraGrid resources combined
• Linpack speed 0.31 petaflops, 123TB memory
• Architecture: 82 racks; 1 rack = 4 chassis; 1 chassis = 12 nodes
• 1 node = Sun blade x6420 (four 16 bit AMD Opteron Quad-Core processors);
• 3,936 nodes = 62,976 cores
Intrepid (ALCF) - DOE funded BlueGene/P
• 0.56 petaflops (theoretical) peak
• 163,840 cores; 80TB memory
• Linpack speed 0.45 petaflops• “Fastest” machine available for open science and third in general1
1. http://www.top500.org/lists/2008/06
4
New era of petascale machines
US firmly committed to path to petascale (and beyond)
NSF: Ranger (5 years, $59 million award)
University of Tennessee, to build system with just under 1PF
peak performance ($65 million, 5-year project)1
“Blue Waters” will come online in 2011 at NCSA ($208 grant), using
IBM technology – to deliver peak 10 Pflops performance
(~ 200K cores, 10PB of disk)
1. http://www.nsf.gov/news/news_summ.jsp?cntn_id=109850
5
New era of petascale machines
• We wish to do new science at this scale – not just incremental advances
• Applications that scale linearly up to tens of thousands of cores
(large system sizes, many time steps) – capability computing at
petascale
• High throughput for “intermediate scale” applications (in the 128 – 512 core range)
6
Intercontinental HPC grid environment
AHELeeds
Manchester
Oxford
RAL
HPCx
UK NGS
PSCSDSC
NCSAUS TeraGrid
TACC (Ranger)
HECToR
DEISA
Lightpaths
ANL (Intrepid)
Massive data transfers
Advanced reservation/ co-scheduling
Emergency/pre-emptive access
7
JANET Lightpath is a centrally managed service which supports large research projects on the JANET network by providing end-to-end connectivity, from 100’s of Mb up to whole fibre wavelengths (10 Gb).
Typical usage – Dedicated 1Gb network to connect to
national and international HPC infrastructure– Shifting TB datasets between the
UK/US – Real-time visualisation – Interactive computational steering– Cross-site MPI runs (e.g. between
NGS2 Manchester and NGS2 Oxford)
Lightpaths - Dedicated 1 Gb UK/US network
8
Advanced reservations• Plan in advance to have access to the resources
Process of reserving multiple resources for use by a single application
- HARC1 - Highly Available Resource Co-Allocator - GUR2 - Grid Universal Remote
• Can reserve the resources:
– For the same time:• Distributed MPIg/MPICH-G2 jobs• Distributed visualization • Booking equipment (e.g. visualization facilities)
– Or some coordinated set of times– Computational workflows
Urgent computing and pre-emptive access (SPRUCE)
1. http://www.realitygrid.org/middleware.shtml#HARC
2. http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/TGIA64LinuxCluster/Doc/coschedule.html
9
Also available via the HARC API - can be easily built into Java applications.
Deployed on a number of systems- LONI (ducky, bluedawg, zeke, neptune IBM p5 clusters)- TeraGrid (NCSA, SDSC IA64 clusters, Lonestar, Ranger(?))- HPCx- North West Grid (UK)- National Grid Service - UK NGS - Manchester, Oxford, Leeds
Advanced reservations
10
• Middleware which simplifies access to distributed resources; manage workflows
• Wrestling with middleware can't be a limiting step for scientists - Hiding complexities of the ‘grid’ from the end user
• Applications are stateful Web services
• Application can consist of a coupled model, parameter sweep, steerable application, or a single executable
Application Hosting Environment
11
HYPO4D1 (Hydrodynamic periodic orbits in 4D)
Scientific goal: to identify and characterize periodic orbits in turbulent fluid
flow (from which exact time averages can be computed exactly) Uses lattice-Boltzmann method: highly scalable (linear scaling up to
at least 33K cores on Intrepid and close to linear up to 65K)
a) Ranger b) Intrepid + Surveyor (Blue Gene/P)
1. L. Fazendeiro et al. “A novel computational approach to turbulence”, AHM08
12
HYPO4D1 (Hydrodynamic periodic orbits in 4D) Novel approach to turbulence studies: efficiently parallelizes time and
space
Algorithm is extremely memory-intensive: full spacetime trajectories are
numerically relaxed to nearby minimum (unstable periodic orbit)
• Ranger is ideal resource for this work (123 TB of RAM)
• During early-user period millions
of time steps for different
systems simulated and
then compared for similarities
~ 9TB of data
1. L. Fazendeiro et al. “A novel computational approach to turbulence”, AHM08
13
LB3D1
LB3D -- three-dimensional lattice-Boltzmann solver for multi-component
fluid dynamics, in particular amphiphilic systems • Mature code - 9 years in development. It has been extensively used on
the US TeraGrid, UK NGS, HECToR and HPCx machines
• Largest model simulated to date is 20483 (needs Ranger)
1. R. S. Saksena et al. “Petascale lattice-Boltzmann simulations of amphiphilic liquid crystals”, AHM08
14
Cubic Phase Rheology Results1
2563 lattice-sites gyroidal system with multiple domains
Recent results include the tracking of large time-scale defect dynamics on 10243
lattice-sites systems; only possible on Ranger, due to sustained core count and disk storage requirements
• Regions of high stress magnitude are localized in the vicinity of defects
1. R. S. Saksena et al. “Petascale lattice-Boltzmann simulations of amphiphilic liquid crystals”, AHM08
15
LAMMPS1
Fully-atomistic simulations of clay-polymer nanocomposites on Ranger
More than 85 million atoms simulated
Clay mineral studies, with ~ 3 million atoms, 2-3 orders of magnitude greater than any previous study
Prospects: to include the edges of the clay (not periodic boundary) and do realistic-sized models – at least 100 million atoms (~2 weeks wall clock, using 4096 cores)
1. J Suter et al. Grid-Enabled Large-Scale Molecular Dynamics of Clay Nano-materials, AHM08
16
HIV-1 drug resistance1
Goal: to study the effect of anti- retroviral inhibitors (targetting proteins in the HIV lifecycle, such as viral protease and reverse- transcriptase enzymes)
High end computational power to confer clinical decision support On Ranger, up to 100 replicas (configurations) simulated, for the first time, in some cases going to 100 ns
3.5TB of trajectory and free
energy analysis
Energy differences of binding compared with experimental results for wildtype and MDR proteases with inhibitors LPV and RTV using 10ns trajectory.
1. K. Sadiq et al., “Rapid, Accurate and Automated Binding Free Energy Calculations of Ligand-Bound HIV Enzymes for Clinical Decision Support using HPC and Grid Resources”, AHM08
• 6 microseconds in four weeks
• AHE orchestrated workflows
17
GENIUS project1
Grid Enabled Neurosurgical Imaging Using Simulation (GENIUS)
• Scientific goal: to perform real time patient specific medical simulation
• Combines blood flow simulation with clinical data
• Fitting the computational time scale
to the clinical time scale:
• Capture the clinical workflow• Get results which will influence
clinical decisions: 1 day? 1 week?• GENIUS - 15 to 30 minutes
1. S. Manos et al., “Surgical Treatment for Neurovascular Pathologies Using Patient-specific Whole Cerebral Blood Flow Simulation”, AHM08
18
GENIUS project1 • Blood flow is simulated using lattice-Boltzmann method (HemeLB)• Parallel ray tracer doing real time in situ visualization• Sub-frames rendered on each MPI processor/rank and composited before
being sent over the network to a (lightweight) viewing client• Addition of volume rendering cuts down scalability of fluid solver due to
required global communications• Even so, datasets rendered at more than 30 frames per second (10242
pixel resolution)
1. S. Manos et al., “Surgical Treatment for Neurovascular Pathologies Using Patient-specific Whole Cerebral Blood Flow Simulation”, AHM08
19
CONCLUSIONS
• A wide range of scientific research activities were presented that make
effective use of the new range of petascale resources available in the USA
• These demonstrate the emergence of new science not possible without
access to this scale of resources
• Some existing techniques still hold however, such as MPI, as some of
these applications have shown, scaling linearly up to at least tens of
thousands of cores
• Future prospects: we are well placed to move onto next machines coming
online in the US and Japan
20
JANET/David SalmonNGS staffTeraGrid StaffSimon Clifford (CCS)Jay Bousseau (TACC) Lucas Wilson (TACC)Pete Beckmann (ANL)Ramesh Balakrishnan (ANL)Brian Toonen (ANL)Prof. Nicholas Karonis (ANL)
Acknowledgements