Date post: | 30-Mar-2015 |
Category: |
Documents |
Upload: | katarina-boice |
View: | 212 times |
Download: | 0 times |
Oak Ridge Leadership Computing Facility
Don MaxwellHPC Technical Coordinator
October 8, 2010
Presented To:HPC User Forum, Stuttgart
www.olcf.ornl.gov
2
Oak Ridge LeadershipComputing Facility
• Mission: Deploy and operate the computational resources required to tackle global challenges
– Providing world-class computational resources and specialized services for the most computationally intensive problems
– Providing stable hardware/software path of increasing scale to maximize productive applications development
– Deliver transforming discoveries in materials, biology, climate, energy technologies, etc.
– Provide the ability to investigate otherwise inaccessible systems, from supernovae to nuclear reactors to energy grid dynamics
2 Managed by UT-Battellefor the Department of Energy
3
Our vision for sustained leadershipand scientific impact
• Provide the world’s most powerful open resourcefor capability computing
• Follow a well-defined path for maintaining world leadershipin this critical area
• Attract the brightest talent and partnerships from all over the world
• Deliver cutting-edge science relevant to the missionsof DOE and key federal and state agencies
• Unique opportunity for multi-agencycollaboration for science basedon synergy of requirements and technology
44
With UT, we are NSF’s National Institute for Computational Sciences for academia
4 Managed by UT-Battellefor the Department of Energy
· 1 PF system to the UT-ORNL Joint Institute for Computational Sciences– Largest grant in UT history– Other partners: Texas Advanced Computing Center, National Center
for Atmospheric Research, ORAU, and core universities– 1 of up to 4 leading-edge computing systems planned
to increase the availability of computing resourcesto U.S. researchers
· A new phase in our relationship with UT– Computational Science Initiative– Governor’s Chair and joint faculty– Engagement with the scientific community– Research, education, and training mission
5
Peak performance 2.33 PF/s
Memory 300 TB
Disk bandwidth > 240 GB/s
Square feet 5,000
Power 7 MW
Oak Ridge National LaboratoryLeadership Computing Systems
Peak performance 1.03 PF/s
Memory 132 TB
Disk bandwidth > 50 GB/s
Square feet 2,300
Power 3 MW
Jaguar
Kraken
Peak Performance 1.1 PF/s
Memory 248 TB
Disk Bandwidth 104 GB/s
Square feet 1,600
Power 2.2 MWNOAA CMRS
World’s most powerful computer
NOAA’s most powerful computer
NSF’s most powerful computer
6
Jaguar History
Jan 2005XT3 Dev Cabinet
Mar 200510 Cabinet
Single Core
April 2005+30 XT3 Cabinets
Jun 2005+16 cabinets for total of 56 XT3
25TF
Nov 2006XT4 Dual Core
2.6GHz32 then 36
cabinets
July 2006XT3 Dual
Core 2.6 GHz50TF
March 2007XT3 and XT4
Combined for total of 124
cabinets100TF
May 2008XT4 68 cabinets
Quad Core 250TF
Dec 2008200 cabinet
Quad Core XT51PF
Nov 2009200 cabinet Six Core XT5
2PF
7
What is Jaguar Today?
Jaguar combines a 263 TF Cray XT4 system at ORNL’s OLCF with a 2,332 TF Cray XT5 to create a 2.5 PF systemSystem attribute XT5 XT4
AMD Opteron processors 37,376 Hex-core 7,832 Quad-core
Memory DIMMS 75,772 31,776
Node architecture Dual socket SMP Single Socket
Memory per core/node (GB) 1.3/16 2/8
Total system memory (TB) 300 62
Disk capacity (TB) 10,000 750
Disk bandwidth (GB/s) 240 44
Interconnect SeaStar2+ 3D torus
SeaStar2+3D torus
8
“Spider”: Center-wide High Speed Parallel File System
• “Spider” provides a shared, parallel file system for all systems
– Based on Lustre file system
• Demonstrated bandwidth of over 240 GB/s
• Over 10 PB of RAID-6 Capacity– DDN 9900 storage controllers with 8+2 disks per
RAID group– 13,440 1-TB SATA Drives
• 192 Dell PowerEdge Storage servers – 3 TB of memory
• Available from all systems via our high-performance scalable I/O network
– Over 3,000 InfiniBand ports– Over 3 miles of cables– Scales as storage grows
• Spider is the parallel file system for Jaguar
• Spider uses approximately 400 KW of power
9
Jaguar combines a 2.33 PF Cray XT5 with a 263 TF Cray XT4
System components are linked by 4×-DDR InfiniBand (IB) using three Cisco 7024D switches
• XT5 has 192 IB links
• XT4 has 48 IB links
• Spider has 192 IB links
Spider
Cray XT4
Cray XT5 External Logins
10
Building an Exabyte Archive
• Supercomputers addressing Grand Challenges need to quickly store massive amounts of data
• The High-Performance Storage System meets the big-storage demands of big science
• 25PB of Tape Storage
• Planning for 750PB by 2012
Stanley White, National Center for Computational Sciences
High-Performance Storage System adds capacity and speed
“Fifteen years ago, [national] labs realized they needed something of this size. They recognized Grand Challenge problems were coming up that would require petaflops of computing power. And they realized those jobs had to have a place to put the data.”
11
Scheduling to Maximize Capability Computing
Factor Unit of Weight
Actual Weight (Minutes) Value
Quality of Service # of days 1440Highest (90)
High (12)Medium (2)
Account Priority # of days 1440Allocated Project (1)
No Allocation (Staff) (0)No Hours (-365)
Job Size # of days 1440
0 (90)>120000 (15)
>80000 & <120000 (10)>40000 & <80000 (5)
<40000 (0)
Fairshare # of minutes 1440 (<>)5% user (+/-) 30 minutes (<>)10% acct (+/-) 1 hour
Queue Time 1 minute 1 Provided by Moab
Capability jobs get maximum priority and walltime
Jobs are prioritized using several factors to meet DOE goals and to provide flexibility
12
Job Failure Trends
2000 5000 40000 80000 120000 2250000.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
Failures Due to Hardware By Job Size
Cores
MPI Forum
OpenMPI
HWPOISON
13
ORNL’s Current and Planned Data Centers
Computational Sciences Building (40,000 ft2)
Maximum building power to 25 MW 6,600 ton chiller plant 1.5 MW UPS and 2.25 MW generator LEED Certified
Multiprogram Research Facility (30,000 ft2)
Capability computing for national defense 25 MW of power and 8,000 ton chillers LEED Gold Certification
Multiprogram Computing & Data Center (140,000 ft2)
Up to 100 MW of power Lights out facility Planned for LEED Gold certification
14
T. BarronD. DillowD. FullerR. GunasekaranS. Hicks5
Y. KimK. MatneyR. MillerS. Oral
National Center for Computational SciencesJ. Hack, Director
A. Bland, OLCF Project DirectorL. Gregg, Division Secretary
Operations CouncilW. McCrosky, Finance Officer
H. George, HR Rep.K. Carter, Recruiting
M. Richardson*, Facility Mgmt.M. Disney, ES&H Officer
R. Adamson, M. Disney, Cyber Security
D. LevermanD. Londo4
J. LothianD. Maxwell@M. McNamara4
J. Miller6
D. PelfreyG. Phipps, Jr.6
R. RayS. ShpanskiyC. St. PierreB. Tennessen4
K. ThachT. Watts4
S. WhiteC. Willis4
T. Wilson6
R. AdamsonM. BastJ. Becklehimer4
J. Breazeale6
J. Brown6
M. DisneyA. Enger4
C. EnglandJ. Evanko4
A. Funk4
D. Garman4
D. GilesM. Hermanson2
J. HillS. KochH. KuehnC. Leach6
High-PerformanceComputing Operations
A. BakerS. Allen
B. Mintz7
M. MathesonR. Mills5
B. Mintz7
H. NamG.Ostrouchov5
N. PodhorszkiD. PugmireR. Sisneros7
R. SankaranR. TchouaA. Tharrington#
R. Toedte
S. Ahern#
E. Apra5
R. H. BakerD. Banks3
M. BrownJ. DanielM. EisenbachM. FaheyJ. Gergel5S. Hampton7
W. Joubert#
S. Klasky#
A. Lopez-Bezanilla7
Q. Liu7
Scientific ComputingR. KendallA. Fields
Deputy Project DirectorK. Boudwin
B. Hammontree, Site PreparationJ. Rogers, Hardware Acquisition
R. Kendall, Test & Acceptance DevelopmentA. Baker, Commissioning
D. Hudson, Project ManagementK. Stelljes, Cray Project Director
Advisory CommitteeJ. DongarraT. DunningK. Droegemeier
S. KarinD. ReedJ. Tomkins
J. Levesque N. Wichmann J. LarkinD. Kiefer L. DeRose
Cray Supercomputing Center of Excellence
Application Performance
Tools5
R. GrahamT. Darland
R. BarrettW. BlandL. Broto7
O. HernandezS. HodsonT. JonesR. KellerG. KoenigJ. Kuehn
Chief Technology OfficerA. Geist
Director of OperationsJ. Rogers
OLCF System ArchitectS. Poole
Director of ScienceB. Messer, Acting
INCITE ProgramJ. White
Industrial PartnershipsS. Tichenor
User AssistanceAnd Outreach
A. BarkerA. Fields
J. BuchananJ. Eady5
D. FrederickC. FusonE. Gedenk1
B. Gajus5
M. GriffithS. HempflingJ. Hines#
S. JonesC. Kerns1
D. Levy5
M. MillerL. RaelB. RenaudC. Rockett1
D. Rose5
J. SmithW. Wade1
B. WhittenL. Williams5
B. Settlemyer5
D. SteinertJ. SimmonsV. Tipparaju5
S. Vazhkudai5F. WangV. WhiteZ. Zhang
Technology IntegrationG. ShipmanS. Mowery
1Student2Post Graduate3JICS4Cray, Inc.5Matrixed6Subcontract7 Post Doc*Acting# Task Lead@ Technical Coordinator
ORNL is managed and operated by UT-Battelle, LLC under contract
with the DOE.78 FTEs
15
Scientific Computing
15
Scientific Computing facilitates the delivery of leadership science by partnering with users to effectively utilize computational science, visualization and workflow technologies on OLCF resources through:
• Science team liaisons
• Developing, tuning, and scaling current and future applications
• Providing visualizations to present scientific results and augment discovery processes
16
We allocate time on the DOE systems through the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) Program
Provides awards to academic, government, and industry organizations worldwide needing large allocations of computer time, supporting resources, and data storage to pursue transformational advances in science and industrial competitiveness.
17
User Demographics
Active Usersby Sponsor
System time is allocated to each project. We do not charge for time except for proprietary work by commercial companies.
18
• Glimpse into dark matter • Supernovae ignition • Protein structure • Creation of biofuels • Replicating enzyme functions • Protein folding • Chemical catalyst design • Efficient coal gasifiers • Combustion • Algorithm development
• Global cloudiness • Regional earthquakes • Carbon sequestration • Airfoil optimization • Turbulent flow • Propulsor systems • Nano-devices • Batteries • Solar cells • Reactor design
Contact informationJulia C. White, INCITE Manager
Some INCITE research topics
Next INCITE Call for Proposals: April 2011
Awards for 1-, 2-, or 3- years
Average award > 20 million processor hours per year
Contact us about discretionary time for INCITE preparation
19
Three of six GB finalists ran on Jaguar
Gordon Bell Prize Awarded to ORNL Team
• A team led by ORNL’s Thomas Schulthess received the prestigious 2008 Association for Computing Machinery (ACM) Gordon Bell Prize at SC08
• For attaining fastest performance ever in a scientific supercomputing application
• Simulation of superconductors achieved 1.352 petaflops on ORNL’s Cray XT Jaguar supercomputer
• By modifying the algorithms and software design of the DCA++ code, the team was able to boost its performance tenfold
Gordon Bell Finalists DCA++ ORNL LS3DF LBNL SPECFEM3D SDSC• RHEA TACC• SPaSM LANL• VPIC LANL
UPDATE: with upgraded Jaguar, DCA++ has exceeded 1.9 PF
20
OLCF is working with users to produce scalable, high-performance apps for the petascale
Science Area Code Contact Cores Total
Performance Notes
Materials DCA++ Schulthess 213,120 1.9 PF* 2008 Gordon Bell Winner
Materials WL-LSMS Eisenbach 223,232 1.8 PF 2009 Gordon Bell Winner
Chemistry NWChem Apra 224,196 1.4 PF 2009 Gordon Bell Finalist
Nano Materials OMEN Klimeck 222,720 >1 PF
2010 Gordon Bell
Submission
Seismology SPECFEM3D Carrington 149,784 165 TF 2008 Gordon Bell Finalist
Weather WRF Michalakes 150,000 50 TF
Combustion S3D Chen 144,000 83 TF
Fusion GTC PPPL 102,000 20 billion Particles / sec
Materials LS3DF Lin-Wang Wang 147,456 442 TF 2008 Gordon
Bell Winner
Chemistry MADNESS Harrison 140,000 550+ TF
20 Managed by UT-Battellefor the U.S. Department of Energy
21
Scientific Progress at the Petascale
Nuclear EnergyHigh-fidelity predictive simulation tools for the design of next-generation nuclear reactors to safely increase operating margins.
Fusion EnergySubstantial progress in the understanding of anomalous electron energy loss in the National Spherical Torus Experiment (NSTX).
Nano ScienceUnderstanding the atomic and electronic properties of nanostructures in next-generation photovoltaic solar cell materials.
TurbulenceUnderstanding the statistical geometry of turbulent dispersion of pollutants in the environment.
Energy StorageUnderstanding the storage and flow of energy in next-generation nanostructured carbon tube supercapacitors
BiofuelsA comprehensive simulation model of lignocellulosic biomass to understand the bottleneck to sustainable and economical ethanol production.
21 Managed by UT-Battellefor the U.S. Department of Energy
22
Science Results
• Coherent transport simulations in band-to-band tunneling devices with simulation times of less than an hour => rapidly explore design space
• Incoherent transport simulations coupling all energies through phonon-interactions. Production runs on 70,000 cores in 12 hours=> first atomistic incoherent transport simulations
Science Objectives and Impact
• Identify next generation nano-transistor architectures, and reduce power consumption and increase manufacturability.
• Model, understand, and design carrier flow in nano-scale semiconductor transistors.
Nanoscience / nanotechnologyPetascale simulations of nano-electronic devices
Research Team
M. Luisier and G. Klimeck, Purdue University
3-year INCITE award, with 20 million hours in 2010
OMEN: 3D, 2D, and 1D atomistic devices
23
Science Results
Science Objectives and Impact
Computational Fluid DynamicsSmart-Truck Optimization
Research Team
Mike Henderson, BMI Corp.
Participant in the Industrial Partnerships Program
Unprecedented detail and accuracy of a Class 8 Tractor-Trailer aerodynamic simulation.• Minimizes drag associated with trailer underside• Compresses and accelerates incoming air flow and
injecting high energy air into trailer wake
=> UT-6 Trailer Under Tray System reduces Tractor/Trailer drag by 12%
• Apply advanced computational techniques from aerospace industry to substantially improve fuel efficiency and reduce emissions of trucks by reducing drag / increasing aerodynamic efficiency
• If all 1.3 million long haul trucks operated with the drag of a passenger car, the US would annually: Save 6.8 billion gallons of diesel Reduce 75 million tons CO2 Save $19 billion in fuel costs
Aerodynamic Performance Testing Methods - Jaguar CFD analysis of truck
and mirrors
24
Examples of OLCF Industrial ProjectsDeveloping new add-on parts to reduce drag and increase fuel efficiency of Class 8 (18-wheeler) long haul trucks. This will reduce fuel consumption by up to 3,700 gallons per truck per year, and reduce CO2 by up to 41 tons (82,000 lb) per truck per year. BMI using NASA FUN3D and NASA team is assisting BMI with code refinement (OLCF Director’s Discretionary Award)
Analyzing unsteady versus steady flows in low pressure turbomachinery and their potential effects on more energy efficient designs. (OLCF Director’s Discretionary Award)
Studying at the nano scale catalysts that can selectively produce hydrogen from biomass (hydrogen to be used as energy for fuel cells) (OLCF Director’s Discretionary Award)
Developing a unique CO2 compression technology for significantly lower cost carbon sequestration (ALCC award)
INCITE awards
25
• The U.S. Department of Energy requires exaflops computing by 2018 to meet the needs of the science communities that depend on leadership computing
• Our vision: Provide a series of increasingly powerful computer systems and work with user community to scale applications to each of the new computer systems
– OLCF-3 Project: New 10-20 petaflops computer based on early hybrid multi-core technology
10 Year Strategy:Moving to the Exascale
OLCF Roadmap from 10-year plan
300 PF System
1020 PF
2008 2009 2010 2011 2012 2013 2014 2015 2016
ORNL Extreme Scale Computing Facility(140,000 ft2)
2017
ORNL Computational Sciences Building
2018 2019
ORNL Multipurpose Research Facility
1 EF
OLCF-3Future systems
Today 2 PF, 6-core 1 PF
100 PF
26
• Similar number of cabinets, cabinet design, and cooling as Jaguar
• Operating system upgrade of today’s Cray Linux Environment
• New Gemini interconnect• 3-D Torus • Globally addressable
memory• Advanced synchronization
features• New accelerated node design using GPUs • 20 PF peak performance
• 9x performance of today’s XT5• 3x larger memory• 3x larger and 4x faster file system
OLCF-3 “Titan” System Description