+ All Categories
Home > Documents > Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform...

Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform...

Date post: 24-Mar-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
29
PRESENTED BY Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE- NA0003525. Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin Pedretti, Si Hammond Sandia National Laboratories env
Transcript
Page 1: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

P R E S E N T E D B Y

Sandia National Laboratories is a multimission laboratory managed and

operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell

International Inc., for the U.S. Department of Energy’s National Nuclear Security

Administration under contract DE-NA0003525.

Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC

Supercomputing

An d rew J . Yo un g e

PI s : Ja m e s H . L a ro s I I I , Kev in Ped re t t i , S i H a m m o n d

Sa n d ia Na t i o n a l L a bo rato r ie s

env

Page 2: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Outline

Overview of Vanguard Prototype HPC Architectures

Astra – Petascale ARM platform ATSE – Advanced Tri-lab Software Environment R&D Opportunities Conclusion

Page 3: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Vanguard Overview

Page 4: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Vanguard Program: Advanced Architecture Prototype Systems4

• Prove viability of advanced technologies for NNSA integrated codes, at scale

• Expand the HPC-ecosystem by developing emerging unproven technologies• Is it viable for future ATS/CTS platforms?• Increase technology AND integrator choices

• Buy down risk and increase technology and vendor choices for future platforms • Ability to accept higher risk allows for more/faster technology advancement• Lowers/eliminates mission risk and significantly reduces investment

• Jointly address hardware and software challenges

• First prototype platform targeting ARM

Page 5: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Where Vanguard Fits5

V a n g u a r dT e s t B e d s A T S / C T S P l a t f o r m s

H i g h e r R i s k , G r e a t e r A r c h i t e c t u r a l C h o i c e s

G r e a t e r S t a b i l i t y , L a r g e r S c a l e

Test Beds• Small testbeds

(~10-100 nodes)• Breadth of

architectures• Brave users

Vanguard• Larger-scale

experimental systems• Focused eforts to

mature new technologies• Broader user-base• Demonstrate viability

for production use• NNSA Tri-lab resource

ATS/CTS Platforms• Leadership-class systems

(Petascale, Exascale, ...)• Advanced technologies,

sometimes frst-of-kind• Broad user-base• Production use

Page 6: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Vanguard Phase 1: Astra

Page 7: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Hammer

APM/HPEXgene-1

Sullivan

Cavium/PenguinThunderX1

= 2018

= Retired

= 2015

= 2017

TODAY

Mayer

CaviumHPE/Comanche

ThunderX2

Future ASC

Platforms

Sept 2011

Astra

Sandia’s NNSA/ASC ARM Platforms7

Petascale ARM Platform

Delivery Aug/Sep 2018

HPE Apollo 70Cavium ThunderX2Mellanox ConnectX-

5Switch-IB22592 nodes

Cavium ThunderX

132 nodes

Pre-GA Cavium

ThunderX247 nodes

Applied MicroX-Gene-147 nodes

Vanguard

Page 8: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

8

Astra“Per aspera ad astra”

Demonstrate viability of ARM for U.S. DOE Supercomputnn

2.3 PFLOPs peak885 TB/s memory bandwidth peak332 TB memory1.2 MW

per aspera ad astra

through difficulties to the stars

Page 9: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Vanguard-Astra Compute Node Building Block9

Dual socket Cavium Thunder-X2 CN99xx 28 cores @ 2.0 GHz

8 DDR4 controllers per socket

One 8 GB DDR4-2666 dual-rank DIMM per controller

Mellanox EDR InfiniBandConnectX-5 VPI OCP

Tri-Lab Operating System Stackbased on RedHat 7.5+

HPE Apollo 70Cavium TX2 Node

Page 10: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Vanguard-Astra Compute Node10

Cavium Thunder-X2

ARM v8.128 cores @ 2.0

GHz

Cavium Thunder-X2

ARM v8.128 cores @ 2.0

GHz

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 GB DDR4-2666 DR

8 DDR4 channels/socket, 1 DIMM/channelEach socket has its own PCIe x8 link to NIC

Mellanox ConnectX-5 OCP Network

Interface

PCIe Gen3 PCIe Gen3

Management Ethernet1 Gbps

x8x8

1 EDR link, 100 Gbps 1 Gbps

Page 11: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Vanguard-Astra System Packaging11

HPE Apollo 70 Chassis: 4 nodes

Astra

18 chassis/rack

72 nodes/rack

3 IB switches/rack(one 36-port switch

per 6 chassis)

36 compute racks(9 scalable units, each 4 racks)

2592 compute nodes(5184 TX2 processors)

3 IB spine switches(each 540-port)

HPE Apollo 70 Rack

Page 12: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Vanguard-Astra Infrastructure12

Login & Service Nodes 4 login/compilation nodes3 Lustre routers to connect to external Sandia flesystem(s)2 general service nodes

Interconnect EDR InfniBand in fat tree topology2:1 oversubscribed for compute nodes1:1 full bandwidth for in-platform Lustre storage

System Management Dual HA management nodes runningHPE Performance Software – Cluster Manager (HPCM)Ethernet management network, connects to all nodesOne boot server per scalable unit (288 nodes)

In-platform Storage All-fash Lustre storage system403 TB usable capacity244 GB/s throughput

Page 13: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Network Topology13

540-Port Switch #2 540-Port Switch #3540-Port Switch #1

Switch 2.1 Switch 2.2 Switch 2.3 Switch 2.4 Switch 2.29 Switch 2.30

Switch 3.1 Switch 3.2 Switch 3.18...

... Switch 2.61 Switch 2.62 Switch 2.63 Switch 2.64 Switch 2.89 Switch 2.90

Switch 3.37 Switch 3.38 Switch 3.54...

...

Switch 1.1 Switch 1.2 Switch 1.3 Switch 1.4 Switch 1.5 Switch 1.6 Switch 1.7 Switch 1.8 ... Switch 1.105 Switch 1.06 Switch 1.107 Switch 1.108

24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes

Switch 2.31 Switch 2.32 Switch 2.33 Switch 2.34 Switch 2.59 Switch 2.60

Switch 3.19 Switch 3.20 Switch 3.36...

...

108 L1 switches * 24 nodes/switch = 2592 compute nodes

Mellanox Switch-IB2 EDR, Radix 36 switches, 3 level fat tree, 2:1 taper at L1

Each L1 switch has 4 linksto each 540-port switch

Page 14: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Vanguard-Astra Advanced Power & Cooling14

18.5C WB

20.0C 20.0C

1.5C approach

wall peak nominal (linpack) idle racks wall peak nominal (linpack) idle

Node racks 39888 35993 33805 6761 36 1436.0 1295.8 1217.0 243.4

MCS300 10500 7400 7400 170 12 126.0 88.8 88.8 2.0

Network 12624 10023 9021 9021 3 37.9 30.1 27.1 27.1

Storane 11520 10000 10000 1000 2 23.0 20.0 20.0 2.0

utlity 8640 5625 4500 450 1 8.6 5.6 4.5 0.5

1631.5 1440.3 1357.3 274.9

Projected power of the system by component

per consttuent rack type (W) total (kW)

Extreme Efciency: Total 1.2 MW in the 36 compute racks are

cooled by only 12 fan coils These coils are cooled without

compressors year round. No evaporatve water at all almost 6000 hours a year

99% of the compute racks heat never leaves the cabinet, yet the system doesn’t require the internal plumbinn of liquid disconnects and cold plates runninn across all CPUs and DIMMs

Builds on work by NREL and Sandia:htps://www.nrel.nov/esif/partnerships-jc.html

Page 15: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

ATSE – Advanced Tri-lab Software Environment

Page 16: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Advanced Tri-lab Software Environment Goals16

Build an open, modular, extensible, community-inspired, and vendor-adaptable ecosystem

Prototype new technologies that may improve the DOE ASC computing environment (e.g., ML frameworks, containers, VMs, etc)

Leverage existing efforts Tri-lab OS (TOSS) OpenHPC & other programming environments Exascale Computing Project (ECP) software technologies

Dec’17ATSE

Design Doc

Aug’17Tri-lab Arm software

team formed

Jul’18Initial

Release Target

Sep’18First Use on

Vanguard-Astra

ATSEstack

Page 17: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Tri-Lab Software Efort for ARM17

Accelerate ARM ecosystem for ASC computing Prove viability for ASC integrated codes running at scale Harden compilers, math libraries, tools, communication libraries

Heavily templated C++, Fortran 2003/2008, Gigabyte+ binaries, long compiles

Optimize performance, verify expected results

Build integrated software stack Programming environment (compilers, math libs, tools, MPI, OMP, SHMEM, I/O, ...) Low-level OS (optimized Linux, network, filesystems, containers/VMs, ...) Job scheduling and management (WLM, app launcher, user tools, ...) System management (boot, system monitoring, image management, ...)

Improve 0 to 60 time... ARM system arrival to useful work done

env

Page 18: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

ARM Tri-lab Software Environment (ATSE)

Vanguard Hardware

Base OSLayer

Closed Source Integrator ProvidedLimited Distribution ATSE Activity

Vendor OS TOSSOpen OS

e.g. OpenSUSE

Cluster Middlewaree.g. Lustre, SLURM

ATSE Programming Environment “Product” for VanguardPlatform-optimized builds, common-look-and-feel across platforms

Virtual MachinesATSE Packaging

User-facingProgramming Env

Native Installs

Containers

NNSA/ASC Application Portfolio

Open Source

Page 19: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Integrate Components from Many Sources

TOSSRHEL EPEL

Vendor Sofware

ATSE Packaner

OpenHPC

Open Build Server

ATSE Packanes

Vendor Sofware

Koji Build Server

ATSE Diagram (from SNL Feb 12 TOSS meeting)

Vanguard Hardware

Base OSLayer

Closed Source Intenrator ProvidedLimited Distribution ATSE Activity

Vendor OS TOSSOpen OS

e.g. OpenSUSE

Cluster Middlewaree.n. Lustre, SLURM

ATSE Programming Environment “Product” for VanguardPlatform-optimized builds, common-look-and-feel across platforms

Virtual MachinesATSE Packaninn

User-facinnPronramminn Env

Native InstallsContainers

NNSA/ASC Application Portfolio

Open Source

ATSE Activity

Closed Source

Integrator Provided

Limited Distribution

Open Source

Key:

Page 20: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Draft ATSE Timeline for 2018

MarchContinue software stack explorations and gap analysis on testbedsSetup OpenBuild server and replicate OHPC package builds for aarch64

April – MayDevelop ATSE Packager framework, ability to pull packages from TOSS, RHEL, OpenHPC OBS, vendor, and other

sourcesIdentify initial component list

JulyInitial ATSE release 2018.0 on Mayer

Lab-distribution version: TOSS BaseOS + (ATSE-GCC | ATSE-ARM | ATSE-*) Open-distribution version: SUSE and/or CentOS BaseOS + ATSE-GCC

Q3 2018 Linux kernel optimization and HPC patches Basic VM & container support

Q4 2018 ATSE 2018.1 release Initial upstream to OpenHPC push

Page 21: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Astra Status and Research

Page 22: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Acceptance Plan – Maturing the Stack22

env

Page 23: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Cavium Arm64 Providing Best-of-Class Memory Bandwidth

23

STREAM TRIAD

TX2 DDR4-2400SkyLake 8160

Trinity Haswell

Trinity KNL DDR

Page 24: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Network Bandwidth on ThunderX2 + Mellanox MLX5 EDR with Socket Direct

24

Node 1

MLX5 EDRMLX5_0 MLX5_3

Socket 1 Socket 2

Node 2

MLX5 EDRMLX5_3MLX5_0

Socket 2Socket 1

Pair 1

Pair 1

Pair 2

Pair 2

1 Network Link

Socket Direct – Each socket has dedicated path to the

NIC

OSU MPI Multi-Network Bandwidth

Arm64 + EDR providing> 12 GB/sbetween

nodes> 75M

messages/sec

Page 25: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Mini-App Performance on Cavium ThunderX225

ThunderX2 providing high memory bandwidth6 channels (Skylake) vs.

8 in ThunderX2See this in MiniFE SpMV and

STREAM Triad

Slower compute reflects less optimization in software stackExamples – Non-SpMV kernels in

MiniFE and LULESHGCC and ARM versus Intel

compiler MiniFE Solve GF/s MiniFE SpMV GF/s STREAM Triad LULESH Solve FOM0

0.5

1

1.5

2

2.5

Speedup over Haswell E5-2680v3

ThunderX2 Skylake 8160 Haswell E5-2680

Speedup o

ver

Hasw

ell

Page 26: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

R&D Areas26

Leverage containers and virtual machines Support for machine learning frameworks ARMv8.1 includes new virtualization extensions, SR-IOV Working with Singularity on full container solution

Evaluating parallel filesystems + I/O systems @ scale GlusterFS, Ceph, BeeGFS, Sandia Data Warehouse, …

Resilience studies over Astra lifetime

Improved MPI thread support, matching acceleration

OS optimizations for HPC @ scale Exploring HPC-tuned Linux kernels to non-Linux lightweight kernels and multi-kernels Arm-specific optimizations

Page 27: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Conclusion27

Vanguard allows the DOE to take necessary risks to ensure a healthy HPC ecosystem for future production mission platforms Increase technology choicesProve ability to run multi-physics production applications at scale

Tri-lab software stack effort to mature ARM for ASC computingHarden compilers, math libs, and toolsOptimize performance, verify expected results Increase modularity and openness of software stackSupport traditional HPC and emerging AI + ML workloads

Sandia now a member of Linaro HPC SIG!

Page 28: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Questions?

[email protected]

Page 29: Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC ...Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin

Virt - From x86 to ARM64

▪ What opportunites and challennes do we face when movinn from an x86 world to an ARM world?▪ Virtual Machines

▪ Near-natve HPC performance with VMs possible in x86– Type1 & Type2 hypervisors– Hobbes / Palacios VM

▪ Avoid lenacy issues with x86?▪ Anythinn new we can do with ARM?

▪ Containers▪ How do I build containers on my x86 laptop that run on Astra?▪ Focus on ABI compatbility

▪ Leverane industry/enterprise without losinn HPC focus

per aspera ad astra

through difficulties to the stars


Recommended