+ All Categories
Home > Documents > sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu,...

sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu,...

Date post: 16-Jun-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
76
Srinath Vadlamani, Field Application Engineer SEA, April 8, 2019 Arm HPC Ecosystem Hardware, Software and tools
Transcript
Page 1: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

Srinath Vadlamani, Field Application EngineerSEA, April 8, 2019

Arm HPC Ecosystem

Hardware, Software and tools

Page 2: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

2 © 2019 Arm Limited

Arm Technology Already Connects the World

Arm is ubiquitous

We design IP, not manufacture chips

Partners build products for their target markets

One size is not always the best fit for all

HPC is a great fit for co-design and collaboration

Partnership is key Choice is good

21 billion chips sold by partners in 2017 alone

Mobile/Embedded/IoT/Automotive/Server/GPUs

Page 3: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

3 © 2019 Arm Limited

Armv8-A Architecture Evolution

RISC architecture§ Only have 32 bits available for encoding all instructions§ Supports the development of efficient implementations

64-bit capable since 2012§ Known as AArch64 (or AArch32 when run in a 32-bit mode)§ 128-bit vector unit (aka NEON Advanced SIMD)

• Atomic memory ops• Type2 hypervisor support

• Half-precision float• RAS support• Statistical profiling

• Pointer authentication• Nested virtualization• Complex float

• AArch64 execution state• A64 instruction set

Page 4: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

4 © 2019 Arm Limited

Arm business model

Arm develops technology that is licensed to semiconductor companies.Arm receives an upfront license fee and a royalty on every chip that contains its technology.

Business Development

Arm licenses technology to Partner

Technology

License Fee

Per-Chip Royalty

OEM sells consumer products

SemiCoPartner

Partner develops

chips

OEM Customer

Page 5: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

5 © 2019 Arm Limited

CPU Engagement Models With Arm

Architecture license

Core license

Architecture License

Partner designs complete CPU microarchitecture from scratch

• Clean room – no reference to Arm core designs

Freedom to develop any design • Must conform to the rules &

programmers model of a given architecture variant

• Must pass Arm architecture validation to preserve software compatibility

Long term strategic investment

Core License

Partner licenses complete microarchitecture design

§ Wide choices available§ Many different A, R & M products

CPU differentiation through: § Flexible configuration options§ Wide implementation envelope with

different process technologies

Range of licensing & engagement models possible

Standard CPU Proprietary CPU

Page 6: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

6 © 2019 Arm Limited

HPC on Arm – What’s new in 2018/19

• Marvell ThunderX2 now GA• Fujitsu announced details of A64FX (with SVE) for Post-K • Arm announces Neoverse brand for infrastructure and core IP roadmap (Ares, Zeus, Poseidon) with

each generation delivering 30% perf boost. N1 platform details announced.

Powerful hardware for now and future

• Three mature toolchains available –Arm Commercial, GNU and Cray CE• ISVs start porting to Arm – Altair RADIOSS, ANSYS Fluent and LS-DYNA

Mature toolchains and ISV Software

• New deployments across the EU and USA• USA - Sandia Astra (Top 500), Comanche Clusters• EU – Catalyst and Isambard in UK, GENCI and Dibona (MontBlanc 3) in France

Deployments

Page 7: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

Arm Hardware for Infrastrucutre

(including HPC)

Page 8: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

8 © 2019 Arm Limited

AWS Graviton by Amazon

Page 9: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

9 © 2019 Arm Limited

AWS Graviton by Amazon

Page 10: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

10 © 2019 Arm Limited

Huawei unveils KunPeng 920 CPU and TaiShan ServersIndustry’s Highest Performance 2.6GHz 64-cores 7nm based ARM v8 Server SoC & Servers

TaiShan 5280/5290Storage Server

TaiShan 2280Balanced Server

TaiShan X6000High-Density Server

Big Data, Distributed Storage and Arm-Native applications

“Use ARM-based CPU in areas like cloud and servers where they are better.” – William XU, Chief Strategy Marketing Officer, Huawei

Page 11: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

The Cloud to Edge Infrastructure Foundationfor a World of 1T Intelligent Devices

The Cloud to Edge Infrastructure Foundationfor a World of 1T Intelligent Devices

Page 12: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

© 2018 Arm Limited

Broad SoC system design options within Arm EcosystemArm Architectural designArm IP

Custom Arm High performance CPUCustom Fabric & IP

Accelerators

High performance CPUs

Foundry

Memory

Data plane CPUs

IO

TSMC 7FF, Samsung 7LPP, UMC

CMN FabricOther IP

ML, on-die FPGANetworking, security, encryptionVideo, Custom

DDR, HBM, Flash, Storage Class memory

PCIe, CCIX, 100G+ ethernet

Common Software Platform and EcosystemArm Architecture v8.x-A

Page 13: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

13 © 2019 Arm Limited

Arm IP : Commitment to Infrastructure segment

16nm

(A72, A75)

CosmosPlatform

7nm

Ares (N1)Platform

7nm+

ZeusPlatform

PoseidonPlatform

5nm

~30% per Gen Faster Performance & New Features

2021

2020

2019Today

Page 14: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

Confidential © 2018 Arm Limited 14

Revolutionary compute performance

Platform features specific to infrastructure

Extreme range of scale and diversity of compute

Accelerating the transformation to a scalable cloud to edge infrastructureNeoverse N1 platform

Page 15: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

Confidential © 2018 Arm Limited 15

2.5xMemcacheD

2.5xNGINX 1.7x

Java*

Neoverse N1 platform: Revolutionary compute performance

Improved cloud to edge TCO through revolutionary workload performance

Data shown for Neoverse N1 has been collected/projected from an array of platforms, and relative to Cortex A72 ”Cosmos”*Based on an industry standard Java-based benchmark

Page 16: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

Arm Hardware for HPC

Page 17: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

17 © 2019 Arm Limited

Arm Architecture Partner SoC for HPCAvailable or Announced in 2018-19

Page 18: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

HPC Software Ecosystem

Page 19: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

19 © 2019 Arm Limited

Cluster Managem

ent Tools:Bright, HPE CM

U, xCat, Warew

ulf

Arm HPC Ecosystem – Overview

Silicon Suppliers:Marvell, Fujitsu,

Mellanox Linux OS Distro of choice:RHEL, SUSE, CENTOS,…

Arm Server Ready Platform:Standard OS compatible FW and RAS features

HPC Applications: Open-source, Owned, and Commercial ISV codes

Job schedulers and Resource Management:

SLURM, IBM LSF, Altair PBS Pro,

etc.HPC Programming

Languages:Fortran, C, C++

via GNU, LLVM, Arm

& OEMs

Debug and performance analysis tools:

Arm Forge, Rogue Wave,

TAU, etc.

Filesystems: BeeGFS,

LUSTRE, ZFS, HDFS, GPFS

App/ISA specific optimizations, optimized libs and intrinsics:Arm PL, BLAS, FFTW, etc.

OEM/ODM’s:Cray, HPE, ATOS-Bull, Fujitsu, Gigabyte, Inventec, Foxconn

Communication Stacks and run-times:Mellanox IB/OFED/HPC-X, OpenMPI, MPICH, MVAPICH2, OpenSHMEM, OpenUCX, HPE MPI

Parallelism standards:OpenMP

(omp / gomp), MPI, SHMEM(see below)

User-space utilities, scripting,

container, and other packages:

Singularity,Openstack, OpenHPC,

Python, NumPy, SciPy, etc.

Page 20: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

20 © 2019 Arm Limited

Common HPC applications now available

Build recipes online at https://gitlab.com/arm-hpc/packages/wikis/home

LAMMPS CESM2 MrBayes Bowtie

AMBER Paraview SIESTA UMNAMD

VASP MILCWRF GEANT4Quantum ESPRESSO

DL-Poly NEMOGAMESSOpenFOAM VisIT

QMCPACKAbinitBLAST NWCHEM BWA

GROMACS

Chem/Phys Weather CFD Visualization Genomics

Page 21: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

21 © 2019 Arm Limited

ISVs codes on Arm

Porting underway Available

Page 22: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

22 © 2019 Arm Limited

: Typical HPC packages available for Arm

OpenHPC is a community effort to provide a common, verified set of open source packages for HPC deployments

Arm and partners actively involved:• Arm is a silver member of OpenHPC• Linaro is on Technical Steering Committee• Arm-based machines in the OpenHPC build

infrastructure

Status: 1.3.6 release out now• Packages built on Armv8-A for CentOS and SUSE

Functional Areas Components include

Base OS CentOS 7.5, SLES 12 SP3

Administrative Tools

Conman, Ganglia, Lmod, LosF, Nagios, pdsh, pdsh-mod-slurm, prun, EasyBuild, ClusterShell, mrsh, Genders, Shine, test-suite

Provisioning Warewulf

Resource Mgmt. SLURM, Munge

I/O Services Lustre client (community version)

Numerical/Scientific Libraries

Boost, GSL, FFTW, Metis, PETSc, Trilinos, Hypre, SuperLU, SuperLU_Dist,Mumps, OpenBLAS, Scalapack, SLEPc, PLASMA, ptScotch

I/O Libraries HDF5 (pHDF5), NetCDF (including C++ and Fortran interfaces), Adios

Compiler Families GNU (gcc, g++, gfortran), LLVM

MPI Families OpenMPI, MPICH

Development Tools Autotools (autoconf, automake, libtool), Cmake, Valgrind,R, SciPy/NumPy, hwloc

Performance Tools PAPI, IMB, pdtoolkit, TAU, Scalasca, Score-P, SIONLib

Page 23: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

23 © 2019 Arm Limited

Arm HPC Ecosystem website: www.arm.com/hpc

Starting point for developers and end-users of Arm for HPC

Latest events, news, blogs, and collateral including

whitepapers, webinars, and presentations

Links to HPC open-source & commercial SW packages

Guides for porting HPC applications

Quick-start guides to Arm tools

Links to community collaboration sites

Curated and moderated by Arm

Page 24: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

24 © 2019 Arm Limited

Arm HPC Community: community.arm.com/tools/hpc/HPC Community-driven Content

Blogs by Arm and our HPC community

Calendar of upcoming events such as workshops and webinars

HPC Forum with questions & posts curated and moderated by Arm HPC technical specialists

Ask, answer, share progress and expertise

Page 25: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

25 © 2019 Arm Limited

Arm HPC Packages wikiwww.gitlab.com/arm-hpc/packages/wikis

• Dynamic list of common HPC packages• Status and porting recipes• Community driven• Anyone can join and contribute• Provides focus for porting progress• Allows developers to share and learn

Page 26: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

26 © 2019 Arm Limited

Open source libraries for helping increase performance

Arm Optimized Routineshttps://github.com/ARM-software/optimized-routinesThese routines provide high performing versions of many math.h functions• Algorithmically better performance than

standard library calls• No loss of accuracy

SLEEF libraryhttps://github.com/shibatch/sleef/Vectorized math.h functions• Provided as an option for use in Arm Compiler

Perf-libs-toolshttps://github.com/ARM-software/perf-libs-tools

Understanding an application’s needs for BLAS, LAPACK and FFT calls• Used in conjunction with Arm Performance

Libraries can generate logging info to help profile applications for specific case breakdowns

Example visualization: DGEMM cases called

Page 27: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

Arm HPC deployments

Page 28: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

28 © 2019 Arm Limited

Deployments

Page 29: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

29 © 2019 Arm Limited

Arm Supercomputer Makes Top500 List!

“Astra, the world’s fastest Arm-basedsupercomputer according to the TOP500 list, has achieved a speed of 1.529 petaflops, placing it 203rd on a ranking of top computers …”

Page 30: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

30 © 2019 Arm Limited

Vanguard Astra at SandiaMOST POWERFUL ARM SUPERCOMPUTER, IN TOP 500 (#203 in HPL and #36 in HPCG)

• 2,592 HPE Apollo 70 compute nodes• 5,184 CPUs, 145,152 cores, 2.3 PFLOPs (peak)

• Marvell ThunderX2 Arm SoC, 28 core, 2.0 GHz

• Memory per node: 128 GB (16 x 8 GB DR DIMMs) • Aggregate capacity: 332 TB, 885 TB/s (peak)

• Mellanox IB EDR, ConnectX-5 • 112 36-port edges, 3 648-port spine switches

• Red Hat RHEL for Arm

• HPE Apollo 4520 All–flash Lustre storage• Storage Capacity: 403 TB (usable)• Storage Bandwidth: 244 GB/s

Page 31: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

31 © 2019 Arm Limited

Isambard in Production at Bristol/GW4Largest EU Arm HPC cluster to date

• Cray XC50 system w/ 168 nodes with Marvell ThunderX2 (32C)

• 10,752 total cores• High-speed ARIES interconnect• Cray HPC SW Stack including CCE, CrayPAT, Cray

MPI, libs, ...• Production deployment reached @ SC18

Page 32: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

32 © 2019 Arm Limited

Deployments: Catalyst UK

• HPE, in conjunction with Arm and SUSE, announced in April the “Catalyst UK” program: deployments to accelerate the growth of the Arm HPCecosystem into three universities

• Each machine will have:

• 64 HPE Apollo 70 systems, each with two 32-core Cavium ThunderX2 processors (i.e. 4096 cores per system), 128GB of memory and Mellanox InfiniBand interconnects

• SUSE Linux Enterprise Server for HPC

Bristol: VASP, CASTEP, Gromacs, CP2K, Unified Model, Hydra, NAMD, Oasis, NEMO, OpenIFS, CASINO, LAMMPS

EPCC: WRF, OpenFOAM, Rolls Royce Hydra opt, 2 PhD candidates

Leicester: Data-intensive apps, genomics, MOAB Torque, DiRACcollab

Page 33: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

34 © 2019 Arm Limited

Deployment: Mont Blanc

Page 34: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

35 © 2019 Arm Limited

Deployments: HPE’s Comanche CollaborationEarly access to Cavium ThunderX2 systems that became Apollo 70

Engagements in HPE Comanche program have accelerated adoption• We have been able to assess the state of fundamental software

stacks, such as MPI and NUMA capabilities• Collaborative work here especially great with all partners focusing on

interoperability issues• Examples include fixing bugs with kernels, MPI drivers and OpenMP

thread placement• Optimization of packages, environment and execution configurations

Over 1,000 processors delivered | LLNL TOSS stack ported and demoed | InfiniBand optimized

Page 35: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

Performance results on Arm

Page 36: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

37 © 2019 Arm Limited

Page 37: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

38 © 2019 Arm Limited

Single node results from GENCI - France

Page 38: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

39 © 2019 Arm Limited

Isambard, UK – Single node results

Page 39: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

40 © 2019 Arm Limited

Isambard, UK – Multi-node resultsGromacs (42M atoms) on Horizon (Intel Skylake, 20C) vs Isambard (Marvell ThunderX2, 32C)

Page 40: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

Commercial Tools for HPC by Arm

Page 41: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

42 © 2019 Arm Limited

Our Solution for any Architecture, at any ScaleCommercial tools for AArch64, x86_64, ppc64 and accelerators

Arm Performance Libraries

BLAS, LAPACK and FFT

Arm HPC Compiler

Linux user space compiler for HPC applications

Arm Performance Reports

Interoperable application performance insight

All-inclusive development toolkit for Arm hardware

Arm Forge Professional

Multi-node interoperable profiler and debugger

Arm MAP

Speed-up applications with a lightweight scalable profiler

Arm DDT

Slash your time to debug on any hardware, at any scale.

Arm Performance Reports

Find the most efficient settings for your workloads.

Arm Cross-Platforms Tools Debug, optimise and analyse any platform

Arm Forge Professional

Arm DDT and MAP in One Single Package

Page 42: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

43 © 2019 Arm Limited

Arm Allinea StudioBuilt for developers to achieve best performance on Arm with minimal effort

Comprehensive and integrated tool suite for Scientific computing, HPC and Enterprise developersSeamless end-to-end workflow from getting started to advanced optimization of your workloadsCommercially supported by Arm engineersFrequent releases with continuous performance improvementsReady for current and future generations of server-class Arm-based platformsAvailable for a wide-variety of Arm-based server-class platforms

Page 43: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

44 © 2019 Arm Limited

Arm Performance LibrariesBLAS, LAPLACK, FFT

Scalar and vector math functions

Arm DDT Cross-platform parallel debugger

Meets the requirements of HPC developers on Arm

Profile

Develop and buildDebug

Optimize

Arm Linux CompilerFor C, C++ and Fortran codes

Arm MAPCross-platform lightweight profilerArm Performance ReportsMaximize System Efficiency

Page 44: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

45 © 2019 Arm Limited

arm Allinea StudioA quick glance at what is in Arm Allinea Studio

C/C++ Compiler• C++ 14 support• OpenMP 4.5 without

offloading• SVE ready

Fortran Compiler• Fortran 2003 support• Partial Fortran 2008

support• OpenMP 3.1• SVE ready

Performance Libraries• Optimized math libraries• BLAS, LAPACK and FFT• Threaded parallelism with

OpenMP• Scalar math routines

Forge (DDT and MAP)• Profile, Tune and Debug• Scalable debugging with

DDT• Parallel Profiling with MAP

Performance Reports• Analyze your application• Memory, MPI, Threads,

I/O, CPU metrics

Tuned by Arm for a wide-range of server-class Arm-based platforms

Page 45: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

46 © 2019 Arm Limited

Progress in the last year

A fully integrated tools suite for deployment on Arm systems

Arm C/C++ Compiler

• Porting and tuning

guides for common

applications

• Optimizations and bug

fixes

Arm Fortran

Compiler

• New Fortran Directives

• Improved Fortran 2008

support

• Support for vectorization

of loops with math calls

Arm Perf Libraries

• BLAS, FFT and LAPACK

Improvements

• Sparse routine SPMV

support

• Scalar math routines

Forge and Perf

Reports

• General cross-platform

improvements

• Python profiling

• Better interop with Arm

Compiler and Libraries

GNU8 toolchain

• GCC and Gfortran

• 2nd

toolchain in the

studio

• Better suited for certain

applications

• Beta support for HPC

users

Support and tuning for Arm server-class platforms

Page 46: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

47 © 2019 Arm Limited

Commercial C/C++/Fortran compiler with best-in-class performance

Tuned for Scientific Computing, HPC and Enterprise workloads• Processor-specific optimizations for various server-class Arm-based platforms• Optimal shared-memory parallelism using latest Arm-optimized OpenMP runtime

Linux user-space compiler with latest features• C++ 14 and Fortran 2003 language support with OpenMP 4.5• Support for Armv8-A and SVE architecture extension• Based on LLVM and Flang, leading open-source compiler projects

Commercially supported by Arm • Available for a wide range of Arm-based platforms running leading Linux

distributions – RedHat, SUSE and Ubuntu

Compilers tuned for Scientific Computing and HPC

Latest features and performance optimizations

Commercially supported by Arm

Page 47: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

48 © 2019 Arm Limited

C/C++ Frontend

Fortran Frontend

Optimizer Armv8-A code-gen

SVE code-gen

Clang based LLVM based

Flang based

Enhanced optimization for Armv8-A and SVE

C/C++ Files (.c/.cpp)

Fortran Files (.f/.f90)

Arm C/C++/Fortran Compiler

Armv8-A binary

SVEbinary

LLVM IR LLVM IRIR Optimizations

Auto-vectorization

LLVM based

LLVM based

Language specific frontend Architecture specific backendLanguage agnostic optimization

Arm Compiler – Building on LLVM, Clang and Flang projects

Page 48: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

49 © 2019 Arm Limited

Arm Linux Compiler – What’s new in 2018/19?

• For Arm platforms for current generation (Marvell ThunderX2) and future (SVE based)

• Base compiler technology upgrade (Clang/LLVM 7, GNU8, Latest Flang)• Vectorization of loops with math function calls

Overall - Better code generation

• Enable key Fortran applications (open source, in house and commercial)• Improved auto vectorization • Fortran vectorization directives like IVDEP

Fortran – Increase in maturity

Page 49: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

50 © 2019 Arm Limited

Optimized BLAS, LAPACK and FFT

Commercial 64-bit Armv8-A math libraries • Commonly used low-level math routines - BLAS, LAPACK and FFT• Provides FFTW compatible interface for FFT routines• Batched BLAS support

Best-in-class serial and parallel performance• Generic Armv8-A optimizations by Arm• Tuning for specific platforms like Cavium ThunderX2 in collaboration with

silicon vendors

Validated and supported by Arm• Available for a wide range of server-class Arm-based platforms• Validated with NAG’s test suite, a de-facto standard

Best in class performance

Validated with NAG test suite

Commercially supportedby Arm

Page 50: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

51 © 2019 Arm Limited

Arm Performance Libraries progressProgress and additions since SC17

Key improvements in since 18.0

• Massive improvements in FFT performance• All basic, advanced and guru interface FFTW

calls now supported• Many functions have had extra serial and

parallel performance improvements targetingThunderX2

• Addition of libamath• High performing implementations of certain

key math.h functions

New features in 19.0

• Sparse linear algebra for higher performing SpMV calls

• FFTW MPI interface for FFT calls added• Parallelisation of many FFTW plans• Parallel scaling improvements, especially for

ThunderX2• Particular focus on GEMMs and POTRF, GETRF

and GETQR

Page 51: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

53 © 2019 Arm Limited

Compiler and Libraries - Future roadmapFocus on current and next generation hardware

Libraries : Vector Math routines and more scalar math routines

Fortran Compiler : Directives & new Fortran 2008/OpenMP features

All compilers : Vectorization and optimization report improvements

More features in compilers and libraries

• Application specific tuning and optimization

• For Marvell ThunderX2 and other server-class Arm-based platforms

More optimizations for current hardware

• SVE enabled Performance Libraries

• Application specific tuning and optimization in Compilers and Libraries for SVE

Getting ready for SVE-based future hardware

Page 52: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

Toolchain performance

results

Page 53: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

65 © 2019 Arm Limited

Arm Compiler and Libraries – 19.1 releaseProgress and additions since SC18 (19.0 release)

Arm C/C++/Fortran Compilers

• Fortran: TRAILZ intrinsic, a Fortran 2008 feature,now supported

• Fortran: Runtime I/O performance improvementwhen handling formatted text data

• Fortran: New UNROLL directive to provideunrolling hints to the compiler

• Bug fixes

Arm Perf Libraries

• BLAS - Improved GEMV and GEMM (SCZ variants)

• FTW Fortran MPI interface now supported

• FFT MPI parallel scaling has been improved.

• SpMV - Support for CSC and COO formats; Improved single-precision performance; Fortran Interface now supported.

• Math routines (in libamath) – Vector routines support with optimized logf and expf; Arm Compiler uses libamath by default; A GNU compatible version provided

Page 54: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

66 © 2019 Arm Limited

BLAS improvements to many GEMM routines in 19.1Shown below: CGEMM on Marvell ThunderX2 run using 56 threads

0

200

400

600

800

1000

1200

1400

1600

0 1000 2000 3000 4000 5000

Perfo

rman

ce, G

FLO

Ps

Matrix size, M=N=K

CGEMM on 56 ThunderX2 threads

19.0

19.1

Page 55: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

67 © 2019 Arm Limited

BLAS improvements to GEMV routines in 19.1All cases improved for both serial and parallel. Comparison shown on ThunderX2 for serial SGEMV and DGEMV against OpenBLAS

0

2000

4000

6000

8000

10000

12000

14000

16000

0 2000 4000 6000 8000 10000

Perf

orm

ance

, MFL

OPs

Matrix size, M=N

SGEMV on ThunderX2OpenBLAS - NOpenBLAS - TArm PL 19.1 - NArm PL 19.1 - T

0

1000

2000

3000

4000

5000

6000

7000

0 2000 4000 6000 8000 10000

Perf

orm

ance

, MFL

OPs

Matrix size, M=N

DGEMV on ThunderX2

OpenBLAS - NOpenBLAS - TArm PL 19.1 - NArm PL 19.1 - T

Page 56: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

68 © 2019 Arm Limited

FFT MPI performance in 19.1Scaling using FFTW MPI interface improved; now similar scaling to FFTW

1

10

100

1000

1 2 4 8 16 32 64

Solu

tion

time

(s)

Number of MPI processes

FFT MPI performance on ThunderX23-d case: 1024x1024x1024

ArmPL 19.1FFTW 3.3.8Perfect scaling

0.1

1

10

100

1 2 4 8 16 32 64

Solu

tion

time

(s)

Number of MPI processes

FFT MPI performance on ThunderX23-d case: 512x512x512

ArmPL 19.1FFTW 3.3.8Perfect scaling

Page 57: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

69 © 2019 Arm Limited

Libamath – increased performance for math.h functionsELEFUNT run on ThunderX2: cases no libamath, Arm Compiler with libamath 19.0 and 19.1

0

50

100

150

200

250

300

350

ALOG EXP PWR SIN NINT DLOG DEXP DPWR DSIN DNINT DRECIPerf

orm

ance

-pe

rcen

tage

of 1

9.0

Math performance measured by ELEFUNT

gfortran/libm

libamath 19.0

libamath 19.1

Page 58: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

Cross-Platform tools

Arm Forge and Arm Performance Reports

Page 59: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

71 © 2019 Arm Limited

Interoperable

Available on the vast majority of HPC platforms, including

AMD, IBM, Intel, Nvidia… and of course Arm!

Performant

Fast, lightweight and transparent tools that help focus on

the real issues that count

Comprehensive

Packed with the best features to slash the development

overhead spent on debugging and optimising issues

By Choosing Arm, You Choose a State-of-the-art Solution

Page 60: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

72 © 2019 Arm Limited

Arm Forge ProfessionalA cross-platform toolkit for debugging and profiling

The de-facto standard for HPC development• Available on the vast majority of the Top500 machines in the world• Fully supported by Arm on x86, IBM Power, Nvidia GPUs, etc.

State-of-the art debugging and profiling capabilities• Powerful and in-depth error detection mechanisms (including memory

debugging)• Sampling-based profiler to identify and understand bottlenecks• Available at any scale (from serial to petaflopic applications)

Easy to use by everyone• Unique capabilities to simplify remote interactive sessions• Innovative approach to present quintessential information to users

Very user-friendly

Fully Scalable

Commercially supportedby Arm

Page 61: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

73 © 2019 Arm Limited

Arm Performance ReportsCharacterize and understand the performance of HPC application runs

Gathers a rich set of data• Analyses metrics around CPU, memory, IO, hardware counters, etc.• Possibility for users to add their own metrics

Build a culture of application performance & efficiency awareness• Analyses data and reports the information that matters to users • Provides simple guidance to help improve workloads’ efficiency

Adds value to typical users’ workflows• Define application behaviour and performance expectations• Integrate outputs to various systems for validation (e.g. continuous

integration)• Can be automated completely (no user intervention)

Relevant adviceto avoid pitfalls

Accurate and astuteinsight

Commercially supportedby Arm

Page 62: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

74 © 2019 Arm Limited

Key highlights in Forge & Performance ReportsLatest 19.0 version released in Dec 2018

Forge Performance ReportsDDT MAP

PackagingCreation of Allinea StudioA new solution for aarch64 platforms that includes

the Arm Compiler, Arm Performance Libraries, and the former Allinea tools!

PlatformsFull support for IBM systemsArm v8 supportCUDA 9 support

Full support for IBM systemsArm v8 supportCUDA 9 support

Full support for IBM systemsArm v8 supportCUDA 9 support

ImprovementsUsability ImprovementsMemory debugging optimizations

Optimizations for many-core systems

Optimizations for many-core systems

New Features Combined C/C++/Fortran and Python Debugging

Python profilingBackfill Custom MetricsOn-kernel GPU profilingAbility to profile selected ranks

Python performance analysisAbility to profile selected ranks

Page 63: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

75 © 2019 Arm Limited

Forge and Performance Reports – Future roadmapWhy do our tools matter and what will we focus on this year?

Finding and using the righthardware is hard, even more sobecause of porting andmigration costs.

We will keep providing cross-platform tools to enable choiceand innovation in HPC.

Reduce migration costs and increase portability

For every run in production,codes are run 3 to 5 times tovalidate they meet standards.

We will assist the communityreduce their testing costs bypromoting best practices andtightening the link betweentools agile continuous delivery.

Too often, users are stopped intheir work by licence sizeslimitations.

We will work on providingcapabilities to users on demandat any time.

Slash down code validation costs and time

Provide capabilities on demand

Page 64: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

76 © 2019 Arm Limited

Forge/Performance Reports Roadmap 2018-2019Key highlights for Forge/PR 19.1 and 19.2

• Support for latest software environments (MPI, compilers, etc.)

• Support for popular HPC systems (Intel, Arm, Power, GPUs…)

• Developing exclusive features in collaboration with vendors (e.g. HPE, etc.)

Continuous work

• Arithmetic evaluations of CPU metrics

• Assembly views to Forge• Integration with DynamoRIO

for low-level instrumentation of operations

• Addition of a “burst mode” in the tools

• Simplify the integration of tools within scripts

• Add the json, xml, csv outputs of the “offline” tools features

19.1 19.2/20.0

Page 65: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

SVE - Introduction, tools and workflow

Page 66: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

78 © 2019 Arm Limited

Scalable Vector Extension (SVE)A vector extension to the ARMv8-A architecture with some major new features

Gather-load and scatter-storeLoads a single register from several non-contiguous memory locations.

Per-lane predicationOperations work on individual lanes under control of a predicate register.

Predicate-driven loop control and managementEliminate scalar loop heads and tails by processing partial vectors.

Vector partitioning and software-managed speculationFirst Faulting Load instructions allow memory accesses to cross into invalid pages.

Extended floating-point horizontal reductionsIn-order and tree-based reductions trade-off performance and repeatability.

1 2 3 45 5 5 51 0 1 0

6 2 8 4

+

=

pred

1 2 0 01 1 0 0

+pred

1 2

1 + 2 + 3 + 4

1 + 2

+

3 + 4

3 7= =

=

=

n-2

1 01 0CMPLT nn-1 n n+1INDEX i

for (i = 0; i < n; ++i)

Page 67: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

79 © 2019 Arm Limited

SVE is Arm’s next generation SIMD ISA

1 + 2 + 3 + 4

1 + 2

+

3 + 4

3 7

= =

=

=

1 2 3 4

5 5 5 5

1 0 1 0

6 2 8 4

+

=

pred

1 2 0 0

1 1 0 0

+

pred

1 2

n-2

1 01 0CMPLT n

n-1 n n+1INDEX i

for (i = 0; i < n; ++i)

Gather-load

and scatter-storePer-lane predication

Predicate-driven loop

control and management

Vector partitioning and

software-managed speculation

Extended floating-point

horizontal reductions

Page 68: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

80 © 2019 Arm Limited

SVE: HPGMG & Lulesh

Page 69: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

81 © 2019 Arm Limited

SVE: Optimizing Stencil

• What are the effects of Vector Length Agnosticism?• How well suited is the the ISA to express the semantics of stencil codes?

i

j

k

Baseline:Vectorise on k

i

j

k

Unroll j×"

i

j

k

Unroll j×#, i×"

Version $%$&$'()($*+(),'

%,+- ./0)($*+(),'

%,+- ./0$%$&$'(

Baseline 1234 7(6) 7(6)

Unroll j2 2×1234 12(10) 6(5)

Unroll i2j3 2×3×1234 28(22) 5.6(4.6)

Page 70: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

82 © 2019 Arm Limited

Open source support

• Arm actively posting SVE open source patches upstream• Beginning with first public announcement of SVE at HotChips 2016.

• Available upstream• GNU Binutils-2.28: released Feb 2017, includes SVE assembler & disassembler.

• GCC 8: Full assembly, disassembly and basic auto-vectorization

• GDB 8.2 SVE support

• LLVM 7: Full assembly, disassembly

• Linux kernel: since Mar 2017

• QEMU 3.1: SVE support (user-space and system mode)

• Under upstream review• LLVM: since Nov 2016, as presented at LLVM conference.

Page 71: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

83 © 2019 Arm Limited

Compiler support

Feature Upstream GCC Upstream LLVM Arm Compiler 6 (For bare metal)

Arm Linux Compiler (for Linux user-space)

SVE asm and disasm Yes Yes Yes Yes

SVE code generation Yes NoPlanned for 2019-20

Yes Yes

SVE ACLE NoPlanned for GCC10 (2020)

NoPlanned for 2019-20

Yes Yes

Auto-vectorization BasicMore improvements planned for GCC9

NonePlanned for 2019-20

Advanced Advanced

Page 72: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

84 © 2019 Arm Limited

Getting ready for SVE

Port to Arm• Port to current Arm hardware –

Single node and multi-node• Tune it for current Arm hardware

Get ready for SVE• Port to SVE using QEMU and/or

ArmIE on current Arm hardware

Tune for SVE• On real SVE hardware

Co-work with Arm tools and professional services team

Page 73: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

85 © 2019 Arm Limited

Arm Instruction Emulator for SVEDevelop tomorrow’s software on today’s hardware

• Simple “black box” tool aimed at userspacesoftware developers• $ armclang hello.c --march=sve$ ./a.outIllegal instruction$ armie –msve-vector-bits=256 --./a.outHello

• Runs userspace application binaries at close to native speed• runs multithreaded applications• transparent to system calls

• Intercepts and emulates use of ARM instructions newer than hardware

Page 74: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

86 © 2019 Arm Limited

Arm Instruction EmulatorDevelop your user-space applications for future hardware today

Start porting and tuning for future architectures early• Reduce time to market, Save development and debug time with Arm

support

Run 64-bit user-space Linux code that uses new hardware features on current Arm hardware

• SVE support available now. • Tested with Arm Architecture Verification Suite (AVS)

Near native speed with commercial support• Emulates only unsupported instructions• Maintained and supported by Arm for a wide range of Arm-based SoCs

Commercially Supportedby Arm

Runs at close to native speed

Develop software for tomorrow’s hardware today

Page 75: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

87 © 2019 Arm Limited

DynamoRIO

Dynamic Binary InstrumentationFast code translation in userspaceOriginally developed in MITNow managed by GoogleUsed for• profiling• valgrind-like checking• architecture emulation

Page 76: sVadlamani Arm HPC - SEA€¦ · Arm HPC Ecosystem –Overview Silicon Suppliers: Marvell, Fujitsu, Mellanox Linux OS Distro of choice: RHEL, SUSE, CENTOS,… Arm Server Ready Platform:

91 © 2019 Arm Limited

Key points of contactVisit www.arm.com/hpc-tools for further information

Product team

David LecomberSr Director, Infrastructure tools

Ashok Bhat Sr Product manager – Compiler and Libraries

Patrick Wohlschlegel

Sr Product manager – Forge and Perf Reports

Sales team

Rob Rick and Andrew Westergren –

Americas

Marcin Krzysztofik – EMEA, India and China

Toshinori Kujiraoka – Japan


Recommended