+ All Categories
Home > Documents > The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1...

The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1...

Date post: 05-Jun-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
26
1 The High Performance Computing Roadmap FUT-1438 Jay Kruemcke Senior Product Manager SUSE High Performance Computing [email protected] @mr_sles
Transcript
Page 1: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

1

The High Performance Computing Roadmap

FUT-1438

Jay KruemckeSenior Product Manager – SUSE High Performance Computing

[email protected]

@mr_sles

Page 2: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

2

Agenda

1. Why HPC?

2. Customer challenges

3. What SUSE brings to HPC

4. Where are we going?

Page 3: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

3

Why HPC?

Worldwide HPC revenue expected to reach over $19.95 billion by 20231

Big data combined with HPC creating new solutions, adding many new

users/buyers to the HPC space (AI/ML/DL and HPDA are hot new areas)

SUSE runs on 21 of the top 50 supercomputers (7 RH, 9 CentOS)2

SUSE dominates top 100, CentOS gains share in “smaller” supercomputers2

Commercial OS Share in Top 500 (represents 100 supercomputers in the list): SUSE 53%, RH 24%, bullx 17%, Ubuntu 6%2

1 Hyperion Research, November 20192 Top500 Supercomputer Report, November 2019

Page 4: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

4

Cloud Computing For HPC Will Grow Faster

1 Hyperion Research, November 2019

• Total HPC spending is projected to reach $44B

in 2022

• Over 70% of HPC sites run some jobs in public

clouds

• Over 10% of all HPC jobs are now running in

clouds (primarily hybrid)

• Public clouds are cost-effective for some jobs

but up to 10x more expensive for others,

depending on where data resides

• Private and hybrid cloud use is growing faster

Page 5: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

5

Customer Pain Points And Challenges

Time to Solution

“I need to maximize

application performance,

scale workloads, and

minimize overhead.”

• Parallel software is lacking

with many applications

needing a major re-design

• Segmented into

commercial and scientific,

and there is not enough

collaboration

Maintenance

“My IT staff doesn’t have

time to update and test all

the different software

components.”

• Better management

software needed; update

deployment approach to

leverage HPC and cloud

infrastructure

• Stack components

provided by multiple

vendors, making it more

challenging to maintain

Complexity

Composing a working HPC

environment is difficult, time-

consuming, requiring

experts.”

• Clusters are hard to use

and manage as they

become more complex in

heterogeneous

environments

• Storage access time and

data management are

becoming new bottlenecks

Page 6: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

6

SUSE Linux Enterprise High Performance Computing

HPC bundle with supported HPC packages – beyond an OS

Supports Aarch64 (Arm) and x86-64

Many IHV/ISV/CSP partnerships

Multiple service life options

Competitive cluster node pricing model

Page 7: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

7

SuperMUC Petascale system runs SUSE on Lenovo

ThinkSystem

Geophysicists use earthquake simulation software to

investigate seismic waves beneath Earth’s surface

Calculations involved in this kind of simulation are so

complex that they push even supercomputers to their limits

Page 8: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

8

Selected SUSE HPC Projects

SUSE Linux for HPC

& the HPC module

SUSE Enterprise Storage

SUSE Package Hub

HPC Containers

Arm: the emerging platform

HPC in the Cloud

Accelerator enablement

Page 9: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

9

Why SUSE Linux For HPC?

Enterprise Linux with Enterprise support

• Security incidents require quick response to address system vulnerabilities

More than just an OS - HPC software included and supported

• SLE HPC includes popular HPC software such as slurm and OpenMPI

• Deployment templates for Head Nodes, Compute Nodes, Dev Nodes

Aggressively priced subscriptions

• SUSE Linux for HPC priced for large and small HPC configurations

Proven track record in HPC

• 50% of the Top 100 HPC systems are running SUSE Linux or SLE-based OS

Page 10: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

10

SUSE Linux HPC Module

Simplify access to supported

HPC packages

All packages supported by SUSE

via SUSE Linux Enterprise HPC

Available for x86 and Arm-based

platforms

SLE HPC 12 and SLE HPC 15MUNGE

ScaLAPACK

genders

Page 11: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

11

Installing The HPC Module

6/3/2020

Page 12: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

12

SUSE HPC Reference Architecture

Page 13: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

13

Cloud is being optimized for HPC

workloads through performance,

scalability and cost efficiency,

enabling you to extend your HPC

environment to the cloud on-demand.

Dynamically burst to the cloud to

complement your on-premises

capabilities, or even fully migrate

entire HPC environments and

workflows.

Cloud-Ready HPC

Page 14: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

14

HPC In The Cloud

HPC “all-in” the cloud

• Includes the head, compute and storage nodes,

with no hardware infrastructure to maintain

• Optimized cost and performance for scale-out

applications

HPC bursting to hybrid/public clouds

• Address changing capacity needs

• Extend HPC jobs to the Cloud for on-demand

scale and flexibility

Local Network Cloud Local Network Cloud

Page 15: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

15

Goal: Propel the Arm HPC ecosystem and exascale computing in the UK

• More than 12,000 Arm-based cores running across three universities

• 64 Apollo 70 systems per site

• Two 32 core Cavium ThunderX2 processors per system

• Running SUSE Linux Enterprise for High Performance Computing

Catalyst UK program: HPE, Arm, SUSE, and three leading UK universities establish one of the largest Arm-based supercomputer deployments in the world

Page 16: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

16

Artificial Intelligence

Machine Learning

Neural Networks

Deep Learning

Convolutional Neural Networks

Transfer Learning

The Spectrum Of AI Solutions

Deep LearningExamples are disease identification

and energy demand optimization.

Machine LearningExamples are cyber security,

autonomous vehicles and F1 racing.

Artificial IntelligenceExamples are Google Maps and game

play.

Neural NetworksExamples are facial and voice

recognition.

Convolutional Neural NetworksExamples are image/video recognition

and medical image analysis.

Transfer LearningFor example, knowledge gained while

learning to recognize cars could apply

when trying to recognize trucks.

Page 17: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

17

SUSE Participation In OpenACC

OpenACC is a directive based programming model designed to provide

performance and portability for CPUs, GPUs, and other accelerators

SUSE joined OpenACC to simplify access to accelerator technology for

SUSE HPC customers

Page 18: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

18

SUSE PackageHub

• High-quality, up-to-date packages

delivered by openSUSE Factory

• Easy to install via zypper or yast

• Built and maintained by the

community of users

• Approved and curated by SUSE

• No charge

About 1000 packages

available for X86-64

More than 500 packages

available for ARM

Enterprise UserSUSE Package HubUpstream packages

Package Category

TensorFlow ML Framework

Caffe2 Framework

Theano Deep learning library

Numpy* Math library

Pytorch* ML library

ArmNN ML Framework

clustershell Administrative

robinhood Administrative

singularity Runtime*planned

Page 19: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

19

Key HPC Partnerships

1

9

Page 20: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

20

Ceph-based, software-defined storage

Backup/archival HPC storage

IO500 benchmark-ranked

Easy to manage with openATTIC

Certified with HPE DMF

SUSE Enterprise Storage

Page 21: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

21

“Thanks to the stability and ease of

management of the SUSE solution, we

have significantly reduced the time we

spend managing live and archived

data. This keeps our internal team free

to focus on driving new value for the

university and its life-changing

research projects.”

“SUSE Enterprise Storage has already brought

clear improvements to our deep learning projects,

one of which requires two million files in a single

directory. Putting these files into SUSE Enterprise

Storage has increased performance more than ten

times compared with the previous storage solution.”Steve Cousins

Supercomputer Engineer

University of Maine System

Page 22: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

22

SUSE Enterprise Storage Solution For HPC - Ceph

Tier 2 Storage Use Case

6/3/2020

Low Latency

Storage (Lustre,

XFS, NFS etc)

HPC Compute

Cluster

SUSE Enterprise

Storage

• Use Cases:

• Primary Storage (Certain Use Cases)

• Nearline or Archival Storage

• Home Directories

• Certified with HPE Data Management Framework (DMF) and iRODS*

Page 23: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

23

HPC Storage Use Case:

Large European Energy Company

Active TierHot Data

Dormant TierCold Data

HPC/AI Compute Cluster

High-Performance Storage

Scale-out NAS

Parallel File Systems

All-Flash File System

HPE Data Management FrameworkTiered data management

TapeDMF zero watt storage

Object Storage & Cloud

Tier 0 Storage needs

- Clustered file system

- Lustre

- 10 PiB, 240Gb/sec

Tier 1 Storage needs

(SUSE / Ceph)

- Object Storage, resilient

- Widely used, affordable

- Automatic access`

- 5 PiB

Page 24: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

24

SLES HPC Lifecycle Roadmap*

SLES 12 HPC SP5

SLES 12 HPC

SP5 LTSS

SLES 12 HPC SP5SLES 12 HPC SP5

ESPOS

2017 2018 2019 2020 2021 2022 20252023 2024

SLES 12 HPC

SP3 LTSS

SLES 12 HPC SP3

ESPOS

SLES 12 HPC SP3

FCS

Sept 2017

SLES 12 HPC SP3

”Normal” SP

overlap

SLES 12 HPC

SP4 LTSS

SLES 12 HPC SP4

ESPOS

SLES 12 HPC SP4

FCS

4Q 2018

SLES 12 HPC SP4

”Normal” SP

overlap

SLE HPC 15

ESPOS

SLE HPC 15 FCS

Q2 2018

SLE HPC 15

”Normal” SP

overlap

SLE HPC 15 SP2

SLE HPC 15 SP2

SLE HPC 15

SP2 LTSS

SLE HPC 15 SP2

ESPOS

SLE HPC 15

SP1 LTSS

SLE HPC 15 SP1

ESPOS

SLE HPC 15 SP1

FCS

Q2 2019

SLE HPC 15 SP1

”Normal” SP

overlap

*NOTE: All future dates are estimates for illustration purposes and are not intended as committed dates.

SLE HPC 15

LTSS

Page 25: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

25

Strategic Directions

Enable and exploit new HPC hardware

Shift HPC Module focus to utilities

Blend in AI/ML support

Simplify HPC in the Cloud experience

Improve Day 1 and Day 2 experience

Page 26: The High Performance Computing Roadmap · 2020. 6. 4. · ESPOS SLE HPC 15 SP1 LTSS SLE HPC 15 SP1 ESPOS SLE HPC 15 SP1 FCS Q2 2019 SLE HPC 15 SP1 ”Normal” SP overlap *NOTE: All

Recommended