+ All Categories
Home > Documents > Supercomputing Trends in Earth System Modelling

Supercomputing Trends in Earth System Modelling

Date post: 11-Mar-2022
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
20
Supercomputing Trends in Earth System Modelling ECMWF Workshop on HPC in Meteorology 26 October 2016 Dr. Phil Brown Earth Sciences Segment Leader
Transcript

Supercomputing Trends in Earth System

Modelling

ECMWF Workshop on HPC in Meteorology

26 October 2016

Dr. Phil Brown

Earth Sciences Segment Leader

Overview

● Cray Update

● Early KNL results with Earth System Models

● Thoughts future trends

● System Reliability@Exascale

● Deep Learning in Weather/Climate

● Converged Architectures

Copyright 2016 Cray Inc.2

Cray Solutions for the Earth Sciences

Copyright 2016 Cray Inc.3

Why Cray ?

• Cray’s solutions enable a broader and more detailed range of meteorological services and products

• Advanced modeling capabilities

• Shortened research to operations

• Experience delivering and operating world’s largest and most complex systems

• Emphasis on total cost of ownership – power, upgradability and efficiency

• Commitment to long-term partnerships delivering significant ongoing value to our customers.

Market

Presence

• Broad presence across NWP and climate communities:

• From Terascale to Petascale

• Research and operational environments

• Model development platforms for extreme scale architectures

Cray Growth in Weather, Climate and Oceanography

Copyright 2016 Cray Inc.4

Nov

2015Jan

2015

July

2015

Sept

2015

Aug

2015

Dec

2015

Mar

2016

Cray Growth in Weather, Climate and Oceanography

Copyright 2016 Cray Inc.5

Nov

2015Jan

2015

July

2015

Sept

2015

Aug

2015

Dec

2015

Mar

2016

Since last workshop:

• >89,000 sockets shipped

• >53 PetaFlops nominal peak

• 89PB of Cray Sonexion

• >3TB/s nominal IO BW

Two Large XC Systems that will Impact Future Technologies and Applications Throughout the Community

● Los Alamos / Sandia – “Trinity”● >40 Pflop system, mix of Haswell + KNL

● 3TB/s / 3PB SSD DataWarp Capability

● NERSC8 – “Cori”● >40 Pflop system, mix of Haswell + KNL

● 3TB/s / 3PB SSD DataWarp Capability

● Transitioning user base to “many-core” processing

● NERSC Exascale Science Applications Program (NESAP)

Copyright 2016 Cray Inc.6

HOMME – KNL vs Broadwell

● Spectral element dynamical core within CAM component of CESM

● Performance dominated by advection (2nd order Runge-Kutta), memory bandwidth limited

● Key optimizations targeted at KNL implemented by NCAR ASAP group● See John Dennis’ talk at the 6th UCAR MultiCore workshop

● “perfTESTWACCM” benchmark: Baroclinic wave in N. hemisphere● Size NE=8, 70 vertical levels, 135 tracers

● Runtimes from “prim_main_loop”

● Node: Intel Xeon Phi 7250 68-core 1.4GHz, 96GB DDR4-2400, 16GB MCDRAM● MCDRAM bandwidth (quad-flat): 475-490GB/s

● MCDRAM bandwidth (cache): ~350GB/s

● DDR4 bandwidth (quad-flat): ~90GB/s

● Node: 2 x Intel Xeon Broadwell E5-2699 22-core 2.2GHz, 128GB DDR4-2400● DDR4 bandwidth = ~130GB/s

● Cray XC, with Cray compiler & programming environment

● Strong scaling study performed by Marcus Wagner, supported by NERSC CoE

Copyright 2016 Cray Inc.7

Single Node Results

Copyright 2016 Cray Inc.8

1.00 0.69

2.69

3.65

1.00

0.65

1.90 1.83

-

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

Broadwell KNL-DRAM-flat KNL-MCDRAM-cache KNL-MCDRAM-flat

HOMME performance Memory bandwidth

Multi Node Results - runtime

Copyright 2016 Cray Inc.9

63.2

32

21.5

12.57.7

34.1

20.7

13.39.3 9.1

0

10

20

30

40

50

60

70

1 2 3 6 8 12

Tim

e (

s)

Number of NodesBroadwell Knights Landing

Multi Node Results – parallel efficiency

Copyright 2016 Cray Inc.10

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 3 6 8 12

Para

llel E

ffic

iency

Number of NodesBroadwell Knights Landing

Unified Model – KNL vs Broadwell

● Unified Model v10.3 / AMIP test case ● N96 (135km) Global Atmosphere

● Memory footprint approximately 26GB

● Test platforms: ● Node: Intel Xeon Phi 7230 64-core 1.3GHz, 96GB DDR4-2400, 16GB MCDRAM

● MCDRAM bandwidth (cache): ~350GB/s

● DDR4 bandwidth (cache): ~70GB/s

● Node: Intel Xeon Phi 7250 68-core 1.4GHz, 96GB DDR4-2400, 16GB MCDRAM

● MCDRAM bandwidth (cache): ~350GB/s

● DDR4 bandwidth (cache): ~70GB/s

● Node: 2 x Intel Xeon Broadwell E5-2695 18-core 2.1GHz, 128GB DDR4-2400

● DDR4 bandwidth = ~130GB/s

● Cray XC, with Cray compiler & programming environment

● Investigation conducted by Eckhard Tschirschnitz

Copyright 2016 Cray Inc.11

Unified Model – KNL vs Broadwell

● Three cases tested1. Unmodified code, compiled to target KNL/BDW, double precision solver

2. Unmodified code, compiled to target KNL/BDW removing all “vector0” flags, double precision solver

● Vector0 disables automatic vectorization

● Used to avoid some numerical issues observed previously in UM with higher vectorization

● No issues apparent in this particular test case

3. Unmodified code, compiled to target KNL/BDW, mixed precision solver

● Inspired by early KNC work, higher cache efficiency

Copyright 2016 Cray Inc.12

UM - Throughput & Energy Efficiency

Copyright 2016 Cray Inc.13

-

0.20

0.40

0.60

0.80

1.00

1.20

1.40

BDW-36,4x8, 32cores

BDW-36,6x6, 36cores

KNL-68,16x4, 64

cores

BDW-36,4x8, 32cores

KNL-64,16x4, 64

cores

BDW-36,4x8, 32cores

KNL-68,16x4, 64

cores

KNL-64,16x4, 64

coresThroughput

Energy

Unmodified,

Double precision solver

Unmodified,

Mixed precision solverVector0 removed,

Double precision solver

Thoughts on early KNL results

1. Potential for greater throughput/node ● Significant optimization may be required to maximize potential

● Improving vectorization, threading & cache re-use

2. Significant potential for higher energy efficiency

3. Parallel efficiency reduces faster with scale ● Not unexpected due to lower per-core performance

● May prove problematic where jobs have strict runtime targets which challenge Xeon today

Copyright 2016 Cray Inc.14

Reliability in the coming years

● Considering a contemporary 3000 node system

● Very high whole system MTTI/availabilities achievable

● Expect job failures to occur every 5-10 days

● Caused by uncorrectable soft and hard errors in memory and/or CPUs

● Scales with number of devices/sockets in system

● Expect whole-system reliability to remain high, but rate

of job failures to increase

Copyright 2016 Cray Inc.15

Impact on Weather/Climate models

● Not ideal for deterministic models● Requires re-run from start or saved state

● Losing an ensemble member not so impactful● Close coupling of ensembles into single MPI launch not desirable

until MPI-resiliency features exist

● Overall impact modest?

● Providing workflows are engineered to expect & react to failures

Copyright 2016 Cray Inc.16

Machine/Deep Learning in Weather/Climate

● Deep Learning used to describe a family of algorithms related to multi-level neural networks:● Deep Neural Networks

● Convolutional Neural Networks

● Recurrent Neural Networks

● Lots more!

● Key enabler has been access to compute resources● DL is predominantly FLOP bound

● Large scale problems rapidly becoming “HPC”-class

● Delivering “state of the art” results in computer vision, speech recognition, natural language processing etc.

Copyright 2016 Cray Inc.17

Machine/Deep Learning in Weather/Climate

● Almost the opposite of a physics/dynamics based model● Arduous to train, but comparatively quick to run

● Use-cases will be complementary?

● Some ideas:● Rapid classifiers for radar/observations

● Pattern recognition in model outputs

● Infilling/smoothing model outputs

Copyright 2016 Cray Inc.18

Future Converged Architecture

Copyright 2016 Cray Inc.19

Bulk StorageCache, Metadata &

Data Movement

Automated

Data

Movement

Thank you for your attention


Recommended