+ All Categories
Home > Documents > PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since...

PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since...

Date post: 28-Feb-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
36
CHIUW June 22, 2019 ANSHU DUBEY PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING
Transcript
Page 1: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

CHIUWJune 22, 2019

ANSHU DUBEY

PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING

Page 2: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

SETTING THE STAGEWhat should my ideal computational tool do?

Everything really.

§ Scan my brain§ Figure out what I want § Scan the literature§ Figure out the equations§ Auto-generate the code§ Run it§ Analyze the data

I am happy to present the results.

6/26/19 2

Page 3: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

Since programming models and other tools are not so obliging, let me reduce the complexity by several orders of magnitude.

If we were starting a new multiphysics exascalesoftware project today, that expects to have long term use for scientific discovery, how should we design the software?

6/26/19 3

SETTING THE STAGE

Chapel designers seem to think the way I do. I like the abstractions and the design, let me explain why.

Page 4: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

SCIENCE CODE DEVELOPMENT

4

Numerical solvers

Validation

Physical World

Equations

Mesh/particles etcImplementation

Model

Discretize

Verify accuracystability

Model fidelity

Model fidelity

Domain expert

Applied Mathematician

Domain expert

Applied Mathematician

Software EngineerDomain expert

Page 5: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

THERE IS MORE

5

Numerical solvers

Validation

Physical World

Equations

Mesh/particles etcImplementation

Model

Discretize

Verify accuracystability

Model fidelity

Model fidelity

Domain expert

Applied Mathematician

Domain expert

Applied Mathematician

Software Engineer, optimization experts

Performance

Domain expert

Page 6: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

ARCHITECTING SCIENTIFIC CODES

Desirable Characteristics

6

Maintainability and Verifiability

For credible and reproducible results

PerformanceAll machines need to be

used well

ExtensibilityMost use cases need

additions and/or customizations

PortabilityEven the same generation

platforms are different

Page 7: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

ARCHITECTING SCIENTIFIC CODES

Desirable Characteristics

7

Maintainability and VerifiabilityClean code

DocumentationComprehensive testing

PerformanceSpatial and temporal locality

of dataMinimizing data movement

Maximizing scalability

ExtensibilityWell defined structure and

modules Encapsulation of functionalities

PortabilityGeneral solutions that work without significant manual

intervention across platforms

Page 8: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

ARCHITECTING SCIENTIFIC CODES

Why it is challenging

8

ExtensibilitySame data layout not good

for all solversMany corner casesNecessary lateral

interactions

PortabilityTremendous platform

heterogeneityA version for each class of device => combinatorial

explosion

Maintainability and Verifiability

Wrong incentivesDesigning good tests is hard

PerformanceSolvers with low arithmetic

intensity but hard sequential dependencies

Proximity and work distribution at cross

purposes

Page 9: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

DESIGN APPROACHTaming the Complexity: Separation of Concerns

9

Subject of researchModel

Numerics

More StableDiscretization

I/OParameters

Treat differently

Client CodeMathematically

complex

InfrastructureData structures and movement

Hide from one another

logically separable functional units of

computation

Encode into framework

Differentiate between private and public

Define interfaces

Applies to both kind

Page 10: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

Requirements

Software ArchitectureAPI Design

Implement

Test

Maintain

Augment

Model

API

DesignDevelop

Validate

Integrate

Infrastructure Capabilities

SEPARATION OF CONCERNS

10

Page 11: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

DESIGN PHILOSOPHY

§ Infrastructure design§ Take time to discuss, iterate over requirements and

specification§ Keep end users involved

§ Not doing so leaves possible options on the table

§ Simple is better§ Flexibility Vs transparent to the user

§ Flexibility wins

§ Hierarchical access to features

6/26/19 11

Page 12: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

INTERACTION BETWEEN INFRASTRUCTURE AND PHYSICS

6/26/19 12

Inte

rface

s

Wra

pper

laye

r

infrastructure physics

Page 13: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

Example Software: FLASH

Cosmological cluster formation

Supersonic MHD turbulence

Type Ia SN

Rayleigh-Taylor instability

Core collapse supernovae

Ram pressure stripping

laser slab

Rigid body structure

Accretion torus

Vulcan laser experiments: B-field generation/amplification

Page 14: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

qMany components under researchq Software continuously evolvingqCompute on expensive, rare resourcesqAll use cases are different and unique

More Scientific Understanding

Higher FidelityModel

More DiverseSolvers

More Hardware Resources

SCIENCE USING FLASH

14

Page 15: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

FLASH CODE BASICS

§ An application code, composed of encapsulated functional units.§ Units are combined and composed to form applications§ Not one monolithic binary, each problem has its own distinct

binary§ Setup tool (python) parses Config files, picks specific

implementations of units and composes full application§ Units can have alternative implementations

§ Three implementations of mesh are supported

§ Composability implies any of the implementations can be picked

§ Mostly Fortran, some C, about 1.5 million lines of code§ Portable, and until recently performance portable

Page 16: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

DESIGN CONSIDERATIONS§ Encapsulation and interfaces§ Separation of concerns§ Extensibility§ Locality § Composability§ Orchestration§ Cost accounting

6/26/19 16

Page 17: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

ENCAPSULATION

Real view : A whole domain with many operators

Functional decomposition

Virtual view :domain sections as stand-alone computation unit

Virtual viewcollection ofcomponents

Spatialdecomposition

Parallelizationand scalingoptimization

Memoryaccess and computeoptimization

§ Virtual view of functionalities§ Decomposition into units and definition of

interfaces

Page 18: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

Other&units

unmodified

HydroAPI

FLASHDriver

Other&units

GridMainConfig

AMRConfig

Implementation

GridAPI

UnsplitConfig

FLASHDriver

GridMainConfig

AMRConfig

GridAPI

HydroMainConfig

namespace organizational

ImplementationImplementation

Call4Grid_initDomain….4(call4other4units)

Call4Grid_initDomainCall4Hydro….(call4other4units)

6/26/19 18

EXTENSIBILITY ADD A UNIT

Page 19: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

namespace(

organiza.onal(

(Implementa.on(

unmodified(FLASHDriver

Other&units&

GridMain Config

AMRConfig

Implementa.on(

Grid API

FLASHDriver

Other&units&

GridMain Config

AMRConfig

Implementa.on(

Grid API

GridSolvers Config

modified(

6/26/19 19

EXTENSIBILITY AND LOCALITYADD A SUBUNIT

Page 20: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

Real view : A whole domain with many operators

Virtual view :domain sections as stand-alone computation unit

Parallelizationand scalingoptimization

SpatialDecompositionBlocks/tiles

COMPOSABILITY

Dynamic Scheduling

Load Distribution

Framework

§ AMR infrastructure: refinement, load balancing, work redistribution

§ Meta-information about domain sections§ Asynchronization at block and operator level§ No kernel optimization in this part

Page 21: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

Real view : A whole domain with many operators

Functional decomposition

Virtual viewcollection ofcomponents

Memoryaccess and computeoptimization

COMPOSITION

Abstraction at solver level

code transformation

Fusing/inliningFunctions

Framework§ Abstractions for performance

portability§ Ability to express operations

at a higher level § Do away with optimization

blockers

§ Leave it to tools and compilers to optimize

Page 22: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

CODE TRANSFORMATION

§ Two different scopes§ The usual one

§ Write code once, generate ”optimized” code for the target§ Down at the level of loop nests or kernels

§ Best done for limited scope computations§ We intend to use transpiler being developed by collaborators§ Turns IR into constrained python, optimized code generated from

there.§ The not so usual one

§ High level orchestration of operators§ Determined during application configuration§ Communicated to the runtime in part

6/26/19 22

Page 23: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

ORCHESTRATION SYSTEM

§ Task composer – used for configuration§ Extension of the original FLASH “Config” files§ A configuration DSL § Encode meta-information for application construction in

FLASH-specific syntax as needed

A primer on how FLASH framework configures application.

6/26/19 23

Page 24: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

CONFIG FILES

§ Can exist anywhere in the directory structure§ Encode all meta-information for that level

§ Unit dependencies§ State variables needed§ State variables that need reconciliation at fine-coarse boundaries§ Runtime environment

6/26/19 24

REQUIRES DriverREQUIRES physics/HydroREQUIRES physics/Eos/EosMain/HelmholtzREQUIRES physics/sourceTerms/Burn/BurnMain/nuclearBurnREQUIRES Simulation/SimulationCompositionPARAMETER xhe4 REAL 0.0 [0.0 to 1.0]PARAMETER xc12 REAL 1.0 [0.0 to 1.0]PARAMETER xo16 REAL 0.0 [0.0 to 1.0]

Page 25: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

25

CONFIGURATION

Evolution(time

stepping)

Hydro/MHDExplicitStencils

Self GravitySemi-implicit

Stencils, FFT etc

EOSPointwise

Table lookup

BurnPointwise

ODE

DiffusionImplicit

ParticlesLagrangian

RadiationImplicit

Laser Drive

Shock Tube

Library

Dubey et al, Parallel Computing 2009

Page 26: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

26

CONFIGURATION

Evolution(time

stepping)

Hydro/MHDExplicitStencils

Self GravitySemi-implicit

Stencils, FFT etc

EOSPointwise

Table lookup

BurnPointwise

ODE

DiffusionImplicit

ParticlesLagrangian

RadiationImplicit

Laser Drive

Cellular

Dubey et al, Parallel Computing 2009

Page 27: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

27

CONFIGURATION

Evolution(time

stepping)

Hydro/MHDExplicitStencils

Self GravitySemi-implicit

Stencils, FFT etc

EOSPointwise

Table lookup

BurnPointwise

ODE

DiffusionImplicit

ParticlesLagrangian

RadiationImplicit

Laser Drive

GCD

Dubey et al, Parallel Computing 2009

Page 28: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

28

CONFIGURATION

Evolution(time

stepping)

Hydro/MHDExplicitStencils

Self GravitySemi-implicit

Stencils, FFT etc

EOSPointwise

Table lookup

BurnPointwise

ODE

DiffusionImplicit

ParticlesLagrangian

RadiationImplicit

Laser Drive

HEDP

Library

Dubey et al, Parallel Computing 2009

Page 29: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

COMPOSER FILES§ Same philosophy§ Keep them separate from Config files

§ More complex§ Functionally different§ Operate at individual unit level

§ Build a separate tool § Could be a DSL compiler

§ We prefer to keep it simple§ Time will tell if we can

§ Parse the meta-information and produce executable code

6/26/19 29

Page 30: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

OUR VISION

30

allocateMemoryHost()allocateMemoryAccel()moveData_1()kernel_1()

kernel_M()moveData_2()kernel_M+1()

……

kernel_N()moveData_P()deallocateMemoryHost()deallocateMemoryAccel()

Task

Operation

Emitted code

Solver Information

kernels

Platform Information

Memory Requirements

TaskComposer

Operation

Page 31: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

RUNTIME ORCHESTRATION

Single GPU

CPU1 MPIRank

PackingThread

ExecutionThreads

Blocks ReadyQueue

Blocks DoneQueue

Enqueue usingBlock Iterator

Move DataTo GPU

ControlKernels

Run tasksOn CPU

and/or

Transfer databack to CPU

Examples of CPU tasks:(1) computeDt

(2) refinementError

Work done onlist of Blocks

already inGPU Memory

UnpackingThreads

Task Composition: scheduleComputations(gpu={gcFill, computeFluxes, updateSoln, Eos},cpu={computeDt},moveDataBack=True)

6/26/19 31

Page 32: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

BUILDING THE CODE

§ Configuration in three stages§ Stage 1 – the usual running of setup script§ Stage 2 – run the task composer§ Stage 3 – run the transpiler

§ Run make as usual§ The orchestrator generated in the process

§ Launches various threads that control run time§ May or may not interact with AMReX asynchronization

6/26/19 32

Lot of open questions still, but we believe that this is the right approach

Page 33: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

WHY THIS WAY - PARALLELISM§ MPI is not difficult, decomposition is§ In parallelization neither all nor none is good

§ All – leave everything to the compiler§ Domain specific knowledge lost – wasted opportunity§ Compilers get impossible job, cannot optimize

§ None – orchestrate everything explicitly§ Not feasible for even moderately complex application § Impossible from productivity perspective

§ Whichever model is used, understanding the parallelizable structure of application is critical

§ Constructs to encode the understanding needed

6/26/19Go to "Insert | Header & Footer" to update this text 33

Page 34: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

WHY THIS WAY - KERNELS§ C++ => Pushing a needlessly complex

language that lacks basic structures § If there is a mesh there are 3D arrays

§ meta-data built and carried around§ Explicit order of access and order of operations

§ No graceful way to encode lack of dependence§ Maintainable code in clean constructs, perhaps

in python eventually§ We can also exploit alternative implementations

at arbitrary granularity

6/26/19Go to "Insert | Header & Footer" to update this text 34

Page 35: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

ADVANTAGES

6/26/19Go to "Insert | Header & Footer" to update this text 35

§ All code can be compiled with standard compilers

§ Constructs for expressing parallelism at different granularities

§ Limit intelligence needed in any one tool§ Domain knowledge encoded in composer file,

helps with optimizations

This is why I think Chapel designers think the way I do.

Page 36: PROGRAMMING ABSTRACTIONS FOR ORCHESTRATION OF HPC SCIENTIFIC COMPUTING … · 2019. 9. 4. · Since programming models and other tools are not so obliging, let me reduce the complexity

36

Questions ?


Recommended