ARM and Mellanox Hackathon - GRChombo · 2020. 5. 11. · ARM and Mellanox Hackathon - GRChombo...

Post on 07-Sep-2020

2 views 0 download

transcript

ARM and Mellanox Hackathon - GRChombo

Kacper Kornet

September 25, 2019

DAMTP, University of Cambridge

Table of contents

1

GRChombo

GRChombo is an AMR GR code developed by a team of researchers

from:

• Department of Applied Mathematics and Theoretical Physics

(DAMTP), University of Cambridge

• Argonne Leadership Computing Facility, Argonne National

Laboratory

• Department of Physics, King’s College London

• School of Mathematical Sciences, Queen Mary University of London

• Department of Physics, University of Oxford

• Institute of Mathematics and Physics, University of Louvain

Core developers: Josu C. Aurrekoetxea (KCL), Katy Clough (Oxford),

Amelia Drew (Cambridge), Pau Figueras (QMUL), Hal Finkel (ANL),

Tiago Frana (QMUL), Chenxia Gu (QMUL), Thomas Helfer (KCL),

Cristian Joana (UCLouvain), Kacper Kornet (Cambridge), Markus

Kunesch (Cambridge), Eugene Lim (KCL), Miren Radia (Cambridge),

James Widdicombe (KCL) 2

GRCombo

3

GRChombo: Parallelization levels

• Set of boxes distributed among with MPI

• Inside boxes outer loops parallelized with OpenMP

• Innermost loops vectorized with intrinsics

4

GRChombo: C++ template library

void BinaryBHLevel::specificEvalRHS(GRLevelData &a_soln, GRLevelData &a_rhs,

const double a_time)

{

// Enforce positive chi and alpha and trace free A

BoxLoops::loop(make_compute_pack(TraceARemoval(),

PositiveChiAndAlpha()),

a_soln, a_soln, INCLUDE_GHOST_CELLS);

// Calculate CCZ4 right hand side and set constraints

// to zero to avoid undefined values

BoxLoops::loop(

make_compute_pack(CCZ4(m_p.ccz4_params, m_dx, m_p.sigma),

SetValue(0, Interval(c_Ham, NUM_VARS - 1))),

a_soln, a_rhs, EXCLUDE_GHOST_CELLS);

}

5

GRChombo: C++ template library

// Compute the value of phi at the current point

template <class data_t>

data_t ScalarBubble::compute_phi(Coordinates<data_t> coords) const

{

data_t rr = coords.get_radius();

data_t rr2 = rr * rr;

data_t out_phi = m_params.amplitudeSF * rr2 *

exp(-sqr(rr - m_params.r_zero

/ m_params.widthSF));

return out_phi;

}

6

GRChombo: instrinsics classes

template <> struct simd\_traits<float>

{

typedef __m512 data_t;

typedef __mmask16 mask_t;

static const int simd_len = 16;

};

template <> struct simd<double> : public simd_base<double>

{

typedef typename simd_traits<double>::data_t data_t;

typedef typename simd_traits<double>::mask_t mask_t;

ALWAYS_INLINE

simd() : simd_base<double>(_mm512_setzero_pd()) {}

ALWAYS_INLINE

simd(double x) : simd_base<double>(_mm512_set1_pd(x)) {} 7

Porting GRChombo to ARM

• Finding best compiler options (-fno-fast-errno)

• Replacing x86 specific bits with general one

• NEON port

• rudimentary SVE port (not vector length agnostic yet)

8

GRCombo benchmarks on ARM cluster

9

GRCombo benchmarks on ARM cluster

10

GRCombo on Bluefield

• Runs without source modifications (although one needs to be careful

about architecture options)

• Using same number of cores ∼ 3 slower then ThunderX2

11