+ All Categories
Home > Documents > libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann...

libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann...

Date post: 18-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
25
libCEED Finite Element Library Development Update and Examples Jeremy L Thompson Valeria Barra, Jed Brown University of Colorado Boulder [email protected] Sept 25, 2019 Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 1
Transcript
Page 1: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED Finite Element LibraryDevelopment Update and Examples

Jeremy L ThompsonValeria Barra, Jed Brown

University of Colorado Boulder

[email protected]

Sept 25, 2019

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 1

Page 2: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED Team

Developers: Jed Brown1, Jeremy Thompson1

Thilina Rathnayake2, Jean-Sylvain Camier3, Tzanio Kolev3,Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3,David Medina4, Tim Warburton5, & Oana Marin6

Grant: Exascale Computing Project (17-SC-20-SC)

1: University of Colorado, Boulder2: University of Illinois, Urbana-Champaign3: Lawrence Livermore National Laboratory4: OCCA5: Virginia Polytechnic Institute and State University6: Argonne National Laboratory

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 2

Page 3: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

Overview

libCEED is an extensible library that provides a portable algebraicinterface and optimized implementations of high-order operators

We have optimized implementations for CPU and GPU

We have new performance optimizations, development in ourexample suite, and research in preconditioning strategies

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 3

Page 4: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

Overview

1 Introduction

2 libCEED

3 Example Suite

4 Current Efforts

5 Future Work

6 Questions

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 4

Page 5: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

Introduction

Center for Efficient Exascale Discretizations

DoE exascale co-design center

Design discretization algorithms for exascale hardware that deliversignificant performance gain over low order methods

Collaborate with hardware vendors and software projects for exascalehardware and software stack

Provide efficient and user-friendly unstructured PDE discretizationcomponent for exascale software ecosystem

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 5

Page 6: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

Introduction

Tensor Product Elements

Using an assembled matrix forgoes performance optimizationsfor hexahedral elements

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 6

Page 7: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED

libCEED Design

libCEED design approach:

Avoid global matrix assembly

Optimize basis operations for all architectures

Single source user quadrature point functions

Easy to parallelize across hetrogeneous nodes

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 7

Page 8: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED

libCEED Backends

CPU

GPU

Pure C

MAGMA

AVX

LIBXSMM

Pure CUDA

OCCA

libCEED

CPU

GPU

MFEM

Nek5000

PETSc

...

libCEED provides multiple backend implementations

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 8

Page 9: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED

libCEED Operator Decomposition

AL = GTBTDBG

G - CeedElemRestriction, local gather/scatter

B - CeedBasis, provides basis operations such as interp and grad

D - CeedQFunction, representation of PDE at quadrature points

AL - CeedOperator, aggregation of Ceed objects for local action of operator

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 9

Page 10: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED

Laplacian Example

Solving the 2D Poisson problem: −∆u = fWeak Form:

∫∇v∇u =

∫vf

General libCEED OperatorAL = GTBTDBG

Laplacian OperatorAL = GTBT

Grad2DDBGrad2DG

where D is block diagonal by quadrature point:

Di = (wi det Jgeo) J−1geoJ

−Tgeo and Jgeo =

[ ∂x∂r

∂x∂s

∂y∂r

∂y∂s

]x , y physical coords; r , s reference coords

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 10

Page 11: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED

Basis Optimization

Solving the 2D Poisson problem: −∆u = fWeak Form:

∫∇v∇u =

∫vf

General libCEED OperatorAL = GTBTDBG

Laplacian OperatorAL = GTBT

Grad2DDBGrad2DG

Computationally Efficient Form

AL = GT[BTG ⊗ BT

I BTI ⊗ BT

G

]D

[BG ⊗ BI

BI ⊗ BG

]G

BI - 1D InterpolationBG - 1D Gradient

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 11

Page 12: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED

Basis Optimization

Solving the 2D Poisson problem: −∆u = fWeak Form:

∫∇v∇u =

∫vf

General libCEED OperatorAL = GTBTDBG

Laplacian OperatorAL = GTBT

Grad2DDBGrad2DG

Computationally Efficient FormAL =

GT(BTI ⊗ BT

I

) [BTG ⊗ I2 I2 ⊗ BT

G

]D

[BG ⊗ I2I2 ⊗ BG

](BI ⊗ BI )G

where BG = BGBI

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 12

Page 13: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED

Operator Definition

General libCEED Operator:vL = ALuL

AL = GTBTDBG

Laplacian Operator Code:

CeedOperatorCreate(ceed , qf_apply , NULL , NULL , &op_apply);

CeedOperatorSetField(op_apply , "du", e r e s t r i c t u , CEED_TRANSPOSE ,

basisu , CEED_VECTOR_ACTIVE);

CeedOperatorSetField(op_apply , "geo",erestrictqdi ,CEED_NOTRANSPOSE ,

CEED_BASIS_COLLOCATED , geo);

CeedOperatorSetField(op_apply , "dv", e r e s t r i c t u , CEED_TRANSPOSE ,

basisu , CEED_VECTOR_ACTIVE);

...

CeedOperatorApply(op_apply , uloc, vloc, CEED_REQUEST_IMMEDIATE);

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 13

Page 14: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED

QFunction Definition

General libCEED QFunction:vq = Duq

2D Laplacian QFunction:[dv0

dv1

]=

[D00 D01

D01 D11

][du0

du1

]

2D Laplacian QFunction Code:

CeedQFunctionCreateInterior(ceed , 1, Poisson2D ,

Poisson2D_loc , &qf_apply);

CeedQFunctionAddInput(qf_apply , "du", 2, CEED_EVAL_GRAD);

CeedQFunctionAddInput(qf_apply , "geo", 3, CEED_EVAL_NONE);

CeedQFunctionAddOutput(qf_apply , "dv", 2, CEED_EVAL_GRAD);

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 14

Page 15: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED

QFunction Definition

Single Source QFunctions for all backends:

C/C++ code, compiled with main for CPU, JiT for GPU

int Poisson2D(void *ctx , const CeedInt Q,

const CeedScalar *const *in, CeedScalar *const *out) {

// Inputs and Outputs

const CeedScalar *du = in[0];

CeedScalar *geo = out[0], *dv = out [1];

// Quadrature Point Loop

CeedPragmaSIMD // For CPU vectorization

for (CeedInt i=0; i<Q; i++) {

dv[i+Q*0] = geo[i+Q*0]*du[i+Q*0] + geo[i+Q*2]*du[i+Q*1];

dv[i+Q*1] = geo[i+Q*2]*du[i+Q*0] + geo[i+Q*1]*du[i+Q*1];

} // End of Quadrature Point Loop

return 0;

}

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 15

Page 16: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED

libCEED Performance

Benchmark performance across multiple implementations

Benchmark Problem 1/2:

Mu = f

L2 projection problem

Benchmark Problem 3/4:

Ku = f

Poisson problem

3D scalar problem (BP 1/3) or 3D vector problem (BP 2/4)

Unpreconditioned CG, maximum of 20 iterations

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 16

Page 17: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED

GPU Performance

Substantial performance increase with Single Source QF + JiT

+/- 10% performance of tuned kernels in libParanumal

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 17

Page 18: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

libCEED

CPU Performance

101 102 103 104 105 106

Points per compute node

0

1

2

3

4

5

[DOF

s x C

G ite

ratio

ns] /

[com

pute

nod

es x

seco

nds]

1e8 4 nodes × 24 ranks, /cpu/self/xsmm/serial, PETSc BP3p=1p=2p=3p=4p=5p=6p=7p=8p=9p=10p=11p=12

101 102 103 104 105 106

Points per compute node

0

1

2

3

4

5

[DOF

s x C

G ite

ratio

ns] /

[com

pute

nod

es x

seco

nds]

1e8 4 nodes × 24 ranks, /cpu/self/xsmm/blocked, PETSc BP3p=1p=2p=3p=4p=5p=6p=7p=8p=9p=10p=11p=12

RMACC Summit, 4 x Intel Xeon E5-2680 v3

External vectorization important at lower order

Order we see performance ’switch’ problem dependent

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 18

Page 19: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

Example Suite

Navier-Stokes Example

State Variables:

ρ - Mass densityU - Momentum densityE - Total Energy density

3D Compressible Navier-Stokes:∂ρ∂t + div (U) = 0

∂U∂t + div (ρ (u × u) + PI3) + ρgk = div (Fu)

∂E∂t + div ((E + P) u) = div (Fe)

Viscous and Thermal Stresses:Fu = µ

(∇u + (∇u)T + λdiv (u) I3

)Fe = uFu + k∇T

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 19

Page 20: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

Example Suite

QFunction Assembly

User QFunction:// ---- Fuvisc

const CeedInt Fuviscidx [3][3] = {{0, 1, 2}, {1, 3, 4}, {2, 4, 5}};

for (CeedInt j=0; j<3; j++)

for (CeedInt k=0; k<3; k++)

dv[k][j+1][i] -= wJ*(Fu[Fuviscidx[j][0]]* dXdxdXdxT[k][0] +

Fu[Fuviscidx[j][1]]* dXdxdXdxT[k][1] +

Fu[Fuviscidx[j][2]]* dXdxdXdxT[k][2]);

Assembly:

dv[k][j+1][i] -= wJ*(Fu[Fuviscidx[j][0]]* dXdxdXdxT[k][0] +

b08d: c5 7d 28 d0 vmovapd %ymm0 ,% ymm10

Fu[Fuviscidx[j][1]]* dXdxdXdxT[k][1] +

b091: c4 42 c5 b8 d3 vfmadd231pd %ymm11 ,%ymm7 ,%ymm10

b096: c5 fd 28 84 24 c8 04 vmovapd 0x4c8(%rsp),%ymm0

b09d: 00 00

dv[k][j+1][i] -= wJ*(Fu[Fuviscidx[j][0]]* dXdxdXdxT[k][0] +

b09f: c4 62 f5 ac 14 07 vfnmadd213pd (%rdi ,%rax ,1) ,%ymm1 ,%ymm10

b0a5: c5 7d 11 14 07 vmovupd %ymm10 ,(%rdi ,%rax ,1)

Fu[Fuviscidx[j][1]]* dXdxdXdxT[k][1] +

b0aa: c5 7d 59 94 24 68 04 vmulpd 0x468(%rsp),%ymm0 ,%ymm10

b0b1: 00 00

...

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 20

Page 21: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

Current Efforts

Example Suite

Ongoing development in example suite

PHASTA investigating porting to libCEED

SUPG stabilization

Primitive variable formulation

Implicit time integrator

Initial development of shallow water equations example

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 21

Page 22: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

Current Efforts

Preconditioning

Iterative solvers require preconditioning

Especially with high-order finite element operators

Operator Diagonal- Diagonally dominant operators

P-Multigrid- Elliptic operators

BDDC with FDM- In development

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 22

Page 23: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

Future Work

Future Work

Further performance enhancements (GPU and CPU)

Improved mixed mesh and operator composition support

Expanded non-linear and multi-physics examples

Preconditioning based on libCEED operator decomposition

Algorithmic differentiation of user quadrature functions

We invite contributors and friendly users

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 23

Page 24: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

Questions

Questions?

Advisors : Jed Brown1 & Daniel Appelo1

Collaborators: Valeria Barra1, Oana Marin2, Tzanio Kolev3,Jean-Sylvain Camier3, Veselin Dobrev3, Yohann Doudouit3,Tim Warburton4, David Medina5, & Thilina Rathnayake6

Grant: Exascale Computing Project (17-SC-20-SC)

1: University of Colorado, Boulder2: Argonne National Laboratory3: Lawrence Livermore National Laboratory4: Virginia Polytechnic Institute and State University5: OCCA6: University of Illinois, Urbana-Champaign

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 24

Page 25: libCEED Finite Element Library Development Update and …...Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3, David Medina4, Tim Warburton5, & Oana Marin6 ... s reference coords Jeremy

Questions

libCEED Finite Element LibraryDevelopment Update and Examples

Jeremy L ThompsonValeria Barra, Jed Brown

University of Colorado Boulder

[email protected]

Sept 25, 2019

Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 24


Recommended