libCEED Finite Element LibraryDevelopment Update and Examples
Jeremy L ThompsonValeria Barra, Jed Brown
University of Colorado Boulder
Sept 25, 2019
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 1
libCEED Team
Developers: Jed Brown1, Jeremy Thompson1
Thilina Rathnayake2, Jean-Sylvain Camier3, Tzanio Kolev3,Veselin Dobrev3, Valeria Barra1, Yohann Doudouit3,David Medina4, Tim Warburton5, & Oana Marin6
Grant: Exascale Computing Project (17-SC-20-SC)
1: University of Colorado, Boulder2: University of Illinois, Urbana-Champaign3: Lawrence Livermore National Laboratory4: OCCA5: Virginia Polytechnic Institute and State University6: Argonne National Laboratory
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 2
Overview
libCEED is an extensible library that provides a portable algebraicinterface and optimized implementations of high-order operators
We have optimized implementations for CPU and GPU
We have new performance optimizations, development in ourexample suite, and research in preconditioning strategies
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 3
Overview
1 Introduction
2 libCEED
3 Example Suite
4 Current Efforts
5 Future Work
6 Questions
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 4
Introduction
Center for Efficient Exascale Discretizations
DoE exascale co-design center
Design discretization algorithms for exascale hardware that deliversignificant performance gain over low order methods
Collaborate with hardware vendors and software projects for exascalehardware and software stack
Provide efficient and user-friendly unstructured PDE discretizationcomponent for exascale software ecosystem
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 5
Introduction
Tensor Product Elements
Using an assembled matrix forgoes performance optimizationsfor hexahedral elements
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 6
libCEED
libCEED Design
libCEED design approach:
Avoid global matrix assembly
Optimize basis operations for all architectures
Single source user quadrature point functions
Easy to parallelize across hetrogeneous nodes
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 7
libCEED
libCEED Backends
CPU
GPU
Pure C
MAGMA
AVX
LIBXSMM
Pure CUDA
OCCA
libCEED
CPU
GPU
MFEM
Nek5000
PETSc
...
libCEED provides multiple backend implementations
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 8
libCEED
libCEED Operator Decomposition
AL = GTBTDBG
G - CeedElemRestriction, local gather/scatter
B - CeedBasis, provides basis operations such as interp and grad
D - CeedQFunction, representation of PDE at quadrature points
AL - CeedOperator, aggregation of Ceed objects for local action of operator
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 9
libCEED
Laplacian Example
Solving the 2D Poisson problem: −∆u = fWeak Form:
∫∇v∇u =
∫vf
General libCEED OperatorAL = GTBTDBG
Laplacian OperatorAL = GTBT
Grad2DDBGrad2DG
where D is block diagonal by quadrature point:
Di = (wi det Jgeo) J−1geoJ
−Tgeo and Jgeo =
[ ∂x∂r
∂x∂s
∂y∂r
∂y∂s
]x , y physical coords; r , s reference coords
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 10
libCEED
Basis Optimization
Solving the 2D Poisson problem: −∆u = fWeak Form:
∫∇v∇u =
∫vf
General libCEED OperatorAL = GTBTDBG
Laplacian OperatorAL = GTBT
Grad2DDBGrad2DG
Computationally Efficient Form
AL = GT[BTG ⊗ BT
I BTI ⊗ BT
G
]D
[BG ⊗ BI
BI ⊗ BG
]G
BI - 1D InterpolationBG - 1D Gradient
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 11
libCEED
Basis Optimization
Solving the 2D Poisson problem: −∆u = fWeak Form:
∫∇v∇u =
∫vf
General libCEED OperatorAL = GTBTDBG
Laplacian OperatorAL = GTBT
Grad2DDBGrad2DG
Computationally Efficient FormAL =
GT(BTI ⊗ BT
I
) [BTG ⊗ I2 I2 ⊗ BT
G
]D
[BG ⊗ I2I2 ⊗ BG
](BI ⊗ BI )G
where BG = BGBI
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 12
libCEED
Operator Definition
General libCEED Operator:vL = ALuL
AL = GTBTDBG
Laplacian Operator Code:
CeedOperatorCreate(ceed , qf_apply , NULL , NULL , &op_apply);
CeedOperatorSetField(op_apply , "du", e r e s t r i c t u , CEED_TRANSPOSE ,
basisu , CEED_VECTOR_ACTIVE);
CeedOperatorSetField(op_apply , "geo",erestrictqdi ,CEED_NOTRANSPOSE ,
CEED_BASIS_COLLOCATED , geo);
CeedOperatorSetField(op_apply , "dv", e r e s t r i c t u , CEED_TRANSPOSE ,
basisu , CEED_VECTOR_ACTIVE);
...
CeedOperatorApply(op_apply , uloc, vloc, CEED_REQUEST_IMMEDIATE);
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 13
libCEED
QFunction Definition
General libCEED QFunction:vq = Duq
2D Laplacian QFunction:[dv0
dv1
]=
[D00 D01
D01 D11
][du0
du1
]
2D Laplacian QFunction Code:
CeedQFunctionCreateInterior(ceed , 1, Poisson2D ,
Poisson2D_loc , &qf_apply);
CeedQFunctionAddInput(qf_apply , "du", 2, CEED_EVAL_GRAD);
CeedQFunctionAddInput(qf_apply , "geo", 3, CEED_EVAL_NONE);
CeedQFunctionAddOutput(qf_apply , "dv", 2, CEED_EVAL_GRAD);
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 14
libCEED
QFunction Definition
Single Source QFunctions for all backends:
C/C++ code, compiled with main for CPU, JiT for GPU
int Poisson2D(void *ctx , const CeedInt Q,
const CeedScalar *const *in, CeedScalar *const *out) {
// Inputs and Outputs
const CeedScalar *du = in[0];
CeedScalar *geo = out[0], *dv = out [1];
// Quadrature Point Loop
CeedPragmaSIMD // For CPU vectorization
for (CeedInt i=0; i<Q; i++) {
dv[i+Q*0] = geo[i+Q*0]*du[i+Q*0] + geo[i+Q*2]*du[i+Q*1];
dv[i+Q*1] = geo[i+Q*2]*du[i+Q*0] + geo[i+Q*1]*du[i+Q*1];
} // End of Quadrature Point Loop
return 0;
}
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 15
libCEED
libCEED Performance
Benchmark performance across multiple implementations
Benchmark Problem 1/2:
Mu = f
L2 projection problem
Benchmark Problem 3/4:
Ku = f
Poisson problem
3D scalar problem (BP 1/3) or 3D vector problem (BP 2/4)
Unpreconditioned CG, maximum of 20 iterations
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 16
libCEED
GPU Performance
Substantial performance increase with Single Source QF + JiT
+/- 10% performance of tuned kernels in libParanumal
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 17
libCEED
CPU Performance
101 102 103 104 105 106
Points per compute node
0
1
2
3
4
5
[DOF
s x C
G ite
ratio
ns] /
[com
pute
nod
es x
seco
nds]
1e8 4 nodes × 24 ranks, /cpu/self/xsmm/serial, PETSc BP3p=1p=2p=3p=4p=5p=6p=7p=8p=9p=10p=11p=12
101 102 103 104 105 106
Points per compute node
0
1
2
3
4
5
[DOF
s x C
G ite
ratio
ns] /
[com
pute
nod
es x
seco
nds]
1e8 4 nodes × 24 ranks, /cpu/self/xsmm/blocked, PETSc BP3p=1p=2p=3p=4p=5p=6p=7p=8p=9p=10p=11p=12
RMACC Summit, 4 x Intel Xeon E5-2680 v3
External vectorization important at lower order
Order we see performance ’switch’ problem dependent
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 18
Example Suite
Navier-Stokes Example
State Variables:
ρ - Mass densityU - Momentum densityE - Total Energy density
3D Compressible Navier-Stokes:∂ρ∂t + div (U) = 0
∂U∂t + div (ρ (u × u) + PI3) + ρgk = div (Fu)
∂E∂t + div ((E + P) u) = div (Fe)
Viscous and Thermal Stresses:Fu = µ
(∇u + (∇u)T + λdiv (u) I3
)Fe = uFu + k∇T
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 19
Example Suite
QFunction Assembly
User QFunction:// ---- Fuvisc
const CeedInt Fuviscidx [3][3] = {{0, 1, 2}, {1, 3, 4}, {2, 4, 5}};
for (CeedInt j=0; j<3; j++)
for (CeedInt k=0; k<3; k++)
dv[k][j+1][i] -= wJ*(Fu[Fuviscidx[j][0]]* dXdxdXdxT[k][0] +
Fu[Fuviscidx[j][1]]* dXdxdXdxT[k][1] +
Fu[Fuviscidx[j][2]]* dXdxdXdxT[k][2]);
Assembly:
dv[k][j+1][i] -= wJ*(Fu[Fuviscidx[j][0]]* dXdxdXdxT[k][0] +
b08d: c5 7d 28 d0 vmovapd %ymm0 ,% ymm10
Fu[Fuviscidx[j][1]]* dXdxdXdxT[k][1] +
b091: c4 42 c5 b8 d3 vfmadd231pd %ymm11 ,%ymm7 ,%ymm10
b096: c5 fd 28 84 24 c8 04 vmovapd 0x4c8(%rsp),%ymm0
b09d: 00 00
dv[k][j+1][i] -= wJ*(Fu[Fuviscidx[j][0]]* dXdxdXdxT[k][0] +
b09f: c4 62 f5 ac 14 07 vfnmadd213pd (%rdi ,%rax ,1) ,%ymm1 ,%ymm10
b0a5: c5 7d 11 14 07 vmovupd %ymm10 ,(%rdi ,%rax ,1)
Fu[Fuviscidx[j][1]]* dXdxdXdxT[k][1] +
b0aa: c5 7d 59 94 24 68 04 vmulpd 0x468(%rsp),%ymm0 ,%ymm10
b0b1: 00 00
...
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 20
Current Efforts
Example Suite
Ongoing development in example suite
PHASTA investigating porting to libCEED
SUPG stabilization
Primitive variable formulation
Implicit time integrator
Initial development of shallow water equations example
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 21
Current Efforts
Preconditioning
Iterative solvers require preconditioning
Especially with high-order finite element operators
Operator Diagonal- Diagonally dominant operators
P-Multigrid- Elliptic operators
BDDC with FDM- In development
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 22
Future Work
Future Work
Further performance enhancements (GPU and CPU)
Improved mixed mesh and operator composition support
Expanded non-linear and multi-physics examples
Preconditioning based on libCEED operator decomposition
Algorithmic differentiation of user quadrature functions
We invite contributors and friendly users
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 23
Questions
Questions?
Advisors : Jed Brown1 & Daniel Appelo1
Collaborators: Valeria Barra1, Oana Marin2, Tzanio Kolev3,Jean-Sylvain Camier3, Veselin Dobrev3, Yohann Doudouit3,Tim Warburton4, David Medina5, & Thilina Rathnayake6
Grant: Exascale Computing Project (17-SC-20-SC)
1: University of Colorado, Boulder2: Argonne National Laboratory3: Lawrence Livermore National Laboratory4: Virginia Polytechnic Institute and State University5: OCCA6: University of Illinois, Urbana-Champaign
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 24
Questions
libCEED Finite Element LibraryDevelopment Update and Examples
Jeremy L ThompsonValeria Barra, Jed Brown
University of Colorado Boulder
Sept 25, 2019
Jeremy L Thompson (CU Boulder) libCEED Finite Element Library Sept 25, 2019 24