+ All Categories
Home > Documents > OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline •...

OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline •...

Date post: 12-Mar-2018
Category:
Upload: hoangthuan
View: 331 times
Download: 12 times
Share this document with a friend
35
OpenFOAM ® + GPGPU İbrahim Özküçük
Transcript
Page 1: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

OpenFOAM® + GPGPU

İbrahim Özküçük

Page 2: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Outline

• GPGPU vs CPU

• GPGPU plugins for OpenFOAM®

• Overview of Discretization

• Linear System Solvers in OpenFOAM®

• CUDA® for FOAM Link (cufflink)

• Cusp & Thrust Libraries

• How Cufflink Works

• Performance data of Cufflink solvers

• CUDA® Solvers in foam-extend-3.0

• Considerations about future

23.01.2014OpenFOAM® and CUDA®-based solvers 2

Page 3: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

GPGPU vs CPU

23.01.2014OpenFOAM® and CUDA®-based solvers 3

Taken from reference (1)

Page 4: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

GPGPU vs CPU

23.01.2014OpenFOAM® and CUDA®-based solvers 4

Taken from reference (1)

Page 5: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

OpenFOAM GPGPU Solvers

23.01.2014OpenFOAM® and CUDA®-based solvers 5

• SpeedIT Plugin to OpenFOAM ® - Conjugate Gradient & BiConjugate GradientFurther information @ http://speedit.vratis.com/index.php/products

• ofgpu, GPU Linear Solvers for OpenFOAM®Further information @ http://www.symscape.com/gpu-openfoam

• Culises - GPU power for OpenFOAM®Further information @ http://www.fluidyna.com/content/culises

Page 6: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Overview of Discretization

The term discretization means approximation of a problem into discrete quantities. The FV method and others, such as the finite element and finite difference methods, all discretize the problem as follows:

• Spatial discretization Defining the solution domain by a set of points that fill and bound a region of space when connected;

• Temporal discretization (For transient problems) dividing the time domain into into a finite number of time intervals, or steps;

• Equation discretization Generating a system of algebraic equations in terms of discrete quantities defined at specific locations in the domain, from the PDEs that characterize the problem.

23.01.2014OpenFOAM® and CUDA®-based solvers 6

Page 7: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Linear System Solvers in OpenFOAM®

• PBiCG - preconditioned bi-conjugate gradient solver for asymmetric matrices;

• PCG - preconditioned conjugate gradient solver for symmetric matrices;

• GAMG - generalized geometric-algebraic multi-grid solver

• smoothSolver - solver using a smoother for both symmetric and asymmetric matrices

• diagonalSolver - diagonal solver for both symmetric and asymmetric matrices

23.01.2014OpenFOAM® and CUDA®-based solvers 7

Page 8: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Linear System Solvers in OpenFOAM®

• Diagonal incomplete-Cholesky (DIC)

• Diagonal incomplete LU (DILU)

• GAMG preconditioner

23.01.2014OpenFOAM® and CUDA®-based solvers 8

Preconditioners

• Diagonal incomplete-Cholesky (DIC)

• Diagonal incomplete LU (DILU)

• Gauss-Seidel

• Variants of DIC and DILU exist with additional Gauss-Seidel smoothing

Smoothers

Page 9: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Interface for Linear System Solvers

23.01.2014OpenFOAM® and CUDA®-based solvers 9

lduMatrix ClassOpenFOAM

=

A x b

A

x_solution

b

GPGPU Linear System Solver

?

Page 10: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

CUDA® for FOAM Link (cufflink)

23.01.2014OpenFOAM® and CUDA®-based solvers 10

• Cuda For FOAM Link (cufflink) is an open-source library for linking numerical

methods based on Nvidia's Compute Unified Device Architecture (CUDA™) C/C++

programming language and OpenFOAM®. Currently, the library utilizes the sparse

linear solvers of Cusp and methods from Thrust to solve the linear Ax = b system

derived from OpenFOAM's lduMatrix class and return the solution vector. Cufflink is

designed to utilize the course-grained parallelism of OpenFOAM® (via domain

decomposition) to allow multi-GPU parallelism at the level of the linear system

solver. Currently only supports the OpenFOAM-extend fork of the OpenFOAM code.

• https://code.google.com/p/cufflink-library/

Page 11: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

CUSPA C++ Templated Sparse Matrix Library

23.01.2014OpenFOAM® and CUDA®-based solvers 119

Integrating CUSP into OpenFOAM

“Cusp is a library for sparse linear algebra and graph computations on CUDA. Cusp provides a flexible, high-level interface for manipulating sparse matrices and solving sparse linear systems.”[2]

Provided Template Solvers:• (Bi-) Conjugate Gradient (-Stabilized)• GMRES

Matrix Storage • CSR, COO, HYB, DIA

Provided Preconditioners• Jacobi (diagonal) preconditioners• Sparse Approximate inverse preconditioner• Smoothed-Aggregation Algebraic Multigrid preconditioner

cusp-Library http://code.google.com/p/cusp-library/

Page 12: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Thrust

23.01.2014OpenFOAM® and CUDA®-based solvers 12

10

Integrating CUSP into OpenFOAM

“Thrust is a CUDA library of parallel algorithms with an interface resembling the C++ Standard Template Library (STL). Thrust provides a flexible high-levelinterface for GPU programming that greatly enhances developer productivity. “ [3]

http://code.google.com/p/thrust/

Page 13: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

How Cufflink Works

23.01.2014OpenFOAM® and CUDA®-based solvers 13

OpenFOAM lduMatrix Class

=

A x b

Cusp Solver on GPU

=

A x b

Thrust Methods

Cusp Methods

Page 14: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

How Cufflink Works

23.01.2014OpenFOAM® and CUDA®-based solvers 14

OpenFOAM lduMatrix Class

=

A x b

Cusp Solver on GPU

=

A x b

Thrust Methods

Cusp Methods

thrust::copy method converts lduMatrix data into COO format.

Page 15: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

How Cufflink Works

23.01.2014OpenFOAM® and CUDA®-based solvers 15

OpenFOAM lduMatrix Class

=

A x b

Cusp Solver on GPU

=

A x b

Thrust Methods

Cusp Methods

thrust::copy method converts lduMatrix data into COO format.

Data in COO format is transfered to GPU memory by using CUDA code.

Page 16: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

How Cufflink Works

23.01.2014OpenFOAM® and CUDA®-based solvers 16

OpenFOAM lduMatrix Class

=

A x b

Cusp Solver on GPU

=

A x b

Thrust Methods

Cusp Methods

thrust::copy method converts lduMatrix data into COO format.

Data in COO format is transfered to GPU memory by using CUDA code.

Data in COO format is changed into different formats in GPU and passed into CUSP-based solver along with a convergence

criteria

Page 17: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

How Cufflink Works

23.01.2014OpenFOAM® and CUDA®-based solvers 17

OpenFOAM lduMatrix Class

=

A x b

Cusp Solver on GPU

=

A x b

Thrust Methods

Cusp Methods

thrust::copy method converts lduMatrix data into COO format.

Data in COO format is transfered to GPU memory by using CUDA code.

Data in COO format is changed into different formats in GPU and passed into CUSP-based solver along with a convergence

criteria

Residuals are calculated based on OpenFOAM’s normalized residual method

Page 18: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

How Cufflink Works

23.01.2014OpenFOAM® and CUDA®-based solvers 18

OpenFOAM lduMatrix Class

=

A x b

Cusp Solver on GPU

=

A x b

Thrust Methods

Pass X solution vector back to OpenFOAM by using thrust methods

along with GPU solver performance data.

Page 19: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Current Cufflink Solvers

23.01.2014OpenFOAM® and CUDA®-based solvers 19

• cufflink_AinvPBiCGStab

• cufflink_AinvPCG

• cufflink_CG

• cufflink_DiagPBiCGStab

• cufflink_DiagPCG

• cufflink_SmAPCG

• These solvers also have their parallel versions which works in multi-gpu setups by using OpenFOAM’s domain decomposition methods.

Page 20: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Performance Data taken fromOptimization, HPC, and Pre- and Post-Processing I Session.6th OpenFOAM® Workshop Penn State University. June 15th 2011

23.01.2014OpenFOAM® and CUDA®-based solvers 2017

Preliminary ResultsA test Problem.

02 =∇ T

2D Heat Equation

Vary N from 10-2000 where N2 = nCells

Page 21: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Performance Data taken fromOptimization, HPC, and Pre- and Post-Processing I Session.6th OpenFOAM® Workshop Penn State University. June 15th 2011

23.01.2014OpenFOAM® and CUDA®-based solvers 21

Preliminary ResultsSolver Settings

All CG solvers

Tolerance = 1e-10;MaxIter 1000;

solver GAMG; tolerance 1e-10; smoother GaussSeidel; nPreSweeps 0; nPostSweeps 2; cacheAgglomeration true; nCellsInCoarsestLevel sqrt(nCells); agglomerator faceAreaPair; mergeLevels 1;

Page 22: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Performance Data taken fromOptimization, HPC, and Pre- and Post-Processing I Session.6th OpenFOAM® Workshop Penn State University. June 15th 2011

23.01.2014OpenFOAM® and CUDA®-based solvers 22

Preliminary ResultsSetup

CUDA version 4.0CUSP version 0.2Thrust version 1.4Ubuntu 10.04

CPU: Dual Intel Xeon Quad Core E5430 2.66GHzMotherboard: Tyan S5396RAM: 24 gig

GPU: Tesla C2050 3GB DDR5515 Gflops peak double precision1.03 Tflops Peak single precision14 MP * 32 cores/MP = 448 coresHost-device memory bw = 1566 MB/sec (Motherboard specific)

Page 23: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Performance Data taken fromOptimization, HPC, and Pre- and Post-Processing I Session.6th OpenFOAM® Workshop Penn State University. June 15th 2011

23.01.2014OpenFOAM® and CUDA®-based solvers 2320

Preliminary ResultsSolve Time

0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000

0

200

400

600

800

1000

1200

1400

Solve() Time Comparison

cusplink_SmAPCG

GAMG

cusplink_DPCG

cusplink_CG

DPCG-parallel4

DPCG-parallel6-s231

DPCG

CG

nCells

Tim

e [se

co

nd

s]

Page 24: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Performance Data taken fromOptimization, HPC, and Pre- and Post-Processing I Session.6th OpenFOAM® Workshop Penn State University. June 15th 2011

23.01.2014OpenFOAM® and CUDA®-based solvers 24 21

Preliminary ResultsSolution Speedup

0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000

0

2

4

6

8

10

12

14

16

18Speedup Comparison

DPCG

DPCG-parallel4

DPCG-parallel6-s231

DPCG-parallel6-s161

cusplink_DPCG

cusplink_CG

nCells

Speedup

Speedup = Ts/Tp = TOFCG

/Tother

Page 25: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Performance Data taken fromOptimization, HPC, and Pre- and Post-Processing I Session.6th OpenFOAM® Workshop Penn State University. June 15th 2011

23.01.2014OpenFOAM® and CUDA®-based solvers 2522

Preliminary ResultsSolution Speedup

0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000

0

20

40

60

80

100

120

140

Speedup Comparison

DPCG

DPCG-parallel4

DPCG-parallel6-s231

DPCG-parallel6-s161

cusplink_CG

cusplink_DPCG

GAMG

GAMG6

cusplink_SmAPCG

nCells

Speedup

Speedup = Ts/Tp = TOFCG

/Tother

Page 26: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Performance Data taken fromOptimization, HPC, and Pre- and Post-Processing I Session.6th OpenFOAM® Workshop Penn State University. June 15th 2011

23.01.2014OpenFOAM® and CUDA®-based solvers 26 23

Preliminary ResultsSolution Speedup

0 200000 400000 600000 800000 1000000 1200000

0

10

20

30

40

50

60

Speedup Comparison

DPCG

DPCG-parallel4

DPCG-parallel6-s231

DPCG-parallel6-s161

cusplink_CG

cusplink_DPCG

GAMG6

GAMG

cusplink_SmAPCG

nCells

Speedup

Speedup = Ts/Tp = TOFCG

/Tother

Page 27: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

CUDA® Solvers in foam-extend-3.0

23.01.2014OpenFOAM® and CUDA®-based solvers 27

• Cufflink library is built-in since foam-extend-3.0.

• Right now, compiling CUDA® solvers in foam-extend-3.0 is very hard due to lack of knowledge and tutorials about it.

• In near future, improvements on GPGPU solvers in foam-extend fork of OpenFOAM® is expected by the community of foam-extend.

• It includes the following solvers: cudaBiCGStab, cudaCG

Page 28: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Considerations about Future

23.01.2014OpenFOAM® and CUDA®-based solvers 28

• Improvements on Cusp based solvers which would decrease the effect of memory bottleneck between GPU and main memory.

• Different open-source sparse-matrix linear equations solver can replace the Cusp based ones for performance improvement. However, this is not a trivial task!

• Right now, multi-gpu on one node is supported, but developments of multi-node gpus would be better for very large scale simulations where one node would not be enough.

• Problem size must be big enough for compensating GPU memory bottleneck overhead.

Page 29: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

GPGPU vs CPU

23.01.2014OpenFOAM® and CUDA®-based solvers 29

CPU – GPU HW Differences

● CPU

● Most die area used for memory cache

● Relatively few transistors for ALUs

● GPU

● Most die area used for ALUs

● Relatively small caches

Page 30: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

GPGPU vs CPU

23.01.2014OpenFOAM® and CUDA®-based solvers 30

Taken from reference (1)

Page 31: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

GPGPU vs CPU

23.01.2014OpenFOAM® and CUDA®-based solvers 31

Taken from reference (1)

Page 32: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

GPGPU vs CPU

23.01.2014OpenFOAM® and CUDA®-based solvers 32

Taken from reference (1)

Page 33: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

GPGPU vs CPU

23.01.2014OpenFOAM® and CUDA®-based solvers 33

Taken from reference (1)

Page 34: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

Q & A

Page 35: OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

1. Karl Rupp. CPU, GPU and MIC Hardware Characteristics over Time. retrieved from http://www.karlrupp.net/2013/06/cpu-gpu-and-mic-hardware-characteristics-over-time/ on date 21.01.2014.

2. Daniel P. Combest, Dr. P.A. Ramachandran, Dr. M.P. Dudukovic. Implementing Fast Parallel Linear System Solvers In OpenFOAM based on CUDA. 6th OpenFOAM Workshop Penn State University. June 15th 2011.

3. The OpenFOAM® Extend Project tutorials

References


Recommended