+ All Categories
Home > Documents > NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf ·...

NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf ·...

Date post: 04-Jul-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
27
GTC 2012: NEW ADVANCES IN GPU LINEAR ALGEBRA Kyle Spagnoli EM Photonics 5/16/2012
Transcript
Page 1: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

GTC 2012:

NEW ADVANCES IN GPU LINEAR ALGEBRA

Kyle Spagnoli EM Photonics

5/16/2012

Page 2: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

QUICK “ABOUT US”

» HPC/GPU Consulting Firm

» Specializations in:

» Electromagnetics

» Image Processing

» Fluid Dynamics

» Linear Algebra

Page 3: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

INTRODUCTION TO OUR LIBRARIES

» CULA Dense » Linear algebra routines

» CULA Sparse » Iterative sparse system solvers

and preconditioners

» pCULA » Scalable solvers for multiple GPUs

» Ongoing work

Page 4: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

INTRODUCTION - COMMON POINTS

» Easy to use » No GPU programming experience necessary » dgetrf(…) culaDgetrf(…)

» Exhaustively tested and benchmarked » Accuracy & stability first!

» Cross platform » Linux, Windows, Mac OS X

» Multiple languages » C/C++, Fortran, Python, Matlab

Page 5: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

CULA DENSE

Page 6: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

CULA DENSE – INTRODUCTION

» First released in 2009

» LAPACK and BLAS implementations » Host or device memory

» Almost 300 routines

» Upcoming release (R15) » Tuned for Kepler architecture

» Now free for personal academic use

Page 7: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

CULA DENSE - FUNCTIONALITY

LAPACK BLAS

LU factorization Cholesky factorization Matrix-matrix multiply

QR decomposition Orthogonal factorization Matrix-vector multiply

Least squares System solve Rank updates

Eigenvalue routines Matrix inversion Conjugate

Singular value decomposition

Auxiliary routines Transpose

Page 8: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

CULA DENSE - PERFORMANCE

0

100

200

300

400

500

600

700

800

640 1920 3200 4480 5760 7040 8320 9600 10880 12160 13440 14720 16000 17280

GFL

OP

s

Matrix Size

CULA Dense - Cholesky Factorization (SPOTRF)

CPU (MKL)

GPU (GTX680)

Performance numbers include transfer time across PCI-Express (Gen2) bus CPU Intel Core i7 2600K GPU NVIDIA GTX 680 (1.5 GB)

Page 9: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

CULA DENSE – LINK INTERFACE

» GPU acceleration with no code changes!

» Intercepts calls to BLAS & LAPACK libraries

» Analyze routine, parameters, and hardware

» Forward to GPU if appropriate

» Pass-through to CPU otherwise

Page 10: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

CULA SPARSE (ITERATIVE)

Page 11: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

CULA SPARSE – INTRODUCTION

» First released in 2011

» Iterative solvers and preconditioners

» Multiple matrix storage formats supported

» Upcoming release (S3)

» Tuned for Kepler

» Free for personal academic use

Page 12: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

CULA SPARSE - FUNCTIONALITY

Solvers Preconditioners Data

CG Jacobi Double / Complex

BiCG Block Jacobi CSR / CSC / COO

BiCG-Stab / (L) ILU0

GMRES Reordered ILU0

MINRES

Page 13: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

CULA SPARSE - PERFORMANCE

0x

2x

4x

6x

8x

10x

12x

14x

16x

CG GMRES BiCG MINRES BiCGSTAB BiCGSTABL

Spee

d U

p

Iterative Solver Performance

System Size = 1.5M GPU = NVIDIA C2070 CPU = Xeon X5560 (MKL)

Page 14: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

CULA SPARSE – PERFORMANCE FEATURES

» Hybrid performance

» CPU begins working during initial transfer

» Preconditioner generation

» Initial iterations

» Matrix reordering

» Can increase parallelism

Page 15: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

PCULA – MULTI-GPU + CPU PERFORMANCE

Page 16: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

PCULA – INTRODUCTION

» Scale to multiple GPUs and CPUs in a single node

» Currently in alpha release

» Greatly increased performance, scalability, and functionality coming soon!

Page 17: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

PCULA – TILED ALGORITHMS

(0,0) (0,1) (0,2) (0,3)

(1,0) (1,1) (1,2) (1,3)

(2,0) (2,1) (2,2) (2,3)

(3,0) (3,1) (3,2) (3,3)

m

n

original matrix tiled matrix

Page 18: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

PCULA – TASK SCHEDULING

GEMM

GEMM

GEMM

POTRF

SYRK

Pending Valid Tasks

Hardware

Busy

Free

Free

POTRF

TRSM

TRSM

TRSM

SYRK

Completed Tasks

SYRK

(3 tasks)

(1 task)

(0 tasks)

Page 19: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

PCULA – HETEROGENEOUS TASK SCHEDULING

» Data locality is critical

» Hardware performance

» Persistent “live tuning” performance database

» Task queue depth

» Too long idle hardware if not perfect

» Too short worker starvation

Page 20: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

PCULA – OUT OF (GPU) CORE

» Solve problems larger than GPU memory

» Natural extension of tiled data partitioning

» MESI memory coherence protocol

» Least recently used replacement strategy

(0,0) (0,1) (0,2) (0,3)

(1,0) (1,1) (1,2) (1,3)

(2,0) (2,1) (2,2) (2,3)

(3,0) (3,1) (3,2) (3,3)

(0,0) (0,1) (0,2) (0,3)

(1,0) (1,1) (1,2) (1,3)

(2,0) (2,1) (2,2) (2,3)

(3,0) (3,1) (3,2) (3,3)

Invalid Modified

Exclusive Shared

Page 21: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

PCULA – FUNCTION LIST

» Currently supports

» BLAS Routines (GEMM, TRSM, GEMV)

» LU Factorization & Solve (GETRF + GESV)

» Cholesky Factorization & Solve (POTRF + POSV)

» QR Factorization & Solve (GEQRF + GEQRS)

» Eigenvalue and SVD routines in future release

Page 22: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

PCULA - PERFORMANCE

0

200

400

600

800

1000

1200

4608 5760 6912 8064 9216 10368 11520 12672

GFL

OP

s

Matrix Size

pCULA - DGEMM Performance

CPU

GPU

CPU + GPU

CPU + 2xGPU

Performance numbers include transfer time across PCI-Express (Gen2) bus CPU Intel Xeon 5560 GPU 2x NVIDIA C2050

Page 23: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

ONGOING WORK

Page 24: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

ONGOING WORK - CULA

» CULA Dense » More routines/tuning

» CULA Sparse » Direct solvers » Algebraic Multi-Grid (AMG)

» pCULA » Multi-node cluster support » NUMA optimizations

Page 25: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

ONGOING WORK – C++ AMP

» Microsoft’s C++ AMP library » “ampblas” development project

» Linear algebra to C++ AMP ecosystem

» Multiple talks today and tomorrow

» C++ AMP Lounge

Page 26: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

CULA PARTNERS & INTEGRATORS

» Here at GTC 2012....

Page 27: NEW ADVANCES IN LINEAR ALGEBRA - NVIDIAon-demand.gputechconf.com › ...GPU-Linear-Algebra.pdf · Title: New Advances in GPU Linear Algebra - GPU Technology Conference 2012 Author:

Copyright © 2012 EM Photonics

THANKS!

» Convention hall @ booth #20

» More information @ www.culatools.com

Thanks! Questions?


Recommended