CS 380 - GPU and GPGPU Programming Lecture 24: Additional ... · CS 380 - GPU and GPGPU Programming...

transcript

CS 380 - GPU and GPGPU ProgrammingLecture 24: Additional Stuff, Part 1

Markus Hadwiger, KAUST

Reading Assignment #14 (until May 11)

Read (required):

• Programming Massively Parallel Processors book,Chapter 10 (Sparse Matrix-Vector Multiplication)

• Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors, Nathan Bell and Michael Garlandhttp://www.nvidia.com/docs/IO/77944/sc09-spmv-throughput.pdf

Read (optional):

• CUSPARSE library description in the CUDA SDK

• CUSP library: http://cusplibrary.github.io/

• Incomplete-LU and Cholesky Preconditioned Iterative Methods Using CUSPARSE and CUBLAS, Maxim Naumovhttp://developer.nvidia.com/sites/default/files/akamai/cuda/files/

psts_white_paper_final.pdfthis is also included in the CUDA SDK!

Reading Assignment ++

“Modern GPU” library and links to other libraries at:• https://nvlabs.github.io/moderngpu/intro.html

About occupancy and latency• https://nvlabs.github.io/moderngpu/performance.html

Latency data obtained via micro benchmarking• http://lpgpu.org/wp/wp-content/uploads/2013/05/poster_andresch_acaces2014.pdf

Warp-aggregate atomics• http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-optimized-filtering-warp-

aggregated-atomics/

Fast Histograms Using Shared Atomics on Maxwell• http://devblogs.nvidia.com/parallelforall/gpu-pro-tip-fast-histograms-using-shared-

atomics-maxwell/

CUDA 7 and C++11• http://devblogs.nvidia.com/parallelforall/power-cpp11-cuda-7/

PTX virtual assembly and SASS machine assembly• PTX in CUDA SDK: ptx_isa_4.2.pdf SASS in CUDA SDK: CUDA_Binary_Utilities.pdf

Markus Hadwiger, KAUST 3

CS 380 - GPU and GPGPU Programming Lecture 24: Additional ... · CS 380 - GPU and GPGPU Programming...

Documents