Rached Abdelkhalak, NVIDIA
CUDA DEVELOPER TOOLS: OVERVIEW & NEW FEATURES
2
DEVELOPER WORKFLOW
Application
Development
IDE integration
Debug Gfx APIDebug CUDA
System & CUDA
Trace
Graphics ProfilingCUDA Kernel
Profiling
Gfx GPU
crash dump
CUDA GPU
crash dump
3
Nsight Eclipse EditionNsight Visual Studio Edition
DEVELOPER TOOLS PORTFOLIO
Application
Development
IDE integration
Debug Gfx APIDebug CUDA
System & CUDA
Trace
Graphics ProfilingCUDA Kernel
Profiling
Gfx GPU
crash dump
CUDA GPU
crash dump
4
Nsight Eclipse EditionNsight Visual Studio Edition
DEVELOPER TOOLS PORTFOLIO
Application
Development
IDE integration
Debug Gfx APIDebug CUDA
System & CUDA
Trace
Graphics ProfilingCUDA Kernel
Profiling
Gfx GPU
crash dump
CUDA GPU
crash dump
Nsight Eclipse Editioncuda-gdbNsight Visual Studio EditionNsight Computecuda-memcheck & compute-sanitizerNsight Graphics
5
Nsight Eclipse EditionNsight Visual Studio Edition
DEVELOPER TOOLS PORTFOLIO
Application
Development
IDE integration
Debug Gfx APIDebug CUDA
System & CUDA
Trace
Graphics ProfilingCUDA Kernel
Profiling
Gfx GPU
crash dump
CUDA GPU
crash dump
Nsight Eclipse Editioncuda-gdbNsight Visual Studio EditionNsight Computecuda-memcheck & compute-sanitizerNsight Graphics
cuda-gdb/Nsight Eclipse EditionNsight Visual Studio EditionNsight Aftermath
6
Nsight Eclipse EditionNsight Visual Studio Edition
DEVELOPER TOOLS PORTFOLIO
Application
Development
IDE integration
Debug Gfx APIDebug CUDA
System & CUDA
Trace
Graphics ProfilingCUDA Kernel
Profiling
Gfx GPU
crash dump
CUDA GPU
crash dump
Nsight Eclipse Editioncuda-gdbNsight Visual Studio EditionNsight Computecuda-memcheck & compute-sanitizerNsight Graphics
cuda-gdb/Nsight Eclipse EditionNsight Visual Studio EditionNsight Aftermath
Nsight Systems
7
Nsight Eclipse EditionNsight Visual Studio Edition
DEVELOPER TOOLS PORTFOLIO
Application
Development
IDE integration
Debug Gfx APIDebug CUDA
System & CUDA
Trace
Graphics ProfilingCUDA Kernel
Profiling
Gfx GPU
crash dump
CUDA GPU
crash dump
Nsight Eclipse Editioncuda-gdbNsight Visual Studio EditionNsight Computecuda-memcheck & compute-sanitizerNsight Graphics
cuda-gdb/Nsight Eclipse EditionNsight Visual Studio EditionNsight Aftermath
Nsight Systems
Nsight Compute
Nsight Graphics
8
OVERALL UPDATESChips, platforms support across developer tools
CUDA 11.0 support OS support updatesMacOSX host platform only
Removal of Windows 7 support
Chips UpdateA100 GPU Support
Arm SBSA support
9
NVIDIA® NSIGHT™ ECLIPSE EDITION
Plug-in to EclipseEclipse 4.8 to 4.11 support
Edit, build, debug CUDA-C applications
CUDA aware source code editor – syntax highlighting, code completion and inline help
Debugger - Seamless and simultaneous debugging of CPU and GPU code
NVCC build integration to cross compile for various target platforms
Docker support
Plug-ins enabling CUDA development
Documentation: https://docs.nvidia.com/cuda/nsight-eclipse-plugins-guide
https://docs.nvidia.com/cuda/nsight-eclipse-plugins-guide
10
NSIGHT VISUAL STUDIO EDITION
Visual Studio 2015, 2017 and 2019 support
CPU + GPU CUDA Debugging
Source-correlated assembly debugging(SASS / PTX / SASS+PTX)
Data breakpoints for CUDA C/C++ code
Expressions in Locals, Watch and Conditionals
CUDA info view
Crash dump
Warp Watch view in NextGen debugger
Improved OptiX debugging (-lineinfo)
More info & documentation: https://developer.nvidia.com/nsight-visual-studio-edition
VS Extension enabling CUDA development
https://developer.nvidia.com/nsight-visual-studio-edition
11
NSIGHT TOOLS INTEGRATION INTO VISUAL STUDIO
Improved workflow for Nsight Systems, Nsight Graphics & Nsight Compute:• Settings passed to the standalone tool• Quick launch with key bindings• Complements Nsight Visual Studio Edition's Debugger
Works with Visual C++, C#, Visual Basic .NET, F#, and Python projectsOn Visual Studio Marketplace: installation and automatic update notificationsSupported in Visual Studio 2015, 2017, & 2019
Improved Workflow. Launches more powerful, standalone tools within Visual Studio
Tools appear under the Nsight menu (highlighted) Visual Studio project settings are transferred to the Nsight tool upon launch
More Info: https://developer.nvidia.com/nsight-tools-visual-studio-integration
https://developer.nvidia.com/nsight-tools-visual-studio-integration
12
CUDA-GDB
Command line source and assembly (SASS) level debugger
Nsight Eclipse Edition debugging backend
Simultaneous CPU and GPU debugging
Inspect and modify memory, register, variable state
Control program execution
Runtime GPU error detection
Support for multiple GPUs, multiple contexts, multiple kernels, Thread focus
Core dump support
Documentation : http://docs.nvidia.com/cuda/cuda-gdb
Overview of command line debugger
(cuda-gdb) info cuda threads breakpoint all
BlockIdx ThreadIdx Virtual PC Dev SM Wp Ln Filename Line
Kernel 0
(1,0,0) (0,0,0) 0x0000000000948e58 0 11 0 0 infoCommands.cu 12
(1,0,0) (1,0,0) 0x0000000000948e58 0 11 0 1 infoCommands.cu 12
(1,0,0) (2,0,0) 0x0000000000948e58 0 11 0 2 infoCommands.cu 12
(1,0,0) (3,0,0) 0x0000000000948e58 0 11 0 3 infoCommands.cu 12
(1,0,0) (4,0,0) 0x0000000000948e58 0 11 0 4 infoCommands.cu 12
(1,0,0) (5,0,0) 0x0000000000948e58 0 11 0 5 infoCommands.cu 12
(cuda-gdb) info cuda threads breakpoint 2 lane 1
BlockIdx ThreadIdx Virtual PC Dev SM Wp Ln Filename Line
Kernel 0
(1,0,0) (1,0,0) 0x0000000000948e58 0 11 0 1 infoCommands.cu 12
http://docs.nvidia.com/cuda/cuda-gdb
13
CUDA-GDB
Upgrade to GDB 8.2
MacOSX support brought back to life as a host
Performance improvements
● Module load time (~30% faster)
Quality improvements
● Improved handling of -lineinfo debug information (OptiX)● Improved debugging with parallel cuda-gdb sessions● Enabled CPU-side hardware watchpoints
New Features
14
CUDA-MEMCHECK
Multiple tools
memcheck : reports out of bounds/misaligned memory access errors
racecheck : identifies races on __shared__ memory
initcheck : usage of uninitialized global memory
synccheck : identify invalid usage of __syncthreads() and __syncwarp()
Documentation: http://docs.nvidia.com/cuda/cuda-memcheck/index.html
Functional correctness checking tool suite
http://docs.nvidia.com/cuda/cuda-memcheck/index.html
15
COMPUTE-SANITIZER
Next-Gen replacement tool for cuda-memcheck
New command line interface (CLI) tool based on the Sanitizer API
Performance gain for applications using libraries such as CUSOLVER, CUFFT or DL frameworks
cuda-memcheck has performance issues on Windows that were inherent in its design
compute-sanitizer fixes that and brings performance on Windows to be on par with Linux.
Available in CUDA 11.0
OS: Linux (x86_64, Power, Arm SBSA), Windows
GPUs: Maxwell+
cuda-memcheck still supported in CUDA 11.0 (does not support Arm SBSA)
Documentation: https://docs.nvidia.com/cuda/compute-sanitizer
New tool for functional correctness checking tool suite
16
COMPUTE-SANITIZER
Provides finer control than cuda-memcheck through APIs to analyze memory patterns
APIs are grouped into two categories:
Callback API – CUDA events such as memory allocations/kernel
Patching API – inserts patches for specific memory instructions
New in CUDA 11.0:
• SanitizerSetCallbackData API has been updated to take a function as input rather than a stream
• Added support for Cooperative Groups
• Memory Access callbacks for shared/local memory now report the address offset within the shared/local window
Documentation: https://docs.nvidia.com/cuda/compute-sanitizer
Samples: https://github.com/NVIDIA/compute-sanitizer-samples
API for functional correctness checking tool suite
https://docs.nvidia.com/cuda/compute-sanitizer/index.htmlhttps://github.com/NVIDIA/compute-sanitizer-samples
17
NSIGHT SUITE OF PROFILERS
18
NSIGHT SYSTEMSMPI & OpenACC Trace
19
NSIGHT SYSTEMSOpenMP 5 API Trace
20
NSIGHT COMPUTE
Support for CUDA Asynchronous Copy
Sparse Data compression
A100 GPU support
21
NSIGHT COMPUTE
Efficient way to evaluate kernel characteristics, quickly understand potential directions for further improvements or existing limiters
Inputs: Arithmetic Intensity (FLOPS/bytes)Performance (FLOPS/s)
Ceilings: Peak Memory BandwidthPeak FP32/FP64 Performance
New Roofline analysis
22
USEFUL LINKS
Web: https://developer.nvidia.com/tools-overview
How to contact us? Forums: https://forums.developer.nvidia.com/c/development-toolsemail: [email protected]
Other digital GTC talks of interest:
S21351: Scaling the Transformer Model Implementation in PyTorch Across Multiple Nodes
S21547: Rebalancing the Load:Profile-Guided Optimization of the NAMD Molecular Dynamics Program for Modern GPUs using Nsight Systems
S21771: Optimizing CUDA Kernels in HPC Simulation and Visualization Codes using Nsight Compute
S21565: Roofline Performance Model for HPC and Deep-Learning Applications
https://developer.nvidia.com/tools-overviewhttps://forums.developer.nvidia.com/c/development-toolsmailto:[email protected]://developer.nvidia.com/gtc/2020/video/s21351https://developer.nvidia.com/gtc/2020/video/s21547https://developer.nvidia.com/gtc/2020/video/s21565