Peter Messmer 4/29/2016 - indico.cern.ch fileThe Big bang in machine learning ... COMPUTER VISION...

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Deep Learning at NVIDIA Peter Messmer

4/29/2016 [email protected]

2

The Big bang in machine learning

“ Google’s AI engine also reflects how the world of computer hardware is changing. (It) depends on machines equipped with GPUs… And it depends on these chips more than the larger tech universe realizes.”

DNN GPU BIG DATA

3

DEEP LEARNING EVERYWHERE

INTERNET & CLOUD

Image Classification Speech Recognition

Language Translation Language Processing Sentiment Analysis Recommendation

MEDIA & ENTERTAINMENT

Video Captioning

Video Search Real Time Translation

AUTONOMOUS MACHINES

Pedestrian Detection Lane Tracking

Recognize Traffic Sign

SECURITY & DEFENSE

Face Detection Video Surveillance Satellite Imagery

MEDICINE & BIOLOGY

Cancer Cell Detection Diabetic Grading Drug Discovery

4

Tesla Accelerated computing platform Focused on Co-Design from Top to Bottom

Productive Programming Model & Tools

Expert Co-Design

Accessibility

APPLICATION

MIDDLEWARE

SYS SW

LARGE SYSTEMS

PROCESSOR

Fast GPU Engineered for High Throughput

0.0

0.5

1.0

1.5

2.0

2.5

3.0

2008 2009 2010 2011 2012 2013 2014

NVIDIA GPU x86 CPUTFLOPS

M2090

M1060

K20

K80

K40

Fast GPU +

Strong CPU

5

INTRODUCING TESLA P100 New GPU Architecture to Enable the World’s Fastest Compute Node

Pascal Architecture NVLink CoWoS HBM2 Page Migration Engine

PCIe

Switch

PCIe

Switch

CPU CPU

Highest Compute Performance GPU Interconnect for Maximum Scalability

Unifying Compute & Memory in Single Package

Simple Parallel Programming with Virtually Unlimited Memory

Unified Memory

CPU

Tesla P100

6 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Giant leaps

in everything

NVLINK

PAGE MIGRATION ENGINE

PASCAL ARCHITECTURE

CoWoS HBM2 Stacked Mem

K40 Tera

flops

(FP32/FP16)

5

10

15

20

P100

(FP32)

P100

(FP16)

M40

K40

Bi-

dir

ecti

onal BW

(G

B/Sec)

40

80

120

160 P100

M40

K40 Bandw

idth

(G

B/s)

200

400

600

P100

M40 K40

Addre

ssable

Mem

ory

(G

B)

10

100

1000

P100

M40

21 Teraflops of FP16 for Deep Learning 5x GPU-GPU Bandwidth

3x Higher for Massive Data Workloads Virtually Unlimited Memory Space

10000 800

7 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

nvidia DGX-1 world’s first deep learning supercomputer

170 TFLOPS FP16

8x Tesla P100 16GB

NVLink Hybrid Cube Mesh

Accelerates Major AI Frameworks

Dual Xeon

7 TB SSD Deep Learning Cache

Dual 10GbE, Quad IB 100Gb

3RU – 3200W

8

9

NVIDIA Deep Learning SDK High Performance GPU-Acceleration for Deep Learning

COMPUTER VISION SPEECH AND AUDIO BEHAVIOR

Object Detection Voice Recognition Translation Recommendation

Engines Sentiment Analysis

DEEP LEARNING

cuDNN

MATH LIBRARIES

cuBLAS cuSPARSE

MULTI-GPU

NCCL

cuFFT

Mocha.jl

Image Classification

DEEP LEARNING

SDK

FRAMEWORKS

APPLICATIONS

10

NVIDIA DIGITS Interactive Deep Learning GPU Training System

Test Image

Monitor Progress Configure DNN Process Data Visualize Layers

developer.nvidia.com/digits

http://developer.nvidia.com/digits

11

Good time for confluence of HPC and DL

CSCS working on containers for WLCG data

CSCS to upgrade Piz Daint to Pascal GPUs

Simple process for basic appliations

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

13

nvGRAPH Accelerated Graph Analytics

nvGRAPH for high performance graph analytics

Deliver results up to 3x faster than CPU-only

Solve graphs with up to 2.5 Billion edges on 1x M40

Accelerates a wide range of graph analytics apps:

developer.nvidia.com/nvgraph

PageRank Single Source Shortest

Path

Single Source Widest

Path

Search Robotic Path Planning IP Routing

Recommendation Engines Power Network Planning Chip Design / EDA

Social Ad Placement Logistics & Supply Chain

Planning

Traffic sensitive routing 0

1

2

3

Itera

tions/

s

nvGRAPH: 3x Speedup

48 Core Xeon E5

nvGRAPH on M40

PageRank on Twitter 1.5B edge dataset

CPU System: 4U server w/ 4x12-core Xeon E5-2697 CPU,

30M Cache, 2.70 GHz, 512 GB RAM

14

OPENACC More Science, Less Programming

SIMPLE

POWERFUL

PORTABLE

Minimum efforts Small code modifications

Up to 10x faster application performance

Optimize once,

run on GPUs and CPUs

main() { <serial code> #pragma acc kernels //automatically runs on GPU

{ <parallel code> } }

FREE FOR ACADEMIA

SIMPLE

Date post:	25-Sep-2019
Category:	Documents
Upload:	others
View:	7 times
Download:	0 times

Peter Messmer 4/29/2016 - indico.cern.ch fileThe Big bang in machine learning ... COMPUTER VISION...

Documents