NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in...

João Paulo Navarro, Solutions Architect

NVIDIA PLATFORM FOR AI- Linkedin

https://www.linkedin.com/in/jppnavarro/

2

HTTPS://WWW.YOUTUBE.COM/WATCH?V=GIZ7KYRWZGQ

i am ai

https://www.youtube.com/watch?v=GiZ7kyrwZGQ

3

NVIDIA

GPU Computing

Gaming VR AI & HPC Self-Driving Cars

4

GPU COMPUTING AT THE HEART OF AI

Big Bang of Modern AI

103

105

107

1.5X per year

40 Years of CPU Trend Data

Single-threaded perf

GPU-Computing perf

1.5X per year

1.1X per year

1000X

by 2025

Performance Beyond Moore’s Law

Original data up to the year 2010 collected and plotted by M. Horowitz,

F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp

1980 1990 2000 2010 2020

AlexNet

CAMBRIAN EXPLOSION

Convolutional Networks Recurrent Networks

Generative Adversarial Networks

Reinforcement Learning

There is a Cambrian explosion of neural networks. Since AlexNet, thousands of new models have emerged. With hundreds of layers and billions of parameters, their complexity has soared by 500X in just 5 years. The hyperscale datacenters that host them serve billions of people, cost billions to operate, and are among the most complex computers the world has ever made. Maintaining great quality of service while minimizing cost is incredibly difficult. Jensen helps us remember with PLASTER.

PROGRAMMABILITY

LATENCY

ACCURACY

SIZE

THROUGHPUT

RATE OF LEARNING

ENERGY EFFICIENCY

Convolutional Networks

Recurrent Networks

Generative AdversarialNetworks

Reinforcement Learning

New Species

8

REVOLUTIONARY AI PERFORMANCE

Performance up to 100 CPUs

21 billion transistors – 5120 CUDA cores

New Tensor Core architecture inspired by the demands of deep learning

Volta is the Most Advanced Data Center GPU Ever Built

9

MAXIMIZING PERFORMANCE ON VOLTA

GPU Generational Training Scaling

0

4

8

12

K80 V100 Tensor Core

ResNet-152 Training, 8x K80 (16 GPUs total) compared with 8x V100 NVLink GPUs using NVIDIA 17.10 containers

Greater Than 10x Performance K80 vs. V100

10

DEEP LEARNING

11

AI AND DEEP LEARNING

NVIDIA AI PLATFORM

NVIDIA GPU Cloud NVIDIA AI Inference TITAN VEvery Cloud

Every Computer Maker

Tesla V100 DGX-1 and DGX Station

Announcing NEW 32GB

2XAnnouncing NEW 32GB

2X

13

DEEP LEARNING SOFTWARE

developer.nvidia.com/deep-learning

https://developer.nvidia.com/deep-learning

14

WHAT IS THE BEST DEEP LEARNING FRAMEWORK?

15

DL FRAMEWORKSHow to choose?

Jeff Dean and Francois Chollet from Google have indicated relevant DL framework statistics for adoption.

16


https://developer.nvidia.com/deep-learning-frameworks


17




18

INFERENCE

19

AI INFERENCING AT THE SPEED OF LIGHT

HTTPS://WWW.YOUTUBE.COM/WATCH?V=-4UG6QFHPUM

https://www.youtube.com/watch?v=-4ug6QFHpUM

20

THE BRAIN OF AI CARSNVIDIA DRIVE™ scalable AI platform for

entire range of autonomous driving

320+ companies have adopted DRIVE, for

data centers and in vehicles

Includes automakers and suppliers,

mapping and sensor companies, startups

and research orgs

21

NVIDIA DRIVE AUTOMOTIVE PERCEPTIONHTTPS://WWW.YOUTUBE.COM/WATCH?V=D1JDS-KXXJA

https://www.youtube.com/watch?v=D1jds-KxXJA

22

NVIDIA TENSORRT PROGRAMMABLE INFERENCE ACCELERATOR

TESLA V100

DRIVE PX 2

TESLA P4

JETSON TX2

NVIDIA DLA

TensorRT

Frameworks Platforms

23

TENSOR RTHTTPS://WWW.YOUTUBE.COM/WATCH?V=HTWOJXC_MQI

https://www.youtube.com/watch?v=hTwOjXC_mQI

24

NVIDIA TENSORRT10X BETTER DATA CENTER TCO

160 CPU Servers

45,000 Images / Second

65 KWatts

25

NVIDIA TENSORRT10X BETTER DATA CENTER TCO

1 NVIDIA HGX with 8 Tesla V100 GPUs

45,000 Images / Second

3 KWatts

1/6 the Cost | 1/20 the Power

4 Racks in a Box

TENSORRT - NVIDIA AI INFERENCE

SPEECH SYNTH

DGN, S2S

TensorRT 2

INT8

TensorRT 3

Tensor

Core

TensorRT

CNNs

TensorRT 4

TensorFlowIntegration

KaldiOptimization

ASR

RNN++

RECOMMENDER

MLP-NCF

NLP

RNN

IMAGE / VIDEO

CNN

30MHYPERSCALE SERVERS 190X

IMAGE / VIDEOResNet-50 with

TensorFlow

Integration

50XNLPGNMT

45XRECOMMENDER

Neural

Collaborative

Filtering

36XSPEECH

SYNTH WaveNet

60XASR

DeepSpeech 2

DNN

All speed-ups are chip-to-chip CPU to GV100.Sept ‘16 Apr ‘17 Sept ‘17 Apr ‘18

ONNX

WinML

27

BIG DATA & ANALYTICS

28

DATA DELUGE TO DATA HUNGRY

INCREASING DATA VARIETY

Search Marketing

Behavioral Targeting

Dynamic Funnels

User Generated Content

Mobile Web

SMS/MMS

Sentiment

HD Video

Speech To Text

Product/Service Logs

Social Network

Business Data Feeds

User Click Stream

Sensors Infotainment Systems

Wearable Devices

CyberSecurity Logs

ConnectedVehicles

Machine Data

IoT Data

Dynamic Pricing

Payment Record

Purchase Detail

Purchase Record

Support Contacts

Segmentation

Offer Details

Web Logs

Offer History

A/B Testing

BUSINESS PROCESS

PETABYTES

TERABYTES

GIG

ABYTES

EXABYTES

ZETTABYTES

Streaming Video

Natural Language Processing

WEB

DIGITAL

AI

29

WORKAROUNDS ARE NOT THE ANSWERS

EXPLORE THE OUTLIERS AND LONG-TAIL EVENTS

Pre-aggregation struggles at scale

RELY ON ACCURATE DATA

Scale out on CPU infrastructure has

tremendous hidden costs

SCALE WITH A ROI

Sampling misses the whole picture

$

30

NVIDIA ACCELERATED ANALYTICSGPUs in the Data Center

AI-ACCELERATEVISUALIZEANALYZE

31

GPU FOR ANALYTICS SOLUTIONS + ARCHITECTURES

Spark Scheduler

CORE TECHNOLOGIES

GPU-ACCELERATED DATA CENTER

ACCELERATED VISUALIZATION

ACCELERATED DATABASES

DEEP LEARNING

CloudNVIDIA DGX Products

CORE TECHNOLOGIES

TRADITIONALDATA CENTER

VISUALIZATION

DATABASES

NVIDIA Tesla GPUs

Mesos

32

GPU-ACCELERATION HAS NO LIMITSMapD

BlazeGraph

Kinetica

Leading In-Memory DB> 50x Slower

NoSQL DB’s> 100x Slower

Aggregate of queries - Time (s)Less is better!

SQream

1403

1843

700

GPUs 700X-800X faster

than graphs in all cases

700M Edges Single Node

Xeon 2650 vs 2 K80

1.98B Edges 16 EC2

r3.xlarge vs 16 K40s

1.98B Edges 16 EC2

r3.4xlarge vs 16 K40s2

1.98B Edges Spark CPU

Baseline

1

Speed-up over baseline spark CPU configuration

Speed-u

p (

hig

her

is f

ast

er)

33

GPU-ACCELERATION HAS NO LIMITSMapD

34

MAPD: GPU Accelerated Database

35

ML ACROSS INDUSTRIES

Finance Healthcare Telco

GPU ACCELERATED ML AND BIG DATA

gpuopenanalytics.com



37

H2O4GPU PERFORMANCE

GLM XGBoost K-Means

40x10x5x

38

NVIDIA VOLTA IN EVERY CLOUD, EVERY DATACENTER

NVIDIA GPU CLOUDOptimized Stacks for Every Cloud

20,000+ Registered Organizations | 30 Containers

NOW on AWS, GCP, AliCloud, Oracle Cloud, DGX

https://ngc.nvidia.com/

https://ngc.nvidia.com/

HOW TO START?Develop on GeForce, Deploy on Tesla

GeForceStart development using GeForce

CloudScale out on cloud

Data CenterDeploy on data center

41

developer.nvidia.com

https://developer.nvidia.com/






INCEPTION PROGRAM

https://www.nvidia.com/en-us/deep-learning-ai/startups/



João Paulo Navarro, Solutions Architect

NVIDIA PLATFORM FOR AI- Linkedin

https://www.linkedin.com/in/jppnavarro/

Date post:	24-Jun-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in...

Documents