+ All Categories
Home > Documents > NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in...

NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in...

Date post: 24-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
43
João Paulo Navarro, Solutions Architect NVIDIA PLATFORM FOR AI - Linkedin
Transcript
Page 1: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

João Paulo Navarro, Solutions Architect

NVIDIA PLATFORM FOR AI- Linkedin

Page 2: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

2

HTTPS://WWW.YOUTUBE.COM/WATCH?V=GIZ7KYRWZGQ

i am ai

Page 3: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

3

NVIDIA

GPU Computing

Gaming VR AI & HPC Self-Driving Cars

Page 4: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

4

GPU COMPUTING AT THE HEART OF AI

Big Bang of Modern AI

103

105

107

1.5X per year

40 Years of CPU Trend Data

Single-threaded perf

GPU-Computing perf

1.5X per year

1.1X per year

1000X

by 2025

Performance Beyond Moore’s Law

Original data up to the year 2010 collected and plotted by M. Horowitz,

F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp

1980 1990 2000 2010 2020

Page 5: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

AlexNet

Page 6: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

CAMBRIAN EXPLOSION

Convolutional Networks Recurrent Networks

Generative Adversarial Networks

Reinforcement Learning

Page 7: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

There is a Cambrian explosion of neural networks. Since AlexNet, thousands of new models have emerged. With hundreds of layers and billions of parameters, their complexity has soared by 500X in just 5 years. The hyperscale datacenters that host them serve billions of people, cost billions to operate, and are among the most complex computers the world has ever made. Maintaining great quality of service while minimizing cost is incredibly difficult. Jensen helps us remember with PLASTER.

PROGRAMMABILITY

LATENCY

ACCURACY

SIZE

THROUGHPUT

RATE OF LEARNING

ENERGY EFFICIENCY

Convolutional Networks

Recurrent Networks

Generative AdversarialNetworks

Reinforcement Learning

New Species

Page 8: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

8

REVOLUTIONARY AI PERFORMANCE

Performance up to 100 CPUs

21 billion transistors – 5120 CUDA cores

New Tensor Core architecture inspired by the demands of deep learning

Volta is the Most Advanced Data Center GPU Ever Built

Page 9: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

9

MAXIMIZING PERFORMANCE ON VOLTA

GPU Generational Training Scaling

0

4

8

12

K80 V100 Tensor Core

ResNet-152 Training, 8x K80 (16 GPUs total) compared with 8x V100 NVLink GPUs using NVIDIA 17.10 containers

Greater Than 10x Performance K80 vs. V100

Page 10: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

10

DEEP LEARNING

Page 11: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

11

AI AND DEEP LEARNING

Page 12: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

NVIDIA AI PLATFORM

NVIDIA GPU Cloud NVIDIA AI Inference TITAN VEvery Cloud

Every Computer Maker

Tesla V100 DGX-1 and DGX Station

Announcing NEW 32GB

2XAnnouncing NEW 32GB

2X

Page 13: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

13

DEEP LEARNING SOFTWARE

developer.nvidia.com/deep-learning

Page 14: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

14

WHAT IS THE BEST DEEP LEARNING FRAMEWORK?

Page 15: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

15

DL FRAMEWORKSHow to choose?

Jeff Dean and Francois Chollet from Google have indicated relevant DL framework statistics for adoption.

Page 16: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

16

DL FRAMEWORKSHow to choose?

https://developer.nvidia.com/deep-learning-frameworks

Page 17: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

17

DL FRAMEWORKSHow to choose?

https://developer.nvidia.com/deep-learning-frameworks

Page 18: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

18

INFERENCE

Page 19: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

19

AI INFERENCING AT THE SPEED OF LIGHT

HTTPS://WWW.YOUTUBE.COM/WATCH?V=-4UG6QFHPUM

Page 20: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

20

THE BRAIN OF AI CARSNVIDIA DRIVE™ scalable AI platform for

entire range of autonomous driving

320+ companies have adopted DRIVE, for

data centers and in vehicles

Includes automakers and suppliers,

mapping and sensor companies, startups

and research orgs

Page 21: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

21

NVIDIA DRIVE AUTOMOTIVE PERCEPTIONHTTPS://WWW.YOUTUBE.COM/WATCH?V=D1JDS-KXXJA

Page 22: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

22

NVIDIA TENSORRT PROGRAMMABLE INFERENCE ACCELERATOR

TESLA V100

DRIVE PX 2

TESLA P4

JETSON TX2

NVIDIA DLA

TensorRT

Frameworks Platforms

Page 23: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

23

TENSOR RTHTTPS://WWW.YOUTUBE.COM/WATCH?V=HTWOJXC_MQI

Page 24: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

24

NVIDIA TENSORRT10X BETTER DATA CENTER TCO

160 CPU Servers

45,000 Images / Second

65 KWatts

Page 25: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

25

NVIDIA TENSORRT10X BETTER DATA CENTER TCO

1 NVIDIA HGX with 8 Tesla V100 GPUs

45,000 Images / Second

3 KWatts

1/6 the Cost | 1/20 the Power

4 Racks in a Box

Page 26: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

TENSORRT - NVIDIA AI INFERENCE

SPEECH SYNTH

DGN, S2S

TensorRT 2

INT8

TensorRT 3

Tensor

Core

TensorRT

CNNs

TensorRT 4

TensorFlowIntegration

KaldiOptimization

ASR

RNN++

RECOMMENDER

MLP-NCF

NLP

RNN

IMAGE / VIDEO

CNN

30MHYPERSCALE SERVERS 190X

IMAGE / VIDEOResNet-50 with

TensorFlow

Integration

50XNLPGNMT

45XRECOMMENDER

Neural

Collaborative

Filtering

36XSPEECH

SYNTH WaveNet

60XASR

DeepSpeech 2

DNN

All speed-ups are chip-to-chip CPU to GV100.Sept ‘16 Apr ‘17 Sept ‘17 Apr ‘18

ONNX

WinML

Page 27: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

27

BIG DATA & ANALYTICS

Page 28: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

28

DATA DELUGE TO DATA HUNGRY

INCREASING DATA VARIETY

Search Marketing

Behavioral Targeting

Dynamic Funnels

User Generated Content

Mobile Web

SMS/MMS

Sentiment

HD Video

Speech To Text

Product/Service Logs

Social Network

Business Data Feeds

User Click Stream

Sensors Infotainment Systems

Wearable Devices

CyberSecurity Logs

ConnectedVehicles

Machine Data

IoT Data

Dynamic Pricing

Payment Record

Purchase Detail

Purchase Record

Support Contacts

Segmentation

Offer Details

Web Logs

Offer History

A/B Testing

BUSINESS PROCESS

PETABYTES

TERABYTES

GIG

ABYTES

EXABYTES

ZETTABYTES

Streaming Video

Natural Language Processing

WEB

DIGITAL

AI

Page 29: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

29

WORKAROUNDS ARE NOT THE ANSWERS

EXPLORE THE OUTLIERS AND LONG-TAIL EVENTS

Pre-aggregation struggles at scale

RELY ON ACCURATE DATA

Scale out on CPU infrastructure has

tremendous hidden costs

SCALE WITH A ROI

Sampling misses the whole picture

$

Page 30: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

30

NVIDIA ACCELERATED ANALYTICSGPUs in the Data Center

AI-ACCELERATEVISUALIZEANALYZE

Page 31: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

31

GPU FOR ANALYTICS SOLUTIONS + ARCHITECTURES

Spark Scheduler

CORE TECHNOLOGIES

GPU-ACCELERATED DATA CENTER

ACCELERATED VISUALIZATION

ACCELERATED DATABASES

DEEP LEARNING

CloudNVIDIA DGX Products

CORE TECHNOLOGIES

TRADITIONALDATA CENTER

VISUALIZATION

DATABASES

NVIDIA Tesla GPUs

Mesos

Page 32: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

32

GPU-ACCELERATION HAS NO LIMITSMapD

BlazeGraph

Kinetica

Leading In-Memory DB> 50x Slower

NoSQL DB’s> 100x Slower

Aggregate of queries - Time (s)Less is better!

SQream

1403

1843

700

GPUs 700X-800X faster

than graphs in all cases

700M Edges Single Node

Xeon 2650 vs 2 K80

1.98B Edges 16 EC2

r3.xlarge vs 16 K40s

1.98B Edges 16 EC2

r3.4xlarge vs 16 K40s2

1.98B Edges Spark CPU

Baseline

1

Speed-up over baseline spark CPU configuration

Speed-u

p (

hig

her

is f

ast

er)

Page 33: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

33

GPU-ACCELERATION HAS NO LIMITSMapD

Page 34: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

34

MAPD: GPU Accelerated Database

Page 35: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

35

ML ACROSS INDUSTRIES

Finance Healthcare Telco

Page 36: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

GPU ACCELERATED ML AND BIG DATA

gpuopenanalytics.com

Page 37: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

37

H2O4GPU PERFORMANCE

GLM XGBoost K-Means

40x10x5x

Page 38: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

38

NVIDIA VOLTA IN EVERY CLOUD, EVERY DATACENTER

Page 39: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

NVIDIA GPU CLOUDOptimized Stacks for Every Cloud

20,000+ Registered Organizations | 30 Containers

NOW on AWS, GCP, AliCloud, Oracle Cloud, DGX

Page 40: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

HOW TO START?Develop on GeForce, Deploy on Tesla

GeForceStart development using GeForce

CloudScale out on cloud

Data CenterDeploy on data center

Page 42: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

INCEPTION PROGRAM

https://www.nvidia.com/en-us/deep-learning-ai/startups/

Page 43: NVIDIA PLATFORM FOR AI - Amazon S3 › ...SQream 1403 1843 700 GPUs 700X-800X faster than graphs in all cases 700M Edges Single Node Xeon 2650 vs 2 K80 1.98B Edges 16 EC2 r3.xlarge

João Paulo Navarro, Solutions Architect

NVIDIA PLATFORM FOR AI- Linkedin


Recommended