+ All Categories
Home > Documents > Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author...

Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author...

Date post: 25-May-2020
Category:
Upload: others
View: 14 times
Download: 0 times
Share this document with a friend
34
Building the Adaptable, Intelligent World Ivo Bolsens Senior Vice President & Chief Technology Officer © Copyright 2018 Xilinx
Transcript
Page 1: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Building the Adaptable,Intelligent World

Ivo BolsensSenior Vice President & Chief Technology Officer

© Copyright 2018 Xilinx

Page 2: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

This is the Era of Heterogeneous Compute

One ArchitectureCan’t Do It Alone

Mountains of Unstructured Data

© Copyright 2018 Xilinx

Page 3: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Performance for a Diverse Range of Applications

Software Programmability

Adaptability to Keep Pace with Rapid Innovation

Today’s Developer Needs

© Copyright 2018 Xilinx

Page 4: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Device Category

FPGA SoCMPSoC

Soft

war

e P

rogr

amm

abili

ty

RFSoC

Enter the ACAP A New Class of Devices for Today’s Challenges

ACAP

© Copyright 2018 Xilinx

Page 5: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

daptiveomputeccelerationlatform

ACAP

© Copyright 2018 Xilinx

Page 6: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Compute Acceleration

AdaptableEngines

ScalarEngines

AIEngines

© Copyright 2018 Xilinx

Page 7: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Heterogeneous

The Industry’s First ACAP

Scalable

Parallel

SW Programmable

HW Adaptable

© Copyright 2018 Xilinx

Page 8: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Network-on-Chip (NoC)Ease of UseInherently Software Programmable

Available at Boot, No Place-and-Route Required

© Copyright 2018 Xilinx

High Bandwidth and Low LatencyMulti-Terabit/Sec Throughput

Guaranteed QoS

Power Efficiency8x Power Efficiency vs. Soft Implementations

Arbitration Across Heterogeneous Engines

Page 9: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

AI Engines

VECTOR CORE

MEM

OR

Y

VECTORCORE

MEM

OR

Y

VECTORCORE

MEM

OR

Y

VECTORCORE

MEM

OR

Y

High Throughput, Low Latency, and Power Efficient

Ideal for AI Inference and Advanced Signal Processing

© Copyright 2018 Xilinx

Page 10: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Adaptable forAny Application

Software Programmable

HeterogeneousPlatform

VERSAL

User ApplicationC, C++, Python

Application-Specific FrameworksMachine Learning | Video | Genomics | Search | Financial Modeling | Database

New Unified Software Development Environment

C, Xilinx LibrariesXilinx & Ecosystem

HW LibrariesCustom HWOS & Embedded Run-Time

Intelligent EnginesAdaptable EnginesScalar Engines

© Copyright 2018 Xilinx

Platform for Any Developer

Page 11: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

CLOUD EDGENETWORK

Wired Wireless Endpoints Data Center

© Copyright 2018 Xilinx

Versal Multi-Market Platform

Page 12: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

VERSAL AI Core Series

For 5G Beamforming & CloudRANAI Engines Provides >5X Compute Density for

Advanced Wireless Compute

© Copyright 2018 Xilinx

Page 13: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

CLOUD EDGENETWORK

Wired Wireless Endpoints Data Center

© Copyright 2018 Xilinx

AI ADOPTION ACROSS MARKETS

Versal Multi-Market Platform

Page 14: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Training Data Center Inference Edge InferenceTAM $B

Barclays Research, Company Reports May 2018

2016 2017 2018 2019 2020 2021 2022 2023

Inference

“Dog”

30

20

10

Projected Growth in AI Inference

© Copyright 2018 Xilinx

Unlabeled Data Model Estimate

Page 15: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

The Rate of AI Innovation

Performance at Low Latency

Low Power Consumption

Whole App Acceleration

Challenges

© Copyright 2018 Xilinx

Inference

“Dog”

Unlabeled Data Model Estimate

Page 16: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

ClassificationObject

Detection SegmentationSpeech

RecognitionRecommendation

EngineAnomaly Detection

DIVERSE MODELS OVER A BROAD RANGE OF APPLICATIONS

CNN RNN, LSTM MLP

APPLICATIONS

The Rate of AI Model Innovation

© Copyright 2018 Xilinx

Page 17: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

2012 2018

80

70

50

60To

p-1

Acc

ura

cy (

1%

)

Classification

Source:https://arxiv.org/pdf/1605.07678.pdf https://arxiv.org/pdf/1608.06993.pdfhttps://arxiv.org/pdf/1709.01507.pdf https://arxiv.org/pdf/1611.05431.pdf

The Rate of AI Model InnovationClassification

© Copyright 2018 Xilinx

Page 18: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

AlexNet

GoogLeNet

DenseNet

Silicon Design Cycle (time)Start Design Production Design

Rate of Innovation Outpaces Silicon Cycles

© Copyright 2018 Xilinx

Page 19: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

3ms Latency Response

50ms LatencyResponse

High Throughput OR Low Latency High Throughput AND Low Latency

Input 1

Input 2

Input 3

Input 4

Result 1

Result 2

Result 3

Result 4

CPU/GPU

Input 1

Input 2

Input 3

Input 4

Result 1

Result 2

Result 3

Result 4

FPGA

Low Latency is Critical for Inference

© Copyright 2018 Xilinx

Page 20: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

2012 2013 2014 2015 2016 2017 2018 2019 2020

FP32 FP/INT16 INT8 INT6 INT4 INT2 INT1INT32

Source: Bill Dally (Stanford), Cadence Embedded Neural Network Summit, February 1, 2017

8b Add 0.03

16b Add 0.05

32b Add 0.1

16b FP Add 0.4

32b FP Add 0.9

RELATIVE ENERGY COST

Operation: Energy (pJ)

Inference Moving to Lower Precision

© Copyright 2018 Xilinx

Page 21: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Notation: 3b/5b: 3 bit weights/ 5 bit activation

Reduced Precision Arithmetic

RetrainedFloat Direct Quantization

1.00

0.00

10.00

100.00

90.00

80.00

70.00

60.00

50.00

40.00

30.00

20.00

10.00 100.00 1000.00

ResNet-50L ImageNet Top5 Error vs Hardware Cost

Erro

r (%

)

Hardware Cost (LUT + 100*DSP)

1b/2b2b/8b

Retraining

Floating Point Baseline

3b/5b

4b/6b

8b/8b

Page 22: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Custom Data Flow

Custom Memory Hierarchy

Custom Precision

Need for Adaptable Hardware

© Copyright 2018 Xilinx

Domain Specific Architectures (DSAs)

on Adaptable Platforms

Page 23: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

(1) Measured on EC2 Xeon Platinum 8124 Skylake, c5.18xlarge AWS instance, Intel Caffe: https://github.com/intel/caffe(2) V100 results taken from Oct 9th updates on www.Nvidia.com(3) Versal Core Series(4) GoogLeNet V1 throughput (Img/sec)

High-End CPU(1) High-End GPU(2)

43Xvs. CPUvs. CPU

2Xvs. GPU

Sub – 7ms LatencySub – 2ms LatencyLatency Tolerant Inference

CN

N P

erfo

rman

ce(4

)

AI InferenceAcceleration

Xilinx(3)

Leveraging AI Engines

Majority of Adaptable & Scalar Engines Available for Whole App Acceleration

Low LatencyXilinx’s Unique Advantage

© Copyright 2018 Xilinx

GoogLeNet v1

Page 24: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

(1) Measured on EC2 Xeon Platinum 8124 Skylake, c5.18xlarge AWS instance, Intel Caffe: https://github.com/intel/caffe(2) V100 results taken from Oct 9th updates on www.Nvidia.com(3) Versal Core Series(4) GoogLeNet V1 throughput (Img/sec)

High-End CPU(1) High-End GPU(2)

72Xvs. CPU

vs. GPU

2X

Sub – 7ms LatencySub – 7ms Latency

CN

N P

erfo

rman

ce(4

)

AI InferenceAcceleration

Xilinx(3)

Leveraging AI Engines

Majority of Adaptable & Scalar Engines Available for Whole App Acceleration

Low LatencyXilinx’s Unique Advantage

© Copyright 2018 Xilinx

GoogLeNet v1

Page 25: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

(1) Measured on EC2 Xeon Platinum 8124 Skylake, c5.18xlarge AWS instance, Intel Caffe: https://github.com/intel/caffe(2) V100 results taken from Oct 9th updates on www.Nvidia.com(3) Versal Core Series(4) GoogLeNet V1 throughput (Img/sec)

High-End GPU(2)

4Xvs. GPU

Sub – 2ms Latency

CN

N P

erfo

rman

ce(4

)

AI InferenceAcceleration

Xilinx(3)

Leveraging AI Engines

Majority of Adaptable & Scalar Engines Available for Whole App Acceleration

Low LatencyXilinx’s Unique Advantage

© Copyright 2018 Xilinx

GoogLeNet v1

Page 26: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

PCIe

CPU-GPU

GPUCPU

Motion Analysis

CNNH.264 Decode

Decode OpenCV CNN

PCIe

CPU-Xilinx FPGA

CPU

CNN

FPGA

Motion Analysis

H.264 Decode

9.2ms0.9ms16ms

50ms16ms 16ms

Power: 75W

Power: 50W

Throughput: 4x12 fpsLatency: 82 ms

Throughput: 4x38 fpsLatency: 26.1 ms

Whole Application AccelerationIntelligent Video Analytics

© Copyright 2018 Xilinx

Page 27: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Accelerated Libraries

Pruning / Compression

Complier & Quantization Tools

Runtime

FPGAs & ACAPs

FPGA-as-a-Service Alveo Custom Board

Customer Models Model Zoo

CLOUD EDGE

© Copyright 2018 Xilinx

Enabling the Development Community

Processor Overlays (DNN, LSTM, RNN, MLP)

Page 28: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Hardware Developers Vivado Design Suite: RTL Full Design

Hardware-Aware Software Developers

HLS: C++ IP Functions

System Integrators IP Integrator

Embedded Developers MPSoC Software Environment

Data Scientists Frameworks: Python, APIs

SaaS Developers FaaS Platform

Application DevelopersSDX: C++, OpenCL, Libraries,XRT open source runtime

© Copyright 2018 Xilinx

Platforms for Every Developer

Page 29: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Data Center First

© Copyright 2018 Xilinx

Page 30: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

AccessibleDeploy in the Cloud or On-Premises

Applications Available Now

FastFaster than CPUs & GPUs

Latency Advantage Over GPUs

AdaptableOptimized for Any Workload

Adapt to Changing Algorithms

© Copyright 2018 Xilinx

Page 31: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

© Copyright 2018 Xilinx

Page 32: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

FrameworksData Scientists and AI Developers

Libraries, Compilers, MiddlewareApplicationDevelopers

Firmware and Runtime Software Developers

Integrated Development EnvironmentHardware and Software Developers

Xilinx Acceleration Platform

USERTOOLS

© Copyright 2018 Xilinx

Page 33: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

MACHINELEARNING

HPC & LIFESCIENCES

FINANCIAL COMPUTING

IMAGEPROCESSING

DATABASE SEARCH AND ANALYTICS

VIDEOSTREAMING

On–Premises Deployment

Cloud Deployment

Application Ecosystem

Accessible: Cloud & On-Premise

© Copyright 2018 Xilinx

Page 34: Building the Adaptable, Intelligent World...Title Building the Adaptable, Intelligent World Author Ivo Bolsens Created Date 12/19/2018 5:43:36 PM

Building the Adaptable,Intelligent World

© Copyright 2018 Xilinx


Recommended