Eitan Medina, Chief Business Officer Habana Labs · Habana Labs | 3 ResNet-50 Inference Habana...

Eitan Medina, Chief Business OfficerHabana Labs

15,000 images/second on ResNet-50

Habana Labs | 3

ResNet-50 Inference

Habana HL-1000Latency 1.3ms, 100W

V100Latency 6ms

Dual-Socket Platinum 8180

Scaling AI Throughput in the Data Center

169 CPU Servers 8 GPUs

45,000 Images/sec ResNet-50 (Inference)

3 AI Processorswith real-time latency

support

Habana Labs | 5

AI Performance = Throughput + Latency @ Low batch size• Data center business models

questions:• Real-time AND Non-real-time?• Batching vs. customer SLA• How would you rent out hardware?

Dedicated/customer or shared-units• Throughput @ low latency Higher

Revenues / Card for data centers• Single AIP can service concurrently

multiple topologies/clients with real-time SLA for all

• TDM scheme latency is hidden below the 7msec limit

• Flexibility = $$ in the bottom line• Lowering OPEX Lower price

Expand the market for AI

14,00015,000

8,000

Batch=1 Batch = 5 Batch=10Throughput 8,000 14,000 15,000Latency 0.27 0.67 1.3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

Late

ncy

[mse

c]

Imag

es /

Sec

ond

Per Batch Size Used

ResNet-50 Performance: Throughput & Latency

Throughput Latency

Quantization Accuracy

• Mixed-Precision architecture• Accuracy-loss tolerance:

• Controlled by user through our software API in compile time• ResNet-50 example:

• Int-8: negligible accuracy loss (0.4%) • Int-16: no accuracy loss at all (but would reduce throughput)• Model was quantized without fine-tuning or retraining

GPU Reference FP32 HL-1000 Result INT8 Diff INT8 HL-1000 Result INT16 Diff INT16

75.7% 75.3% -0.4% 75.7% 0.0%

*Top1 accuracy, higher is better

ResNet-50 Accuracy* vs. Data Type

More Horsepower = Better Accuracy• Instead of just 1, Run ensemble of different networks and combine results• Use deeper neural network (with same frame/rate)

• Inference using multiple crops and perform averaging• Typical pre-processing:

• Input image is scaled to 256 (smaller dimension) • Center 224x224 crop is taken

• Instead of using the center crop only, use multiple crops and average results• 5 crops (upper-left, upper-right, lower-left, lower-right, center)

Network Top-1 Error / Improvement Top-5 Error / Improvement

ResNet-50 24.7 7.13

ResNet-101 23.48 / 1.22% 6.44 / 0.69%

Network Top-1 Error / Improvement Top-5 Error / Improvement

ResNet-50 24.7 7.13

ResNet-50 (5 crops) 23.51 / 1.19% 6.31 / 0.82%

Pure AI Processor

Neural machinetranslation

Sentimentanalysis

Imagerecognition

Recommendationsystem

Habana Labs | 6

Habana Labs Proprietary and Confidential | 1

Habana Labs Overview• Founded in 2016• Employees:

• 120 full time employees and contractors

• Products & Technology: AI Processors for Inference and Training • 10 Patent applications in the AI domain

• Locations: Tel-Aviv, Israel and San-Jose, CA• Investors: Avigdor Willenz (Chairman), Bessemer, WALDEN (Lip-Bu Tan)

• History of building successful companies• Successful integration post M&A (Galileo , Annapurna, Leaba, Nusemi … )

Habana Labs | 6

‐ COO at DSP Group ‐ COO at Prime Sense (Acquired by Apple)‐ VP of Operations at CEVA

David Dahan, CEO (Co-Founder) Shlomo Raikin, CTO‐ Author of 45 patents‐ Chief SOC Architect, Mellanox ‐ Project Architect, Intel

Ran Halutz, VP R&D (Co-Founder) ‐ Group Manager at Apple ‐ Director, Group Manager at Prime Sense ‐ VLSI Manager at CEVA

Eitan Medina, CBO‐ VP, GM Fingerprint Business Unit, TDK‐InvenSense‐ VP, Marketing and Product Management, InvenSense Inc.‐ VP of Engineering, Audience Inc. ‐ VP of Engineering, Consumer Products, Cavium ‐ VP of Cellular Engineering, Marvell‐ CTO Galileo Technology

Habana Labs | 6

Management Team

Habana Labs Software Structure

Deep Learning Framework Deep Learning Models Exchange Format

SynapseAI API

SynapseAI

Habana Labs Library

User’s Library

Habana LabsGraph Compiler

KMD API

Kernel Mode Driver (PCIe)

SynapseAI API

SynapseAI (Run Time)Recipe

Application / Service

Goya supports models trained on any processor (CPU, GPU, TPU, Gaudi etc.)

Habana Labs | 6

• Tensor Processor Core (TPCTM) • VLIW SIMD vector core • C-programmable• GEMM operations engine• Special functions hardware• Tensor addressing• Mixed-precision data types –

• FP32, INT32, INT16, INT8, UINT32, UINT16, UINT8

Goya Processor Architecture

Habana Labs | 6

Performance Profiling Tool • Performance Analysis• Graphical views• Real time

Software Infrastructure and Tools

Graph Compiler

Run-time

Kernel Mode Driver

Host SideTPC Tools

• Compiler• Assembler• IDE: Debugger / Simulator

Rich Performance Library• Deep learning operators

On-board processor Software• Debugger (Lauterbach)

• MxNet, ONNX, TensorFlow • Python front end• Compilation Flows• Topologies

• C API, Python API• Maintenance features

PCI Driver• Multi device support• Maintenance features

Device Side

SynapseAI

Habana Labs | 6

Training and Inference = Different Requirements

Habana Labs | 7

Delivering Two Product Lines

Inference

Sampling Q2 2019

Training

Habana Labs | 8

2Tbps scale-outPerformance scales linearly

to thousands of devices

HL-1000

Thank You!

www.habana.ai

Habana Labs | 9

Date post:	19-Aug-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

Eitan Medina, Chief Business Officer Habana Labs · Habana Labs | 3 ResNet-50 Inference Habana...

Documents