This is hpc on intel
Accelerate your Inferencing with Intel® Deep Learning BoostShailen Sobhee
AI Software Technical Consultant
This is hpc on intel 2
Audience pre-requisites
▪Familiar with deep learning stages
−Training and inferencing
▪Have a basic knowledge about hardware
−know what are vector registers, like AVX-512
This is hpc on intel 3
Outline
▪What is Intel® Deep Learning Boost (Intel® DL Boost)
▪Why is Intel® DL Boost useful?
▪What are Vector Neural Network Instructions (VNNI)
▪Sample results
This is hpc on intel 4
What is Intel® Deep Learning Boost ?
Intel® DL Boost:
▪extends the AVX-512 instructions
▪designed to deliver significant and more efficient Deep Learning (Inference) acceleration
▪ for deep learning workloads optimized to use the Vector Neural Network Instruction (VNNI)
▪on Intel® Xeon® Scalable processor
▪as from the 2nd generation (codename “Cascade Lake”)
This is hpc on intel 5
Deep Learning Foundations
• Heavy compute (Matrix Multiplications) are the foundation of many DL applications
• Multiply a row*column values, accumulate into a single value
• Traditional HPC and many AI training
workloads use floating point
• Massive dynamic range of values
(FP32 goes up to ~2^128)A [int8]
B[int8]
C[int32]
Matrix MultiplyA x B = C
This is hpc on intel 6
Convolution operation for inference
=
Inputs
Weights
Outputs
16
16x 4
4+
This is hpc on intel 7
Why do we need Intel® Deep Learning Boost?
This is hpc on intel 8
The key term:
Quantization
This is hpc on intel 9
Here’s why Quantization matters
10110110 10110110
10110110 10110110
10110110
Floating Point96.1924
32 -bit
Integer96
8 bit
This is hpc on intel 10
Here’s why Quantization matters
Lower Power
Lower memory bandwidth
Lower storage
Higher performance
Image credits: see backup
Important:Acceptable accuracy loss
This is hpc on intel
VNNI instruction set
This is hpc on intel 12
This is hpc on intel 13
Intel® Deep Learning BoostOptimizing AI inference
vpmadd231psOUTPUT
FP32
INPUT FP32
INPUT FP32
vpmaddubsw
OUTPUT INT16
CONSTANT INT16
vpmaddwdOUTPUT
INT32
Accumulate INT32
vpadddOUTPUT
INT32
INPUT INT8
INPUT INT8
1st gen Intel® Xeon®
Scalable processor without Intel® DL
Boost
1st gen Intel® Xeon®
Scalable processor without Intel® DL
Boost
2nd gen Intel® Xeon®
Scalable processor
with Intel® DL Boost
Port0 Port5
…. ….
FMA FMA
Microarchitecture view
In a given clock cycle
VNNI VNNI
FP32
INT8
INT8VNNI
NEW
vpdpbusd OUTPUT INT32
INPUT INT8
INPUT INT8 Accumulate
INT32
This is hpc on intel 15
Here’s one tool in your arsenal to do it ☺
This is hpc on intel 16
Intel® Distribution of OpenVINO™ in a nutshell
Your Program
Trained DL
model
Model Optimizer
IROptimized
IR
IR.xml.bin
Inference Engine
CNNNetwork
FP32/FP16
MKLDNN Plugin
CPU
MyriadXPlugin
Movidius
FPGA Plugin
Arria
clDNN Plugin
GPU
GNA Plugin
GNA
1 2 3 4
This is hpc on intel 17
Intel® Distribution of OpenVINO™ in a nutshell
Your Program
Trained DL
model
Model Optimizer
IROptimized
IR
IR.xml.bin
Inference Engine
CNNNetwork
FP32/FP16
This is hpc on intel 18
Intel® Distribution of OpenVINO™ in a nutshell
Your Program
Trained DL
model
Model Optimizer
IROptimized
IR
IR.xml.bin
Inference Engine
CNNNetwork
calibration_toolFP32
FP32/FP16
IR.xml.bin
Validation data INT8 normalize process
FP16 doesn’t work
Validation Data Examples:ImageNet for ClassificationPascal VOC for Object Detection
INT8 ready IR data
INT8
+Stat data
This is hpc on intel
Online StageOffline Stage
19
Intel® Distribution of OpenVINO™ in a nutshell
Your Program
Trained DL
model
Model Optimizer
IROptimized
IR
IR.xml.bin
Inference Engine
CNNNetwork
calibration_toolFP32
FP32/FP16
IR.xml.bin
Validation data
FP16 doesn’t work
Validation Data Examples:ImageNet for ClassificationPascal VOC for Object Detection
INT8 ready IR data
INT8
+Stat data
https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Int8Inference.html
This is hpc on intel 20
Sample resultsDemo 1
Demo 2
Both executions on Intel® Cascade Lake CPU
This is hpc on intel
QUICK DEMO
This is hpc on intel 22
Sample results
This is hpc on intel 23
Key take away
Try the Intel® Distribution of OpenVINO™:
https://software.intel.com/en-us/openvino-toolkit
Benefit from faster inference speeds with INT8leveraging VNNI instructions on
Intel® Cascade Lake CPUs.
This is hpc on intel 24
Summary
▪What is Intel® Deep Learning Boost (Intel® DL Boost)
▪What are Vector Neural Network Instructions (VNNI)
▪Why is Intel® DL Boost useful?
▪Intel® Distribution of OpenVINO™
This is hpc on intel
Thank you!