"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Presentation from CEVA

Copyright © 2016 CEVA 1

Yair Siegel

May 3, 2016

Fast Deployment of Low-power Deep

Learning on CEVA Vision Processors


CEVA — The leading licensor of ultra-low-power

signal processing IPs for embedded devices

Imaging &

Vision

Audio, Voice,

Sensing Connectivity Communication

>7 Billion CEVA-powered devices shipped world-wide


• CEVA Deep Neural Network (CDNN) Software Framework

• Accelerates machine learning deployment for embedded systems

• Utilizes CEVA-XM4 imaging & vision DSP

• Targeted at object recognition and vision analytics

• Automatic conversion from offline neural networks to real-time networks

Scope

* Vs. GPU-based systems

** Vs. typical implementation

30x

Lower Power*

15x

Lower Memory

Bandwidth**

30%

Faster

Processing*


Presentation Outline

1.

Backgrounder

2.

CEVA Deep

Neural

Networks

Introduction

3.

Neural

Networks

Development

Flow

4.

AlexNet

Example

5.

Summary


• Image signal

processor (ISP)*

• Image registration

• Depth map generation

• Point cloud processing

• 3D scanning

• 3D content creation

CEVA in the Vision Space

3D vision

Computational

photography

Visual

perception

Enabling Intelligent Vision Processing

Left Image

Right Image

Depth Data Images, Data

Encode*

* These are most appropriately implemented by external HW accelerators

• Refocus image

• Video stabilization

• Low-light image enhance

• Zoom

• Super-resolution

• Background removal

• HDR

• Deep learning (CNN, DNN)

• Object detection,

recognition & tracking

• Augmented reality (AR)

• Natural user interface

(NUI)

• Context aware algorithms

• Biometric authentication


• 4th-generation imaging and vision processor IP

• Brings embedded systems closer to human vision and

visual perception

• Vector-type processor; combines fixed- and floating-point

math; up to 4096-bit processing per cycle

• Includes vision processor, libraries, tools and applications

(CEVA, SW partners, service experts)

• Mature: 10+ design wins, Silicon available in Q2/2016

• CNN-based algorithms combined w/traditional algorithms

CEVA-XM4™ Imaging & Vision DSP


• Human brain based on neural networks, used for any cognitive

processing: visual, audio, other senses

• Networks develop over time, data collected & analyzed

• “Training” phase – Learning new types from examples

• “The hunt” to mimic human perception in computers

• Horsepower, efficient engine, algorithmic quality — limiters

• Big progress here recently

Neural Networks Basics

Output Layer Input Layer Hidden

Layers

Connections,

Weights Neurons

"...a computing system made up of a number of simple, highly interconnected processing

elements, which process information by their dynamic state response to external inputs.”*

*"Neural Network Primer: Part I" by Maureen Caudill, AI Expert, Feb. 1989

High-time for neural networks in embedded systems


• Deep Learning

• Family of neural network methods, high number of

layers (hence deep)

• Convolutional Neural Networks (CNN)

• Most popular deep learning neural network method

• Benefits

1. Best recognition quality (vs. alternatives)

2. Re-trainable without code changes (implement once,

use many times)

• Caffe — deep learning framework

• Popular open source software framework, used to

build, train, activate neural networks.

• Targets expression, speed, and modularity

Deep Learning Neural Networks

caffe.berkeleyvision.org

Object Recognition Driver Assistance

(ADAS)

Vision Analytics

Artificial Intelligence (AI)

Augmented Reality / Virtual Reality


• Computation intensive

• 1Meg-Ops/layer — typical

• Training in floating point — limited perf in embedded

• High memory bandwidth

• Between layers, fetching weights for layers

• Example: AlexNet — 12MB in layers, 243MB weights in FP

• Multi-ROI processing using same network

• Evolving, TTM

• Ability to modify network, change characteristics, quickly

Neural Network Embedded Challenges

All above in a cost and energy efficient form factor —

must-have for mass market adoption


CEVA Deep Neural Network Flow with Caffe


CEVA Network Generator

(offline)

CEVA Deep Neural Network (CDNN) Features

Real-time Neural Network

Libraries

CDNN deliverables include real-time example models for image

classification, localization, object detection

• Auto converts for power-efficiency

• Floating to fixed point conversion

• Adapts for embedded constraints

• Keeps high accuracy, 1% deviation

• RT algo development and deployment

• Optimized for CEVA-XM4 vision DSP

• Any network portion/layer

• Fixed or variable input sizes

• On-the-fly bandwidth optimizations


Real-Time CDNN Application Flow


• Example application steps to run on device using CDNN

a. Create CDNN CEVA handle

• CDNNCreate()

b. Create network model (based on CDNN conversion tool outputs)

• CDNNCreateNetwork()

c. Initialize CDNN library (create a network and a memory database)

• CDNNInitialize()

d. Execute the network (no need for re-initialization)

• CDNNNetworkClassify()

Simplified Developer Flow via CDNN


Neural Networks on CEVA-XM4

m

n

Reducing Bandwidth Programmability & Time-To-Market

Performance Optimization

• Compress via prior knowledge

• Reduce network redundancies

• E.g., AlexNet fully connected —>6MB

• Data reused on entry point

• Flexible solution supporting any

network

• Quick turn-around time via port

automation

• Maximize MAC utilization

• Combine small maps

• Use fixed-point for higher performance

• Utilize dedicated instructions

• Parallel scatter-gather for activation layer


• Example based on Caffe open source implementation for CNN

Example CNN — AlexNet

Classification Probabilities

Object

AlexNet PC

Probability

(floating point)

AlexNet on XM4

Probability

(fixed point)

Labrador retriever 90.44% 91.01%

Golden retriever 4.45% 3.98%

Beagle 0.21% 0.18%

Kuvasz 0.12% 0.10%

| | <1%


CEVA-XM4 CDNN Development Platform

PCIe

XM4 FPGA i.MX6

Host running Linux

applications


iMX6 (Host)

• Live AlexNet object recognition — come visit our booth!

• Enables milli-watt products vs. watts on GPU

CEVA-XM4 CDNN Demo

Webcam

FHD

Shared

Memory

DMA

DDR

JBOX

PC

Debugger

USB

Daisy

CDNN

Engine

CEVA

Link

CEVA Host

Link

HDMI

XM4 FPGA

Input

Images

Data TCM

Code TCM

Code Cache PCIe

FHD to 224x224

Conversion


• SW framework for real-time, efficient object recognition & vision analytics

• Accelerates deep learning application deployment

• Harnessing CEVA-XM4 imaging & vision DSP

• Lowest power & memory bandwidth solution

• Enables real-time classification with pre-trained networks

1. Receives network model & weights as input (via “Caffe”)

2. Automatically converts to real-time network, via CEVA Network Generator

3. Utilizes real-time network models in CNN applications on CEVA-XM4

CEVA Deep Neural Network (CDNN) Summary


Backup Material


CEVA — The leading licensor of ultra-low-power

signal processing IP’s for embedded devices

More than 300 licensees to date

>7 Billion CEVA-powered devices

shipped worldwide to date

100 licensees of Wi-Fi & Bluetooth

IP — and more than 1 billion

chips shipped

3X the market share in DSP over

any other DSP IP vendor

1 in 3 handsets worldwide are

powered by CEVA DSP

5 billion DSP cores in audio/voice

devices shipped to date

>20 licensees for imaging and

vision — shipping for first time

in 2016


• Face Detection & Recognition

• Universal Object Recognition

• Pedestrian Detection

• ADAS Algorithms (FCW, LDW)

• 3D Depth Map Creation

CEVA-XM4 Imaging & Vision IP Platform

CPU-DSP Link – Communication Layer

• Digital Video Stabilizer (DVS)

• Super-Resolution (SR)

Hardware

Layer

Software

Layer

App Dev.

Kit (ADK)

Host CV / OpenVX API

SW

Toolset

Hardware Development

Kit

Partner Software Products

CEVA-XM4 DSP Core

Auto system handle

CEVA Software Products

CEVA-CV Libraries

CEVA CNN Framework (CDNN) Android Framework (AMF) Provides OEM

differentiation CPU

offload

Source code

provided

RTOS

Date post:	13-Jan-2017
Category:	Technology
Upload:	embedded-vision-alliance
View:	326 times
Download:	0 times

"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Presentation from CEVA

Technology