Copyright © 2017 Cadence Design Systems 1
Frank Brill, Khronos OpenVX Working Group Chair
1 May 2017
The OpenVX Computer Vision Library Standard for Portable, Efficient Code
Copyright © 2017 Cadence Design Systems 2
• Cadence and Tensilica DSP IP• What is Khronos?• What is OpenVX?• Why do we need OpenVX?• New OpenVX 1.2 Features
• New “classical” vision functions• Neural network extension• Safety critical spec
Outline
Copyright © 2017 Cadence Design Systems 3
Cadence Tensilica Processor and DSP IP Business
Copyright © 2017 Cadence Design Systems 4
Khronos Connects Software to Silicon
Low-level silicon APIs needed on almost every platform:
graphics, parallel compute, rich media, vision, sensor
and camera processing
Software
Silicon
Conformance Tests and Adopters Programs for specification integrity
and cross-vendor portability
Industry Consortium creating OPEN STANDARD APIs for hardware accelerationAny company is welcome – one company one vote
ROYALTY-FREE specifications State-of-the art IP framework protects
members AND the standards
International, non-profit organization Membership and Adopters fees cover operating and engineering expenses
Strong industry momentum100s of man years invested by industry experts
Well over a BILLION people use Khronos APIs Every Day…© Copyright Khronos Group 2016
Copyright © 2017 Cadence Design Systems 5
What is OpenVX?
Copyright © 2017 Cadence Design Systems 6
• Portable, embedded computer vision applications require:• High performance• Low power
• Achieving this traditionally require code optimized for the hardware:• Custom kernels: intrinsics, assembly• Custom data movement: high bandwidth, DMA, memory hierarchy• Sacrifices portability
• OpenVX promises performance portability• Write application once, run anywhere efficiently
Why do we need OpenVX?
Copyright © 2017 Cadence Design Systems 7
OpenVX – Low Power Vision Acceleration
• High level abstraction API– Targeted at real-time mobile and embedded platforms
• Performance portability across diverse architectures– Multi-core CPUs, GPUs, DSPs and DSP arrays, ISPs, Dedicated hardware…
• Extends portable vision acceleration to very low power domains– Doesn’t require high-power CPU/GPU Complex– Lower precision requirements than OpenCL – Low-power host can setup and manage frame-rate graph
Accelerator
Vision Engine
Middleware
Application
Accelerator
Accelerator
Pow
er E
ffic
ienc
y
Computation Flexibility
Dedicated Hardware
GPUCompute
Multi-coreCPU
X1
X10
X100 Vision Processing Efficiency
Vision DSPs
© Copyright Khronos Group 2016
Copyright © 2017 Cadence Design Systems 8
OpenVX Graphs
• OpenVX developers express a graph of image operations (‘Nodes’)– Nodes can be on any hardware or processor coded in any language– E.g. on GPU nodes may be implemented in OpenCL
• Minimizes host interaction during frame-rate graph execution– Host processor can setup graph which can then execute almost autonomously
Array of Keypoints
YUVFrame
GrayFrame
CameraInput
RenderingOutput
Pyrt
Color Conversion
Channel Extract
Optical Flow
Harris Track
Image Pyramid
RGBFrame
Array of FeaturesFtrt-1OpenVX Graph
OpenVX Nodes
Feature Extraction Example Graph
© Copyright Khronos Group 2016
Copyright © 2017 Cadence Design Systems 9
• Faster development of vision applications• More optimized results via graph • Portability to different platforms
• E.g., GPU to DSP, or CPU-only to heterogeneous platforms• Next-generation vision Devices (e.g., Vision P5 to Vision P6)• Alternative vendors• Performance portability
• No platform-specific performance optimizations• For your internal developers• For application developers (ISVs)
How can OpenVX help you?
Copyright © 2017 Cadence Design Systems 10
OpenVX OpenCV OpenCL Comments
Execution Model
Graph Immediate
Compile/enqueue
(can implement graph)
An “immediate” function call programming model prevents important optimizations, especially on embedded DSPs
Visionfunctionality
Small but growing
set of popularfunctions
Vast,generally on PC/CPU only
None,user must create
their own
OpenCL is a language for parallel programming on heterogeneous platforms; requires compiler and floating-pointOpenVX is an API focused on low-power, fixed-pointoperation, written in C or any language
StandardsIndustry
standard with conformance
tests
None, proprietary
subsets ported
Industrystandard with conformance
tests
Optimized OpenCV code generally not easily portable between different (esp. embedded) platforms; conformance enforces portability
How is OpenVX different from existing alternatives?
Copyright © 2017 Cadence Design Systems 11
vx_image input = vxCreateImage(1920, 1080);vx_image output = vxCreateImage(0, 0);vx_image horiz = vxCreateVirtualImage();vx_image vert = vxCreateVirtualImage();vx_image mag = vxCreateVirtualImage();
vx_graph g = vxCreateGraph();vxSobel3x3Node(g, input, horiz, vert);vxMagnitudeNode(g, horiz, vert, mag);vxThresholdNode(g, mag, THRESH, output);
status = vxVerifyGraph(g);status = vxProcessGraph(g);
Simple edge detector in OpenVX
mh
i S
v
T
oS M T
“Compiles” the graph
“Executes” the graph
Copyright © 2017 Cadence Design Systems 12
• Graphs enable automatic optimizations• Especially tiling and chaining
• Automatic selection of kernels • Optimized by data type at graph verification time
• User doesn’t need to handle:• DMA• Tile overlap• Local memory management• Special hardware features (like scatter-gather) • The OpenVX framework does all of this for you
OpenVX Benefits
Copyright © 2017 Cadence Design Systems 13
Standard OpenVX 1.1 Functions
• Absolute Difference• Accumulate• Accumulate Squared• Accumulate Weighted• Arithmetic Addition• Arithmetic Subtraction• Bitwise And• Bitwise Exclusive Or• Bitwise Inclusive Or• Bitwise Not• Box Filter• Canny Edge Detector• Channel Combine• Channel Extract• Color Convert
• Convert Bit depth• Custom Convolution• Dilate Image• Equalize Histogram• Erode Image• Fast Corners• Gaussian Filter• Harris Corners• Histogram• Image Pyramid• Integral Image• Magnitude• Mean and Standard
Deviation• Median Filter• Min, Max Location
• Optical Flow Pyramid (LK)
• Phase• Pixel-wise
Multiplication• Remap• Scale Image• Sobel 3x3• TableLookup• Thresholding• Warp Affine• Warp Perspective
Copyright © 2017 Cadence Design Systems 14
New features in OpenVX 1.2
Copyright © 2017 Cadence Design Systems 15
• Feature detection: find features useful for object detection and recognition• Histogram of gradients – HOG � Template matching• Local binary patterns – LBP � Line finding
• Classification: detect and recognize objects in an image based on a set of features• Import a classifier model trained offline • Classify objects based on a set of input features
• Image Processing: transform an image• Generalized nonlinear filter
• Dilate, erode, median with arbitrary kernel shapes
• Non maximum suppression • Find local maximum values in an image
• Edge-preserving noise reduction
New OpenVX 1.2 functions (1 of 2)
Copyright © 2017 Cadence Design Systems 16
• Many, many minor improvements
• Conditional execution & node predication• Selectively execute portions of a graph based on a true/false predicate
• New Extensions• Import/export: compile a graph; save and run later• 16-bit support: signed 16-bit image data• Neural networks: “Layer: nodes:
• E.g. convolution, deconvolution, activation, normalization, pooling, softmax
• OpenVX SC: Safety critical version of the (1.1) spec• Leverages the import/export extension to define a run-time-only “deployment feature set”• Less “dynamic” subset of the 1.1 functionality
New OpenVX 1.2 functions (2 of 2)
B C
S
A condition
If A then S ← B else S ← C
Copyright © 2017 Cadence Design Systems 17
• Two main parts: (1) a tensor object and (2) a set of CNN layer nodes• A vx_tensor is a multi-dimensional array that supports at least 4 dimensions
• Tensor creation and deletion functions• Simple math for tensors
• Element-wise Add, Subtract, Multiply, TableLookup, and Bit-depth conversion• Transposition of dimensions and generalized matrix multiplication• vxCopyTensorPatch, vxQueryTensor (#dims, dims, element type, Q)
OpenVX neural network extensionTechnical overview
417938
1-D tensor: [6]i.e., 6-element vector
417938
417938
417938
417938
2-D tensor: [6, 4]i.e., 6 by 4 matrix
3-D tensor: [6, 4, 5]4-D tensor: [6, 4, 5, 3]
Copyright © 2017 Cadence Design Systems 18
• Tensor types of INT16, INT7.8, INT8, and U8 are supported• Other types may be supported by a vendor
• Eight neural network “layer” nodes:
• Conformance tests will be up to some “tolerance” in precision• To allow for optimizations, e.g., weight compression
OpenVX neural network extensionTechnical overview cont’d
vxActivationLayer vxConvolutionLayer vxDeconvolutionLayervxFullyConnectedLayer vxNormalizationLayer vxPoolingLayervxROIPoolingLayer vxROIPoolingLayer
Copyright © 2017 Cadence Design Systems 19
Neural Network Exchange Format (NNEF)Separate working group from OpenVX, but unified IP Zone
Copyright © 2017 Cadence Design Systems 20
• Based on OpenVX 1.1 main specification• MISRA C clean per KlocWorks v10• Adds requirement to support import/export extension• Divides functionality into “development” and “deployment” feature sets
Safety critical version (OpenVX SC)
OpenVX SCDevelopment Feature Set
(Create Graph)
OpenVX SCDeployment Feature Set
(Execute Graph)
Binary format
verifyexport
import
Entire graph creation API No graph creation APIimplementation-dependent format
Copyright © 2017 Cadence Design Systems 21
• Khronos main web page• https://www.khronos.org
• OpenVX Overview• https://www.khronos.org/openvx
• OpenVX Specifications: current, previous, and extensions• https://www.khronos.org/registry/OpenVX
• OpenVX Resources: implementations, tutorials, reference guides, etc.• https://www.khronos.org/openvx/resources
• Cadence Vision DSP landing page• https://ip.cadence.com/vision
Resources
© 2017 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo, and the other Cadence marks found at www.cadence.com/go/trademarks are trademarks or registered trademarks of Cadence Design Systems, Inc. All other trademarks are the property of their respective holders.