Date post: | 18-Aug-2015 |
Category: |
Technology |
Upload: | embedded-vision-alliance |
View: | 30 times |
Download: | 0 times |
© Copyright, 2014 1
NVIDIA TEGRA K1 Mar 2, 2014
NVIDIA Confidential December 3, 2014
Standard for Vision Acceleration
Elif Albuz
Mobile Vision Software
NVIDIA
© Copyright, 2014 2
Khronos Standards
Visual Computing - 3D Graphics - Heterogeneous Parallel Computing
3D Asset Handling - 3D authoring asset interchange
- 3D asset transmission format with compression
Acceleration in HTML5 - 3D in browser – no Plug-in
- Heterogeneous computing for JavaScript
Sensor Processing - Vision Acceleration - Camera Control - Sensor Fusion
© Copyright, 2014 3
Mobile/Embedded Vision Acceleration
Enables new experiences
Augmented Reality
Face, Body and Gesture Tracking
Computational Photography and
Videography
3D Scene/Object Reconstruction
© Copyright, 2014 4
Challenges for Mobile & Embedded Vision
Control, coordinate and synchronize a diverse array of
mobile sensors
Maintainable code for a heterogeneous mix of CPUs, GPUs and DSPs, dedicated hardware
Performance & Power efficiency
Creating fluid 60Hz experiences on battery-powered mobile devices
Code that is deployable across multiple devices,
platforms and OS
© Copyright, 2014 5
OpenVX – Power Efficient Vision Acceleration
Defines C API for a subset of computer vision
primitives with its data containers
Defines framework to assemble and execute
primitives with a goal of enabling various
optimization opportunities for vision pipelines.
Extensible
Vision
Accelerator
Application Application
Application Application
Vision
Accelerator Vision
Accelerator Vision
Accelerator
© Copyright, 2014 6
OpenCV & OpenVX
Governance Community driven open source
with no formal specification
Formal specification defined and
implemented by hardware vendors
Conformance No conformance tests for consistency and
every vendor implements different subset
Full conformance test suite / process
creates a reliable acceleration platform
Portability APIs can vary depending on processor Hardware abstracted for portability
Scope Very wide
1000s of imaging and vision functions
Multiple camera APIs/interfaces
Tight focus on hardware accelerated
functions for mobile vision
Use external camera API
Efficiency Memory-based architecture
Each operation reads and writes memory
Graph-based execution
Optimizable computation, data transfer
Use Case Rapid experimentation Production development & deployment
© Copyright, 2014 7
Started early 2012
Version 1.0 released in Oct 2014
Conformance Test Suite OpenVX Trademark
Contributors
OpenVX History
© Copyright, 2014 8
OpenVX
VXU Library for synchronous access to single nodes
Directed graphs for power and performance efficiency
OpenVX Node
OpenVX Node
OpenVX Node
OpenVX Node
Downstream
Application
Processing
Example OpenVX Graph
© Copyright, 2014 9
OpenVX 1.0 VXU Function Overview
Core data structures Images and Image Pyramids
Processing Graphs, Kernels, Parameters
Image Processing Arithmetic, Logical, and statistical operations
Multichannel Color and BitDepth Extraction and Conversion
2D Filtering and Morphological, resize & warp
Core Computer Vision Pyramid & Integral Image computation
Feature Extraction and Tracking Histogram Computation and Equalization
Canny Edge Detection
Harris and FAST Corner detection
Sparse Optical Flow
OpenVX 1.0 defines
framework for
creating, managing and
executing graphs
Focused set of widely
used functions that are
readily accelerated
Implementers can add
functions as extensions
Widely used extensions
adopted into future
versions of the core
OpenVX Specification
Is Extensible Khronos maintains extension registry
© Copyright, 2014 10
Some optimizations VX Graphs can Enable
Reuse memory for
different
intermediate data
Memory
management
Less allocation overhead,
more memory for
other applications
Replace a sub-
graph by a single
faster node
Kernel Merge
Better memory
locality, less kernel
launch overhead
Split the graph
execution across
the whole system
: CPU / GPU /
DSP / dedicated
HW
Graph
Scheduling
Faster execution
or lower power
consumption
Execute a sub-
graph at tile
granularity
instead of image
granularity
Data Tiling
Better use of
data cache and
local memory
© Copyright, 2014 11
Example: Feature tracking Graph
frameRGB
frameYUV
frameGray
Array of keypoints
Camera/image/video
Input data
Rendering/Output
Pyr-
1 pyr
0
pyr pts
Array of keypoints
Color Conversion
Channel Extract
Image Pyramid
Optical Flow
Harris Corners
OpenVX Graph
© Copyright, 2014 12
NVIDIA VisionWorks™ – Integrating OpenVX
VisionWorks library contains diverse vision and imaging primitives
Will leverage OpenVX for optimized primitive execution
Can extend VisionWorks nodes through GPU-accelerated primitives
Provided with sample library of fully accelerated pipelines
Application
Code
Sample
Pipelines
Tegra/Kepler dGPU
CUDA
VisionWorks
Framework
VisionWorks APIs
…
Classifier Corner
Detection
Feature
Tracking
Hough
Detection
Feature Tracker Hough Circle&
Line Object Tracker
Optical Flow Denoising
© Copyright, 2014 13
Summary
Khronos is building interoperating APIs for portable / power-efficient
vision and sensor processing
OpenVX 1.0 specification is now finalized and released
Full conformance tests and Adopters program immediately available
Khronos open source sample implementation by end of 2014
First commercial implementations already close to shipping
Companies are encouraged to join Khronos to influence the direction
of mobile and embedded vision processing!
© Copyright, 2014 15
Primitive assembling and execution:2 flavors
Immediate execution
Direct ‘synchronous’ call to the primitive
Classical API blocking call
OpenCV like
Useful for fast prototyping, for intermediate migration step from OpenCV
Graph based
Relevant for video stream processing
More optimization opportunities
More discussed later in the presentation
© Copyright, 2014 16
Some more features
Possibility to create user defined primitives
More targeting the host CPU currently
Delay object to keep track of the past data when needed
OpenVX can give the valid region for images
(deduced from the processing)
© Copyright, 2014 19
Memory model : Opaque containers (1)
The property of the ‘bytes’ well defined
Object Read/write
Object bytes stay in the property of the OpenVX world
Object Access/commit (equivalent to map/unmap)
Access: the host gets access to bytes
Commit: the host releases the access to bytes
A primitive needs all its parameters committed before execution
Useful for complex memory hierarchy :
OpenVX has control on where the data bytes are physically stored
© Copyright, 2014 20
Memory model : Opaque containers (2)
Physical layout under control of OpenVX
Access/commit
OpenVX returns a pointer + memory layout (addressing structuring)
The application needs to use this layout
Import of existing application images
The host application provides its memory layout
After commit, OpenVX can creates a shadow copy if the original layout is not
convenient for best acceleration
Useful for acceleration and performance portability
© Copyright, 2014 21
Data Objects life cycle
1. Create the object creation
The application receives a reference to the object
2. Use the object reference for processing
For access/commit
For create graph nodes with the data object as parameter
3. Release the object when the application does not to use this object
anymore
Release != Destructed, the object stays alive until it’s not referenced by
other objects (ex: a graph)
© Copyright, 2014 23
Graph
Dataflow graph defined by interconnected nodes
Node = instance of a Primitive with a well defined parameters
Dataflow edge granularity : a data object (ex: image)
No control in graph
With exception of possibility to abort/restart the graph (node callbacks)
Graph connectivity
Fully defined by node inputs/outputs : Edges are implicitly determined from
nodes, not explicitly created by the application
Single data object writer per graph
Semantics independent from node creation ordering
Particular case: bidirectional parameter (for accumulation primitive only)
© Copyright, 2014 24
Graph life cycle
Ahead-of-time (set-up time)
Graph creation
Graph ‘Verification’
Verify for correctness
Optimizations can happen here
Runtime
Graph execution
Can be called multiple times
– Without re-verification if the graph connectivity or ‘immutable’ node
parameters not modified.
– Otherwise, re-verification needed
More optimizations can happen here (need to be ‘cheap’)
© Copyright, 2014 25
Graph execution: 2 modes
Synchronous
‘Blocking’ graph execution
Asynchronous
Note: still limited feature in 1.0
© Copyright, 2014 26
Virtual objects : specific to graphs
‘Virtual’ objects describe temporary data objects
2 Usages
More generic graph:
the user does not specify some of the object properties (example : image
dimensions) that are deduced by the graph manager from the node that generates
it.
Less work in case of image dimension changes
More memory optimizations
‘virtual’ is a contract between the application and OpenVX that tells :
“the application will never access bytes of the virtual object”
The graph manager can reuse the same physical buffer across multiple virtual
objects
A virtual object never needs to be visible from the host
© Copyright, 2014 28
OpenVX Graph
color
convert
channel
extract
pyramid
pyr-1 pyr0
frameRGB
frameYUV
frameGray
© Copyright, 2014 29
Feature tracking: Graph Creation (1)
void createTrackerGraph(vx_image frameRGB, tracker_t &trk) {
trk.graph = vxCreateGraph(trk.context);
// Create color convert node
vx_image frameYUV = vxCreateVirtualImage(trk.graph, 0, 0,
VX_DF_IMAGE_IYUV);
trk.cvt_color_node = vxColorConvertNode(trk.graph, frameRGB, frameYUV);
// Create channel extract node
vx_image frameGray = vxCreateVirtualImage(trk.graph, 0, 0,
VX_DF_IMAGE_U8);
trk.ch_extract_node = vxChannelExtractNode (trk.graph, frameYUV,
VX_CHANNEL_Y, frameGray);
// Create pyramid node
vx_pyramid pyr_sample =
vxCreatePyramid(trk.context, trk.pyr_levels, VX_SCALE_PYRAMID_HALF,
trk.width, trk.height, VX_DF_IMAGE_U8);
trk.pyr_delay = vxCreateDelay(trk.context, (vx_reference)pyr_sample, 2);
trk.pyr_node =
vxGaussianPyramidNode(trk.graph, frameGray,
(vx_pyramid)vxGetReferenceFromDelay(trk.pyr_delay, 0));
color
convert
channel
extract
pyramid
pyr-1 pyr0
frameRGB
frameYUV
frameGray
© Copyright, 2014 30
Feature tracking: Graph Creation (2)
vx_array pts_sample = vxCreateArray(trk.context, VX_TYPE_KEYPOINT, 1000);
trk.pts_delay = vxCreateDelay(trk.context, (vx_reference)pts_sample, 2);
trk.curr_features = vxCreateArray(trk.context, VX_TYPE_KEYPOINT, 1000);
vx_uint32 lk_epsilon = UINT_MAX;
vx_scalar s_lk_epsilon = vxCreateScalar(trk.context, VX_TYPE_UINT32, &lk_epsilon);
vx_scalar s_lk_num_iters = vxCreateScalar(trk.context, VX_TYPE_UINT32, &trk.lk_num_iters);
vx_bool lk_use_init_est = vx_false_e;
vx_scalar s_lk_use_init_est = vxCreateScalar(trk.context, VX_TYPE_BOOL, &lk_use_init_est);
trk.opt_flow_node =
vxOpticalFlowPyrLKNode(trk.graph,
(vx_pyramid)vxGetReferenceFromDelay(trk.pyr_delay, -1),
(vx_pyramid)vxGetReferenceFromDelay(trk.pyr_delay, 0),
(vx_array)vxGetReferenceFromDelay(trk.pts_delay, -1),
(vx_array)vxGetReferenceFromDelay(trk.pts_delay, -1),
trk.curr_features, VX_TERM_CRITERIA_ITERATIONS, s_lk_epsilon,
s_lk_num_iters, s_lk_use_init_est, trk.lk_win_size);
color
convert
channel
extract
pyramid
optical flow pyrLK
pyr-1 pyr0 pts0 pts-1
frameRGB
frameYUV
frameGray
curr_features
P
© Copyright, 2014 31
Feature tracking: Graph Creation (3)
// Create HarrisTrack node
trk.feature_track_node =
nvxHarrisTrackNode(trk.graph, frameGray, (vx_array)vxGetReferenceFromDelay(trk.pts_delay, 0),
0, trk.curr_features, trk.harris_k, trk.harris_thresh);
// Verify the graph is legal, and optimize it
vxVerifyGraph(trk.graph);
color
convert
channel
extract
pyramid
optical flow pyrLK
Harris
track
Pyr-1 pyr0 pts0 Pts-1
frameRGB
frameYUV
frameGray
curr_features
P
P