Date post: | 13-Jan-2015 |
Category: |
Technology |
Upload: | the-khronos-group-inc |
View: | 235 times |
Download: | 0 times |
© Copyright Khronos Group, 2013 - Page 1
Technology Update
Presented by:
Erik Noreke, Khronos Group Vice President of Business Development
November2013
© Copyright Khronos Group, 2013 - Page 2
Khronos Connects Software to Silicon
ROYALTY-FREE, OPEN STANDARD APIs for
advanced hardware acceleration
Low level silicon to software interfaces needed on every platform
Graphics, video, audio, compute,
vision, sensor and camera processing
Defines the forward looking roadmap for
the silicon community
Shipping on billions of devices across
multiple operating systems
Rigorous conformance tests for
cross-vendor consistency
Khronos is OPEN for any company to
join and participate
Acceleration APIs BY the Industry
FOR the Industry
© Copyright Khronos Group, 2013 - Page 3
Power is the New Limit to Performance • GPUs are much more power efficient than CPUs for data parallelism
- When exploiting data parallelism can x10 as efficient – but can go further…
• Lots of space for transistors on SOC – but can’t turn them all on at same time!
- Would exceed Thermal Design Point
• Dark Silicon - specialized hardware – only turned on when needed
- Dedicated units can increase locality and parallelism of computation
Power Efficiency
Computation Flexibility
Enabling new mobile use cases requires pushing computation
onto GPUs and dedicated hardware
Dedicated Hardware
GPU Compute
Multi-core CPU X1
X10
X100
How do we provide
access to this diversity of
processors and hardware
without horrible platform
fragmentation?
Standards!
© Copyright Khronos Group, 2013 - Page 4
OpenCL Built-in Kernels • Used to control non-OpenCL C-capable
resources on an SOC – ‘Custom Devices’
- E.g. Video encode/decode, Camera ISP …
• Represent functions of Custom Devices
as an OpenCL kernel
- Can enqueue Built-in Kernels to Custom
Devices alongside standard OpenCL kernels
• OpenCL run-time a powerful coordinating
framework for ALL SOC resources
- Programmable and custom devices
controlled by one run-time
Built-in kernels enable control of specialized processors and hardware
from OpenCL run-time
© Copyright Khronos Group, 2013 - Page 5
OpenCL SPIR 1.2 Provisional released!
OpenCL Roadmap
OpenCL 2.0
Significant enhancements to memory and execution models to
expose emerging hardware capabilities and provide increased
flexibility, functionality and performance to developers
OpenCL-SPIR (Standard Parallel Intermediate Representation)
Exploring LLVM-based, low-level Intermediate Representation for IP
Protection and as target back-end for alternative high-level languages
OpenCL-HLM (High Level Model)
High-level programming model, unifying host and device execution environments through
language syntax for increased usability and broader optimization opportunities
OpenCL 2.0 Provisional released!
© Copyright Khronos Group, 2013 - Page 6
Mobile OpenCL Shipping • Android ICD extension released in latest extension specification
- OpenCL implementations can be discovered and loaded as a shared object
• Multiple implementations shipping in Android NDK
- ARM, Imagination, Vivante, Qualcomm, Samsung …
© Copyright Khronos Group, 2013 - Page 7
OpenGL 3D API Family Tree
OpenGL ES 1.0
OpenGL ES 1.1 OpenGL ES 2.0 OpenGL ES 3.0
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
OpenGL 1.5 OpenGL 2.0 OpenGL 4.3 OpenGL 2.1
OpenGL 3.0
OpenGL 3.1
OpenGL 3.2
OpenGL 3.3
OpenGL 4.0
OpenGL 4.1
OpenGL 4.2
2002
OpenGL 1.3
ES-Next
GL-Next
OpenGL ES 2.0
Content OpenGL ES 1.1
Content
OpenGL ES 3.0
Content
ES3 is backward compatible
so new features can be
added incrementally Fixed function
3D Pipeline
Programmable vertex
and fragment shaders
WebGL 1.0
OpenGL 4.4 is a
superset of DX11
WebGL-Next
Desktop 3D
Mobile 3D
OpenGL 4.4
© Copyright Khronos Group, 2013 - Page 8
OpenGL 4.3 Compute Shaders • Execute algorithmically general-purpose GLSL shaders
- Can operate on uniforms, images and textures
• Process graphics data in the context of the graphics pipeline
- Easier than interoperating with a compute API IF processing ‘close to the pixel’
• Standard part of all OpenGL 4.3 implementations
- Matches DX11 DirectCompute functionality
Physics AI Simulation Ray Tracing Imaging Global Illumination
© Copyright Khronos Group, 2013 - Page 9
OpenGL ES 3.0 Highlights • Better looking, faster performing games and apps – at lower power
- Incorporates proven features from OpenGL 3.3 / 4.x
- 32-bit integers and floats in shader programs
- NPOT, 3D textures, depth textures, texture arrays
- Multiple Render Targets for deferred rendering, Occlusion Queries
- Instanced Rendering, Transform Feedback …
• Make life better for the programmer
- Tighter requirements for supported features to reduce implementation variability
• Backward compatible with OpenGL ES 2.0
- OpenGL ES 2.0 apps continue to run unmodified
• Standardized Texture Compression
- #1 developer request!
© Copyright Khronos Group, 2013 - Page 10
Visual Sensor Revolution • Single sensor RGB cameras are just the start of the mobile visual revolution
- IR sensors – LEAP Motion, eye-trackers
• Multi-sensors: Stereo pairs -> Plenoptic array -> Depth cameras
- Stereo pair can enable object scaling and enhanced depth extraction
- Plenoptic Field processing needs FFTs and ray-casting
• Hybrid visual sensing solutions
- Different sensors mixed for different distances and lighting conditions
• GPUs today – more dedicated ISPs tomorrow?
Dual Camera LG Electronics
Plenoptic Array Pelican imaging
Capri Structured Light 3D Camera PrimeSense
© Copyright Khronos Group, 2013 - Page 11
OpenVX • Vision Hardware Acceleration Layer
- Enables hardware vendors to implement
accelerated imaging and vision algorithms
- For use by high-level libraries or apps
• Focus on enabling real-time vision
- On mobile and embedded systems
• Diversity of efficient implementations
- From programmable processors, through
GPUs to dedicated hardware pipelines
Open source sample
implementation
Hardware vendor
implementations
OpenCV open
source library
Other higher-level
CV libraries
Application
Dedicated hardware can help make vision
processing performant and low-power enough
for pervasive ‘always-on’ use
© Copyright Khronos Group, 2013 - Page 12
OpenVX - Power Efficient Vision Acceleration • Create vision processing graph for power and performance efficiency
- Each Node can be implemented in software or accelerated hardware
- Nodes may be fused by the implementation to eliminate memory transfers
• EGLStreams can provide data and event interop with other APIs
- BUT use of other Khronos APIs are not mandated
• VXU Utility Library provides efficient access to single nodes
- Open source implementation – easy way to start using OpenVX
OpenVX Node
OpenVX Node
OpenVX Node
OpenVX Node
Heterogeneous
Processing
Native
Camera
Control
© Copyright Khronos Group, 2013 - Page 13
OpenVX and OpenCV are Complementary
Governance Open Source
Community Driven No formal specification
Formal specification and full conformance tests
Implemented by hardware vendors
Scope Very wide
1000s of functions of imaging and vision Multiple camera APIs/interfaces
Tight focus on hardware accelerated functions for mobile vision Use external camera API
Conformance No Conformance testing
Every vendor implements different subset Full conformance test suite / process
Reliable acceleration platform
Use Case Rapid prototyping Production deployment
Efficiency Memory-based architecture
Each operation reads and writes memory Sub-optimal power / performance
Graph-based execution Optimized nodes and data transfer
Highly efficient
© Copyright Khronos Group, 2013 - Page 14
Typical Imaging Pipeline • Pre- and Post-processing can be done on CPU, GPU, DSP…
• ISP controls camera via 3A algorithms
Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF)
• ISP may be a separate chip or within Application Processor
Pre-processing Image Signal Processor
(ISP)
Post-
processing
CMOS sensor
Color Filter Array
Lens
Bayer RGB/YUV
App
Lens, sensor, aperture control 3A
Need for advanced
camera control API!
© Copyright Khronos Group, 2013 - Page 15
FCAM with Extensions • Sample time-stamping for synch between cameras and MEMS sensors
• ISP model (including 3A)
• Regions of Interest
• Multiple cameras
• Multiple ISPs
• Re-entrant ISPs
• Multiple output streams
• Efficient memory allocation
• Streaming rows (not just frames)
• Image types - aligned with MIPI CSI specifications
• Metadata & Statistics
• Vendor extensions – specialized formats and capabilities
© Copyright Khronos Group, 2013 - Page 16
Low Power Environment Scanning • Many sensor use cases would consume too much power to be running 24/7
- Environment aware use cases have to be very low power
• ‘Scanners’ - very low power, always on, detect things in the environment
- Trigger the next level of processing capability
ARM 7 1 MIP and accelerometers can
detect someone in the vicinity
DSP Low power activation of camera
to detect someone in field of view
GPU GPU acceleration for precision
gesture processing
© Copyright Khronos Group, 2013 - Page 17
Sensor Industry Fragmentation …
© Copyright Khronos Group, 2013 - Page 18
StreamInput Sensor Fusion Stack
OS Sensor OS APIs (E.g. Android SensorManager or
iOS CoreMotion)
Low-level native API defines access to
fused sensor data stream and context-awareness
…
Applications
Sensor Sensor
Sensor
Hub Sensor
Hub
StreamInput implementations
compete on sensor stream quality,
reduced power consumption,
environment triggering and context
detection – enabling sensor
subsystem vendors to increased
ADDED VALUE
Middleware (E.g. Augmented Reality engines,
gaming engines)
Platforms can provide
increased access to
improved sensor data stream
– driving faster, deeper
sensor usage by applications
Middleware engines need platform-
portable access to native, low-level
sensor data stream
Mobile or embedded
platforms without sensor
fusion APIs can provide
direct application access
to StreamInput
Hardware transport
interfaces are defined
by each system, e.g.
IIO or HID sensor
© Copyright Khronos Group, 2013 - Page 19
Khronos APIs for Augmented Reality
Advanced Camera Control and stream
generation
3D Rendering and Video
Composition
On GPU
Audio
Rendering
Application
on CPUs, GPUs
and DSPs
Sensor
Fusion
Vision
Processing
MEMS
Sensors
Camera Control
API
EGLStream - stream data
between APIs
Precision timestamps
on all sensor samples
AR needs not just advanced sensor processing, vision
acceleration, computation and rendering - but also for
all these subsystems to work efficiently together
© Copyright Khronos Group, 2013 - Page 20
Leveraging Proven Native APIs into HTML5 • Khronos and W3C liaison
- Leverage proven native API investments into the Web
- Fast API development and deployment
- Designed by the hardware community
- Familiar foundation reduces developer learning curve
Native APIs shipping
or Khronos working group
JavaScript API shipping,
acceleration being developed
or work underway
WebVX? Vision
Processing
WebCAM(!) Camera
control and
video
processing
Possible future
JavaScript APIs or
acceleration
WebStream? Sensor Fusion
Native
JavaScript Canvas
Path Rendering
Camera
Control
HTML
© Copyright Khronos Group, 2013 - Page 21
Microsoft PhotoSynth2 • Demonstrated at Build 2013
http://channel9.msdn.com/Events/Build/2013/4-072 1:50
© Copyright Khronos Group, 2013 - Page 22
C/C++
SDK Dalvik (Java)
Objective C C#
DirectX
HTML/CSS HTML/CSS HTML/CSS
Cross-OS Portability
HTML5 provides cross
platform portability. GPU
accessibility through
WebGL available soon on
~90% mobile systems
Preferred development
environments not
designed for portability
Native code is portable-
but apps must cope with
different available APIs
and libraries
© Copyright Khronos Group, 2013 - Page 23
WebCL – Parallel Computing for the Web • JavaScript bindings to OpenCL APIs
- Enables initiation of Kernels written in OpenCL C within the browser
http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc
© Copyright Khronos Group, 2013 - Page 24
3D Needs a Transmission Format! • Compression and streaming of 3D assets becoming essential
- Mobile and connected devices need access to increasingly large asset databases
• 3D is the last media type to define a compressed format
- 3D is more complex – diverse asset types and use cases
• Needs to be royalty-free
- Avoid an ‘internet video codec war’ scenario
• Eventually enable hardware implementations of successful codecs
- High-performance and low power – but pragmatic adoption strategy is key
Audio Video Images 3D
MP3 H.264 JPEG ? !
An effective and widely adopted codec ignites previously
unimagined opportunities for a media type
© Copyright Khronos Group, 2013 - Page 25
COLLADA and glTF Open Source Ecosystem
Tool Interop
Three.js glTF Importer. Rest3D initiative
COLLADA2GLTF
Translator
OpenCOLLADA
Importer/Exporter
and COLLADA
Conformance Tests
On GitHUB
Pervasive WebGL deployment
Other
authoring
formats
Web-based Tools
https://github.com/KhronosGroup/glTF
https://github.com/KhronosGroup/OpenCOLLADA
https://github.com/KhronosGroup/COLLADA-CTS
© Copyright Khronos Group, 2013 - Page 26
Conclusion • Hardware acceleration is a complex application domain and needs multiple
standards across diverse domains
• Advances in SOC silicon processing and associated APIs to access them are about
to enable mobile devices to truly meet user expectations
• Now is a good time to get involved with the standards initiatives
that effect your business
• These slides and more details at
www.khronos.org