Home >Technology >OpenVX Camera Sensors AR SIGGRAPH Asia

OpenVX Camera Sensors AR SIGGRAPH Asia

Date post:10-May-2015
View:1,149 times
Download:0 times
Share this document with a friend
Following our successful participation at SIGGRAPH Asia 2012 in Singapore, the Khronos Group is excited to demonstrate and educate about Khronos APIs at SIGGRAPH Asia 2013 in Hong Kong. This presentation covers the new OpenVX Camera Sensors for Augmented Reality (AR), by Neil Trevett.
  • 1.Enabling Augmented RealityCamera Processing, Vision Acceleration and Sensor Fusion Neil TrevettVice President NVIDIA, President Khronos Copyright Khronos Group 2013 - Page 1

2. Khronos Standards 3D Asset Handling - Advanced Authoring pipelines - 3D Asset Transmission Format with streaming and compressionVisual Computing- Object and Terrain Visualization - Advanced scene constructionCamera Control APIOver 100 companies defining royalty-free APIs to connect software to siliconOpenVX 1.0 Provisional Released!Sensor ProcessingAcceleration in the Browser- WebGL for 3D in browsers - WebCL Heterogeneous Computing for the web- Mobile Vision Acceleration - On-device Sensor Fusion Copyright Khronos Group 2013 - Page 2 3. Mobile Compute Driving Imaging Use Cases Requires significant computing over large data setsComputational PhotographyFace, Body and Gesture Tracking3D Scene/Object ReconstructionAugmented RealityTime Copyright Khronos Group 2013 - Page 3 4. Accelerating AR to Meet User Expectations Mobile is an enabling platform for Augmented Reality - Mobile SOC and sensor capabilities are expanding quickly But we need mobile AR to be 60Hz buttery smooth AND low power - Power is now the main challenge to increasing quality of the AR user experience What are the silicon acceleration APIs on todays mobile SOCs and OS - And how they can be used to optimize AR performance AND powerSOC = System On ChipComplete compute system minus memory and some peripherals Copyright Khronos Group 2013 - Page 4 5. Why are AR Standards Needed? State-of-the-art Augmented Reality on mobile today before accelerationCourtesy Metaio http://www.youtube.com/watch?v=xw3M-TNOo44&feature=related Copyright Khronos Group 2013 - Page 5 6. Where AR Standards Can Take Us Ray-tracing and light-field calculations running today on CUDA laptop PC 50+ Watts Ongoing research to use depth cameras to reconstruct global illumination model in real-timeNeed on mobile devices at 100x less power = 0.5WHigh-Quality Reflections, Refractions, and Caustics in Augmented Reality and their Contribution to Visual Coherence P. Kn, H. Kaufmann, Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria Copyright Khronos Group 2013 - Page 6 7. CPU/GPU AGGREGATE PERFORMANCEMobile SOC Performance IncreasesDenver 64-bit CPU Maxwell GPUFull Kepler GPU CUDA 5.0 OpenGL 4.3100ParkerGoogle Nexus 7Logan 100x perf increase in four yearsHTC One X+Tegra 410Quad A15Tegra 3 Quad A9 Power saver 5th coreTegra 2 Dual A912012 2011201320142015Device Shipping Dates Copyright Khronos Group 2013 - Page 7 8. Power is the New Design Limit The Process Fairy keeps bringing more transistors.. ..but the End of Voltage Scaling means power is much more of an issue than in the pastIn the Good Old DaysThe New RealityLeakage was not important, and voltage scaled with feature sizeLeakage has limited threshold voltage, largely ending voltage scalingL = L/2 D = 1/L2 = 4D f = 2f V = V/2 E = CV2 = E/8 P = P Halve L and get 4x the transistors and 8x the capability for the same powerL = L/2 D = 1/L2 = 4D f = ~2f V = ~V E = CV2 = E/2 P = 4P Halve L and get 4x the transistors and 8x the capability for 4x the power!! Copyright Khronos Group 2013 - Page 8 9. Mobile Thermal Design Point 7 Screen takes 1W4-5 Screen takes 250-500mW2-4W10 Screen takes 1-2W Resolution makes a difference the iPad3 screen takes up to 8W!4-7W6-10W30-90WTypical max system power levels before thermal failure Even as battery technology improves - these thermal limits remain Copyright Khronos Group 2013 - Page 9 10. How to Save Power?Write 32-bits to LP-DDR2 600pJ Much more expensive to MOVE data than COMPUTE dataSend 32-bits Off-chip 50pJ Process improvements WIDEN the gap - 10nm process will increase ratio another 4X Energy efficiency must be key metric during silicon AND app design - Awareness of where data lives, where computation happens, how is it scheduledSend 32-bits 2mm 24pJ32-bit Float Operation 7pJFor 40nm, 1V process32-bit Integer Add 1pJ 32-bit Register Write 0.5pJ Copyright Khronos Group 2013 - Page 10 11. Hardware Save Power e.g. Camera Sensor ISP CPU - Single processor or Neon SIMD - running fast - Makes heavy use of general memory - Non-optimal performance and power GPU - Programmable and flexible - Many way parallelism - run at lower frequency - Efficient image caching close to processors - BUT cycles frames in and out of memory Camera ISP (Image Signal Processor) - Little or no programmability - Data flows thru compact hardware pipe - Scan-line-based - no global memory - Best perf/watt~760 math Ops ~42K vals = 670Kb 300MHz ~250Gops Copyright Khronos Group 2013 - Page 11 12. Power is the New Performance Limit Lots of space for transistors on SOC but cant turn them all on at same time! - Would exceed Thermal Design Point of mobile devices GPUs are much more power efficient than CPUs - When exploiting data parallelism can be x10 as efficient but can go further Dedicated units can increase locality and parallelism of computation - Dark Silicon - specialized hardware only turned on when needed X100Power Efficiency X10Enabling new mobile AR experiences requires pushing computation onto GPUs and dedicated hardwareX1Dedicated Hardware GPU Compute Multi-core CPU Computation Flexibility Copyright Khronos Group 2013 - Page 12 13. OpenVX Power Efficient Vision Processing Acceleration API for real-time vision - Focus on mobile and embedded systems Diversity of efficient implementations - From programmable processors, through GPUs to dedicated hardware pipelines Tightly specified API with conformance - Portable, production-grade vision functionsApplicationOther higher-level CV librariesOpenCV open source library Complementary to OpenCV - Which is great for prototyping Open source sample implementationHardware vendor implementationsAcceleration for power-efficient vision processing Copyright Khronos Group 2013 - Page 13 14. OpenVX Graphs Vision processing directed graphs for power and performance efficiency - Each Node can be implemented in software or accelerated hardware - Nodes may be fused by the implementation to eliminate memory transfers - Tiling extension enables user nodes (extensions) to also run in local memory VXU Utility Library for access to single nodes - Easy way to start using OpenVX EGLStreams can provide data and event interop with other APIs - BUT use of other Khronos APIs are not mandatedNative Camera ControlOpenVX NodeOpenVX Node OpenVX NodeExample Graph and FlowOpenVX Node Heterogeneous Processing Copyright Khronos Group 2013 - Page 14 15. OpenVX 1.0 Function Overview Core data structures - Images and Image Pyramids - Processing Graphs, Kernels, Parameters Image Processing - Arithmetic, Logical, and statistical operations - Multichannel Color and BitDepth Extraction and Conversion - 2D Filtering and Morphological operations - Image Resizing and Warping Core Computer Vision - Pyramid computation - Integral Image computation Feature Extraction and Tracking - Histogram Computation and Equalization - Canny Edge Detection - Harris and FAST Corner detection - Sparse Optical Flow Copyright Khronos Group 2013 - Page 15 16. OpenVX Participants and Timeline Aiming for specification finalization by mid-2014 Itseez is working group chair Qualcomm and TI are specification editors Copyright Khronos Group 2013 - Page 16 17. OpenVX and OpenCV are ComplementaryGovernanceOpen Source Community Driven No formal specificationFormal specification and conformance tests Implemented by hardware vendorsScopeVery wide 1000s of functions of imaging and vision Multiple camera APIs/interfacesTight focus on hardware accelerated functions for mobile vision Use external camera APIConformanceNo Conformance testing Every vendor implements different subsetFull conformance test suite / process Reliable acceleration platformUse CaseRapid prototypingProduction deploymentEfficiencyMemory-based architecture Each operation reads and writes memoryGraph-based execution Optimizable computation, data transferPortabilityAPIs can vary depending on processorHardware abstracted for portability Copyright Khronos Group 2013 - Page 17 18. OpenVX and OpenCL are ComplementaryUse CaseGeneral Heterogeneous programmingDomain targeted - vision processingArchitectureLanguage-based needs online compilationLibrary-based - no online compiler requiredTarget HardwareExposed architected memory model can impact performance portabilityAbstracted node and memory model diverse implementations can be optimized for power and performancePrecisionFull IEEE floating point mandatedMinimal floating point requirements optimized for vision operatorsEase of UseFocus on general-purpose math libraries with no built-in vision functionsFully implemented vision operators and framework out of the box Copyright Khronos Group 2013 - Page 18 19. Stereo Machine VisionCamera 1Stereo Rectify with RemapCompute Depth Map (User Node)Detect and track objects (User Node)Object coordinatesOpenVX Graph Camera 2Stereo Rectify with RemapCompute Optical FlowImage PyramidFrame Delay Copyright Khronos Group 2013 - Page 19 20. Typical Imaging Pipeline Pre- and Post-processing can be done on CPU, GPU, DSP ISP controls camera via 3A algorithms Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF) ISP may be a separate chip or within Application Processor Lens, sensor, aperture control3ABayerRGB/YUVPre-processingCMOS sensor Color Filter Array LensImage Signal Processor (ISP)PostprocessingAppNeed for advanced camera control API: - to drive more flexible app camera control - over more types of camera sensors - with tighter integration with the rest of the system Copyright Khronos Group 2013 - Page 20 21. Advanced Camera Control Use Cases High-dynamic range (HDR) and computational flash photography - High-speed burst with individual frame control over exposure and flash Rolling shutter elimination -

Popular Tags:

Click here to load reader

Embed Size (px)