Date post: | 22-Nov-2014 |
Category: |
Technology |
Upload: | mikael-bourges-sevenier |
View: | 526 times |
Download: | 0 times |
| © 2013 Aptina Imaging Corporation | Aptina Confidential 1
© 2013 Aptina Imaging Corporation. All rights reserved. Products are warranted only to meet Aptina’s production data sheet specifications. Information, products, and/or specifications are subject to change without notice. All information is provided on an “AS IS” basis without warranties of any kind. Dates are estimates only. Drawings not to scale. Aptina and the Aptina logo are trademarks of Aptina Imaging Corporation. All other trademarks are the property of their respective owners.
Imaging using ARM GPU Investigating flexible imaging pipelines using embedded GPU
Mikaël Bourges-Sévenier (msevenier at aptina dot com)
Director, High-Performance Imaging
December 2, 2013
HPC & GPU Supercomputing Group of Silicon Valley
| © 2013 Aptina Imaging Corporation | Aptina Confidential 2
Agenda
• Toward more flexible imaging pipelines
• Replacing image processor by software & hardware
• Video HDR using Aptina Interlaced HDR sensor
• Q&A
| © 2013 Aptina Imaging Corporation | Aptina Confidential 3
Cameras are everywhere
Interactive Systems that respond
to user actions (PC, Gaming, Mobile)
• Motion/Gesture tracking and recognition
• Body tracking
Environmental Imaging systems that
are situationally aware
(Camera, Mobile, PC)
• Face Detection/Track • Gesture tracking • Object tracking
Ubiquitous “Cameras Everywhere”
Distributed Systems (Mobile, Camera, DIY-
SOHO)
• Point and shoot • HDR
• Surveillance
| © 2013 Aptina Imaging Corporation | Aptina Confidential 4
Computational imaging evolution
Spatial (Volumetric)
Gesture
AR
Face Detect
Face Track
Presence
Colorimetry
Brightness
Web Cam
Smart Camera
True Color, Brightness Compensation, Exposure control
User Identity Access Control
Augmented Information
3D Imaging
Interactive Services
| © 2013 Aptina Imaging Corporation | Aptina Confidential 5
How imaging pipelines work
| © 2013 Aptina Imaging Corporation | Aptina Confidential 6
How Imaging Sensors work
http://www.photoaxe.com
Bayer GRBG pattern • 50% green • 25% red and blue
Bayer CFA is one type of pattern
| © 2013 Aptina Imaging Corporation | Aptina Confidential 7
Bayer Demosaicing
• More G than R, B since eye is more sensitive to luminance than chrominance
• Convert pixel colors from Bayer space to Full RGB color
• Complex interpolation to avoid artifacts (e.g. on edges)
0 1
2 3
0 GRBG1 RGGB2 GBRG3 BGGR
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
RGB
| © 2013 Aptina Imaging Corporation | Aptina Confidential 8
From RAW to RGB/YUV: the ISP
• ISP = Imaging Signal Processor
‣ Transform sensor RAW images to YUV
‣ Very complex pipelines, dedicated, optimized for imaging
‣ Low power (200-400mW)
‣ Embedded in Application Processor or as a separate co-processor
Can I upgrade algorithms?
Image Signal Processor (ISP)
CMOS sensor Color Filter Array
Lens
RAW Bayer
RGB YUV
Lens, sensor, aperture control
| © 2013 Aptina Imaging Corporation | Aptina Confidential 9
Image pipeline block diagram (typical) Sensor Bayer data
Black Level adjust
Lens Shading correction
White Balancing
Defect Correction
Noise Reduction (Bayer) Green balance Demosaic
Color Correction Sharpening Tone Mapping
and Gamma Full color RGB data (to YUV for JPEG)
| © 2013 Aptina Imaging Corporation | Aptina Confidential 10
Problem Statement
• Given a non-typical imaging pipeline, how do we
‣ Take advantage of resources in an embedded platform?
‣ Keep frame rate at 30fps?
‣ Preserve good image quality?
‣ Minimize power usage?
‣ Provide flexible pipelines
| © 2013 Aptina Imaging Corporation | Aptina Confidential 11
Alternative Approaches to ISP-only
Hybrid Full Software
ISP + GPU + CPU + DSP GPU + CPU + DSP
Less power More power
Bayer pattern Any pattern
Reuse existing ISP (may not be re-entrant) Require fast processors
Require recent devices Require high-end devices
ISP may only output 8b precision 8b-32b precision
Pre-processing Image Signal Processor (ISP)
Post-processing
CMOS sensor Color Filter Array
Lens
Bayer RGB YUV
App
Lens, sensor, aperture control
3A
| © 2013 Aptina Imaging Corporation | Aptina Confidential 12
MobileHDR on ARM Mali T604
| © 2013 Aptina Imaging Corporation | Aptina Confidential 13
Arndale Samsung Exynos 5 Dual board • Arndale Samsung Exynos 5 board
‣ CPU: ARM Corte-A15 (2-core) 1.7 GHz 32nm
• 32KB L1 cache, 1MB L2 cache
‣ GPU: ARM MALI T604
• 64 concurrent threads
• Vector ALUs
• 128b registers
• OpenCL 1.1 Full Profile
‣ RAM: 2GB LP-DDR3 800 MHz (12.8 GB/s)
‣ Truly unified cached memory
• CPU and GPU memory is shared – NO COPY!
• 128b wide L1 and L2 access
‣ 2 independent job queue in T628 (in Samsung Exynos 5 Octa)
| © 2013 Aptina Imaging Corporation | Aptina Confidential 14
ARM Mali T604 GPUs In Samsung Exynos 5 Dual
Type Vector GPU Process 32nm
OpenCL 1.1 Full Profile Unified memory Yes
Rendering Tile Work-items 256
Clock 533MHz L2 cache 1MB
Register width 128b Global memory 2GB LP-DDR3 800Mhz (12.8 GB/s)
Pipelines 8 pipes (2 per core) Throughput 100 GFLOPS
Local memory 32KB/core (global)
Constant memory 64KB
Texture cache yes
Compute devices (shader cores)
4
Cacheline 64 bytes
16/32/64b floats No/yes/yes
| © 2013 Aptina Imaging Corporation | Aptina Confidential 15
Avoid buffer copy
• Mali has unified memory
‣ Use CL_MEM_ALLOC_PTR to avoid copy between CPU and GPU
Host data pointers
Global Memory
Buffer created by malloc()
CPU(Host)
GPU(Compute Device)
Buffers created by user (malloc) are notmapped into the GPU memory space
Global Memory
Buffer created by malloc()
CPU(Host)
Buffer created by clCreateBuffer()
GPU(Compute Device)
COPY clCreateBuffer(CL_MEM_USE_HOST_PTR)creates a new buffer and copies the data over(but the copy operations are expensive)
Host data pointers
Global Memory
Buffer created by malloc()
CPU(Host)
GPU(Compute Device)
Buffers created by user (malloc) are notmapped into the GPU memory space
Global Memory
Buffer created by malloc()
CPU(Host)
Buffer created by clCreateBuffer()
GPU(Compute Device)
COPY clCreateBuffer(CL_MEM_USE_HOST_PTR)creates a new buffer and copies the data over(but the copy operations are expensive)
Host data pointers
Global Memory
CPU(Host)
Buffer created by clCreateBuffer()
GPU(Compute Device)
clCreateBuffer(CL_MEM_ALLOC_HOST_PTR)creates a buffer visible by both GPU and CPU
� Where possible don’t use CL_MEM_USE_HOST_PTR– Create buffers at the start of your application– Use CL_MEM_ALLOC_HOST_PTR instead of malloc() – Then you can use the buffer on both CPU host and GPU
clCreateBuffer(CL_MEM_USE_HOST_PTR) clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) malloc()
| © 2013 Aptina Imaging Corporation | Aptina Confidential 16
Stream-based vs. Frame-based
• Stream-based
‣ For low memory devices (e.g. ISP, DSP)
‣ Group of lines processed by kernels
‣ Delay: # of lines a kernel needs
• Frame-based
‣ For fast data-parallel devices (e.g. GPU)
‣ Full image processed
‣ Delay: whole frame between devices
Kernelcontinuous streamof pixels
Q
Kernel
final image accumulates lines
Kernel Kernel KernelFrame Frame
Frame Frame
| © 2013 Aptina Imaging Corporation | Aptina Confidential 17
Aptina Sensor with MobileHDR™ Turned off
| © 2013 Aptina Imaging Corporation | Aptina Confidential 18
Aptina Sensor with MobileHDR™ Turned on
| © 2013 Aptina Imaging Corporation | Aptina Confidential 19
AR0833 8MP Camera sensor
• Frame is inscribed in a circle
‣ 4:3 for images e.g. 8MP 3264 x 2448
‣ 16:9 for video e.g. 6MP 3264 x 1836
• 10-bit per pixel (framed in 16 bits)
• At 30fps, we need 343 MB/s for 180 MPix/s
• Interface with ISP
‣ Data over MIPI CSI2 (serial)
‣ Control over I2C
4:3
2448
3264
16:9
1836
3264
1/3.2" image circle
| © 2013 Aptina Imaging Corporation | Aptina Confidential 20
Feature: Interlaced HDR
• 1 frame contains 2 exposures interlaced
• Ratio between odd and even pairs
‣ 1x, 2x, 4x, 8x
Aptina reserves the right to change products or specifications without notice.AR0833_DS - Rev. F Pub. 4/13 EN 30 ©2011 Aptina Imaging Corporation. All rights reserved.
AR0833: 1/3.2-Inch 8Mp CMOS Digital Image SensorFeatures
Aptina Confidential and Proprietary Preliminary
Features
Interlaced HDR Readout
The sensor enables HDR by outputting frames where even and odd row pairs within a single frame are captured at different integration times. This output is then matched with an algorithm designed to reconstruct this output into an HDR still image or video.
The sensor HDR is controlled by two shutter pointers (Shutter pointer1, Shutter pointer2) that control the integration of the odd (Shutter pointer1) and even (Shutter pointer 2) row pairs.
Figure 16: HDR Integration Time
Tint 1
Tint 2Sample pointer
Shutter pointer 1
Shutter pointer 2
I-FRAME 1
I-FRAME 2
Output Frame from Sensor
EXPOSUREI-FRAME 1
EXPOSUREI-FRAME 2
OutputI-FRAME 1 and 2
Aptina reserves the right to change products or specifications without notice.AR0833_DS - Rev. F Pub. 4/13 EN 30 ©2011 Aptina Imaging Corporation. All rights reserved.
AR0833: 1/3.2-Inch 8Mp CMOS Digital Image SensorFeatures
Aptina Confidential and Proprietary Preliminary
Features
Interlaced HDR Readout
The sensor enables HDR by outputting frames where even and odd row pairs within a single frame are captured at different integration times. This output is then matched with an algorithm designed to reconstruct this output into an HDR still image or video.
The sensor HDR is controlled by two shutter pointers (Shutter pointer1, Shutter pointer2) that control the integration of the odd (Shutter pointer1) and even (Shutter pointer 2) row pairs.
Figure 16: HDR Integration Time
Tint 1
Tint 2Sample pointer
Shutter pointer 1
Shutter pointer 2
I-FRAME 1
I-FRAME 2
Output Frame from Sensor
EXPOSUREI-FRAME 1
EXPOSUREI-FRAME 2
OutputI-FRAME 1 and 2
Aptina reserves the right to change products or specifications without notice.AR0833_DS - Rev. F Pub. 4/13 EN 30 ©2011 Aptina Imaging Corporation. All rights reserved.
AR0833: 1/3.2-Inch 8Mp CMOS Digital Image SensorFeatures
Aptina Confidential and Proprietary Preliminary
Features
Interlaced HDR Readout
The sensor enables HDR by outputting frames where even and odd row pairs within a single frame are captured at different integration times. This output is then matched with an algorithm designed to reconstruct this output into an HDR still image or video.
The sensor HDR is controlled by two shutter pointers (Shutter pointer1, Shutter pointer2) that control the integration of the odd (Shutter pointer1) and even (Shutter pointer 2) row pairs.
Figure 16: HDR Integration Time
Tint 1
Tint 2Sample pointer
Shutter pointer 1
Shutter pointer 2
I-FRAME 1
I-FRAME 2
Output Frame from Sensor
EXPOSUREI-FRAME 1
EXPOSUREI-FRAME 2
OutputI-FRAME 1 and 2
Exposure 1
Exposure 2
| © 2013 Aptina Imaging Corporation | Aptina Confidential 21
mobileHDR demo
• Zero-copy between sensor/OpenCL and OpenCL/OpenGL
• On Arndale board (Samsung Exynos 5 Dual with Mali T604 GPU)
Noise Reduction
iHDR Reconstruction Bayer scaler
Tone Mapping Color Correction
10b iHDR3264x1836 14b
RGB888
EGLImage
CL Image
1080p
OpenCL
GL Texture
OpenGL ES
| © 2013 Aptina Imaging Corporation | Aptina Confidential 22
Summary
• Using GPU for imaging
‣ Provide flexible solutions where traditional ISP is not usable
‣ Fast time to market
• Today’s application processors provide enough processing power for video HDR
• Embedded GPUs tend to increase their ALU count x2 every year
‣ Early 2013 4MP30, End 2013 8MP30,
‣ Early 2014 13MP30
| © 2013 Aptina Imaging Corporation | Aptina Confidential 23
Questions & Answers
Thank you!