+ All Categories
Home > Technology > Imaging using ARM T6xx GPU

Imaging using ARM T6xx GPU

Date post: 22-Nov-2014
Category:
Upload: mikael-bourges-sevenier
View: 526 times
Download: 0 times
Share this document with a friend
Description:
Discuss challenges of implementing imaging pipelines on mobile chipsets with ARM Mali T604 GPU as found in Samsung Exynos 5. Presented at HPC & GPU Supercomputing group of Silicon Valley (http://www.meetup.com/HPC-GPU-Supercomputing-Group-of-Silicon-Valley) on Dec. 12, 2013
23
| © 2013 Aptina Imaging Corporation | Aptina Confidential 1 © 2013 Aptina Imaging Corporation. All rights reserved. Products are warranted only to meet Aptina’s production data sheet specifications. Information, products, and/or specifications are subject to change without notice. All information is provided on an “AS IS” basis without warranties of any kind. Dates are estimates only. Drawings not to scale. Aptina and the Aptina logo are trademarks of Aptina Imaging Corporation. All other trademarks are the property of their respective owners. Imaging using ARM GPU Investigating flexible imaging pipelines using embedded GPU Mikaël Bourges-Sévenier (msevenier at aptina dot com) Director, High-Performance Imaging December 2, 2013 HPC & GPU Supercomputing Group of Silicon Valley
Transcript
Page 1: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 1

© 2013 Aptina Imaging Corporation. All rights reserved. Products are warranted only to meet Aptina’s production data sheet specifications. Information, products, and/or specifications are subject to change without notice. All information is provided on an “AS IS” basis without warranties of any kind. Dates are estimates only. Drawings not to scale. Aptina and the Aptina logo are trademarks of Aptina Imaging Corporation. All other trademarks are the property of their respective owners.

Imaging using ARM GPU Investigating flexible imaging pipelines using embedded GPU

Mikaël Bourges-Sévenier (msevenier at aptina dot com)

Director, High-Performance Imaging

December 2, 2013

HPC & GPU Supercomputing Group of Silicon Valley

Page 2: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 2

Agenda

•  Toward more flexible imaging pipelines

•  Replacing image processor by software & hardware

•  Video HDR using Aptina Interlaced HDR sensor

•  Q&A

Page 3: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 3

Cameras are everywhere

Interactive Systems that respond

to user actions (PC, Gaming, Mobile)

•  Motion/Gesture tracking and recognition

•  Body tracking

Environmental Imaging systems that

are situationally aware

(Camera, Mobile, PC)

•  Face Detection/Track •  Gesture tracking •  Object tracking

Ubiquitous “Cameras Everywhere”

Distributed Systems (Mobile, Camera, DIY-

SOHO)

•  Point and shoot •  HDR

•  Surveillance

Page 4: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 4

Computational imaging evolution

Spatial (Volumetric)

Gesture

AR

Face Detect

Face Track

Presence

Colorimetry

Brightness

Web Cam

Smart Camera

True Color, Brightness Compensation, Exposure control

User Identity Access Control

Augmented Information

3D Imaging

Interactive Services

Page 5: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 5

How imaging pipelines work

Page 6: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 6

How Imaging Sensors work

http://www.photoaxe.com

Bayer GRBG pattern •  50% green •  25% red and blue

Bayer CFA is one type of pattern

Page 7: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 7

Bayer Demosaicing

•  More G than R, B since eye is more sensitive to luminance than chrominance

•  Convert pixel colors from Bayer space to Full RGB color

•  Complex interpolation to avoid artifacts (e.g. on edges)

0 1

2 3

0 GRBG1 RGGB2 GBRG3 BGGR

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

RGB

Page 8: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 8

From RAW to RGB/YUV: the ISP

•  ISP = Imaging Signal Processor

‣  Transform sensor RAW images to YUV

‣  Very complex pipelines, dedicated, optimized for imaging

‣  Low power (200-400mW)

‣  Embedded in Application Processor or as a separate co-processor

Can I upgrade algorithms?

Image Signal Processor (ISP)

CMOS sensor Color Filter Array

Lens

RAW Bayer

RGB YUV

Lens, sensor, aperture control

Page 9: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 9

Image pipeline block diagram (typical) Sensor Bayer data

Black Level adjust

Lens Shading correction

White Balancing

Defect Correction

Noise Reduction (Bayer) Green balance Demosaic

Color Correction Sharpening Tone Mapping

and Gamma Full color RGB data (to YUV for JPEG)

Page 10: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 10

Problem Statement

•  Given a non-typical imaging pipeline, how do we

‣  Take advantage of resources in an embedded platform?

‣  Keep frame rate at 30fps?

‣  Preserve good image quality?

‣  Minimize power usage?

‣  Provide flexible pipelines

Page 11: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 11

Alternative Approaches to ISP-only

Hybrid Full Software

ISP + GPU + CPU + DSP GPU + CPU + DSP

Less power More power

Bayer pattern Any pattern

Reuse existing ISP (may not be re-entrant) Require fast processors

Require recent devices Require high-end devices

ISP may only output 8b precision 8b-32b precision

Pre-processing Image Signal Processor (ISP)

Post-processing

CMOS sensor Color Filter Array

Lens

Bayer RGB YUV

App

Lens, sensor, aperture control

3A

Page 12: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 12

MobileHDR on ARM Mali T604

Page 13: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 13

Arndale Samsung Exynos 5 Dual board •  Arndale Samsung Exynos 5 board

‣  CPU: ARM Corte-A15 (2-core) 1.7 GHz 32nm

•  32KB L1 cache, 1MB L2 cache

‣  GPU: ARM MALI T604

•  64 concurrent threads

•  Vector ALUs

•  128b registers

•  OpenCL 1.1 Full Profile

‣  RAM: 2GB LP-DDR3 800 MHz (12.8 GB/s)

‣  Truly unified cached memory

•  CPU and GPU memory is shared – NO COPY!

•  128b wide L1 and L2 access

‣  2 independent job queue in T628 (in Samsung Exynos 5 Octa)

Page 14: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 14

ARM Mali T604 GPUs In Samsung Exynos 5 Dual

Type Vector GPU Process 32nm

OpenCL 1.1 Full Profile Unified memory Yes

Rendering Tile Work-items 256

Clock 533MHz L2 cache 1MB

Register width 128b Global memory 2GB LP-DDR3 800Mhz (12.8 GB/s)

Pipelines 8 pipes (2 per core) Throughput 100 GFLOPS

Local memory 32KB/core (global)

Constant memory 64KB

Texture cache yes

Compute devices (shader cores)

4

Cacheline 64 bytes

16/32/64b floats No/yes/yes

Page 15: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 15

Avoid buffer copy

•  Mali has unified memory

‣  Use CL_MEM_ALLOC_PTR to avoid copy between CPU and GPU

Host data pointers

Global Memory

Buffer created by malloc()

CPU(Host)

GPU(Compute Device)

Buffers created by user (malloc) are notmapped into the GPU memory space

Global Memory

Buffer created by malloc()

CPU(Host)

Buffer created by clCreateBuffer()

GPU(Compute Device)

COPY clCreateBuffer(CL_MEM_USE_HOST_PTR)creates a new buffer and copies the data over(but the copy operations are expensive)

Host data pointers

Global Memory

Buffer created by malloc()

CPU(Host)

GPU(Compute Device)

Buffers created by user (malloc) are notmapped into the GPU memory space

Global Memory

Buffer created by malloc()

CPU(Host)

Buffer created by clCreateBuffer()

GPU(Compute Device)

COPY clCreateBuffer(CL_MEM_USE_HOST_PTR)creates a new buffer and copies the data over(but the copy operations are expensive)

Host data pointers

Global Memory

CPU(Host)

Buffer created by clCreateBuffer()

GPU(Compute Device)

clCreateBuffer(CL_MEM_ALLOC_HOST_PTR)creates a buffer visible by both GPU and CPU

� Where  possible  don’t  use  CL_MEM_USE_HOST_PTR– Create buffers at the start of your application– Use CL_MEM_ALLOC_HOST_PTR instead of malloc() – Then you can use the buffer on both CPU host and GPU

clCreateBuffer(CL_MEM_USE_HOST_PTR) clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) malloc()

Page 16: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 16

Stream-based vs. Frame-based

•  Stream-based

‣  For low memory devices (e.g. ISP, DSP)

‣  Group of lines processed by kernels

‣  Delay: # of lines a kernel needs

•  Frame-based

‣  For fast data-parallel devices (e.g. GPU)

‣  Full image processed

‣  Delay: whole frame between devices

Kernelcontinuous streamof pixels

Q

Kernel

final image accumulates lines

Kernel Kernel KernelFrame Frame

Frame Frame

Page 17: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 17

Aptina Sensor with MobileHDR™ Turned off

Page 18: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 18

Aptina Sensor with MobileHDR™ Turned on

Page 19: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 19

AR0833 8MP Camera sensor

•  Frame is inscribed in a circle

‣  4:3 for images e.g. 8MP 3264 x 2448

‣  16:9 for video e.g. 6MP 3264 x 1836

•  10-bit per pixel (framed in 16 bits)

•  At 30fps, we need 343 MB/s for 180 MPix/s

•  Interface with ISP

‣  Data over MIPI CSI2 (serial)

‣  Control over I2C

4:3

2448

3264

16:9

1836

3264

1/3.2" image circle

Page 20: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 20

Feature: Interlaced HDR

•  1 frame contains 2 exposures interlaced

•  Ratio between odd and even pairs

‣  1x, 2x, 4x, 8x

Aptina reserves the right to change products or specifications without notice.AR0833_DS - Rev. F Pub. 4/13 EN 30 ©2011 Aptina Imaging Corporation. All rights reserved.

AR0833: 1/3.2-Inch 8Mp CMOS Digital Image SensorFeatures

Aptina Confidential and Proprietary Preliminary

Features

Interlaced HDR Readout

The sensor enables HDR by outputting frames where even and odd row pairs within a single frame are captured at different integration times. This output is then matched with an algorithm designed to reconstruct this output into an HDR still image or video.

The sensor HDR is controlled by two shutter pointers (Shutter pointer1, Shutter pointer2) that control the integration of the odd (Shutter pointer1) and even (Shutter pointer 2) row pairs.

Figure 16: HDR Integration Time

Tint 1

Tint 2Sample pointer

Shutter pointer 1

Shutter pointer 2

I-FRAME 1

I-FRAME 2

Output Frame from Sensor

EXPOSUREI-FRAME 1

EXPOSUREI-FRAME 2

OutputI-FRAME 1 and 2

Aptina reserves the right to change products or specifications without notice.AR0833_DS - Rev. F Pub. 4/13 EN 30 ©2011 Aptina Imaging Corporation. All rights reserved.

AR0833: 1/3.2-Inch 8Mp CMOS Digital Image SensorFeatures

Aptina Confidential and Proprietary Preliminary

Features

Interlaced HDR Readout

The sensor enables HDR by outputting frames where even and odd row pairs within a single frame are captured at different integration times. This output is then matched with an algorithm designed to reconstruct this output into an HDR still image or video.

The sensor HDR is controlled by two shutter pointers (Shutter pointer1, Shutter pointer2) that control the integration of the odd (Shutter pointer1) and even (Shutter pointer 2) row pairs.

Figure 16: HDR Integration Time

Tint 1

Tint 2Sample pointer

Shutter pointer 1

Shutter pointer 2

I-FRAME 1

I-FRAME 2

Output Frame from Sensor

EXPOSUREI-FRAME 1

EXPOSUREI-FRAME 2

OutputI-FRAME 1 and 2

Aptina reserves the right to change products or specifications without notice.AR0833_DS - Rev. F Pub. 4/13 EN 30 ©2011 Aptina Imaging Corporation. All rights reserved.

AR0833: 1/3.2-Inch 8Mp CMOS Digital Image SensorFeatures

Aptina Confidential and Proprietary Preliminary

Features

Interlaced HDR Readout

The sensor enables HDR by outputting frames where even and odd row pairs within a single frame are captured at different integration times. This output is then matched with an algorithm designed to reconstruct this output into an HDR still image or video.

The sensor HDR is controlled by two shutter pointers (Shutter pointer1, Shutter pointer2) that control the integration of the odd (Shutter pointer1) and even (Shutter pointer 2) row pairs.

Figure 16: HDR Integration Time

Tint 1

Tint 2Sample pointer

Shutter pointer 1

Shutter pointer 2

I-FRAME 1

I-FRAME 2

Output Frame from Sensor

EXPOSUREI-FRAME 1

EXPOSUREI-FRAME 2

OutputI-FRAME 1 and 2

Exposure 1

Exposure 2

Page 21: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 21

mobileHDR demo

•  Zero-copy between sensor/OpenCL and OpenCL/OpenGL

•  On Arndale board (Samsung Exynos 5 Dual with Mali T604 GPU)

Noise Reduction

iHDR Reconstruction Bayer scaler

Tone Mapping Color Correction

10b iHDR3264x1836 14b

RGB888

EGLImage

CL Image

1080p

OpenCL

GL Texture

OpenGL ES

Page 22: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 22

Summary

•  Using GPU for imaging

‣  Provide flexible solutions where traditional ISP is not usable

‣  Fast time to market

•  Today’s application processors provide enough processing power for video HDR

•  Embedded GPUs tend to increase their ALU count x2 every year

‣  Early 2013 4MP30, End 2013 8MP30,

‣  Early 2014 13MP30

Page 23: Imaging using ARM T6xx GPU

| © 2013 Aptina Imaging Corporation | Aptina Confidential 23

Questions & Answers

Thank you!


Recommended