+ All Categories
Home > Documents > AMETHYST: Image Registration Engine for Multiframe Processing · Multiframe Image Processing...

AMETHYST: Image Registration Engine for Multiframe Processing · Multiframe Image Processing...

Date post: 22-Jun-2020
Category:
Upload: others
View: 11 times
Download: 0 times
Share this document with a friend
10
AMETHYST: Image Registration Engine for Multiframe Processing Department of Electrical Engineering Princeton University, Princeton, NJ 08544 USA Mohammed Shoaib, Rich Stoakley, Matt Uyttendayle, and Jie Liu Sensing and Energy Research Group Microsoft Research
Transcript
Page 1: AMETHYST: Image Registration Engine for Multiframe Processing · Multiframe Image Processing Enhanced Image Image 1 Image N Image 2 – e.g., high-dynamic range (HDR) imaging, de-noising,

AMETHYST: Image Registration Engine for Multiframe Processing

Department of Electrical Engineering

Princeton University, Princeton, NJ 08544 USA

Mohammed Shoaib, Rich Stoakley, Matt Uyttendayle, and Jie Liu

Sensing and Energy Research Group

Microsoft Research

Page 2: AMETHYST: Image Registration Engine for Multiframe Processing · Multiframe Image Processing Enhanced Image Image 1 Image N Image 2 – e.g., high-dynamic range (HDR) imaging, de-noising,

Multiframe processing (MFP) enables advanced algorithms for image analysis

MFP: Why should you care?

1

Multiframe Image Processing

Enhanced Image

Image 1

Image N

Image 2

– e.g., high-dynamic range (HDR) imaging, de-noising,

image stabilization, de-blurring, super-resolution

imaging, de-hazing, panoramic stitching, etc.

Short

Exposure

Long

Exposure

HDR ImageDe-noised ImageNoisy

Image 1

Noisy

Image 2

Stabilized ImageBlurry

Image 1

Blurry

Image 2

Blurry

Image 1

Blurry

Image 2

Blurry

Image 3

De-blurred Image

IMG SOURCES: (1) L. Zhang, CVPR ‘10 (2) M. Tico, Nokia ERDC ‘06, (3) A. Tomaszewska WSCG ‘07

Page 3: AMETHYST: Image Registration Engine for Multiframe Processing · Multiframe Image Processing Enhanced Image Image 1 Image N Image 2 – e.g., high-dynamic range (HDR) imaging, de-noising,

Why is it hard?

2

E.g., HDR Photography

Typically, serial processing → frame delays cause issues:

1. Moving objects create artifacts

Frame misalignments lead to artifacts in fused image

2. Moving camera also creates artifacts

Senso

r

ISP

Low exposure

shotHigh

exposure shot

Mem

ory

CPU

Merge algorithm to create

HDR image

Senso

r

ISP

Mem

ory

~ 2 seconds/frame

IMG SOURCES: (1) L. Zhang, CVPR ‘10 (2) NVIDIA whitepaper, MWC ‘13

Page 4: AMETHYST: Image Registration Engine for Multiframe Processing · Multiframe Image Processing Enhanced Image Image 1 Image N Image 2 – e.g., high-dynamic range (HDR) imaging, de-noising,

What are some existing solutions?

3

Solution 1: HDR Capture, e.g., Toshiba T4K05

Algorithmic solution is more interesting → needs no hardware change and scales to other applications

Solution 2: Algorithmic

Artifacts absent

HDR image

S-HDR image sensor

Image alignment

Compositing algorithm

Artifacts absent

HDR image

IMG SOURCES: (1) product brief T4K05 ‘13 (2) L. Zhang, CVPR ‘10

Page 5: AMETHYST: Image Registration Engine for Multiframe Processing · Multiframe Image Processing Enhanced Image Image 1 Image N Image 2 – e.g., high-dynamic range (HDR) imaging, de-noising,

What are others doing about it?

4

Senso

r

ISP

Low exposure

shotHigh

exposure shot

Mem

ory

CPU

Merge algorithm to create

HDR image

Senso

r

ISP

Mem

ory

~ 2 seconds/frameFig: Camera architecture in current high-end mobile devices

Sensor ISPQuad-core

CPU

Memory

Sensor ISPTegra 4

GPU Cores

NVIDIA Computational Photography Engine

Fig: Chimera: The NVIDIA computational photography arch.

Tegra 4 Quad-core CPU

Memory

Senso

r

~ 0.2 seconds/frame

CPU GPU

ISP

E.g., NVIDIA: Tegra 4 (2014)

1st real-time HDR, 1st HDR panorama, 1st object tracking

Proprietary ISP-embedded algorithms use GPU for acceleration →~10x speedup and cost power

SOURCE: NVIDIA whitepaper, MWC ‘13

Page 6: AMETHYST: Image Registration Engine for Multiframe Processing · Multiframe Image Processing Enhanced Image Image 1 Image N Image 2 – e.g., high-dynamic range (HDR) imaging, de-noising,

What are their limitations?

5

1. Current solutions: Modest speedups. Not generally applicable.

Image registration is a computational bottleneck → needs acceleration

Our target: ~100x speedup compared to software

Image Registration

Multiframe Compositing

IMG1

IMG1

IMG1

EnhancedImage

> 1.8 sec./frame ~1 sec. total

Image Alignment Image Fusing

2. Current algorithmic solutions: slow on CPUs

Page 7: AMETHYST: Image Registration Engine for Multiframe Processing · Multiframe Image Processing Enhanced Image Image 1 Image N Image 2 – e.g., high-dynamic range (HDR) imaging, de-noising,

What have we done about it?

6

Sensor

F0 Frame Bus

CSI

Streaming Input

BF LS WB DM

CCM EE γ YUV

M0

Mn

G G G

G G G

G

GPUCPU

IPD DFE

HE IWP

AMETHYSTISP

VI

Composite Frames

F1 Fn C0 C1 Cm

Aligned Frames Bus

Fig: Proposed architecture for multi-frame image processing (MFP)

Specialized IP core for image registration Compositing

We propose an architecture for MFP that has a dedicated accelerator for image registration

Page 8: AMETHYST: Image Registration Engine for Multiframe Processing · Multiframe Image Processing Enhanced Image Image 1 Image N Image 2 – e.g., high-dynamic range (HDR) imaging, de-noising,

What are our findings?

7

Technology 45 nm SOI

Area 0.15 mm2 (30k gates)

Memory < 2 MB

Frequency 1.1 GHz

Power 62.7 mW

Exec. Time 30ms/frame

Speed-up$ 37x over CPU

AMETHYST+ Performance Summary

AMETHYST shows speed-up of : 8x over GPU and 5x over FPGAat a power lower by : 14x than GPU and 3x than FPGA

0

0.1

0.2

0.3

0 200 400 600 800 1000

Tim

e/Fr

ame

(s)

Power Consumption (mW)

750

890

920

209.3245818

214.0819587

178.1161896

20

39.4

62.690

2

4

6

8

0 0.05 0.1 0.15 0.2 0.25

GFL

OP

S

Watts/GFLOPS

FPGA

GPU

GPU

FPGA

ASIC (AMETHYST)

ASIC (AMETHYST)

A4, PowerVR SGX535*

Tegra3, 12C GeForce*

Snapdragon, Adreno200*

Xilinx V6, 240k LC

Artix 7, 85k LC

Smartfusion2, 56k LUT

AMETHYST, 0.3 GHz

AMETHYST, 0.8 GHz

AMETHSYT, 1.1 GHz

Sensor

F0 Frame BusC

SIStreaming Input

GPUCPUAMETHYSTISPVI

Composite Frames

Fn

Aligned Frames Bus

C0 Cm

* performance and power are estimated values+ synthesis results for (IPD+DFE) blocks only $ assuming 60% cost due to (IPD + DFE)

Page 9: AMETHYST: Image Registration Engine for Multiframe Processing · Multiframe Image Processing Enhanced Image Image 1 Image N Image 2 – e.g., high-dynamic range (HDR) imaging, de-noising,

What are our findings? Contd…

8

Highlights• State-of-the-art algorithm$ (from Photosynth)

2d-PE

2d-PE

2d-PE

2d-PE

ctrl regs

x

Idle

2d-PE

2d-PE

2d-PE

2d-PE

1d-PE

Ou

t

FIFO

FIFO

FIFO

+

+

1d-PE

FIFO

FIFO

FIFO

Kernel Matrix

Inp

ut

Mat

rix

Local FIFO

• 1st MFP engine for re-targetable applications

• Extensively configurable parallelism

• Multi-level data pipelining and interleaving• Systolic ops w/ 2-level vector reduction

F0 Frame Bus

AMETHYST: Architecture Block Diagram

F1 Fn

Aligned Frames Bus

Ix

Covariance

Iy

NMS

Harris IPD

Daisy (Sunflower) Feature Extraction

gradient histogram (e.g., SIFT, GLOH)

sine-weighted gradient quantization

quadrature steerable filters

Isotropic diff. of Gaussians (DoGs)

iterative

non-iterative

T-block Processing

Spatial Pooling

Normalization

Pat

ch B

uff

er

G Block Smoothing

G(x,σ)

Homography Estimation

Image Warping

RANSAC

Feat

ure

Bu

ffe

r

P P P

P P P

P P P

9x9 Jacobi SVD Interpolate

Kernel 0

Kernel 1

Kernel n

Re-sample

······························· ··

· ····· ···· ···· ·····

······

····· ····

$M. Brown et al., PAMI ‘10

Page 10: AMETHYST: Image Registration Engine for Multiframe Processing · Multiframe Image Processing Enhanced Image Image 1 Image N Image 2 – e.g., high-dynamic range (HDR) imaging, de-noising,

Next

9

Technical steps:

– Finish implementing RTL for HE and IWP modules

Stage 1: FPGA validation

DDR 3/4 interface

Dat

a I/

O ARM Cortex Ax CPU

FPU and NEON

MMU

I-Cache D-CacheAMETHYST

MFP accelerator

DMA x Ch.

IRQ, DMA sync. SWDT

TTC

Clock

TTC

DebugUARTProgr. Interface

Stage 2: Silicon validation

ISP Core

AMETHYSTMFP accelerator

Stage 3: ISP integration

– Verify full-design on FPGA-based programmable SoCs (e.g., Zynq)– Develop HW-SW co-design with ARM core towards custom SoC– Perform physical design and post-layout validation of SoC

– Integrate silicon-proven design IP with ISP core


Recommended