+ All Categories
Home > Documents > FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No....

FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No....

Date post: 04-Oct-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
22
FPGA-based Real-Time Super-Resolution System for Ultra High Definition Videos Zhuolun He, Hanxian Huang, Ming Jiang, Yuanchao Bai, and Guojie Luo Peking University FCCM 2018
Transcript
Page 1: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

FPGA-based Real-Time Super-Resolution System

for Ultra High Definition Videos

Zhuolun He, Hanxian Huang, Ming Jiang, Yuanchao Bai, and Guojie Luo

Peking University

FCCM 2018

Page 2: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Ultra High Definition (UHD) Technology

UHD Television UHD Projector

UHD Phone UHD Camera

Content? • Limited Creators

• High network

bandwidth cost

• Huge storage cost

Page 3: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

High-Resolution <---> Low-Resolution

Desired HR Image 𝑿

Blur

Down-Sampling

Observed LR Image 𝒀

Noise 𝑛

Su

pe

r-R

eso

luti

on

Page 4: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Spectrum of Super Resolution Methods

Interpolation

• Fast

• Easy to implement

• Blurry results

Model-based

• Interpretable

• High complexity

• Assumed known

blur kernel/noise

Example-based

• State-of-the-art quality

• High complexity

• Training data needed

Complicated Simple

Page 5: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Model-based Method is also Compute-Intensive

Desired HR Image 𝑿

Blur

Down-Sampling

Observed LR Image 𝒀

Noise 𝑛

Su

pe

r-R

eso

luti

on

Iteration 1 Iteration 2 X

Model-based methods may not be needed

• The computation also has a layered structure

• We can use a neural network to approximate

Page 6: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Total Variation Distribution

Fact: Blocks contain DIFFERENT

amount of information

(NOT equally important)

Insight: Use DIFFERENT upscaling

methods for different blocks

Page 7: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

A Hybrid Algorithm

INPUT: LR Image 𝒀

1. Crop 𝒀 into sub-images {𝒚}

2.1. 𝒙 <- Upscale(𝒚) IF 𝑴 𝒙 > 𝑻

2.2. ELSE 𝒙 <- CheapUpscale(𝒚)

3. Mosaic 𝑿 with {𝒙} OUTPUT: HR Image 𝑿

M: Total Variation (TV)

Upscale: FSRCNN-s

CheapUpscale: Intepolation

Page 8: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Overall System

Low-Res

Image

High-Res

Image

Pipelined Neural Network

Conv(32, 1, 5) Conv(5, 3, 5) Conv(5, 1, 32)

Interpolator

Accelerator

Deconv(32, 9, 1) Conv(1, 5, 32)

Feature Extration Shrinking Mapping Expanding Deconvolution

Dispatcher

Page 9: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Stencil Access of TV Computation

x[offset]

f3

x[right]

f2 ……

x[down]

f1

0

height

width 𝑁

𝑁

(𝛻𝑥)offset = 𝑎𝑏𝑠(𝑥 right − 𝑥 offset ) + 𝑎𝑏𝑠(𝑥 down − 𝑥 offset )

x[offset]

f3

x[right]

f2 ……

x[down]

f1

Page 10: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Micro-architecture for Stencil Computation

s1 buffer1(𝑁-1) s2 s3

f1 f2 f3

buffer2(1)

x[i][j]…x[i-1][j+2] x[i-1][j+1]

x[i][j]

(x[down])

x[i-1][j+1]

(x[right])

x[i-1][j]

(x[offset])

Buffering System for array x

Computation Kernel (𝛻𝑥)𝑖,𝑗

Page 11: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Convolutional Neural Network

Pipelined Neural Network

Conv(32, 1, 5) Conv(5, 3, 5) Conv(5, 1, 32) Deconv(32, 9, 1) Conv(1, 5, 32)

Feature Extraction Shrinking Mapping Expanding Deconvolution

Page 12: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Convolution 𝑁i 𝑁i+1

ni ci fi

Input Compute Output

sliding window(s)

1

Conv(ci, fi, ni)

Page 13: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Deconvolution

Input Compute Output

sliding window(s)

𝑁i 𝑁i+1

s

Deconv(ci, fi, ni) ci fi ni

Page 14: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Pipeline Balancing

Layer 𝒄𝒊 𝒇𝒊 𝒏𝒊 𝑵𝒊 #Mult. Ideal

#DSP Ideal II

Alloc.

#DSP Alloc. II

Extraction 1 5 32 36 819200 201 4076 200 4096

Shrinking 32 1 5 32 163840 40 4096 32 4096

Mapping 5 3 5 32 202500 50 4050 45 4500

Expanding 5 1 32 30 144000 35 4115 32 4500

Deconvolution 32 9 1 30 2332800 573 4072 519 4500

Overall - - - - 3662340 899 4115 828 4500

Available (ZC706) - - - - - 900 - 900 -

Page 15: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Sub-image Size

• Padding

• 𝑁𝑖 ≡ 𝑘 + 𝑓𝑖 − 1#𝐶𝑜𝑛𝑣𝑖

• If sub-image size too small

• Large border-to-block ratio

• Limited by memory bandwidth

• If sub-image size too large

• Large feature maps

• Limited by on-chip BRAM capacity

Page 16: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Sub-image Size vs. Performance vs. #mult.

0,915

0,920

0,925

0,930

0,935

0,940

36,0

36,5

37,0

37,5

38,0

38,5

39,0

10 20 30 40 50

SS

IM

PS

NR

(d

B)

Block Size

PSNR SSIM

8,00E+09

8,50E+09

9,00E+09

9,50E+09

10 20 30 40 50

Block Size

Multiplications

Page 17: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Overall Comparisons

• Compared six configurations

No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM

1 None Interpolation 6.6*10^7 35.51 0.9138

2 None Neural Network 8.2*10^9 38.55 0.9421

3 Blocking Interpolation 6.6*10^7 35.51 0.9138

4 Blocking Neural Network 8.4*10^9 38.55 0.9420

5 Blocking Mixed-Random 2.2*10^9 36.10 0.9211

6 Blocking Mixed-TV 2.2*10^9 37.36 0.9287

+3.04dB

No Performance Loss

+1.26dB

-1.19dB -75%

>100x

Page 18: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Example Outputs

Configuration 1

None/Interpolation

Configuration 2

None/Neural Network

Configuration 3

Blocking/Interpolation

Configuration 4

Blocking/Neural Network

Configuration 5

Blocking/Mixed-Random

Configuration 6

Blocking/Mixed-TV

Page 19: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Summary Flow

• Crop each frame into blocks • Suitable for low-level (pixel-level) tasks

• GOOD: on-chip buffer friendly

• BAD: Computation overheads

• Dispatch blocks according to TV value • Micro-architecture for buffering system

• Fully-pipelined CNN for upscaling • Sliding window for convolution/deconvolultion

• Pipeline balancing

• Performance • Full-HD (1920x1080) -> Ultra-HD (3940x2160): 31.7fps

Page 20: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Thank you!

Page 21: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

TV Threshold vs. Performance vs. #mult.

0,910

0,915

0,920

0,925

0,930

0,935

0,940

0,945

35,0

35,5

36,0

36,5

37,0

37,5

38,0

38,5

30 40 50 60 70

SS

IM

PS

NR

(d

B)

TV Threshold

PSNR SSIM

0,0E+00

5,0E+09

1,0E+10

1,5E+10

2,0E+10

2,5E+10

30 40 50 60 70

TV Threshold

Multiplications

Page 22: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138

Resource Utilizations

Component BRAM DSP FF LUT

Dispatcher 1 2 618 1138

Neural Network 178 844 63149 98439

Interpolator 0 10 1414 3076

Total 327 858 66261 103714

Available 1090 900 437200 218600

Utilization (%) 30 95 15 47


Recommended