Digital system (SoC) design for low-complexity multimedia processing
Hyun Kim
SoC Design for Multimedia Systems◈ Goal : Reducing computational complexity & power consumption of state-of-
the-art technologies based on multimedia and big-data◈ Method : Speed-up and Low-power through HW acceleration & optimization
Two ways for acceleration : GPU porting / Digital circuit design
SoC design for multimedia systems
GPU (Tesla K40)Caffe + CuDNN
FPGA (Virtex 7 485T)Microsoft Catapult project 2015
Power 235W ~25W
Digital circuit design is the best option for acceleration and optimization!
GPU porting• Highly optimized for parallel data processing and matrix operations• Various development frameworks for DNN• Size & Cost & Power consumption problem!
Digital Circuit (SoC) Design• Implementation is more difficult than GPU• but, Fast & Small & Cheap & Low power• Comparison between GPU porting & FPGA design
FPGA design consumes about one tenth of power compared to GPU porting• High flexibility to apply various optimization techniques• More than x100 speed-up can be achieved by various schemes
◈ Multimedia Processing System on Chip (SoC) Platform All researches were performed on this platform and related to low-power HW acceleration & optimization
◈Contribution 1: HW-based low-power video recording system with multiple video coding modules◈Contribution 2: Optimal combination of power scaling algorithms in HW-based video coding◈Contribution 3: Optimized selection of SRAM size for power reduction in HW-based video coding◈Contribution 4: Optimized HW implementation of DWT+SPIHT and its video quality optimization◈Contribution 5: Low-power HW-based video surveillance with early BG subtraction and adaptive FMC◈Contribution 6: HW design of real-time naturalness image enhancement based on Retinex
Platform
Pre-processing
Low-Power VideoRecording System
Embeddedcompression
Video CodingStandard
Battery-operated& Video Codec
SoC design for multimedia systems
◈ Goal : Implement and optimize a HW-based low-power video recording system considering the trade-off between the performance and power consumption
Low-Power Video Recording System
Published in IEEE Transactions on Multimedia
Camerainput Temp
ModeDRAM
LWCEncoder H.264
Perm Mode
LWCDecoder
NANDFLASH
HW implementation Optimal operation scheme
Trade-off between performance & power consumption
Front-end verification on FPGA board
◈ Contribution1) HW Implementation for low-power VRS which achieves power saving up to 72.5%2) Optimized power solution based on trade-off between power & performance
Only for meaningful video datathat will be stored for a long timeAlways operating
SoC design for multimedia systems
Platform
Pre-processing
◈ Multimedia Processing System on Chip Platform All researches were performed on this platform and related to low-power HW acceleration & optimization
◈Contribution 1: HW-based low-power video recording system with multiple video coding modules◈Contribution 2: Optimal combination of power scaling algorithms in HW-based video coding◈Contribution 3: Optimized selection of SRAM size for power reduction in HW-based video coding◈Contribution 4: Optimized HW implementation of DWT+SPIHT and its video quality optimization◈Contribution 5: Low-power HW-based video surveillance with early BG subtraction and adaptive FMC◈Contribution 6: HW design of real-time naturalness image enhancement based on Retinex
Video CodingStandard
SoC design for multimedia systems
◈ Goal : Find the optimal combination of low-power algorithms achieving best performancein the HW-based video encoder
Power-Scaling for Video Compression
Published in IEEE Transactions on VLSI
Flowchart
Formulation for power estimation
◈ Contribution1) Optimizing video coding standards based on trade-off between power & performance2) HW Implementation of low-power video codec
More than 2dB enhancement at 40% power saving
Experimental results
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40
Optimal Combination Candidate CombinationSelected Combination
Power Saving(%)
Trade-off between power consumption & performance
Power-level table for real-time application
Machine LearningMethodology
SoC design for multimedia systems
Optimized SRAM Size for Video Codec◈ Goal : Decide optimal SRAM sizes for achieving best performance in HW-based video codec
Published in IEEE Journal on Emerging and Selected Topics in Circuits and Systems
◈ Contribution1) Optimized SRAM size solution of HW-based video codec for minimizing the performance degradation considering trade-off between HW resource & performance
Various SRAMs in video codec and their different sensitivities to error
Formulation Optimal size
Optimization Experimental results
Flowchart
SoC design for multimedia systems
Platform
Pre-processing
◈ Multimedia Processing System on Chip Platform All researches were performed on this platform and related to low-power HW acceleration & optimization
◈Contribution 1: HW-based low-power video recording system with multiple video coding modules◈Contribution 2: Optimal combination of power scaling algorithms in HW-based video coding◈Contribution 3: Optimized selection of SRAM size for power reduction in HW-based video coding◈Contribution 4: Optimized HW implementation of DWT+SPIHT and its video quality optimization◈Contribution 5: Low-power HW-based video surveillance with early BG subtraction and adaptive FMC◈Contribution 6: HW design of real-time naturalness image enhancement based on Retinex
Embeddedcompression
SoC design for multimedia systems
Low-cost Hardware Design of 1D SPIHT◈ Goal : Implement a low-cost HW-based embedded compression module (1-D DWT+SPIHT)
Partitioned SPIHT Bit-allocation
Published in IEEE Transactions on Consumer Electronics
Block diagram
Implementation schemes
◈ Contribution1) A low-cost HW design of a 1D DWT+SPIHT with partitioned SPIHT and bit allocation2) Optimization considering the trade-off between HW resource and performance-Reduce HW gate count and memory by 59% and 75% with only a slight PSNR degradation
Experimental results
SoC design for multimedia systems
Optimized DWT and SPIHT◈ Goal : Optimize the R-D performance of HW-based DWT+SPIHT modules
Accepted in IEEE Transactions on Multimedia
Structure of1-D DWT and SPIHT
Formulation and Optimization
Experimental Results
Correlation analysis between DWT coeff. & loss
◈ Contribution1) Optimizing the performance of DWT and SPIHT2) Applying the proposed scheme to HW DWT+SPIHT
Examples before/after applying the scheme
Optimal solution of compression ratio for each coding block
Machine LearningMethodology
SoC design for multimedia systems
Platform
Pre-processing
◈ Multimedia Processing System on Chip Platform All researches were performed on this platform and related to low-power HW acceleration & optimization
◈Contribution 1: HW-based low-power video recording system with multiple video coding modules◈Contribution 2: Optimal combination of power scaling algorithms in HW-based video coding◈Contribution 3: Optimized selection of SRAM size for power reduction in HW-based video coding◈Contribution 4: Optimized HW implementation of DWT+SPIHT and its video quality optimization◈Contribution 5: Low-power HW-based video surveillance with early BG subtraction and adaptive FMC◈Contribution 6: HW design of real-time naturalness image enhancement based on Retinex
SoC design for multimedia systems
Low-Power Video Surveillance System◈ Goal : Implement low-power HW-based video surveillance with the highest power saving
Published in IEEE Transactions on Consumer Electronics
(MKC, MKP) Classification MBCURR coding option IME option for MBNCO-LO
(FG, FG) Strong FG Regular (FME & IP) Regular SR & BM
(FG,BG) Object Boundary Only IP Regular SR & BM
(BG,FG) Uncovered BG Only IP Regular SR & BM
(BG,BG) Strong BG SKIP mode Small SR & BM
◈ Contribution1) Performing BG sub. w/o additional resources considering HW structure of video codec2) Achieving best power savings with negligible PSNR degradation in video surveillance
Inserting BG sub. into HW-based pipeline structure
Experimental results
Flowchart of HW-based BG sub. using only the information generated during video compression
Operation of coding standard in low-power video surveillance
BG sub. results
SoC design for multimedia systems
Real-time HW Design for Retinex◈ Goal : Implement Low-power/Real-time HW IP of Retinex algorithm
Exhibited at 2018 CESSubmitted in IEEE Transactions on Consumer Electronics
- Pre-processing for improving the brightness of the image in the dark part of the input image- Emphasize reflection components by separating illumination component and reflection component- Require a large amount of computations
Retinex algorithm
-Implemented on ZC706 FPGA board-Achieving real-time operation (FHD 60fps)-20% LUT reduction and 53% FF reduction compared to previous design-No frame memory Low-power & Fast
FPGA implementation results
Block diagram
Examples before/after applying Retinex
◈ Contribution1) Real-time HW implementation of Retinex and resource optimization for HW design
CES2018 demo
Dark part : image compression efficiency and recognition accuracy are very low it is very important to improve the brightness of the image
SoC design for multimedia systems