ONR Presentation August 4, 2005

transcript

UCLA ONR 1

ONR PresentationAugust 4, 2005

John Villasenorvilla@icsl.ucla.edu

David Choi, Hyungjin Kim, Dong-U Leedschoi@icsl.ucla.edu, hjkimnov@ee.ucla.edu,

dongu@icsl.ucla.edu

UCLA ONR 2

Overview

• Focus of work to date– Robust imagery

– Automatic adaptation to network parameters

– Region of interest (ROI) coding, including integration of target tracking information

– Improved imagery using system-wide optimizations including channel coding, image coding, receiver signal detection

– Several generations of embedded hardware platform, integration and deployment on helicopter platforms.

UCLA ONR 3

Recent Efforts

• Multi-layered video streams with a base and enhancement later• Region of interest coding using a base/enhancement layer system• Reduced complexity image representation for environments with

severe power constraints• Inherently secure encoding• System-level optimizations (current focus: timing recovery) to

improve end to end imaging capabilities

UCLA ONR 4

Enhancement Layer Video

• Concept: In networks with communication links of varying bandwidths and capacities, – Send base layer video to all clients using legacy standards-based video

codec implementation

– Send enhancement layer video selectively to clients that can support the additional bandwidths.

Video EncodeCamera

High Bandwidth Client

Low bandwidth

Client

Low bandwidth

Client

Low bandwidth

ClientLow

bandwidthClient

basebase

baseba

enhancement

*in collaboration with Innovative Concepts

UCLA ONR 5

Enhancement Layer Encoding/Decoding

• Encoder: Leverage standards-based encoding platform

Video Encoder

Original FrameCompressedBase Layer

Difference Frame

Video Encoder

CompressedEnhancement Layer

Decoder:CompressedBase Layer

Compressed Enhancement Layer

Video Decoder

Recovered Frame

Base LayerFrame

Enhancement Layer Frame

UCLA ONR 6

Enhancement Coding Example

• Enhancement layer coding can provide baseline imagery to low bandwidth clients and higher quality imagery to high bandwidth clients

Base Layer Frame Enhancement LayerFrame

Base with Enhancement Combined Frame

UCLA ONR 7

Improvement in Video quality through Enhancement Layer Video

PSNR vs. Base + Enhancement bitrate

Points along a curve show improvement through enhancement layer

Different curves represent different starting base layer bitrates

Simulation performed with 320x240 video

UCLA ONR 8

PSNR Difference Plot

Difference Plots

0 500 1000 1500 2000 2500-2

Enhancement layer bitrate

PSNR Difference plots

Base layer bitrate: 200k

Base layer bitrate: 600kBase layer bitrate: 800k

Plots show PSNR improvement of using Enhancement layer video over that of video re-encoded at same rate, with respect to the enhancement layer bitrate

UCLA ONR 9

Enhancement layer region of interest

• Application of enhancement layer coding concept to region of interest coding.

• High quality information region of interest may be sent when bandwidth becomes available to certain network connections.

UCLA ONR 10

Image representation using reduced energy processing

• Use edge information to convey scene and location context

• Provides significant scene information while dramatically reducing energy consumption and memory utilization

• Simple, efficient compression algorithms can be applied

original image edge detected image

UCLA ONR 11

Generalized Gaussian Source

Generalized Gaussian Source pdf:

where C1, C2, and are given by

is the shape parameter

= 2 Gaussian

= 1 Laplacian

vX xCCxf 21 exp)(

),(1 v

vvC ),(2 2/1

)/3(1),(

UCLA ONR 12

Generalized Gaussian Source

• Choosing the “shape parameter” ν allows representation of a wide range of pdfs

-5 -4 -3 -2 -1 0 1 2 3 4 50

)PDF of Generalized Gaussian

v=2v=0.5

UCLA ONR 13

Golomb Rice (GR) and exponential Golomb (EG) Codes

• GR and EG codes are classes of Huffman codes that have highly regular structure

• There is no need for an explicit codebook; the codebook is implicit in the choice of the code

• EG codes are particularly well suited to coding of image data that has been processed and then quantized or thresholded.

UCLA ONR 14

Structure of Golomb Rice Codes

• Code Trees

k=1 k=2 k=3

0 1 0 1 1 1 2 01 03 01 14 001 05 001 16 0001 07 0001 18 00001 09 00001 1

0 1 00 1 1 01 2 1 103 1 114 01 005 01 016 01 107 01 118 001 009 001 01

0 1 000 1 1 001 2 1 0103 1 0114 1 1005 1 1016 1 1107 1 1118 01 0009 01 001

Index Codeword Index Codeword Index Codeword

Prefix:# of zeros describe “depth” in tree Suffix: describes which branch for a particular depth

UCLA ONR 15

Structure of Exponential Golomb Codes

Code Trees

k = 0k=1

0 1 1 01 02 01 13 001 004 001 015 001 106 001 01 7 0001 0008 0001 0019 0001 010

Index Codeword0 01 01 01 12 001 003 001 014 001 105 001 01 6 0001 0007 0001 0018 0001 0109 0001 011

Index Codeword

0 001 001 001 012 001 103 001 01 4 0001 0005 0001 0016 0001 0107 0001 0118 0001 1009 0001 101

Index Codeword

UCLA ONR 16

Efficiency of Golomb-Rice and exp-Golomb codes for positive discrete sources derived from a GG, showing effect of code choice and code parameter k. Figure a shows case for v=.3. Curves for v=.7 are given in Figure b. The ratio δ/σ conveys the effect if samples from a generalized Gaussian source with standard deviation σ is quantized using step size δ.

Figure a, v=0.7 Figure b, v=.3

Coding Efficiency for Generalized Gaussian Source

UCLA ONR 17

Histogram of Data from Edge Detected Image

Coding efficiency = 92%

UCLA ONR 18

Complexity of Edge + EG coding

• Complexity of edge detection combined with EG coding is significantly less than video coding using DCT and motion compensation

• Edge Detection Algorithm– 4 shifts, 11 adds, 2 abs() per pixel

• Exp Golomb Code– Codeword can be easily generated using state machine

– Number of operations depends on the run-length statistics, i.e. for each additional bit used in the prefix, require an additional 2 shifts, 2 additions.

– In the typical images we have observed, the coding takes under 0.2 add and 0.2 shifts per pixel

• Total– Dominated by edge processing; e.g. 4 shifts, 11 adds, 2 abs() per pixel

UCLA ONR 19

Complexity of Video coding

• The two most computationally expensive steps of video coding are DCT and motional compensation calculations

• DCT– For each 8x8 DCT block, well optimized implementation uses 536

additions, 192 multiplies per block

– For 640x480, there are 80x60 = 4,800 blocks

– Total of 2,572,800 additions, 2,572,800 multiplies per frame

– Average of 8.4 additions, 3 multiplies per pixel, plus associated memory fetches/stores

UCLA ONR 20

Complexity Analysis of Video Coding

• Motion Search– For each MxM block, and offset of L, need

• (L+1)^2 total number of offsets• 2xMxM addition operations per offset• MxM memory fetches per offset

– E.g. 16x16 block with offset of 16• 1089 x 2 x 16 x 16 = 557,368 additions• 1089 x 16 x 16 = 278,784 memory fetches

– For 640x480, there are 1,200 blocks– Total of 668,841,600 additions per frame– Average of 2,177 additions, 1089 memory fetches per pixel– Reducing the search range to an offset of 8 would result in 578

additions, 289 memory fetches per pixel.

• Entropy Coder– Complexity of H.263’s entropy coder depends on specific image, etc..– In the case of the use of Huffman coding with DCT coefficients, requires

additional memory overhead for lookup tables.– H.264 Main Profile (CABAC) would involve significant extra costs due to

arithmetic coding

UCLA ONR 21

Comparison

• The energy cost for multiplies relative to adds varies with processor, implementation, etc., but an order of magnitude is a reasonable approximation

• Edge+EG required 11 adds/pixel (disregard shifts and abs() as these are nearly free)

• Video encoding required between approximately 500 and 2000 adds/pixel, dominated by motion compensation

• Difference of approximately 2-3 orders of magnitude in energy cost

• Can reduce video coding burden using sub-optimal fast motion searches, etc., but reductions will still leave edge+EG at an energy advantage of several orders of magnitude

UCLA ONR 22

Hybrid system representing object edges, texture in ROI

• Energy-reducing benefits of edge-based representation

• Image quality benefits of traditional video coding in a region of interest

• Example: Total frame size 640 by 480

• ROI: 200 by 160

• Reduces overall energy consumption by approximately an order of magnitude

UCLA ONR 23

Secure arithmetic coding

• Used in many coding algorithms JPEG2000 (still image) and H.264 (video)

• Associates a sequence of symbols with a position in the range [0,1)• Enables high coding efficiency• Not secure as traditionally implemented• Recursive partitioning is used prior to encoding of each new symbol• Every position on [0,1) is associated with a unique symbol string

(binary example shown below)

AA AB BA BB

AAA ABBAAB ABA BABBAA BBA

One symbol

Two symbols

Three symbols

UCLA ONR 24

Interval splitting with one symbol

• Key k0 identifies where the interval is to be split• The portion of the A interval to the right of the key is moved to the right of the B

interval• B is unchanged. A representation in the interval B will have the same codeword

length as in traditional AC• A has two subintervals: at most one more bit is needed relative to traditional AC,

though in some cases there is no increase in length

The result of interval splitting with 1 symbol

UCLA ONR 25

Interval ordering diversity

key = [0.45 0.23]

key = [0.85 0.40]

key = [0.25 0.78]

UCLA ONR 26

End-to-end quality optimization: High level block diagram

Transmitter

Receiver

Image input Image coder Channel Coder Transmission

Timing Recovery Channel Decoder Image decoder

UCLA ONR 27

Timing Recovery

• Symbol timing errors occur at receivers due to clock differences, Doppler effects etc.

• Traditional method: use simple PLL-based circuits• Our approach: data-aided iterative timing recovery

– use information from LDPC decoder at each iteration to assist synchronization

UCLA ONR 28

Block Diagram

Timing Recovery Loop

UCLA ONR 29

Bit Error Rate

0.5 1 1.5 2 2.5

Iterations = 15

without LDPC feedback

with LDPC feedback

UCLA ONR 30

Timing Recovery: Demonstration

• AWGN channel at EbNo=2dB and random phase offsets

Without LDPC FeedbackBER = 10-0.7

With LDPC FeedbackBER = 10-1.5

UCLA ONR 31

Conclusions

• Traditional assumption that “efficient” representation of imagery means maximizing compression needs to be re-examined. Maximizing energy efficiency can be more important than maximizing compression efficiency

• Need methods to convey scene content that reflect power, bandwidth, memory and transmission reliability characteristics, limitations, and statistics of a given platform and environment

• Alternative scene/object representation methods, specifically aimed at low energy with no attempt at esthetic quality, hold promise. Contrast with (mostly failed) previous attempts at “object-based” coding

UCLA ONR 32

Conclusions (continued)

• Low power imaging sensors and networks of such sensors likely to be critical

• Local processing critical, realistic collaboration also has potential

• Low power imaging event detection strategies needed; can’t simply be doing high energy image processing continuously while waiting for events which may occur rarely

• Additional challenge in appropriately determining which information to convey, when, to whom, and how to convey it

• Need proper balance of autonomous and human management of imaging networks, proper balance of video vs. still imagery, resolution vs. rate, etc.

• Approx $20K forecast to remain as of Oct 05

ONR Presentation August 4, 2005

Documents