+ All Categories
Home > Documents > Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in...

Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in...

Date post: 05-Apr-2018
Category:
Upload: michael-maconachie
View: 223 times
Download: 0 times
Share this document with a friend

of 16

Transcript
  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    1/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    Efficient Video Coding in H.264/AVC

    by using Audio-Visual Information

    Jong-Seok Lee & Touradj Ebrahimi

    EPFL, Switzerland

    MMSP09

    5 October 2009

  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    2/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    2

    [email protected]

    Introduction

    Objective of video coding

    Better quality with smaller number of bits

    How to achieve better video coding efficiency?

    Using statistics of signal

    Using human visual systems characteristics: Focus of attention

    Only small region around fixation point is captured at high spatial resolution.

    Attended region

    less compressionUnattended region

    more compression

  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    3/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    3

    [email protected]

    Introduction

    Which region draws attention?

    Conspicuity-based (Itti, 2004)

    Moving object-based (Cavallaro, 2005)

    Face-based (Boccignone, 2008)

    No consideration of cross-modal (audio-visual) interaction!

  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    4/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    4

    [email protected]

    Audio-Visual Focus of Attention

    Abrupt sound draws visual attention to sound source location. (Spence, 1997)

    Attending to auditory stimuli at given location enhances processing of visual

    stimuli at same location. (Spence, 1996)

    We define sound-emitting region as attended region.

  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    5/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    5

    [email protected]

    Overall Procedure

    Original frame Source localization

    Priority map

    H.264/AVC coding

    with flexible

    macroblock ordering

    (FMO)

    Slice grouping

  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    6/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    6

    [email protected]

    Audio-Visual Source Localization

    To identify spatial location of sound source in scene

    Approach

    Canonical correlation analysis

    To find projection vectors of two data for maximizing correlation

    Sparsity principle

    Spatio-temporal consistency

    vs.

    t

    t+1

    t+2t

    t+1 t+2vs.

  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    7/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    7

    [email protected]

    Audio-Visual Source Localization

    Constraint optimization linear programming

    Advantages

    Applicability to normal video with mono audio channel

    No assumption on sound source

    No training required

    Example

    J.-S. Lee, F. De Simone, T. Ebrahimi

    Video coding based on audio-visual attention,ICME09

  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    8/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    8

    [email protected]

    Video Coding

    Localization result Priority map

    Slice grouping

    QP1=QP0+QP

    QP2=QP1+QP

    QP3=QP2+QP

    H.264/AVC coding

    with FMO (Type 6)

    QP0

    QP1

    QP2

    QP3

    * QP=quantization parameter

  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    9/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    9

    [email protected]

    Experiments

    2 test sequences including multiple moving objects in scene

    Audio-visual source localization

    Visual features: differential grayscale pixel value

    Audio features: differential frame energy

    H.264/AVC coding: JM reference software

    Constant QP mode

    Rate control (adaptive QP) mode

    Proposed method (FMO enabled)

    http://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avi
  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    10/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    10

    [email protected]

    Experiments

    Subjective test

    Is quality degradation acceptable?

    ITU-R BT.500-11

    Double stimulus continuous quality scale (DSCQS)

  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    11/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    11

    [email protected]

    Result

    Coding gain by proposed method over constant QP mode

    QP0=22 QP0=30

    #slice #slice

    http://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avi
  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    12/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    12

    [email protected]

    Result

    Rate-distortion curves

    Proposed method (#slice=2) vs. rate control

    QP=1 QP=4

    0 500 1000 1500 2000 250036

    38

    40

    42

    Bitrate (kbit/s)

    PSNRY

    (d

    B)

    Rate control

    Proposed

    0 500 1000 1500 200036

    38

    40

    42

    Bitrate (kbit/s)

    PSNRY(d

    B)

    Rate control

    Proposed

    http://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avihttp://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avihttp://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avihttp://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avi
  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    13/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    13

    [email protected]

    Result

    Subjective quality comparison

    JM QP=1 QP=2 QP=4-10

    0

    10

    20

    30

    40

    DMOS

    Differentialm

    eanopinionscore

    17% gain17% gain

    29% gain29% gain

    (constant QP=26)

    Proposed method (QP0=26, #slice=2)

    http://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avihttp://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avihttp://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avi
  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    14/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    14

    [email protected]

    Conclusion & Discussion

    Audio-visual focus of attention (AV FoA) influences perceived quality.

    And, it can be used for efficient video coding by H.264/AVC.

    Discarding information outside focus of attention does not degrade

    perceived quality significantly.

    AV FoA does not explain everything. It should be combined with

    other attention mechanisms.

  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    15/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    15

    [email protected]

    Questions/comments are welcome!

    Contact

    [email protected]

    http://mmspg.epfl.ch

  • 7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC

    16/16

    Multimedia Signal Processing Group

    Swiss Federal Institute of Technology

    16

    [email protected]

    References

    L. Itti, Automatic foveation for video compression using a neurobiological model of visual attention, IEEE Trans. Image Process., 2004

    A. Cavallaro, O. Steiger, T. Ebrahimi, Semantic video analysis for adaptive content delivery and automatic description, IEEE Trans.

    Circuits Syst. Video Technol., 2005

    G. Boccignone, A. Marcelli, P. Napoletano, G. D. Fiore, G. Iacovoni, S. Morsa, Bayesian integration of face and low-level cues for foveatedvideo coding, IEEE Trans. Circuits Syst. Video Technol., 2008

    B. Stein, M. Meredith, The merging of Senses, MIT Press, 1993

    R. Sharma, V. I. Pavlovic, T. S. Huang, Toward multimodal human-computer interface, Proc. IEEE, 1998

    H. McGurk, J. MacDonald, Hearing lips and seeing voices, Nature, 1976

    J.-S. Lee, C. H. Park, Robust audio-visual speech recognition based on late integration, IEEE Trans. Multimedia, 2008

    M. Sargin, Y. Yemez, E. Erzin, A. Tekalp, Audiovisual synchronization and fusion using canonical correlation analysis, IEEE Trans.

    Multimedia, 2007 P. Perez, J. Vermaak, A. Blake, Data fusion for visual tracking with particles, Proc. IEEE, 2004

    B. Rivet, L. Girin, C. Jutten, Mixing audiovisual speech processing and blind source separation for the extraction of speech signal from

    convolutive mixtures, IEEE Trans. Multimedia, 2007

    C. Spence, J. Driver, Audiovisual links in exogenous covert spatial orienting, Perception & Psychophysics, 1997

    C. Spence, J. Driver, Audiovisual links in endogenous covert spatial attention, J. Experimental Psychology: Human Perception &

    Performance, 1996

    E. Kidron, Y. Schechner, M. Eland, Cross-modal localization via sparsity, IEEE Trans. Signal Process., 2007


Recommended