Date post: | 05-Apr-2018 |
Category: |
Documents |
Upload: | michael-maconachie |
View: | 223 times |
Download: | 0 times |
of 16
7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
1/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
Efficient Video Coding in H.264/AVC
by using Audio-Visual Information
Jong-Seok Lee & Touradj Ebrahimi
EPFL, Switzerland
MMSP09
5 October 2009
7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
2/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
2
Introduction
Objective of video coding
Better quality with smaller number of bits
How to achieve better video coding efficiency?
Using statistics of signal
Using human visual systems characteristics: Focus of attention
Only small region around fixation point is captured at high spatial resolution.
Attended region
less compressionUnattended region
more compression
7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
3/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
3
Introduction
Which region draws attention?
Conspicuity-based (Itti, 2004)
Moving object-based (Cavallaro, 2005)
Face-based (Boccignone, 2008)
No consideration of cross-modal (audio-visual) interaction!
7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
4/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
4
Audio-Visual Focus of Attention
Abrupt sound draws visual attention to sound source location. (Spence, 1997)
Attending to auditory stimuli at given location enhances processing of visual
stimuli at same location. (Spence, 1996)
We define sound-emitting region as attended region.
7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
5/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
5
Overall Procedure
Original frame Source localization
Priority map
H.264/AVC coding
with flexible
macroblock ordering
(FMO)
Slice grouping
7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
6/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
6
Audio-Visual Source Localization
To identify spatial location of sound source in scene
Approach
Canonical correlation analysis
To find projection vectors of two data for maximizing correlation
Sparsity principle
Spatio-temporal consistency
vs.
t
t+1
t+2t
t+1 t+2vs.
7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
7/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
7
Audio-Visual Source Localization
Constraint optimization linear programming
Advantages
Applicability to normal video with mono audio channel
No assumption on sound source
No training required
Example
J.-S. Lee, F. De Simone, T. Ebrahimi
Video coding based on audio-visual attention,ICME09
7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
8/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
8
Video Coding
Localization result Priority map
Slice grouping
QP1=QP0+QP
QP2=QP1+QP
QP3=QP2+QP
H.264/AVC coding
with FMO (Type 6)
QP0
QP1
QP2
QP3
* QP=quantization parameter
7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
9/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
9
Experiments
2 test sequences including multiple moving objects in scene
Audio-visual source localization
Visual features: differential grayscale pixel value
Audio features: differential frame energy
H.264/AVC coding: JM reference software
Constant QP mode
Rate control (adaptive QP) mode
Proposed method (FMO enabled)
http://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avi7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
10/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
10
Experiments
Subjective test
Is quality degradation acceptable?
ITU-R BT.500-11
Double stimulus continuous quality scale (DSCQS)
7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
11/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
11
Result
Coding gain by proposed method over constant QP mode
QP0=22 QP0=30
#slice #slice
http://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avi7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
12/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
12
Result
Rate-distortion curves
Proposed method (#slice=2) vs. rate control
QP=1 QP=4
0 500 1000 1500 2000 250036
38
40
42
Bitrate (kbit/s)
PSNRY
(d
B)
Rate control
Proposed
0 500 1000 1500 200036
38
40
42
Bitrate (kbit/s)
PSNRY(d
B)
Rate control
Proposed
http://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avihttp://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avihttp://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avihttp://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avi7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
13/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
13
Result
Subjective quality comparison
JM QP=1 QP=2 QP=4-10
0
10
20
30
40
DMOS
Differentialm
eanopinionscore
17% gain17% gain
29% gain29% gain
(constant QP=26)
Proposed method (QP0=26, #slice=2)
http://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avihttp://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avihttp://localhost/var/www/apps/conversion/current/tmp/scratch23094/data4_qp26_a_noblur.avi7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
14/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
14
Conclusion & Discussion
Audio-visual focus of attention (AV FoA) influences perceived quality.
And, it can be used for efficient video coding by H.264/AVC.
Discarding information outside focus of attention does not degrade
perceived quality significantly.
AV FoA does not explain everything. It should be combined with
other attention mechanisms.
7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
15/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
15
Questions/comments are welcome!
Contact
http://mmspg.epfl.ch
7/31/2019 Multimedia Signal Processing Group Swiss Federal Institute of Technology Efficient Video Coding in H.264:AVC
16/16
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
16
References
L. Itti, Automatic foveation for video compression using a neurobiological model of visual attention, IEEE Trans. Image Process., 2004
A. Cavallaro, O. Steiger, T. Ebrahimi, Semantic video analysis for adaptive content delivery and automatic description, IEEE Trans.
Circuits Syst. Video Technol., 2005
G. Boccignone, A. Marcelli, P. Napoletano, G. D. Fiore, G. Iacovoni, S. Morsa, Bayesian integration of face and low-level cues for foveatedvideo coding, IEEE Trans. Circuits Syst. Video Technol., 2008
B. Stein, M. Meredith, The merging of Senses, MIT Press, 1993
R. Sharma, V. I. Pavlovic, T. S. Huang, Toward multimodal human-computer interface, Proc. IEEE, 1998
H. McGurk, J. MacDonald, Hearing lips and seeing voices, Nature, 1976
J.-S. Lee, C. H. Park, Robust audio-visual speech recognition based on late integration, IEEE Trans. Multimedia, 2008
M. Sargin, Y. Yemez, E. Erzin, A. Tekalp, Audiovisual synchronization and fusion using canonical correlation analysis, IEEE Trans.
Multimedia, 2007 P. Perez, J. Vermaak, A. Blake, Data fusion for visual tracking with particles, Proc. IEEE, 2004
B. Rivet, L. Girin, C. Jutten, Mixing audiovisual speech processing and blind source separation for the extraction of speech signal from
convolutive mixtures, IEEE Trans. Multimedia, 2007
C. Spence, J. Driver, Audiovisual links in exogenous covert spatial orienting, Perception & Psychophysics, 1997
C. Spence, J. Driver, Audiovisual links in endogenous covert spatial attention, J. Experimental Psychology: Human Perception &
Performance, 1996
E. Kidron, Y. Schechner, M. Eland, Cross-modal localization via sparsity, IEEE Trans. Signal Process., 2007