Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 219 times |
Download: | 1 times |
1
Computational Complexity Analysis of MPEG-4 Decoder
Student : Chung-Yen Tsai
Adivisor : Prof. David W. Lin
Date : 2005/06/08
2
Outline
Corrections of The Computational Complexity in MPEG-4 Encoder(MoMuSys)
Analysis of Computational Complexity in MPEG-4 Decoder(MoMuSys)
Summary Future Work Reference
3
Outline
Corrections of The Computational Complexity in MPEG-4 Encoder(MoMuSys)
Analysis of Computational Complexity in MPEG-4 Decoder(MoMuSys)
Summary Future Work Reference
4
Profile of MoMuSys Encoder in Previous Group Meeting
Execution Time(cycles) Contribution
Motion Estimaton 3,902,482 41.05%
DCT 297,113 3.13%
IDCT 249,057 2.63%
Quantization 42,494 0.45%
Inverse Quantization 16,441 0.17%
VLC 16,130 0.15%
The contribution of ME is not accurate
The information will be gathered in another unit.
The Sum
ofContribution
is Not100%
5
The Cause of The Fault
Execution Time(samples)
Percentage Comutational Complexity
SAD_Macroblock 18642 42.66% 366,785*(257-comp67-data_=256-abs560-data_+)
Subsample_alpha_with_modes 14092 36.26% 643,500*(5-mem_shift3-mem_*4-mem_+2-data_=1-data_+1614-comp)
Redundant Part forFramed-Based
Coding Scheme.
6
Correction of The Profile
Execution Time(samples) Contribution
Motion Estimaton 32739 74.92%
DCT 186 0.43%
IDCT 271 0.62%
Quantization 232 0.53%
Inverse Quantization 78 0.18%
VLC 14 0.03%
File I/O , Mem, others 10051 ~23%
※Samples is the unit used in VTune Performance Analyzer (msec)
7
Correction of The Computational Complexity
Computational Complexity
Motion Estimaton
Frame : 366,785*(257-comp,67-data_=,256-abs,560-data_+) Alpha : 643,500*(5-mem_shift,3-mem_*,4-mem_+,2-data_=,1-d
ata_+,1614-comp)
DCT594*(408-data_*,520-data_+,520-data_=,128-if,64-floor
64-mem_+,64-mem_=,64-mem_*)
IDCT594*(256-data_*,544-data_+,576data_=,64-floor
64-mem_+,64mem_*)
Quan 130-comp,128-dara_=,128-shift,128-data_*
DeQuan 256-data_=,320-comp, 128-shift, 192-data_+, 64-data_*
8
Outline
Corrections of The Computational Complexity in MPEG-4 Encoder(MoMuSys)
Analysis of Computational Complexity in MPEG-4 Decoder(MoMuSys)
Summary Future Work Reference
11
Distribution of Execution Time (Cont’d)
DecodeVopCombinedMotionShapeTextureInterErrRes
54.9%
DecodeVideoPacketCombinedInterErrRes
34.3%
VopMotionCompensate33.6%
VopTextureUpdate30.8%
DecodeCombinedPacketInfoInterErrRes
98.9%
fprintf91.1%
GetPred_Advanced
3.5%
AllocImage2.4%
PrintOutMBData93.9%
12
Distribution of Execution Time (Cont’d)
DecodeVopCombinedMotionShapeTextureInterErrRes
61.1%
DecodeVideoPacketCombinedInterErrRes
68.5%
VopMotionCompensate15.6%
VopTextureUpdate9.0%
DecodeCombinedPacketInfoInterErrRes
98.4%
InterpolateImage20.1%
GetPred_Advanced
29.9%
AllocImage28.6%
fprintf15.5%
13
Distribution of Execution Time (Cont’d) – For Texture Decoding
DecodeCombinedPacketInfoInterErrRes98.9%
GetMBblockdataNoDataPartErrRes
88.0%
GetMBheaderNoDataPartInterErrRes
5.4%
GetMBvectorsNoDataPartErrRes
3.6%
PrintOutMBData75.1%
VlcGetBlock16.1%
BlockIDCT3.6%
BlockDequantH263
0.4%
14
Distribution of Execution Time (Cont’d) – For Texture Decoding
DecodeCombinedPacketInfoInterErrRes98.4%
GetMBblockdataNoDataPartErrRes
76.1%
GetMBheaderNoDataPartInterErrRes
8.3%
GetMBvectorsNoDataPartErrRes
7.1%
PrintOutMBData
0%
VlcGetBlock69.3%
BlockIDCT15.3%
BlockDequantH263
5.9%
15
File_I/O and Non_File_I/O
Original Modification of Tsai
_calloc 3.81% 15.3%
fwrite 1.68% 7.17%
_output 26.86% 7.12%
VopTextureUpdate 1.85% 6.7%
Memset 1.4% 5.43%
BlockIDCT 0.97% 3.53%
16
Modification of MoMuSys Decoder
After removing PrintOutMBData, contribution of IDCT and VLC became larger.
There were also some files written to trace some information in “Debug” mode, and it is removable.
VopTextureUpdate is used to add the decoded texture on the M.C. image. Thus, fprintf is not removable here.
17
Profile in MoMuSys Decoder
Execution Time(samples) Contribution(%)
Motion Compensation 2072 21.69
DCT 0 0
IDCT 358 3.75
Quantization 82 0.86
Inverse Quantization 42 0.44
VLC Decoding 448 4.69
File I/O 2003 20.97
Mem 2616 27.39
others 1171 12.26
ErrRes 759 7.95
19
The Theoretically Computatoinal Complexity We use the 8X8 block based DCT and IDCT.
DCT:64(7_mult,4_div,2_cos),2_mult,1_div
IDCT 64(9_mult,4_div,2_cos),1_mult,1_div
20
Computational Complexity of Each Frame
Computational Complexity
Practical Theoretical
Motion Compesation
DCT594*( 64*(7_mult,4_div,2_cos),
2_mult,1_div )
IDCT594*(256-data_*,544-
data_+,576data_=,64-floor, 64-mem_+,64mem_*)
594*( 64*(9_mult,4_div,2_cos),
1_mult,1_div )
Quan594*(130-comp,128-dara_=,1
28-shift,128-data_*)
DeQuan594*(256-data_=,320-comp,
128-shift, 192-data_+, 64-data_*)
21
Summary
In MoMuSys Encoder, Motion Estimation is the main contribution. The SAD calculation occupies most execution time.
However, the execution time of Motion Compensation and Texture Decoding is less than that of File I/O and Memory operations.
Since the VTune Performance Analyzer is run under Debug Mode, there will be some redundant executions, which increae Files I/O.
22
Future Work
Complete the analysis of Object-Based MoMuSys Codec.
Run some simple simulation on PAC simulator.