Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | octavia-riley |
View: | 218 times |
Download: | 3 times |
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 1Lecture 12
ECEC 453Image Processing Architecture
Lecture 12, 3/3/2004
MPEG and FriendsOleh Tretiak
Drexel University
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 2Lecture 12
Lecture Outline Review of Teleconferencing Advanced Video Coding Computational cost of video
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 3Lecture 12
Picture of LayersGOP-1GOP-NGOP-2IBBPBB ... PSlice-1Slice-NSlice-2Sequence LayerGOP layerPicture layermb-1mb-2mb-n012333YCrCbSlice layerMacroblock layerBlock layer
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 4Lecture 12
Video Compression: Picture Types
Group of Pictures: Three types I — intraframe coding only P — predictive coding B — bi-directional coding
IPB12345678
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 5Lecture 12
Teleconferencing Standards Digital video areas
Broadcast television Recorded programs Two-way communications
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 6Lecture 12
Review: Action in the Video Arena The sponsors: ITU/T SG 15 and ISO/IEC MPEG The players: H.x standards and MPEG-x standards Standards, ITU-T (Telecom Guys)
H.261 (1990) H.263 (draft March 1995) New standards in the works
Standards, ISO/IEC (Entertainment Video) MPEG family
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 7Lecture 12
Review: Video Telephone System
H.320
H.200/AV.250 -Series
H.221H.261
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 8Lecture 12
Review: H.261 Features Common Interchange Format
Interoperability between 25 fps and 30 fps countries 252 pix/line, 288 line, 30 fps noninterlace Terminal equipment converts frame and line numbers Y Cb Cr components, color sub-sampled by a factor of 2 in both
directions Coding
DCT, 8x8, 4 Y and 2 chrominance per masterblock I and P frames only, P blocks can be skipped Motion compensation optional, only integer compensation (Optional) forward error correction coding
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 9Lecture 12
H.324/H.263 H.324: Like H.320
H.261/H.263
G.723.1
H.245signaling
H.253, H.234encryption
H.223
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 10Lecture 12
Parts of H.324 H.263: Video coding for low rate communications G.723.1: Audio and speech for multimedia, 5.3 and 6.3 kbps H.223: Multiplexing protocol H.245: Control protocol. Can be used to specify standard, LAN,
and ATM networks
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 11Lecture 12
Features of H.263 Intended for lower rates than H.261, including 28.8 kbit/sec
modem Includes QCIF(176 x144) and sub-QCIF format (128 x 96 in Y
channel) Optional error correction for mobile channels Half-pixel accuracy motion compensation Differential encoding of motion vectors Improved coding of DCT coefficients Optional advanced coding options
better SNR at the same rate, lower rate at the same SNR 50% more complex than basic H.261
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 12Lecture 12
Picture Formats for H.263
Image Size
Format Y Cb, Cr
sub-QCIF 128 x 96 64 x 48
QCIF 176 x 144 88 x 72
CIF 352 x 288 176 x 144
ACIF 704 x 576 352 x 288
16CIF 1408 x 1152 704 x 576
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 13Lecture 12
All JPEG, ~ 12 Kbytes551x369 389x261
231x155327x219
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 14Lecture 12
Experimental Procedure Original image subsampled (using ® Photoshop) to various
resolutions (pixel number from max to max/8) Each subsampled image JPEG coded to various quality levels
with ® Matlab A group of images with ~ 12 Kbytes per image is compared Result: Subsampling + JPEG coding is better, at given total bits,
than just JPEG coding
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 15Lecture 12
Future of Low-Rate Video Solution looking for a user? ‘Picturephone’ - not popular
Liked by inventors, surveys of the public less then enthusiastic Videoconferencing: some success, but limited acceptance What is needed to make it successful?
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 16Lecture 12
Advanced Video Coding H.263 and MPEG-4 based on ~1995 technology After 1995, MPEG and VCEG (video coding) started working on
a new low-rate standard (H.26L) Rec H.264 released in September 2002 Information on http://www.vcodex.com/ (some is on our web
site) Site maintained by Ian Richardson, who has written books
about video coding
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 17Lecture 12
AVC Encoder
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 18Lecture 12
AVC Decoder
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 19Lecture 12
New Features Prediction in I pictures Different block transform Different Block Sizes Changes in motion compensation VLC and arithmetic coding
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 20Lecture 12
I Picture Prediction System operates with 4x4 blocks and 16x16 macroblocks
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 21Lecture 12
9 Prediction Modes for 4x4 Blocks
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 22Lecture 12
4 Modes for 16x16 Macroblocks Mode 0: Vertical, extrapolate from upper samples Mode 1: Horizontal, extrapolate from left samples Mode 2: DC, mean of upper and left-hand samples Mode 3: Plane, linear fit to left and upper samples
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 23Lecture 12
Different Block Transform Basically, 4x4 DCT Scanning sequence for 16x16 macroblock is shown below 4x4 and 2x2 DC coefficients transformed (again)
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 24Lecture 12
4x4 DCT Tricks Y = AXAT
a = 1/2, b = 0.707 cos(π/8), c = cos(3π/8)
Trick: Y = (CXCT).*E€
A =
a a a ab c −c ba −a −a ac −b b −c
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥
€
C =
1 1 1 11 1 −1 −21 −1 −1 11 −2 2 −1
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥
€
E =
a2 ab /2 a2 ab /2ab /2 b2 /4 ab /2 b2 /4a2 ab /2 a2 ab /2ab /2 b2 /4 ab /2 b2 /4
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 25Lecture 12
Motion Compensation Ideas Adaptive motion compensation blocks:
16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 26Lecture 12
Coding Ideas Constant quantizer value Zig-zag scan with novel run-length code Arithmetic coding an option Motion vectors to 1/4 pixel
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 27Lecture 12
Loop Filter Concept to overcome block artifacts Average across inter-block lines if difference
is too big Difference threshold depends on coding
mode (intra or inter) and quantization stepsize
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 28Lecture 12
Example of Loop Filter
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 29Lecture 12
Summary: AVC 16 - 4 Block size reminiscent of wavelet Flexible scheme of motion compensation New software and hardware for videoconferencing is using this
standard Will Broadband brind on the age of Picturephone?
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 30Lecture 12
Measuring Complexity Count RISC-like operations
r1 = a + b a, b in external memory, r1 ~ register 3 operations, 2 loads and one add
Example: 2-D 8 point DCT, YCbCr frames, 4:2:0 sampling,15 frames/second
DCT:8 coefficient loads, 8 data loads, 8 multiply & add, one store —> 25 ops. 2x25 (for 2D) —> 50 ops per sample —> 3200 ops per block
Y is 176x144, Cr & Cb are 88 x 72 —> 594 blocks Processing rate = 3200 x 594 x 15 = 28.5 MOPS
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 31Lecture 12
Processing Requirements Range of standards
H.263 to HDTV Processing options
Pentinum Computers RISC Computers
either can be in multiprocessor configurations RICS cores DSP systems
(High-end) commodity computers Decode MPEG-1, MPEG-2, H.261 Encode H.261 at ~10 frames per second
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 32Lecture 12
Design of Video Coders Feet first
Design options Software
What platform? Pentium, RISC, choices of clock speed, bus architecture, memory Hardware
DSP, ASIC, PLA, choices of architecture Complete the designs Evaluate - performance, cost Expensive and time consuming
Forecast Preliminary designs, preliminary evaluation
Complexity Measures, MIPS Choose among outcomes from preliminary designs Examine best designs in detail More alternatives can be examined
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 33Lecture 12
Example: DCT Hardware choices
RISC, DSP RISC - more versatile instruction set DSP - faster execution
Algorithm choices Separable matrix implementation
Regular dataflow Parallelizable DCT
Fast Algorithm Fewer operations Less regular dataflow RISC, conventional computer
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 34Lecture 12
DCT: DSP, Matrix, Separable Basic operation
8 data loads, 8 coefficient loads, 8 multiply-accumulate operations, one data store = 25 operations per output coefficient
Basic operation: s = s + c *x Repeat for 8 output values = 8x25 = 200 ops Do on 8 rows = 1600 ops (Separable) Do on 8 columns = 1600 ops Assume coefficients are kept in a register file (fast) Total 3200 ops (1024 loads, 128 stores)
yi = cijx j ,
j=1
8
∑ i =1,2,K ,8
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 35Lecture 12
DCT - RISC implementation From jfdctfst.c
ftp://ftp.uu.net/graphics/jpeg/.
Arai, Agui, and Nakajima's algorithm for scaled DCT Fast 8 point DCT, repeated over rows and columns Integer implementation
Compiled with gcc, level 2 optimization, for SPARC (Sun) processor
Features 8 data loads and stores per 1-D DCT, all other ops are register No multiplications (shifts and adds)
90 instructions per 8 point DCT Total of 16x90 = 1440 instructions (128 loads, 128 stores)
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 36Lecture 12
Sample of DCT code
tmp0 = dataptr[0] + dataptr[7];tmp7 = dataptr[0] - dataptr[7];tmp1 = dataptr[1] + dataptr[6];tmp6 = dataptr[1] - dataptr[6];
ld [%o1],%i0ld [%i4],%g2ld [%i4-24],%i1ld [%i4-4],%g3ld [%i4-20],%i2add %i0,%g2,%i5sub %i0,%g2,%o2add %i1,%g3,%g4ld [%i4-8],%i0sub %i1,%g3,%o7ld [%i4-16],%g3addcc %o3,-1,%o3ld [%i4-12],%g2add %i2,%i0,%i1sub %i2,%i0,%i2add %g3,%g2,%i0
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 37Lecture 12
DCT Code - more #define FIX_0_382683433
((INT32) 98) /* FIX(0.382683433) */ binary 1100010
z5 = MULTIPLY(tmp10 - tmp12, FIX_0_382683433);
add %o7,%o2,%i0 sub %i3,%i0,%g3 sll %g3,1,%g2 add %g2,%g3,%g2 sll %g2,4,%g2 add %g2,%g3,%g2 sll %g2,1,%g2 sra %g2,8,%g3
g3=(((g3+2*g3)*16+g3)*2)/256
dumb
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 38Lecture 12
Design study: DCT, DSP vs RISC DSP: 3200 ops (include 1152 memory references) RISC: 1440 ops (include 256 memory references) Timing:
DSP instruction = 3 ns, DSP mem reference 20 ns RISC instruction = 5 ns, RISC mem reference 20 ns
TDSP=3200*3 + 1152*(20-3) = 29184 ns
TRISC=1440*5 + 256*(20-5) = 11040 ns
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 39Lecture 12
Requirements for codec Processing requirements, H.261 compression and
decompression, CIF @ 30 fps
Function MOPSEntropy decoding 17Inverse quantization 92-D DCT 60Motion estimation 0Loop filtering 55Pixel prediction 30YCbCr to RGB 27
Total 198
Function MOPSRGB to YCbCr 27Motion extimation (25 searches in a 16x16 region) 608Inter/Intraframe coding 40Loop filtering 55Pixel prediction 182-D DCT 60Quantization, Zig-Zag 44Entropy coding 17Frame reconstruction 99Total 968
Compression Decompression
Reference: Table 8.1
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 40Lecture 12
Processor Speed Trends
10
100
1000
10000
100000
1988 1990 1992 1994 1996 1998 2000 2002
MOPS
Source: Figure 8.1, Bhaskaran
General Purpose Microprocessors
Programmable DSP’s
General Video Processors