video fundamental

1

Video Fundementals

Texas InstrumentsRon Birkett Digital Customer Applications Team (DCAT)

Scott Specker Technical Training Organization (TTO)

Video Basics – OutlineDigital Video 101

ResolutionRateColorVideo 101 - Summary

Generic Video Block DiagramVideo InterfacesIntro to Video Compression

2

Which Image Looks Better?

358 x 238 pixels23 KB

149 x 99 pixels6 KB

Pixel (short for picture element) represents each point of information in a pictureIn printing, usually the term "dot" is substituted for "pixel"

Resolution describes the number of pixels horizontally and vertically

How about between these?

How about now


? x ? pixels? KB

3

Which works best for your needs?

12 times the size … Is it twelve times the quality?


1491 x 991 pixels303 KB

What resolution should you use?Printers or displays with fixed dot (or pixel) densities might lead you to a specific resolutionConforming to a specific standard (NTSC, PAL, etc.) might dictate your choiceData rate, processing rate, storage capacity may direct your decision

ResolutionPerceived resolution is a function of:

Actual resolutionDistanceSize of display (i.e. size of pixels)

4

Various Standard Resolutions

720 x 480Standard Definition (SDTV)

18 different resolutions/rates(three most common are shown)Digital Television

ATSC1280 x 7201920 x 1080High Definition (HDTV)

352 x 288352 x 240Resolution VHS VCR is capable ofSIF

704 x 576352 x 288176 x 144

720 x 480NTSC

QCIF

Often used in Video Conferencing or for small screen applications

(specified for various codecs, e.g. H.261)CIF4CIF

720 x 576Full Analog Television ResolutionD1PALApplication(s)Format




5

Which Video Clip Looks Better?

5 frames/sec 15 frames/sec

Which Video Clip Looks Better?

15 frames/sec

Same data rate between these two

progressive interlaced30 fields/sec

6

Progressive and Interlaced

Interlaced formats decrease the data rate by eliminating every other lineEach set of odd or even lines are called a field:

1. Field 1: Odd lines are "painted" every 1/60th of a second2. Field 2: Even lines are painted the next 1/60th of a second

In the end, the two fields end up "painting" the entire screen 30 times/secondConversely, progressive formats scan (i.e. paint) the entire screen in one pass

Each full screen scan is called a frameFrames are commonly updated at 25, 29.97, 30, or 60 frames per second (fps)

Interlacing Side EffectsWhile interlacing can reduce the data rate without reducing the image resolution, it does produce some side effects

Tomato is moving right in this exampleSee how the image is distorted due to movement between the two fields …This is called "combing" (or "feathering)

Similarly the skier seems to flicker (called "line twitter")Worse yet, very thin image elements (such as the ski pole) that are only one line wide seem to disappearand reappear (exacerbated by vertical movement)

As long as nothing in the picture changes (such as when showing a still image), the alternating fields will actually complement each other and form a complete picture. The display will flicker and scan lines will be visible (both a inherent of an interlaced system), but that's life in an interlaced world. We don't spend the evening looking at a still picture on the TV though. The moment something moves, we get interlacing artifacts! 1

7

NTSC, PAL, and CIF Formats

PPP

I

I/P

I

I/P

30 frames/sec30 frames/sec30 frames/sec

50 fields/sec

Frame Rate

59.94 fields/sec

Frame Rate

176 x 144352 x 288

760K pix/sec3M pix/sec12M pix/sec

5M pix/sec10M pix/secData Rate

5M pix/sec10M pix/secData Rate

704 x 576

352 x 288720 x 576

Resolution

352 x 240720 x 480

Resolution

SIFD1

PAL

SIF

QCIFCIF4CIF

D1NTSC

Scan Method in Table AboveI = InterlacedP = Progressive

Notice how the data rate is the same between NTSC and PAL, even though the resolution and frame rates are different

ATSC Formats

HDTV

SDTV

ATSC

I60 fields

P60 frames

P

P

I

I/P

60 frames

60 frames

60 fields

Frame Rate (per second)

1920 x 1080 124 M62 M55 M20 M10 M

Data Rate(pixels/sec)

1920 x 10801280 x 720720 x 480720 x 480

Resolution

1080i720p480p

1080p

480i

Name

KeySDTV = Standard Definition TelevisionHDTV = High Definition TelevisionI = InterlacedP = Progressive

Standard supports both NTSC rates and integer rates:i.e. 60.00, 59.94, 30.00, 29.97, 24.00, and 23.98

Notice the similarity between 720p and 1080i data ratesDebate solved between competing HD formats by including bothIn USA, of four major broadcast networks, two chose 720p while the other two chose 1080iATSC compatible HDTV's provide one native format (720p, 1080i), but they must convert the other so they can display both

8




Color Depth

1-bit/pixel = black/white

8-bits/pixel = 256 greys

4-bits/pixel = 16 colors

8-bits/pixel = 256 colors

8-bits Red8-bits Green8-bits Blue

24-bits/pixel = 16 million colors

9

RGB Color

8-bits Red

8-bits Green

8-bits Blue

All color can be composed by adding specific amounts of R, G, & B

8-bits (28) specifies the amount of each color

This is the scheme used by most electronic displays to generate color; e.g. we often call our computer monitors, "RGB displays"

0 255

Color ReductionHuman eye is not as sensitive to color as it is to Luminance

dark vs light

To this end, to save costs the various standards decided to:Maintain luminance information in our images, butReduce color information

Using RBG, though, how do we easily reduce color information without removing luminance?For this, and other technical reasons, a separate color space was chosen by most video standards …

10

18-3720

98100

31

322481

2487032

23

Y:Cb:Cr:

Color Spaces

Color Spaces: Along with RGB, there are many different ways to represent colorThis "picker" example demonstrates 4 different common color spaces

18-3720

98100

31

322481

2487032

23

Y:Cb:Cr:

Color Spaces

Color Spaces: Along with RGB, there are many different ways to represent colorThis "picker" example demonstrates 4 different common color spaces

YCbCr is the color space most often used in video, where:Y = Luminance (Black/White/Gray)Cb = Blue difference value (Y – Blue)Cr = Red difference value (Y – Red)

11

What is YCbCr?Video

Source

YCbCr

Y

Cb

Cr

R

G

B

Even though most displays actually use RGB to create the image, YCbCr is used most often in consumer electronics for transmission of the imageHistorically, B/W televisions transmitted only luminance (Y)The color signals were added later

YCbCr → YPbPr

Cr

Cb

YVideo DAC

(e.g. Encoder)

YPbPr

When YCbCr values are converted to analog signals, they are called YPbPr

12

YCbCr → YPbPr

…Cr

Cb

YCr

Cb

YCr

Cb

YCr

Cb

YVideo DAC

(e.g. Encoder)

MemoryY

PbPr

When YCbCr values are converted to analog signals, they are called YPbPrDigital displays sample the signals to regain the discrete values

Packed Pixel (Interleaved)

…Cr

Cb

YCr

Cb

YCr

Cb

YCr

Cb

YVideo DAC

(e.g. Encoder)

MemoryY

PbPr

When YCbCr values are converted to analog signals, they are called YPbPrDigital displays sample the signals to regain the discrete valuesIn memory, pixel values can be:

Packed (i.e. interleaved)Sorted (shown on next slide)

13

Packed Pixel (Sorted)

YYYY

MemoryY

PbPr

In memory, pixel values can be: Packed (i.e. interleaved)Sorted Pixels are grouped in memory by type

Cb

Cb

Cb

Cb

Cr

Cr

Cr

Cr

Video DAC(e.g. Encoder)

We still haven't reduced the color info, yet ...

Color (Chroma) Subsampling

YYYY

MemoryY

PbPr

One way to reduce data would be to sample the color signals every other timeVariations of this scheme are used in many video systems

Cb

Cb

Cr

Cr

YCbCr

Y YCbCr

Y YCbCr

Y YCbCr

Y

14

Color (Chroma) Subsampling

YYYY

MemoryY

PbPr

One way to reduce data would be to sample the color signals every other timeVariations of this scheme are used in many video systemsThis scheme is often referred to as 4:2:2 and is often represented as:

Cb

Cb

Cr

Cr

YCbCr

Y YCbCr

Y YCbCr

Y YCbCr

Y

YY

YY CbCb CrCr

Digital Video Formats (4:4:4, 4:2:2, 4:2:0, 4:1:1)

YYYY CbCb

YPbPr

YCbCr

Y YCbCr

Y YCbCr

Y YCbCr

Y YCbCr

Y YCbCr

Y

CrCr

4:2:2

4:4:4 would be the original stream, with no chroma sub-sampling

15


YYYY CbCb

YPbPr

CrCr

4:2:2

YYYY

YCbCrY

Y

Y

YCbCrY

Y

Y

YCbCrY

Y

Y

YCbCrY

Y

Y

YCbCrY

Y

Y

YCbCrY

Y

Y

4:2:0

Cr

Cb


YCbCr

Y YCbCr

Y YCbCr

Y YCbCr

Y YCbCr

Y YCbCr

Y


YYYY CbCb

YPbPr

CrCr

4:2:2

YYYY4:2:0

YYYY CrCb

YCbCr

Y Y Y YCbCr

Y Y Y YCbCr

Y Y Y4:1:1

Cr

Cb


YCbCrY

Y

Y

YCbCrY

Y

Y

YCbCrY

Y

Y

YCbCrY

Y

Y

YCbCrY

Y

Y

YCbCrY

Y

Y

YCbCr

Y YCbCr

Y YCbCr

Y YCbCr

Y YCbCr

Y YCbCr

Y

16

Summary: Pixel's in 4D

Dimensions:1. Height2. Width

4. Color Depth of the pixelHow many bits are used to represent the color of each pixel?

1 bit 8 bits Red: 8 bitsGreen: 8 bits or YCbCrBlue: 8 bits 4:2:2

width

h

3. Frame rate determines how long the pixel exists,i.e. how it moves

Video Basics – OutlineDigital Video 101Generic Video Block DiagramVideo InterfacesIntro to Video Compression

17

Typical Video System ChainDisplayCapture Process

Formatting Compression

EncryptionDecryption

Decompression

Video InputNTSC/PALDecoder

Video OutputNTSC/PAL

Encoder

Typical Video System Chain

Formatting Compression

EncryptionDecryption

Decompression

DisplayCapture Process

Video InputNTSC/PALDecoder


Encoder

ProcessInput

Decoder

Output

EncoderProcessStore

Transmit

18

Video Basics – OutlineDigital Video 101Generic Video Block DiagramVideo Interfaces

DisplayAnalogueDigital

Intro to Video Compression

Video H/W Interfaces

Display / Monitor

Analog Hardware I/FAnalog Data Format

Encoder (Video DAC)

Digital Hardware I/FDigital Data Format

ProcessingVideo Input

NTSC/PALDecoder


Encoder

19


Display / Monitor


Encoder (Video DAC)



Com

pone

nt

S-Vi

deo

CVBSRF

ComponentS-Video CVBS

ProcessingVideoInput

VideoOutput

Radio Frequency (RF)Coaxial Cable (RG58, RG59, RG6)

Off-the-Air (OTA) antenna

Audio/Video signals transmitted on RF carrier signalSignals follow NTSC format in North America; PAL in EuropeSignals usually transmitted from antenna to TV via 75Ω coaxial cable

Using RG58, RG59, or RG6 cable and F-connectorLuminance and color signals transmitted independently

Historically, luminance information (Black/White/Gray) was introduced in the 1930sColor was added as a sideband signal later in the 1950s

Analog TV

Video Connections - RF

Antenna (RF)

20

Video Connections – Composite (CVBS)

Composite Video

VCRs provide two means of connecting to TV:1. RF – Tape playback "broadcast" on open channel (usually channel 3 or 4)2. Composite Video is one of three cables used to carry A/V line-level signals

Composite video, also known as Composite Video with Blanking and Sync (CVBS), is a 75 ohm video cable using a Yellow RCA plugRight and Left audio channels using Red/White RCA plugs (respectively)

VCR recordings do not utilize the full NTSC resolution, rather they only achieve approximately SIF quality (~340x240)CVBS improves performance by eliminating the RF carrier, and separating A & V

RFor

Analog TV

VCRCH 8 12:00

RF

COMP Video

Left

Right

S Video

Video Connections: S-Video

S-Video

While most satellite receivers still have RF and CVBS cables, if an analog TV has an S-Video connection, it provides a noticeable picture improvementS-Video differs from composite (CVBS) by breaking out the video into two signals:

1. Y – Luminance (black/white/gray)2. C – Chrominance (colors)

S-Video is provided by a circular jack with two differential signals (4 pins in jack)Stereo audio is still provided by Left/Right RCA jacks

RG6

Analog TV

Digital SatelliteCH 8

21

Video Connections: Component Video

New video sources (like DVD players) now incorporate Component Video outputsComponent differs from S-Video (Y/C) by further breaking out the chromo signal:

1. Y – Luminance (black/white/gray)2. Pb – Chroma information (Luma – Blue)3. Pr – Chroma information (Luma – Red)

Y'Pb'Pr' specifies how data is physically sent to TV, not the data format. It can be sent in any format discussed earlier (480i, 480p, 720p, 1080i, etc.)While some high-end products use BNC connectors, most Component outputs use three RCA plugsAudio is (still) provided separately (will discuss later)

Digital TV

DVDT2 CH3 10:53

Component VideoY

PrPbY

Pr

Pb


Display / Monitor


Encoder (Video DAC)


VideoDAC

(Encoder)ProcessingVideo

Input

22


VideoDAC

(Encoder)

Display / Monitor


Encoder (Video DAC)


BT.656

YCbCr4:2:2


Output Summary

Video Processing

Display / MonitorAnalog Camera

Analog Hardware I/FAnalog Hardware I/FAnalog Data FormatAnalog Data Format

Encoder (Video DAC)

Decoder (Video ADC)

Digital Hardware I/FDigital Hardware I/FDigital Data FormatAnalog Data Format


Decoder

VideoOutputEncoder

23

Input and Output

Video Processing



Encoder (Video DAC)

Decoder (Video ADC)

Digital Hardware I/FDigital Hardware I/FDigital Data FormatDigital Data Format

ProcessInput

DecoderOutputEncoder

What if it's a Digital Display?

24

Digital Display (e.g. DTV, Computer)

Video Processing

Digital DisplayAnalog Camera


Decoder (Video ADC)


ProcessInput

DecoderOutput

Digital Display (e.g. DTV, Computer)

Video Processing

Digital DisplayAnalog Camera


Decoder (Video ADC)

Digital Hardware I/FDigital Data FormatDigital Data Format

IEEE

13

94

HDMI

DVI

ProcessInput

DecoderOutput

25

Video Connections: DVI

The latest consumer video interface is the DVI (Digital Video Interface) connector. While it is the first digital video I/F for the consumer market, many digital computer monitors use it as well. (It transfers 24-bit RGB or 24-bit YCbCr.)DVI can be found in three variations:

DVI-D – Digital signals (24 signals)DVI-A – Analog signals (5 analog signals similar to VGA: RGBHV)DVI-A/D – Analog and digital signals (24+5 = 29 signals)

Supports High-Bandwidth Digital Content Protection (HDCP) to ensure unauthorized copying of material is preventedAudio is (still) provided separately (will discuss later)

Digital TV

HDTV ReceiverCH 38.1

Video Connections: HDMI

Digital TV

HDTV ReceiverCH 38.1

1. All-digital HDMI can carry both uncompressed high-definition (HD) video and uncompressed multi-channel audio in all HD formats including 720p, 1080i and even upcoming 1080p

2. A single HDMI cable carries both video and audio3. Integrated remote provides simple control of your system4. Automatic format adjustment matches content to preferred viewing format5. PC compatibility enables viewing of your PC data on you HDTV6. HDMI is backward compatible with DVI (Digital Visual Interface)

26

High-speed, digital serial interconnectNo need to convert digital data into analog and tolerate a loss of data integrity Physically small: Thin serial cable can replace larger and more expensive onesEasy to use: No need for terminators, device IDs, or elaborate setup

6-pin connector provides power, 4-pin is very smallInexpensive: Priced for consumer productsScalable architecture:

Supports 63 devices by daisy chaining and branching for true peer-to-peer comm.Hot pluggable: Add or remove 1394 devices with the bus activeCross platform: Apple (FireWire™), Microsoft, HDTV, Sony (iLink™), etc.

Fast: Speeds of 100, 200, 400, and 800 MbpsIsochronous: Even real-time, multimedia data can be guaranteed its bandwidth for just-in-time delivery

DVD-RecorderCH 8 2:43am

Digital TV

Digital Video Connections: IEEE 1394


Video Processing



Encoder (Video DAC)

Decoder (Video ADC)


ProcessInput

DecoderOutput

27


Where & Why of Video CompressionStill Image Compression (JPEG)Compensating for Motion (MJPEG, MPEG1/2/4)H.26x Standards (ITU)AVC (aka H.264, aka MPEG-4p10)VC1 (Windows Media Video 9 - WMV9)OthersVideo Codec Summary

You are entering…The Software ZoneInput

Decoder

Output

EncoderProcessStore

Transmit

HardwareSoftware

Video CaptureDevice Driver

StoreTransmit

Processing:Decryption

DecompressionFormatting

Etc.

Video DisplayDevice Driver

Process

Processing:Formatting

CompressionEncryption

Etc.

28

You are entering…The Software ZoneInput

Decoder

Output

EncoderProcessStore

Transmit

HardwareSoftware

Video CaptureDevice Driver

StoreTransmit

Processing:Decryption

DecompressionFormatting

Etc.

Video DisplayDevice Driver

Process

Processing:Formatting

CompressionEncryption

Etc.

Since Compression/Decompression is so important, let’s take a closer look at it.

Problem: Why compression?Video CaptureDevice Driver

StoreTransmit

Decompression Video DisplayDevice DriverCompression

~4.5 Mbytes/s (36.5 Mbits/s)23.3 GBytesCIF (352x288)

~15.5 Mbytes/s (124.4 Mbits/s)83.7 GBytesD1 (720x480)

TransmissionStorage (90 min.)Format30 frames/s, 4:2:0

Without it a movie won’t fit on a CD (800 MBytes) or a DVD (4.7 GBytes)… it can’t be streamed over DSL (384 Kbits/s – 1.5Mbits/s),

FIOS (5-15Mbits/s), or common Ethernet (10-100Mbits/sec)

Without it…

29

The Generic Image Compression ProblemEncode a digitized image using as few bits as possible

while acceptably maintaining its visual appearance

256 x 256 pixelsX

24 bits per pixel=

1,572,864 bits=

196,608 Bytes

=

192 KBytes

Original

20 KBytes

Reconstruction



30

• JPEG (Joint Photographic Experts Group)– Best known standard is IS 10918-1 (ITU-T T.81)– JPEG committee reported to 3 international standards organizations

(ISO/ITU/IEC)• JPEG used in variety of applications

– Printing– Digital Cameras– Video editing systems

• Typically 10:1 compression• Still Image coding technique to remove spatial redundancy• Block-based DCT, Huffman Coding, Perceptual Quantization• Extensions for lossless, progressive coding• Initially aimed at Monochrome

– Separate compression of components in color image• Baseline, Progressive, Hierarchical, Lossless modes

Image Compression

– Security– Medical Imaging

JPEG Quality – Compression FactorEver wonder why JPEG images can look blocky?

72 dpi (i.e. pixels per inch)Compression Factor 15 (23 KB)

72 dpi (i.e. pixels per inch)Compression Factor 99 (3 KB)

While the blocky version might have some attractive appeal, it's certainly of lower quality.As we study how JPEG compression works, let's see if we can figure out why this (side-) effect occurs.

31

JPEG (Block Transform Coding)DCT

Quantize

Zig-Zag

Run Length Encode

Entropy EncodeVariable Length Coding (VLC)

Picture Block

01101000100010

Frequency ComponentsLow

Frequency

HighFrequency

Image compression techniques rely on the factthat the Human Visual System (HVS) is moresensitive to low frequency information.

32

Discrete Cosine Transform (DCT)Horizontal Frequencies

Vertical Frequencies

Image domain to frequency domainmapping

• Each 8x8 block of pixel values is mapped to the frequency domain (64 frequency components)

• DCT input samples are in range +511 to -512.Larger inputs accepted but overflow may occur (e.g. 12-bit JPEG)

DC Coefficient

DCT

AC Coefficients

DCT Example (Image)

• 4-bit data values: 0 - 15• 4 x 4 pixel block• Example uses 4 bits and 4x4 for convenience

Normally 8-12 bits and 8x8 blocks are used

15

0

8

33

DCT Example (Image Frequency)

0000000000000008

Input Data DCT Output

DC CoefficientAverage

Brightnessof block

• 4-bit data values: 0 - 15• 4 x 4 pixel block

15

0

8

DCT Example (Horizontal)

00000000000000158

LowHorizontalFrequency

00000000000015008

HighHorizontalFrequency

Horizontal Frequencies

34

DCT Example (Vertical)

00000000000150008

LowVerticalFrequency

00015000000000008

HighVerticalFrequency


DCT Example (Checkerboard)

15000000000000008

HighHorizontaland VerticalFrequency



35


Quantize

Zig-Zag

Run Length Encode

Entropy Encode

Picture Block


01101000100010

What have we done?

00000000000000158

DCT

Data Output

• So far, we haven’t saved any memory• We only transformed the image to the frequency domain

36

Quantization

00000000000000158

DCT

Data Output

• Low frequencies are more important than high ones



Quantization

00000000000000158

DCT

Data Output

• Lower frequencies are more important than high ones• Important frequencies are grouped in the upper left corner

Importance

Importance

37

Quantization

00000000000000158

DCT

Data Output

• For best quality and no compression, use 4 bits per pixel• 4 bits allows us to keep all of the frequency data• Remember, JPEG uses 8x8 blocks and 8-12 bits per pixel

# of bits per value

4444444444444444

1122112322232334

Compression# of bits per value

4444444444444444

# of bits per value

• Compression comes from using less bits per pixel

38

A 50% Memory Savings

0102030405060

Pixels Per Block

No Compression 50% Compression

1122112322232334

# of bits per value

4444444444444444

# of bits per value

1122112322232334

0000000000000078

00000000000000158

DCT

Data Output Quantization Matrix

23 = 7

0

24 = 15

0

39

1122112322232334

0000000000000078

00000000000000158

DCT

Data Output

23 = 7

0

24 = 15

0

Quantization Matrix

1122112322232334

0000000000000078

00000000000000158

DCT

Data Output

23 = 7

0

24 = 15

0

Quantization Matrix

40


Quantize

Zig-Zag

Run Length Encode

Entropy Encode

Picture Block


01101000100010

Zig-Zag Scanning

625446383022146

56484032241680

57494133261791

585042342618102

595143352719113

605244362820124

615345372921135

63554739

7

312315

6259555244412927

352120109320

3634221911841

38373323181275

494738322417136

5750463931251614

5856514540302615

63616054

28

534342

• Turn a 2-dimensional block into a 1-dimensional array

41

Zig-Zag Scanning

625446383022146

56484032241680

57494133261791

585042342618102

595143352719113

605244362820124

615345372921135

63554739

7

312315

6259555244412927

352120109320

3634221911841

38373323181275

494738322417136

5750463931251614

5856514540302615

63616054

28

534342

• Turn a 2-dimensional block into a 1-dimensional array


Quantize

Zig-Zag

Run Length Encode

Entropy Encode

Picture Block


01101000100010

42

Run-length Encoding

Block from Original Image

DCT Quantize

Zig-Zag Scan

Run Length Encode

001515

001515

001515

001515

00158

0000

0000

0000

0078

0000

0000

0000

0 00 0 0 0 0 0 0 0 0 0 0078

807000

Variable Length CodingDCT

Quantize

Zig-Zag

Run Length Encode

Entropy Encode

01101000100010

Picture Block


43

JPEG Decoding (reverse the process)

Inverse DCT

InverseQuantize Reverse Zig-Zag

Run Length EncodeEntropy Decode

01101000100010

Picture

Block



44

Motion JPEG

time

JPEGJPEG

JPEGM-JPEG or Motion JPEGEach video frame compressed as a JPEG imageNot really covered by JPEG std., but useful tool for compressing motion sequences (prior to arrival of MPEG)Frames Transmitted / Stored SequentiallyUsed when each individual frame needs to be independently decoded

Security ApplicationsBasis of most ‘Non-Linear’ Editors

Non-StandardisedMPEG can be used in JPEG style in controlled environmentInterchange etc.

MPEG (Motion Pictures Experts Group)

time

JPEGJPEG

JPEGM-JPEG or Motion JPEGEach video frame compressed as a JPEG image

x

y

MPEGUses Motion Compensation to exploit temporal redundancyThat is, why keep a copy of the same thing over and over?

45

MPEG (Motion Pictures Experts Group)

time

JPEGJPEG

JPEGM-JPEG or Motion JPEGEach video frame compressed as a JPEG image

x

y

MPEGUses Motion Compensation to exploit temporal redundancyThat is, why keep a copy of the same thing over and over?Frame Types

“I” FrameIntra-encoded frameCompressed similar to JPEG

“P” FramePredicted Inter-frameRe-uses redundant information between frames

I P P

x

y

time

A

B

C

I (Intra) P (Inter)

What if it moves?

Objects in a frame will probably not stay in the same placeEven if the objects don’t move, the camera mayLet’s look at this a different way…

46

Motion

Frame 1 Frame 2 Frame 3

Divide the Picture into Macroblocks

Frame 1 Frame 2 Frame 3

• Macroblocks are 16x16 luma with their associated chroma blocks• Most video processing uses macroblocks (DCT uses 8x8 blocks)

47

Macroblocks and Blocks

16

16

Each macroblock consists of four luminance blocks and 2 chrominance blocks

Chrominance is sub-sampled in the 4:2:0 formatEach luminance or chrominance block relates to 8 pixels by 8 lines of Y, Cb or Cr

1 2

43

5

6

Y

Cb

Cr

8

8

So, how do we predict the movement of objects?

With the Picture Divided into Macroblocks …Frame 1

Object has moved from frame to frame

Frame 2

48

Search Previous Frame for a Given MacroblockFrame 1

findblock

Frame 2

On a byte-by-byte basis, compare the block of interest to the search area in the previous frame to find the best fit

Very CPU intensiveCommon Methods:

Sum of Absolute Differences (SAD) – most commonSum of Squared Error (SSE)


findblock

Frame 2




49


findblock

Frame 2





findblock

Frame 2




50


findblock

Frame 2

FoundOn a byte-by-byte basis, compare the block of interest to the search area in the previous frame to find the best fit



Motion VectorsFrame 1 Frame 2

If the block hasn’t moved, just re-use itMotion Vector: (x-y offset) specifing a best-fit block from another frame

MV(x,y)

51

Small Error

Reconstructed Image and BlockFrame 1 Frame 2

MV(x,y)

Encoders actually contain a partial decoder to reconstruct the image block-by-blockAny error found between the original and the reconstructed block is captured and compressed using same techniques used in I-frames (i.e. JPEG)

Original(frame 2)

Reconstructed(frm1 MV)

Error

Small Error

Block Specified by Vector & ErrorFrame 1 Frame 2

MV(x,y)

Encoders actually contain a partial decoder to reconstruct the image block-by-blockAny error found between the original and the reconstructed block is captured and compressed using same techniques used in I-frames (i.e. DCT encoded)Final result = Motion Vector + Error “overlay”

Original(frame 2)

Reconstructed(frm1 MV)

Error

52

+ MV(x,y) +

Frame 1 Frame 2Decoder – Motion Compensation

MV(x,y)

Decoder uses Motion Vector to move Reference Frame to new locationAdd Error to Reference Block to complete the new block

Blockfrom Frame 1

Resultant BlockMotion & Error

compensated block

Error

=

Hide ‘n SeekFrame 1 Frame 2 Frame 3 Frame 4

• The squares can be encoded using forward prediction from frame to frame• Since the circle is “hidden” in Frame 2, it cannot be predicted in Frame 3• It will have to be encoded by DCT without prediction (less compression)

53

Hind-sight is 20/20Frame 1 Frame 2 Frame 3 Frame 4

• What if Frame 4 could look “back” to see that Frame 3 needs the circle• This is called “backward prediction”• Frame 3 uses “forward prediction” for the square• Since frame 3 uses both forward and backward, it is a Bi-directional Frame• B frames reduce the bit-rate needed for the video stream• B frames may add additional latency at the decoder

B Frame

Types of Prediction

• Intra (I) Frame– Frame is coded based on spatial redundancy only

• Predicted (P) Frame– Frame is coded using prediction from prior I or P frame(s)

• Bi-directionally predicted (B) Frame– Frame is coded with bi-directional (forward and backward) prediction– Prediction based on I and P frames (not other B frames)– Not a source of prediction for any other frames– Since 2 frames are needed to decode, more memory is needed– Frames may be transmitted out of sequence to simplify decoding

I B B B B PP

54

MPEG TerminologyVideo Sequence

I Frame I Frame

Group of Pictures(GOP) Slice

Picture

Y Cb

Cr

5

61 23 4

Macroblock

Block

1

MPEG-1• Intended to support display directly from a CD-ROM or similar storage device

– Used in the Video CD (VCD) format– Designed for typical data rate < 1.5Mbps (25:1 compression)

• Designed to provide a digital equivalent of the popular VHS video tape format– Format SIF (352 x 240), YUV, 30 frames/sec (non-interlaced)

• How it works:– JPEG with Motion Compensation– Adaptive perceptual quantization for bit-rate mgmt– I, P, and B Pictures

• Encoder is more complex than the decoder• Stereo Audio (MPEG-1 defines MP3 audio format)

55

MPEG-2• The standard for Digital TV (HDTV, DVD)• Most successful of the MPEG standards• Typical output rate 2-6 Mbps (40:1 compression)• To MPEG-1 it adds support for:

– Rec. 601 format (720 x 480), YUV, 30 frames/sec– Interlaced video– Multi-Channel Audio (Dolby Digital AC3)– Motion Film (Pan/Scan, 3:2 Pulldown)

• Encoder is more complex than the decoder

MPEG Encoder Block Diagram

Video Source

Intra

Inter

Transform

Motion Vectors

Predicted Frame

Quantization

EntropyCoding

Motion Estimation

Motion Compensation

InverseQuantization

InverseTransform

-

++

Buffer

Rate Control

ReferenceFramestore

56

MPEG-2 ComponentsPart 1: Systems (ITU-T Rec. H222.0) Part 2: Video (ITU-T Rec. H.262) Part 3: Audio Part 4: Compliance testing Part 5: Software simulation (TR) Part 6: Digital Storage Media Command & Control Extensions (DSM-CC) Part 7: Advanced Audio Coding (AAC) Part 9: Extension for real time interface for systems decoders Part 10: Conformance extensions for DSM-CCPart 11: IPMP on MPEG-2 systems

MPEG-2 Profiles

SNR- or spatial-scalable4:2:2 or 4:2:0I, P, BHigh profileHP

SNR- or spatial-scalable4:2:0I, P, BSpatially Scalable profileSpatial

SNR scalable4:2:0I, P, BSNR Scalable profileSNR

none4:2:0I, P, BMain profileMP

none4:2:0I, PSimple profileSP

Scalable modesChroma FormatPicture Coding TypesNameLevel

Notes:Aspect Ratios for all levels: square pixels, 4:3, or 16:9SNR = signal-to-noise ratio

* Wikipedia

57

MPEG-2 Levels

8062,668,800

Except in High profile where constraint is:• 83,558,400 for 4:2:0

1152192023.976, 24, 25, 29.97, 30, 50, 59.94, 60

High LevelHL

6047,001,600

Except in High profile where constraint is:• 62,668,800 for 4:2:0

1152144023.976, 24, 25, 29.97, 30, 50, 59.94, 60

High 1440H-14

15

10,368,000Except in High profile, where constraint is:• 14,475,600 for 4:2:0 • 11,059,200 for 4:2:2

57672023.976, 24, 25, 29.97, 30Main LevelML

4 3,041,28028835223.976, 24, 25, 29.97, 30Low LevelLL

Max bit rate in Main profile

(Mbit/s)

Max luminance samples per

second

Max vert resolution

Max horiz resolutionFrame rates (Hz)NameAbbr

Max Luminance in samples/sec = approximately height x width x framerate)

* Wikipedia

MPEG-2 Profiles and Levels• Profile: Well-defined sets of compression techniques (i.e. tools)

–Simple (SP)–Main (MP) –422P–SNR

• Level: Pre-defined formats (picture size, bit rate, etc.)

• Common combinations:–MP@ML for SDTV distribution, DVD, etc.–MP@HL-1440 for HDTV distribution

• High-1440 (HL): HDTV– 1440 x 1152 @ 60 fps

• Others are less common

–Spatial–High (HP) –Multi-view (MVP)

• Low: Low quality video, Videoconferencing– Up to 352 x 240 @ 30 fps

• Main (ML): DVD, Cable, etc…– Up to 720 x 480 @ 30 fps (NTSC)

58

MPEG-4• Emerging ISO audio-visual coding standard for multimedia• Higher Compression version of MPEG-2• Supports new ways for communication, access and

manipulation of digital audio-visual data• Application areas are in the merging worlds of computers,

telecom and TV/film• Defines a popular new audio

compression standard– Advanced Audio Coding (AAC)

‘TV/film’

‘Telecom’‘‘Computer’Wireless

interactivity

AV-data

MPEG-4 Video Compression• Based on MPEG-2• Includes Bit Rates Below 64kbps• Standardized Scene Description• Parametric Descriptions of Human Face and Body• Resynchronization/Data Recovery Tools• Media Objects to represent aural, visual, or

audiovisual content– Organized in Hierarchical Fashion– Random Access to Objects– Objects have Spatial and Temporal Scalability

59

MPEG-4 Encoder

MPEG4WORLD

• Single object mode– Treats entire frame as one object (MPEG-2)

• Multiple object mode– code shape, motion and texture of each of the objects– each object coded independently in the bitstream– graphics and video objects can be coded using different techniques– each object can be coded at different spatial and temporal resolutions– the final image is a composition of the individual objects

MPEG-4Profiles / Levels

384 kbps30 frames/sec352x288L3128 kbps15 frames/sec352x288L264 kbps15 frames/sec176x144L1

Max Bitrate

Typical Frame Rate

Typical Image Size

Level

MPEG-4 Simple Profile (SP) Levels

8 Mbps720x576L53 Mbps352X576L4768 kbps352x288L3384 kbps352x288L2128 kbps176x144L1128 kbps176x144L0

Max Bitrate

Typical Image Size

Level

MPEG-4 Advanced Simple Profile (ASP) and Fine Grain ScalablityProfile (FGSP) Levels

60

History of Video Compression Standards

1984

ITU-T

ITU-T/ISO

ISO

1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

MPEG-1 MPEG-4

MPEG-2/ H.262

H.264 (L)/MPEG-4 Part 10-AVC

H.261 H.263 H.263+ H.263++

• ISO: International Standards Organization– JPEG: Joint Photographic Experts Group– MPEG: Moving Picture Experts Group

• ITU: International Telecommunications Union– Standardization of telecommunications formats– Example: H.320 for ISDN audio-video teleconferencing



61

H.261

• Standard for Videoconferencing• Low delay, symmetric applications• Input format CIF (352x288),

– YUV, 30 frames/sec

• Typical output rate is a multiple of 64 Kbps, 10 frames/sec• Block-based DCT and Motion Compensation• No B frames, integer pel motion estimation

H.263• A standard for Videophone• Very low bit-rate, low delay, symmetric applications• Input format QCIF (176x144), YUV, 30 frames/sec• Typical output rate < 28.8 Kbps, 6-10 frames/sec• Block-based DCT and Motion Compensation• Syntax more common to MPEG than H.261• 3D VLC, half pixel (half-pel) Motion Estimation• Different Options:

– Overlapped Motion Compensation– PB frames– Arithmetic Encoding

62



ITU: H.26L H.264ISO: MPEG-4 Part 10 Advanced Video Codec (AVC)

• Both organizations formed the Joint Video Team (JVT) to define their next generation video codec

• Major benefits:–More than 50% in bit rate savings as compared to MPEG-2 for the

same video image quality–Maintains high quality video at low and high bit rates–Network Adaptation Layer allows H.264 to be transported over different

networks–Enhanced error robustness, including mobile networks

• Real-time, low end-to-end delay• Variety of source materials (memory cards, streaming, etc.)

63

H.264/AVC Enhancements

Video Source

Intra

Inter_Transform

Bit StreamOut

Quantized Transform Coefficients

Motion Vectors

Predicted Frame

Quantization

EntropyCoding

Motion Estimation

Frame Store

Motion Compensation

InverseQuantization

InverseTransform

+

Coding Control

++

Intra Prediction

De-Blocking Filter

1

2

3

45

67

New

New

H.264/AVC Enhancements (Details)• 1 and 6 are new, the others are improved

1. Intra Prediction Modes (9 4x4 & 4 16x16 modes = 13 modes)2. 4x4 Integer Transform (fixed)3. Quantization step sizes increased at a compounding rate of

approximately 12.5%4. Different entropy encoding methods

• Single Universal VLC and Context Adaptive VLC• Context-Based Adaptive Binary Arithmetic Coding (CABAC)

5. No Mismatch between Encoder and Decoder Transforms6. Signal-adaptive De-blocking filter7. Improved Motion Estimation

• Seven block sizes and shapes• Multiple reference picture selection• 1/4-pel (6 tap filter) Motion Estimation accuracy

Let’s take a closer look at block sizes.

64

Block Based Motion CompensationH.263• 16x16 block size• 1 motion vector per macroblock

H.264• 7 block sizes• Up to 16 motion vectors per macroblock• 15% bit-rate savings

16x16

4x44x44x4

4x8

4x48x48x4

4x8

8x816x8

16x8

16x8

16x8

16x16

H.263 (MPEG2)• Single reference frame

H.264• Selection of multiple

reference frames

Multiple Reference Frames

65

H.264: Baseline & Main Profiles

Motion Prediction: 7 block sizes, ¼ sample accuracy, multiple reference framesIntra Prediction: 17 modesReversible Transform & Non-uniform QuantizationUniversal & Context Adaptive VLC (UVLC/CAVLC)Loop (de-blocking) Filter

CommonFeatures

MainBaseline

Arbitrary SliceOrdering

RedundantSlices

B pictures:severalpredictionmodes

Context Adaptive Binary Arithmetic Coding (CABAC)

WeightedPrediction

AdaptiveFrame/FieldCoding

FlexibleMacroblockOrdering

For a full -- but difficult to read list ...

H.264 Profiles

YesNoNoNoNoNoNoPredictive Lossless CodingYesNoNoNoNoNoNoSeparate Color Plane CodingYesYesYesYesNoNoNoSeparate Cb and Cr QP controlYesYesYesYesNoNoNoQuantization Scaling MatricesYesYesYesYesNoNoNo8x8 vs. 4x4 Transform AdaptivityYesNoNoNoNoNoNo11 to 14 Bit Sample DepthYesYesYesNoNoNoNo9 and 10 Bit Sample DepthYesYesYesYesYesYesYes8 Bit Sample DepthYesNoNoNoNoNoNo4:4:4 Chroma FormatYesYesNoNoNoNoNo4:2:2 Chroma FormatYesYesYesYesNoNoNoMonochrome Video Format (4:0:0)YesYesYesYesYesYesYes4:2:0 Chroma FormatYesYesYesYesYesYesNoInterlaced Coding (PicAFF, MBAFF)NoNoNoNoNoYesNoData PartitioningNoNoNoNoNoYesYesRedundant Slices (RS)NoNoNoNoNoYesYesArbitrary Slice Ordering (ASO)NoNoNoNoNoYesYesFlexible Macroblock Ordering (FMO)YesYesYesYesYesNoNoCABAC Entropy CodingYesYesYesYesYesYesYesCAVLC Entropy CodingYesYesYesYesYesYesYesIn-Loop Deblocking FilterYesYesYesYesYesYesYesMultiple Reference FramesNoNoNoNoNoYesNoSI and SP SlicesYesYesYesYesYesYesNoB SlicesYesYesYesYesYesYesYesI and P Slices

High 4:4:4 PredictiveHigh 4:2:2High 10HighMainExtendedBaseline

* Wikipedia

66



VC1 Compression Standard• VC-1 is an evolution of the conventional DCT-based video codec design

also found in H.261, H.263, MPEG-1, MPEG-2, and MPEG-4.– Same quality in 1/2 the bitrate of MPEG-2 and MPEG-4

• It is widely characterized as an alternative to the latest ITU-T and MPEG video codec standard known as H.264/MPEG-4 AVC.

– Premise: Lower complexity than H.264 MP with comparable compression efficiency– Main goal of VC-1 development and standardization is to support the compression

of interlaced content without first converting it to progressive• Widely considered Microsoft’s product, there are actually 15 companies in

the VC-1 patent pool (as of August 17, 2006). • As an SMPTE standard, VC-1 is open to implementation by anyone.• Both HD DVD and Blu-ray Disc have adopted VC-1 (as well as H.264) as

a mandatory video standard, meaning their video playback devices will be capable of decoding and playing video-content compressed using VC-1.

67

VC1 Profiles

YesNoNoDisplay metadataYesNoNoGOP LayerYesNoNoField and frame coding modesYesYesNoRange adjustmentYesYesNoIntensity compensationYesYesNoB framesYesYesNoAdaptive macroblock quantisationYesYesNoDynamic resolution changeYesYesNoLoop filterYesYesNoExtended motion vectorsYesYesNoStart codesYesYesNo¼ pixel chrominance motion compensationYesYesYes¼ pixel luminance motion compensationYesYesYes4 motion vector per macroblockYesYesYesOverlapped transformYesYesYes16-bit transformYesYesYesVariable-sized transformYesYesYesBaseline intra frame compression

AdvancedMainSimple

* Wikipedia



68

Other Compression Options/Topics• Encoders vs. Decoders• JPEG2000• DV• Real Networks• DivX• On2



69

* Doom9

Comparison

Compression Ratio Rules of Thumb• Compression ratios to maintain excellent quality:

– 10:1 for general images using JPEG– 30:1 for general video using H.263 and MPEG-2– 50:1 for general video using H.264 / MPEG-4 AVC

• Emerging standards offer further improvements (e.g., JPEG2000 at 40:1 generally looks much better than JPEG at 40:1, and H.264 / MPEG-4 AVC produces excellent quality at 60:1 for some types of video content)

• Application-specific conditions can enable much higher compression ratios (e.g., 1000’s:1 when nothing is moving)

• Event-based methods that selectively identify certain important images to keep while discarding others

70

Network Bit Rate vs. Compression Ratio

Maximum theoretical frame rates for transmittinggeneric VHS-quality digital video data (352x240 frames)

using various networks and compression techniques

Network Kbps

GSM Digital Cellular 14 1 7 1 2 1 156K Modem (PSTN) 56 1 2 2 1 4 1DSL or Cable Up-Link 128 1 1 4 1 8 1Cellphone TV 300 3 1 9 1 18 1DSL Down-Link 768 8 1 23 1 45 1Wireless LAN (802.11) 11,000 109 1 326 1 652 1

Frames/Second Frames/Second Frames/SecondCompressed 30:1 Compressed 60:1Compressed 10:1Frame Rates

(e.g., JPEG) (e.g., MPEG-4) (e.g., H.264)

ImageCompression

VideoCompression

Adv. VideoCompression

Conclusions• H.264 and WMV9 offer good quality at 60:1 compression

– Represents a 2X improvement over previous generation codecs• Many factors to consider in choosing a video codec

– Latency, source content characteristics, application features (e.g., trick plays), processor performance, memory and compression requirements

• Proliferation of standards makes selecting a standard difficult– Growing number of video compression standards– Hardware decisions often made far in advance of product deployment

• Increasing demand for programmable platforms– Support for multiple standards/algorithms and formats– Quick deployment of new standards such as H.264 and VC1– S/W upgrades allow for quality improvements and changing standards– Full algorithm control

• Ability to tailor encoder for needs of the applications• Access motion vectors to determine activity

71

To Learn More About Video …“The Art of Digital Video”

John Watkinson“Digital Television”

H. Benoit“Video Demystified”

Keith Jack“Video Compression Demystified”

Peter SymesMPEG:

www.mpeg.orgJPEG/JBIG Committees:

www.jpeg.orgD SPTEXAS INSTRUMENTS

TECHNOLOGY

Credits(1)• UB Video H.264 White Papers• MPEG information website: www.mpeg.org• JPEG/JBIG Committees website: www.jpeg.org• Digital Video and HDTV

– Charles Poynton The Art of Digital Video• Video Demystified

– Keith Jack• The Art of Digital Video

– John Watkinson• Digital Television

– H. Benoit• Video Compression Demystified

– Peter Symes• UB Video / Ateme / Ingenient / MangoDSP

72

Credits (2)

Permission granted by Scott Specker

Images used by permission from Secrets of Home Theater:

http://www.hometheaterhifi.com/volume_7_4/dvd-benchmark-part-5-progressive-10-2000.html

1

Clip art & photos shipped with Microsoft Office, or downloaded from Microsoft Clip Art website

Date post:	29-Mar-2016
Category:	Documents
Upload:	code-sexy
View:	219 times
Download:	1 times

video fundamental

Documents