Compressionbasics

Compression Compression FundamentalsFundamentals

Topics Topics today…today…

Why Compression ?Why Compression ? Information Theory BasicsInformation Theory Basics Classification of Compression Classification of Compression

AlgorithmsAlgorithms Data Compression ModelData Compression Model Compression PerformanceCompression Performance

Why Compression?Why Compression?

• Digital representation of analog signals Digital representation of analog signals requires huge storagerequires huge storage– High quality audio signal require 1.5 megabits/secHigh quality audio signal require 1.5 megabits/sec– A low resolution movie (30 frames per second, 640 A low resolution movie (30 frames per second, 640

x 580 pixels per frame, 24 bits per pixel) requiresx 580 pixels per frame, 24 bits per pixel) requires• 210 megabits per minute!!210 megabits per minute!!• 95 gigabytes per hour 95 gigabytes per hour

• It is challenging transferring such files It is challenging transferring such files through the available limited bandwidth through the available limited bandwidth network.network.

Why Compression?Why Compression?

SourceSource Bit Rates for uncompressed Sources (Approximate)Bit Rates for uncompressed Sources (Approximate)TelephonyTelephony

(200-3400 (200-3400 Hz)Hz)

8000 samples/second x 12 bits/sample = 96 kbps8000 samples/second x 12 bits/sample = 96 kbps

Wideband Wideband AudioAudio

(20-20000 (20-20000 Hz)Hz)

44100 samples/second x 2 channels x 16 bits/sample= 44100 samples/second x 2 channels x 16 bits/sample= 1.412Mbps1.412Mbps

ImagesImages 512x512 pixel color image x 24 bits/pixel = 6.3Mbits/image512x512 pixel color image x 24 bits/pixel = 6.3Mbits/image

VideoVideo 640x480 pixel color image x 24 bits/pixel x 30 640x480 pixel color image x 24 bits/pixel x 30 images/second=221 Mbpsimages/second=221 Mbps

650 megabyte CD can store 23.5 mins of 650 megabyte CD can store 23.5 mins of video ?video ?

HDTVHDTV 1280x720 pixel color image x 60 images/second x 24 1280x720 pixel color image x 60 images/second x 24 bits/pixel=1.3Gbpsbits/pixel=1.3Gbps

Table –1 : Uncompressed source data rates

The compression The compression problemproblem

• Efficient digital representation of a sourceEfficient digital representation of a source

• Data compression is the representation Data compression is the representation of the source in digital form with as of the source in digital form with as few bits as possible while maintaining few bits as possible while maintaining an acceptable loss in fidelity. Source an acceptable loss in fidelity. Source can be data, still images, speech, can be data, still images, speech, audio, video or whatever signal needs audio, video or whatever signal needs to be stored & transmittedto be stored & transmitted..

Synonyms for Data Synonyms for Data CompressionCompression

• Signal compression & signal codingSignal compression & signal coding• Source coding & source coding with Source coding & source coding with

fidelity criterion (in information theory)fidelity criterion (in information theory)• Noiseless & Noisy Source coding (lossless Noiseless & Noisy Source coding (lossless

& lossy compression)& lossy compression)– Noise Noise Reconstruction noise Reconstruction noise

• Bandwidth compression, redundancy Bandwidth compression, redundancy removal (more dated terminologies, in removal (more dated terminologies, in 80’s.)80’s.)

Types of Data Compression Types of Data Compression ProblemProblem

• Distortion-rate ProblemDistortion-rate Problem– Given the constraint on transmitted data rate Given the constraint on transmitted data rate

or storage capacity, problem is to compress or storage capacity, problem is to compress the source at or below this rate but at the the source at or below this rate but at the highest fidelity possiblehighest fidelity possible

– ExEx. Voice mail, video conferencing, digital . Voice mail, video conferencing, digital cellularcellular

• Rate-distortion ProblemRate-distortion Problem– Given the constraint on the fidelity, problem is Given the constraint on the fidelity, problem is

to achieve it with as few bits as possibleto achieve it with as few bits as possible– ExEx. CD-Quality audio. CD-Quality audio

Information Theory BasicsInformation Theory Basics

• Representation of data is the Representation of data is the combination of information and combination of information and redundancyredundancy

• Data compression is essentially a Data compression is essentially a redundancy reduction techniqueredundancy reduction technique

• Data compression scheme can be Data compression scheme can be broadly divided into two phasesbroadly divided into two phases– ModellingModelling– CodingCoding

Information Theory BasicsInformation Theory Basics

• In In Modeling phaseModeling phase information about information about redundancy is analyzed & represented as a redundancy is analyzed & represented as a modelmodel– This can be done via observing the empirical This can be done via observing the empirical

distribution of the symbols the source distribution of the symbols the source generatesgenerates

• In the In the coding phasecoding phase the difference between the difference between the actual data and the model is codedthe actual data and the model is coded

Discrete Memoryless ModelDiscrete Memoryless Model

• Source is discrete memoryless if it generates symbol Source is discrete memoryless if it generates symbol that is statistically independent of one anotherthat is statistically independent of one another

• Described by the source alphabet A={aDescribed by the source alphabet A={a11,a,a22,a,a33…a…ann} } and associated probabilities P=(p(aand associated probabilities P=(p(a11), p(a), p(a22), p(a), p(a33),…. ),…. p(ap(ann))))

• The The amount of information content for a source amount of information content for a source symbolsymbol I(a I(aii) is) is

• The base 2 logarithm indicates the information The base 2 logarithm indicates the information content is represented in bits. Higher probability content is represented in bits. Higher probability symbols are coded with less bits.symbols are coded with less bits.

Discrete Memoryless Model Discrete Memoryless Model [2][2]

• Averaging the information content over all Averaging the information content over all symbols, we get the symbols, we get the entropyentropy E as follows E as follows

• Hence, entropy is the expected length of a Hence, entropy is the expected length of a binary code over all the symbols.binary code over all the symbols.

• Estimation of entropy depends on the Estimation of entropy depends on the observation & assumption on the structure observation & assumption on the structure of source symbolsof source symbols

Noiseless source coding Noiseless source coding theoremtheorem

• The The Noiseless Source Coding Noiseless Source Coding TheoremTheorem states that any source can states that any source can be losslessly encoded with a code be losslessly encoded with a code whose average number of bits per whose average number of bits per source symbol is arbitrarily close to, source symbol is arbitrarily close to, but not less than, the source entropy but not less than, the source entropy EE in bits by coding infinitely long in bits by coding infinitely long extensions of the source. extensions of the source.

Entropy ReductionEntropy Reduction

• Consider a discrete memoryless source, Consider a discrete memoryless source, with source alphabet A1 = {α, β, γ, δ} & with source alphabet A1 = {α, β, γ, δ} & probability probability pp(α) = 0.65, (α) = 0.65, pp(β) = 0.20, (β) = 0.20, pp(γ) = (γ) = 0.10, 0.10, pp(δ) = 0.05 respectively (δ) = 0.05 respectively

• The entropy of this source is The entropy of this source is

EE = −(0.65 log2 0.65 + 0.20 log2 0.20 + = −(0.65 log2 0.65 + 0.20 log2 0.20 + 0.10 log2 0.10 + 0.05 log2 0.05) 0.10 log2 0.10 + 0.05 log2 0.05)

= 1.42 bits per symbol= 1.42 bits per symbol• A data source of 2000 symbols can be A data source of 2000 symbols can be

represented using 2000 x 1.42 = 2840 bitsrepresented using 2000 x 1.42 = 2840 bits

Entropy Reduction Entropy Reduction [2][2]

• Now assume we know something about Now assume we know something about the structure of the sequencethe structure of the sequence– Alphabet A2 = {0, 1, 2, 3} Alphabet A2 = {0, 1, 2, 3} – Sequence Sequence DD = 0 1 1 2 3 3 3 3 3 3 3 3 3 2 2 2 3 = 0 1 1 2 3 3 3 3 3 3 3 3 3 2 2 2 3

3 3 3 3 3 3 – pp(0) = 0.05, (0) = 0.05, pp(1) = 0.10, (1) = 0.10, pp(2) = 0.20, and (2) = 0.20, and pp(3) (3)

= 0.65 = 0.65 – EE = 1.42 bits per symbol = 1.42 bits per symbol

• Assume the correlation between Assume the correlation between consecutive bits and we attempt to reduce consecutive bits and we attempt to reduce it by it by rrii = = ssii − − ssii−1−1 for each sample for each sample ssii

Entropy Reduction Entropy Reduction [3][3]

• NowNow– DD = 0 1 0 1 1 0 0 0 0 0 0 0 0 −1 0 0 1 0 0 0 = 0 1 0 1 1 0 0 0 0 0 0 0 0 −1 0 0 1 0 0 0 – A2 = {−1, 1, 0}A2 = {−1, 1, 0}– PP(−1) = 0.05, (−1) = 0.05, pp(1) = 0.2, and (1) = 0.2, and pp(0) = 0.75 (0) = 0.75 – EE = 0.992 = 0.992

• If used appropriate entropy coding If used appropriate entropy coding technique maximum compression can technique maximum compression can be achievedbe achieved

Unique DecipherabilityUnique Decipherability

• Consider the following tableConsider the following table

• Symbols are encoded with codes A, B Symbols are encoded with codes A, B and C.and C.

• Consider the string Consider the string SS = ααγαβαδ = ααγαβαδ

Unique Decipherability Unique Decipherability [2][2]

• Deciphering CDeciphering CAA(S) and C(S) and CBB(S) are (S) are unambiguous and we get the string Sunambiguous and we get the string S

• CCCC(S) is ambiguous and not uniquely (S) is ambiguous and not uniquely decipherabledecipherable

• Fixed length codes are always Fixed length codes are always uniquely decipherable.uniquely decipherable.

• Not all variable length codes are Not all variable length codes are uniquely decipherable.uniquely decipherable.

Unique Decipherability Unique Decipherability [3][3]

• Uniquely decipherable codes maintain Uniquely decipherable codes maintain prefix prefix propertyproperty, ie , ie no codeword in the code-set no codeword in the code-set forms the prefix of another distinct codeword forms the prefix of another distinct codeword

• Popular variable-length coding techniques Popular variable-length coding techniques – Shannon-Fano CodingShannon-Fano Coding– Huffman CodingHuffman Coding– Elias CodingElias Coding– Arithmetic CodingArithmetic Coding

• Fixed-length codes can be treated as a special Fixed-length codes can be treated as a special case of uniquely decipherable case of uniquely decipherable variable-lengthvariable-length code. code.

Classification of compression Classification of compression algorithmsalgorithms

CODEC


[2][2]

• Data compressionData compression as a method that takes as a method that takes an input data an input data DD and generates a shorter and generates a shorter representation of the data representation of the data cc((DD) with a ) with a fewer number of bits compared to that of fewer number of bits compared to that of DD

• The reverse process is called The reverse process is called decompression, decompression, which takes the which takes the compressed data c(D) and generates or compressed data c(D) and generates or reconstructs the data D′ reconstructs the data D′

• Sometimes the Sometimes the compressioncompression (coding) and (coding) and decompressiondecompression (decoding) systems (decoding) systems together are called a "CODEC," together are called a "CODEC,"


[3][3]

• If the reconstructed data If the reconstructed data DD′ is an exact ′ is an exact replica of the original data replica of the original data D,D, we call the we call the algorithm applied to compress algorithm applied to compress DD and and decompress decompress cc((DD) to be ) to be losslesslossless. . Otherwise Otherwise the algorithms are the algorithms are lossylossy

• Text, scientific data, medical images are Text, scientific data, medical images are some of the applications requires lossless some of the applications requires lossless compressioncompression

• Compression can be Compression can be staticstatic or or dynamicdynamic, , depends on the coding scheme useddepends on the coding scheme used

Data compression modelData compression model

A data A data compression compression system mainly system mainly consists of three consists of three major stepsmajor steps– removal or removal or

reduction in data reduction in data redundancyredundancy

– reduction in reduction in entropyentropy

– entropy entropy encodingencoding

Data compression modelData compression modelREDUCTION IN DATA REDUNDANCYREDUCTION IN DATA REDUNDANCY

• Removal or reduction in data Removal or reduction in data redundancy is typically achieved by redundancy is typically achieved by transforming the original data from one transforming the original data from one form or representation to anotherform or representation to another

• Popular transformation techniques are Popular transformation techniques are – Discrete Cosine Transform (DCT)Discrete Cosine Transform (DCT)– Discrete Wavelet Transformation (DWT) etcDiscrete Wavelet Transformation (DWT) etcThis step leads to the reduction of entropyThis step leads to the reduction of entropy

• For Lossless compression this For Lossless compression this transformation is completely reversibletransformation is completely reversible

Data compression modelData compression model REDUCTION IN ENTROPYREDUCTION IN ENTROPY

• Non reversible processNon reversible process• Achieved by dropping insignificant Achieved by dropping insignificant

information in the transformed data information in the transformed data ((Lossy!!!Lossy!!!))

• Done by some Done by some quantizationquantization techniques techniques• Amount of quantization dictate the quality Amount of quantization dictate the quality

of the reconstructed dataof the reconstructed data• Entropy of the quantized data is less Entropy of the quantized data is less

compared to the original one, hence more compared to the original one, hence more compression. compression.

Compression PerformanceCompression Performance

• The performance measures of data The performance measures of data compression algorithms can be compression algorithms can be looked at from different perspectives looked at from different perspectives depending on the application depending on the application requirementsrequirements– amount of compression achievedamount of compression achieved– objective and subjective quality of the objective and subjective quality of the

reconstructed datareconstructed data– relative complexity of the algorithmrelative complexity of the algorithm– speed of execution, etc. speed of execution, etc.

Compression Performance Compression Performance AMOUNT OF COMPRESSION ACHIEVEDAMOUNT OF COMPRESSION ACHIEVED

• Compression ratioCompression ratio, the ratio of the , the ratio of the number of bits to represent the number of bits to represent the original data to the number of bits to original data to the number of bits to represent the compressed datarepresent the compressed data

• Achievable compression ratio using a Achievable compression ratio using a lossless compression scheme is lossless compression scheme is totally input data dependent. totally input data dependent.

• Sources with less redundancy have Sources with less redundancy have more entropy and hence are more more entropy and hence are more difficult to achieve compressiondifficult to achieve compression

Compression Performance Compression Performance SUBJECTIVE QUALITY METRICSUBJECTIVE QUALITY METRIC

• MOSMOS: : mmean ean oobservers bservers sscorecore or or mmean ean oopinion pinion sscorecore is a common measureis a common measure– A statistically significant number of observers are A statistically significant number of observers are

randomly chosen to evaluate visual quality of the randomly chosen to evaluate visual quality of the reconstructed images. reconstructed images.

– Each observer assigns a numeric score to each Each observer assigns a numeric score to each reconstructed image based on his or her perception reconstructed image based on his or her perception of quality of the image, say within a range 1–5 to of quality of the image, say within a range 1–5 to describe the quality of the image—5 being the describe the quality of the image—5 being the highest quality and 1 being the worst quality. highest quality and 1 being the worst quality.

– MOS is the average of these scoresMOS is the average of these scores

Compression Performance Compression Performance OBJECTIVE QUALITY METRICOBJECTIVE QUALITY METRIC

• Common quality metrics are Common quality metrics are – root-mean-squared error (root-mean-squared error (RMSERMSE))– signal-to-noise ratio (signal-to-noise ratio (SNRSNR))– peak signal-to-noise ratio (peak signal-to-noise ratio (PSNRPSNR). ).

• If If II is an is an MM × × NN image and image and II is is the corresponding reconstructed the corresponding reconstructed image after compression and image after compression and decompression, decompression, RMSERMSE is is calculated by calculated by

• The The SNRSNR in decibel unit (dB) is in decibel unit (dB) is expressed asexpressed as

Compression Performance Compression Performance CODING DELAY AND COMPLEXITYCODING DELAY AND COMPLEXITY

• Coding delayCoding delay, a performance measure for , a performance measure for compression algorithms where interactive compression algorithms where interactive encoding and decoding is the requirement encoding and decoding is the requirement (e.g., interactive video teleconferencing, on-(e.g., interactive video teleconferencing, on-line image browsing, real-time voice line image browsing, real-time voice communication, etc.)communication, etc.)

• The complex the compression algorithm The complex the compression algorithm Increased coding delayIncreased coding delay

• Compression system designer often use a Compression system designer often use a less sophisticated algorithm for the less sophisticated algorithm for the compression system. compression system.

Compression Performance Compression Performance CODING DELAY AND COMPLEXITYCODING DELAY AND COMPLEXITY

• Coding complexityCoding complexity, a performance , a performance measure considered where the measure considered where the computational requirement to computational requirement to implement the codec is an important implement the codec is an important criteriacriteria

• MOPSMOPS (millions of operations per (millions of operations per second), second), MIPSMIPS (millions of instructions (millions of instructions per second) are often used to measure per second) are often used to measure the compression performance in a the compression performance in a specific computing engine's specific computing engine's architecture. architecture.

Reference

1. Chapter 1 of JPEG2000 Standard for Image Compression: Concepts, Algorithms and VLSI Architectures by Tinku Acharya and Ping-Sing Tsai , John Wiley & Sons

http://discovery.bits-pilani.ac.in/discipline/csis/vimal/course%2006-07%20Second/MMC/Lectures/cf.doc

2. Chapter 1 of Digital Compression for Multimedia: Principles & Standards by Jerry D.Gibson

Date post:	17-May-2015
Category:	Technology
Upload:	rohini-r-iyer
View:	465 times
Download:	0 times

Compressionbasics

Technology