Download - INFORMATION THEORY & CODING 1... · 2019-09-02 · error-correcting codes 1952 D. Hu man { E cientsource encoding 1950-60's Muller, Reed, Solomon, Bose, Ray-Chaudhuri, Hocquenghem

1

INFORMATION THEORY & CODING

Dr. Rui WangAssociate ProfessorDepartment of Electrical and Electronic EngineeringOffice: Nanshan i-Park A7-1107Email: [email protected]

Thanks Prof. Qi Wang for creating these slides!

2

Textbooks

Textbooks and References

Thomas M. Cover, Joy A. Thomas, Elements of InformationTheory, 2nd Edition, Wiley-Interscience, 2006.

3

Textbooks


Thomas M. Cover, Joy A. Thomas, Elements of InformationTheory, 1nd Edition, Tsinghua Unviersity Press, 2003.

4

References


Raymond W. Yeung, A First Course in Information Theory,Springer, 2002.

F. J. MacWilliams, N.J.A. Sloane, The Theory ofError-Correcting Codes, North-Holland, 1977.

Shu Lin, D. J. Costello, Error Control Coding, 2nd Edition,Principles of Mobile Communication, 2004.

Original papers in IEEE Tractions on Information Theory

5

Assessment

• Quiz: starts from the 3rd week, almost every week.• Homework: starts from the 2nd week, every week.• Project: report + Matlab simulation.• Final Exam.

6

Policy Reminders

Academic dishonesty consists of misrepresentation by deceptionor by other fraudulent means and can result in seriousconsequences, e.g. the grade of zero on an assignment, loss ofcredit with a notation on the transcript (“Grade of F assigned foracademic dishonesty”).

7

Note to the Reader:

We have drawn on a number of sources to produce these lecturenotes.

These lecture notes are a perpetual work in progress. Pleasereport any typo or other errors by email. Thanks!

We try to prepare there lecture notes carefully, but they areNOT intended to replace the textbook.

For more information, please refer toeee.sustc.edu.cn/p/wangrui.

Office hours: drop by or appointment by email.

8

A Brief History ∗

Ludwig Boltzmann

(1844-1906)

1877 – Showed that thermodynamic entropy isrelated to the statistical distribution ofmolecular configurations, with increasingentropy corresponding to increasingrandomness.

S = kB logW

where W = N!∏

i1Ni !

.

∗ For a more complete history, one may check the “Timeline of information theory” page from

Wikipedia.

9

A Brief History

Harry Nyquist(1889-1976)

1924 – Nyquist rate and reconstruction ofbandlimited signals from their samples. Alsostated formula R = K logm, where R is the rateof transmission, K is a measure of the numberof symbols per second and m is the number ofmessage amplitudes available. Amount ofinformation that can be transmitted isproportional to the product of bandwidth andtime of transmission.

10

A Brief History

Ralph V. L. Hartley(1888-1970)

1928 – (inventor of the oscillator) – in thepaper entitled “Transmission of Information”proposed formula H = nlogs, where H is the“information” of the message, s is the numberof possible symbols, n is the length of themessage in symbols.

11 - 1

A Brief History

Claude E. Shannon(Apr. 30, 1916 – Feb. 24, 2001)

1938 – In his Master’s theis A SymbolicAnalysis of Relay and Switching Circuits atMIT, he demonstrated that electricalapplication of Boolean algebra could constructand resolve any logical, numerical relationship.

11 - 2

A Brief History


1938 – In his Master’s theis A SymbolicAnalysis of Relay and Switching Circuits atMIT, he demonstrated that electricalapplication of Boolean algebra could constructand resolve any logical, numerical relationship.

“ possibly the most important, and also themost famous, master’s thesis of the century.”

12 - 1

A Brief History


1948 – efficient source representation, reliableinformation transmission, digitalization –foundation of communication and informationtheory. Made the startling discovery thatarbitrarily reliable communications are possibleat non-zero rates. Prior to Shannon, it wasbelieved that in order to get arbitrarily lowprobability of error, the transmission rate mustgo to zero. His paper “A Mathematical Theoryof Communications” proved to be thefoundation of modern communication theory.

12 - 2

A Brief History


1948 – efficient source representation, reliableinformation transmission, digitalization –foundation of communication and informationtheory. Made the startling discovery thatarbitrarily reliable communications are possibleat non-zero rates. Prior to Shannon, it wasbelieved that in order to get arbitrarily lowprobability of error, the transmission rate mustgo to zero. His paper “A Mathematical Theoryof Communications” proved to be thefoundation of modern communication theory.

13

Quotes

“ What made possible, what induced thedevelopment of coding as a theory, and thedevelopment of very complicated codes, wasShannon’s Theorem: he told you that it couldbe done, so people tried to do it.” – RobertFano

“Before 1948, there was only the fuzziest ideaof a message was. There was some rudimentaryunderstanding of how to transmit a waveformand process a received waveform, but there wasessentially no understanding of how to turn amessage into a transmitted waveform.” –Robert Gallager

14

Quotes

“ To make the chance of error as small as youwish? Nobody had ever thought of that. Howhe got that insight, how he even came tobelieve such a thing, I don’t know. But almostall modern communication engineering is basedon that work.” – Robert Fano

15

A Brief History (cont’)

Richard W. Hamming(1915-1998)

1950 R. Hamming – Developed a family oferror-correcting codes

1952 D. Huffman – Efficient source encoding

1950-60’s Muller, Reed, Solomon, Bose,Ray-Chaudhuri, Hocquenghem – AlgebraicCodes

1970’s Fano, Viterbi – Convolutional Codes

1990’s Berrou, Glavieux, Gallager, Lin – Nearcapacity achieving coding schemes: TurboCodes, Low-Density Parity Check Codes

2008 E. Arikan – First practical construction of codesachieving capacity for a wide array of channels: Polar Codes

16 - 1

An example

Mars, Mariner IV, ’64 usingno coding

16 - 2

An example


Mars, Mariner VI, ’69 usingReed-Muller coding

16 - 3

An example


Mars, Mariner VI, ’69 usingReed-Muller coding

Saturn, Voyager, ’71 usingGolay coding

17 - 1

A Communication System

17 - 2


Info. Source: any source of data we wish to transmit or store

Transmitter: mapping data source to the channel alphabet in anefficient manner

Receiver: mapping from channel to data to ensure “reliable”reception

Destination: data sink

18


Question: Under what conditions can the output of the source beconveyed reliably to the destination? What is reliable? Low prob.of error? Low distortion?

19

An Expanded Communication System

What is the ultimate data compression (answer: the entropy H)?What is the ultimate transmission rate of communication(answer: channel capacity C )?

20 - 1

Encoders

Source Encoder

map from source to bits

“matched” to the information source

Goal: to get an efficient representation of the source (i.e., leastnumber of bits per second, minimum distortion, etc.)

20 - 2

Encoders

Source Encoder

map from source to bits

“matched” to the information source

Goal: to get an efficient representation of the source (i.e., leastnumber of bits per second, minimum distortion, etc.)

Channel Encoder

map from bits to channel

depends on channel available (channel model, bandwidth, noise,distortion, etc.) In communication theory, we work withhypothetical channels which in some way capture the essentialfeatures of the physical world.

Goal: to get reliable communication

21 - 1

Source Encoder: Examples

Goal: To get an efficient representation (i.e., small number ofbits) of the source on average.

21 - 2



Example 1: An urn contains 8 numbered balls. One ball isselected. How many binary symbols are required to represent theoutcome?

21 - 3



Example 1: An urn contains 8 numbered balls. One ball isselected. How many binary symbols are required to represent theoutcome?

Answer: Require 3 bits to represent any given outcome.

22 - 1


Example 2: Consider a horse race with 8 horses. It wasdetermined that the probability of horse i winning is

Pr[horse i wins] =(12 ,

14 ,

18 ,

116 ,

164 ,

164 ,

164 ,

164

)

22 - 2




14 ,

18 ,

116 ,

164 ,

164 ,

164 ,

164

)Answer 1: Let’s try the code of the previous example.

22 - 3




14 ,

18 ,

116 ,

164 ,

164 ,

164 ,

164

)Answer 1: Let’s try the code of the previous example.

To represent a givenoutcome, the averagenumber of bits is¯̀ = 3.

23 - 1




14 ,

18 ,

116 ,

164 ,

164 ,

164 ,

164

)Answer 2: What if we allow the length of each representation tovary amongst the outcomes, e.g., a Huffman code:

23 - 2




14 ,

18 ,

116 ,

164 ,

164 ,

164 ,

164

)Answer 2: What if we allow the length of each representation tovary amongst the outcomes, e.g., a Huffman code:

The average numberof bits is

¯̀ = 12 + 1

4 ·2+ 18 ·3+ 1

16 ·4+ 464 ·6

= 2

24


Definition: The source entropy, H(X ) of a random variable Xwith a probability mass function p(x), is defined as

H(X ) =∑x

p(x) log2

1

p(x)

As we will show later in the course, the most efficient representationhas average codeword length ¯̀ as

H(X ) ≤ ¯̀< H(X ) + 1


14 ,

18 ,

116 ,

164 ,

164 ,

164 ,

164

)H(X ) = 1

2 log 2 + 14 log 4 + 1

8 log 8 + 116 log 16 + 4

64 log 64 = 2

The Huffman code is optimal!

25 - 1


Information theory and coding deal with the “typical’’or expected behavior of the source.

Entropy is a measure of the average uncertaintyassociated with the source.

25 - 2


Information theory and coding deal with the “typical’’or expected behavior of the source.

Entropy is a measure of the average uncertaintyassociated with the source.

typical set

Asymptotic Equipartition Property(AEP)

Law of Large Numbers

26 - 1

Channel Encoder

Goal: To achieve an ecomonical (high rate) and reliable(low probability of error) transmission of bits over achannel.

With a channel code we add redundancy to thetransmitted data sequence which allows for thecorrection of errors that are introduced by the channel.

26 - 2

Channel Encoder

Goal: To achieve an ecomonical (high rate) and reliable(low probability of error) transmission of bits over achannel.

With a channel code we add redundancy to thetransmitted data sequence which allows for thecorrection of errors that are introduced by the channel.

Example:

27

Channel Encoder

Each transmitted codeword iscorrupted by the channel. Eachcodeword corresponds to a setof possible received vectors.

Specify a set of codewords sothat at the receiver it is possibleto distinguish which elementwas sent with high-probability.

The channel coding theorem tells us the maximum number ofsuch codewords we can define and still maintain completelydistinguishable outputs.

28

Channel Encoder

Shannon’s Channel Coding Theorem There is a quantitycalled the capacity , C , of a channel such that for every rateR < C there exists a sequence of( 2nR︸︷︷︸#codewords

, n︸︷︷︸# chan. uses

) codes such that Pr[error ]→ 0

as n→∞. Conversely, for any code, if Pr[error ]→ 0 asn→∞ then R ≤ C .

29

Example: Binary Symmetric Channel

Assume independent channel uses (i.e., memoryless)

Channel randomly filps the bit with probability p

For p = 0 or p = 1, C = 1 bits/channel use (noiseless channel orinversion channel)

Input channel alphabet = Output channel alphabet = {0, 1}

Worst case: p = 1/2, in which case the input and the output arestatistically independent ( C = 0 )

Question: How do we devise codes which perform well on thischannel?

30

Repetition Code

In this code, we repeat one bit odd times. The code consists oftwo possible codewords:

C = {000 · · · 0, 111 · · · 1}

Decoding by a majority voting scheme: if there are more 0’s than1’s then declare 0, otherwise 1.

Suppose that R = 1/3, i.e., the source output can be encodedbefore transmission by repeating each bit three times.

Example:

10100 111000111000000

10101111100110011100

31

Repetition Code

Example:

10100 111000111000000

10101111100110011100

The bit error probability Pre is:

Pre = Pr[2 channel errors] + Pr[3 channel errors]

= 3p2(1− p) + p3

= 3p2 − 2p3.

If p ≤ 1/2, Pre is less than p. So, the repetition code improvesthe channel’s reliability. And for small p, the improvement isdramatic.

32

Repetition Code

For R = 1/3, the bit error probability Pre is:

Pre = 3p2 − 2p3.

For R = 1/(2m + 1), the bit error probability Pre is:

Pre =2m+1∑k=m+1

Pr[k errors out of 2m + 1 transmitted bits]

=2m+1∑k=m+1

(2m + 1

k

)pk(1− p)2m+1−k

=

(2m + 1

m + 1

)pm+1 + terms of higher degree in p.

Thus, Pre → 0 as m→∞. However, R → 0! Repetition code isNOT efficient! Shannon demonstrated that there exist codeswhich are capacity achieving at non-zero rates.

33

Hamming Code

x1

x3

x4x2

p1 = x1 + x2 + x4

p2 = x1 + x3 + x4

p3 = x2 + x3 + x4

(x1x2x3x4p1p2p3)

The (7, 4) Hamming code can correct 1 bit error with RateR = 4/7. This code is much better than repetition code.

Hamming codes can be computed in linear algebra throughmatrices. This will be explained later in this course.

34

Review of Probability Theory

Discrete Random Variables

A discrete random variable is used to model a “random experiment”with a finite or countable number of possible outcomes. Forexample, the toss of a coin, the roll of a die, or the count of thenumber of telephone calls during a given time, etc.

The sample space S, of the experiment is the set of all possibleoutcomes and contains a finite or countable number of elements.Let S = {ζ1, ζ2, · · ·}.

An event is a subset of S. Events consisting a single outcome arecalled elementary events.

35 - 1



Let X be a random variable with sample space SX . A probabilitymass function (pmf) for X is a mapping pX : SX → [0, 1] from SXto the closed unit interval [0, 1] satisfying∑

x∈SX

pX (x) = 1,

where the number pX (x) is the probability that the outcome of thegiven random experiment is x , i.e., pX (x) = Pr[X = x ].

35 - 2



Let X be a random variable with sample space SX . A probabilitymass function (pmf) for X is a mapping pX : SX → [0, 1] from SXto the closed unit interval [0, 1] satisfying∑

x∈SX

pX (x) = 1,

where the number pX (x) is the probability that the outcome of thegiven random experiment is x , i.e., pX (x) = Pr[X = x ].

Every event A ∈ S has a probability p(A) ∈ [0, 1] satisfying thefollowing:

1. p(A) ≥ 0

2. p(S) = 1

3. for A,B ∈ S, p(A ∪ B) = p(A) + p(B) if A ∩ B = ∅

36 - 1



Example: A fair coin is tossed N times, and A is the event that aneven number of heads occurs. What is Pr[A]?

36 - 2



Example: A fair coin is tossed N times, and A is the event that aneven number of heads occurs. What is Pr[A]?

Pr[A] =N∑

k=0,k evenPr[ exactly k heads occurs]

=N∑

k=0,k even

(N

k

)(1

2

)k (1

2

)N−k

=1

2N

N∑k=0,k even

(N

k

)=

1

2.

37 - 1


Vector Random Variables

If the elements of SX are vectors of real numbers, then X is a (real)vector random variable.

Suppose Z is a vector random variable with a sample space in whicheach elements has two components (X ,Y ), i.e.,SZ = {z1, z2, · · ·} = {(x1, y1), (x2, y2), · · ·}.

The projection of SZ on its first coordinate is

SX = {x : for some y , (x , y) ∈ SZ}.

37 - 2



If the elements of SX are vectors of real numbers, then X is a (real)vector random variable.

Suppose Z is a vector random variable with a sample space in whicheach elements has two components (X ,Y ), i.e.,SZ = {z1, z2, · · ·} = {(x1, y1), (x2, y2), · · ·}.

The projection of SZ on its first coordinate is

SX = {x : for some y , (x , y) ∈ SZ}.

Example: If Z = (X ,Y ) and SZ = {(0, 0), (1, 0), (1, 1)}, thenSX = SY = {0, 1}.

38 - 1



The pmf of a vector random variable Z = (X ,Y ) is also called thejoint pmf of X and Y , and is denoted by

pZ (x , y) = pX ,Y (x , y) = Pr(X = x ,Y = y),

where the comma in the last equation denotes a logical ’AND’operation.

38 - 2



The pmf of a vector random variable Z = (X ,Y ) is also called thejoint pmf of X and Y , and is denoted by

pZ (x , y) = pX ,Y (x , y) = Pr(X = x ,Y = y),

where the comma in the last equation denotes a logical ’AND’operation.

From pX ,Y (x , y), we can find pX (x) as

pX (x) ≡ p(x) =∑y∈SY

pX ,Y (x , y);

and similarly,

pY (y) ≡ p(y) =∑x∈SX

pX ,Y (x , y).

39 - 1

Conditional Probability

Let A and B be events, with Pr[A] > 0. The conditionalprobability of B given that A occured is

Pr[B|A] =Pr[A ∩ B]

Pr[A].

39 - 2




Pr[A].

Thus, Pr[A|A] = 1, and Pr[B|A] = 0 if A ∩ B = ∅.

39 - 3




Pr[A].

Thus, Pr[A|A] = 1, and Pr[B|A] = 0 if A ∩ B = ∅.

If Z = (X ,Y ) and pX (xk) > 0, then

pY |X (yj |xk) = Pr[Y = yj |X = xk ]

=Pr[X = xk ,Y = yj ]

Pr[X = xk ]

=pX ,Y (xk , yj)

pX (xk).

40 - 1



pY |X (yj |xk) =pX ,Y (xk , yj)

pX (xk).

The random variables X and Y are independent if

∀(x , y) ∈ SX ,Y (pX ,Y (x , y) = pX (x)pY (y)).

40 - 2



pY |X (yj |xk) =pX ,Y (xk , yj)

pX (xk).

The random variables X and Y are independent if

∀(x , y) ∈ SX ,Y (pX ,Y (x , y) = pX (x)pY (y)).

If X and Y are independent, then

pX |Y (x |y) =pX ,Y (x , y)

pY (y)=

pX (x)pY (y)

pY (y)= pX (x),

and

pY |X (y |x) =pX ,Y (x , y)

pX (x)=

pX (x)pY (y)

pX (x)= pY (y).

41 - 1

Expected Value

If X ia a random variable, the expected value (or mean) of X ,denoted by E [X ], is

E [X ] =∑x∈SX

xpX (x).

The expected value of the random variable f (X ) is

E [f (X )] =∑x∈SX

f (x)pX (x).

41 - 2

Expected Value

If X ia a random variable, the expected value (or mean) of X ,denoted by E [X ], is

E [X ] =∑x∈SX

xpX (x).

The expected value of the random variable f (X ) is

E [f (X )] =∑x∈SX

f (x)pX (x).

In particular, E [X n] is the n-th moment of X . The variance of X isthe second moment of X − E [X ], can be computed as

VAR[X ] = E [X 2]− E [X ]2.