2_BasicInformationTheory

7/30/2019 2_BasicInformationTheory

1/31

Lecture 2:

Basic Information Theory

TSBK01 Image Coding and Data Compression

Jrgen Ahlberg

Div. of Sensor Technology

Swedish Defence Research Agency (FOI)


2/31

Today

1. What is information theory about?

2. Stochastic (information) sources.

3. Information and entropy.4. Entropy for stochastic sources.

5. The source coding theorem.


3/31

Part 1: Information Theory

Claude Shannon: A Mathematical Theory of Communication

The

Bell System Technical Journal, 1948

Sometimes referred to as Shannon-Weaver, since

the standalone pub

lication has a foreword by Weaver.Be careful!


4/31

Quotes about Shannon

What is information? Sidestepping questions aboutmeaning, Shannon showed that it is a measurablecommodity.

Today, Shannons insight help shape virtually all systemsthat store, process, or transmit information in digital form,from compact discs to computers, from facsimile machinesto deep space probes.

Information theory has also infilitrated fields outsidecommunications, including linguistics, psychology,economics, biology, even the arts.


5/31


6/31


7/31


8/31

Part 2: Stochastic sources

A source outputs symbolsX1,X2, ...

Each symbol take its value from an alphabetA

= (a1, a2, ). Model:P(X1,,XN) assumed to be known for

all combinations.

Source X1,X2,

Example 1: A textis a sequence of symbols

each taking its value from the alphabetA= (a, , z, A, , Z, 1, 2, 9, !, ?, ).

Example 2: A (digitized) grayscale image is a

sequence of symbols each taking its valuefrom the alphabet A= (0,1) orA= (0, , 255).


9/31

Two Special Cases

1. The Memoryless Source

Each symbol independent of the previous

ones.

P(X1,X2, ,Xn) =P(X1) P(X2) P(Xn)

2. The Markov Source

Each symbol depends on the previous one.

P(X1,X2, ,Xn)=P(X1) P(X2|X1) P(X3|X2)

P(Xn|Xn-1)


10/31

The Markov Source

A symbol depends only on the previous

symbol, so the source can be modelled by a

state diagram.

a

b

c

1.00.5

0.7

0.3

0.3

0.2

A ternary source withalphabet A= (a, b, c).


11/31

The Markov Source

Assume we are in state a, i.e.,Xk= a.

The probabilities for the next symbol are:

a

b

c

1.00.5

0.7

0.3

0.3

0.2

P(Xk+1 = a | Xk= a) = 0.3

P(Xk+1 = b | Xk= a) = 0.7

P(Xk+1 = c | Xk= a) = 0


12/31

The Markov Source

So, ifXk+1 = b, we know thatXk+2 will

equal c.

a

b

c

1.00.5

0.7

0.3

0.3

0.2

P(Xk+2 = a | Xk+1 = b) = 0

P(Xk+2 = b | Xk+1 = b) = 0

P(Xk+2 = c | Xk+1 = b) = 1


13/31


14/31

Analysis and Synthesis

Stochastic models can be used foranalysinga source.

Find a model that well represents the real-world

source, and then analyse the model instead ofthe real world.

Stochastic models can be used forsynthesizinga source.

Use a random number generator in each step ofa Markov model to generate a sequencesimulating the source.


15/31Show plastic slides!


16/31

Part 3: Information and Entropy

Assume a binary memoryless source, e.g., a flip ofa coin. How much information do we receive whenwe are told that the outcome is heads?

If its a fair coin, i.e.,P(heads) =P(tails) = 0.5, we saythat the amount of information is 1 bit.

If we already know that it will be (or was) heads, i.e.,P(heads) = 1, the amount of information is zero!

If the coin is not fair, e.g.,P(heads) = 0.9, the amount of

information is more than zero but less than one bit! Intuitively, the amount of information received is the

same ifP(heads) = 0.9 orP(heads) = 0.1.


17/31

Self Information

So, lets look at it the way Shannon did.

Assume a memoryless source with

alphabet A= (a1, , an) symbol probabilities (p1, , pn).

How much information do we get when

finding out that the next symbol is ai? According to Shannon the self information of

ai is


18/31

Why?

Assume two independent eventsA andB, withprobabilities P(A) = pA andP(B) = pB.

For both the events to happen, the probability isp

Ap

B. However, the amount of information

should be added, not multiplied.

Logarithms satisfy this!

No, we want the information to increase with

decreasing probabilities, so lets use the negative

logarithm.


19/31

Self Information

Example 1:

Example 2:

Which logarithm? Pick the one you like! If you pick the natural log,

youll measure in nats, if you pick the 10-log, youll get Hartleys,

if you pick the 2-log (like everyone else), youll get bits.


20/31


21/31

Entropy

Example:Binary Memoryless Source

BMS 0 1 1 0 1 0 0 0

1

0 0.5 1

The uncertainty (information) is greatest when

Then

Let


22/31

Entropy: Three properties

1. It can be shown that 0 Hlog N.

2. Maximum entropy (H = log N) is reached

when all symbols are equiprobable, i.e.,pi = 1/N.

3. The difference log NHis called the

redundancyof the source.


23/31

Part 4: Entropy for Memory Sources

Assume a block of source symbols (X1, ,

Xn) and define the block entropy:

The entropy for a memory source is defined

as:

That is, the summation is done over all possible combinations ofn symbols.

That is, let the block length go towards infintity.

Divide by n to get the number ofbits / symbol.


24/31

Entropy for a Markov Source

The entropy for a state Sk can be expressed as

Averaging over all states, we get the

entropy for the Markov source as

Pkl is the transition probability from state kto state l.


25/31

The Run-length Source

Certain sources generate long runs orbursts ofequal symbols.

Example:

Probability for a burst of length r:P(r) = (1-)r-1

Entropy:HR = - r=11P(r) logP(r)

If the average run length is , thenHR/ =HM.

A B

1-1-


26/31

Part 5: The Source Coding Theorem

The entropy is the smallest number of bitsallowing error-free representation of the source.

Why is this? Lets take a look on typical sequences!


27/31

Typical Sequences

Assume a longsequence from a binarymemoryless source withP(1) =p.

Among n bits, there will be approximately

w = n

pones. Thus, there isM= (n overw) such typical

sequences!

Only these sequences are interesting. Allother sequences will appear with smallerprobability the larger is n.


28/31

How many are the typical

sequences?

bits/symbol

Enumeration needs logMbits, i.e,

bits per symbol!


29/31

How many bits do we need?

Thus, we needH(X) bits per symbol

to code any typical sequence!


30/31

The Source Coding Theorem

Does tell us

that we can represent the output from a source

XusingH(X) bits/symbol.

that we cannot do better.

Does not tell us

how to do it.


31/31

Summary

The mathematical model of communication.

Source, source coder, channel coder, channel,

Rate, entropy, channel capacity.

Information theoretical entities Information, self-information, uncertainty, entropy.

Sources

BMS, Markov, RL The Source Coding Theorem

Date post:	14-Apr-2018
Category:	Documents
Upload:	khandai-seenanan
View:	214 times
Download:	0 times

2_BasicInformationTheory

Documents