Date post: | 14-Apr-2018 |
Category: |
Documents |
Upload: | khandai-seenanan |
View: | 214 times |
Download: | 0 times |
of 31
7/30/2019 2_BasicInformationTheory
1/31
Lecture 2:
Basic Information Theory
TSBK01 Image Coding and Data Compression
Jrgen Ahlberg
Div. of Sensor Technology
Swedish Defence Research Agency (FOI)
7/30/2019 2_BasicInformationTheory
2/31
Today
1. What is information theory about?
2. Stochastic (information) sources.
3. Information and entropy.4. Entropy for stochastic sources.
5. The source coding theorem.
7/30/2019 2_BasicInformationTheory
3/31
Part 1: Information Theory
Claude Shannon: A Mathematical Theory of Communication
The
Bell System Technical Journal, 1948
Sometimes referred to as Shannon-Weaver, since
the standalone pub
lication has a foreword by Weaver.Be careful!
7/30/2019 2_BasicInformationTheory
4/31
Quotes about Shannon
What is information? Sidestepping questions aboutmeaning, Shannon showed that it is a measurablecommodity.
Today, Shannons insight help shape virtually all systemsthat store, process, or transmit information in digital form,from compact discs to computers, from facsimile machinesto deep space probes.
Information theory has also infilitrated fields outsidecommunications, including linguistics, psychology,economics, biology, even the arts.
7/30/2019 2_BasicInformationTheory
5/31
7/30/2019 2_BasicInformationTheory
6/31
7/30/2019 2_BasicInformationTheory
7/31
7/30/2019 2_BasicInformationTheory
8/31
Part 2: Stochastic sources
A source outputs symbolsX1,X2, ...
Each symbol take its value from an alphabetA
= (a1, a2, ). Model:P(X1,,XN) assumed to be known for
all combinations.
Source X1,X2,
Example 1: A textis a sequence of symbols
each taking its value from the alphabetA= (a, , z, A, , Z, 1, 2, 9, !, ?, ).
Example 2: A (digitized) grayscale image is a
sequence of symbols each taking its valuefrom the alphabet A= (0,1) orA= (0, , 255).
7/30/2019 2_BasicInformationTheory
9/31
Two Special Cases
1. The Memoryless Source
Each symbol independent of the previous
ones.
P(X1,X2, ,Xn) =P(X1) P(X2) P(Xn)
2. The Markov Source
Each symbol depends on the previous one.
P(X1,X2, ,Xn)=P(X1) P(X2|X1) P(X3|X2)
P(Xn|Xn-1)
7/30/2019 2_BasicInformationTheory
10/31
The Markov Source
A symbol depends only on the previous
symbol, so the source can be modelled by a
state diagram.
a
b
c
1.00.5
0.7
0.3
0.3
0.2
A ternary source withalphabet A= (a, b, c).
7/30/2019 2_BasicInformationTheory
11/31
The Markov Source
Assume we are in state a, i.e.,Xk= a.
The probabilities for the next symbol are:
a
b
c
1.00.5
0.7
0.3
0.3
0.2
P(Xk+1 = a | Xk= a) = 0.3
P(Xk+1 = b | Xk= a) = 0.7
P(Xk+1 = c | Xk= a) = 0
7/30/2019 2_BasicInformationTheory
12/31
The Markov Source
So, ifXk+1 = b, we know thatXk+2 will
equal c.
a
b
c
1.00.5
0.7
0.3
0.3
0.2
P(Xk+2 = a | Xk+1 = b) = 0
P(Xk+2 = b | Xk+1 = b) = 0
P(Xk+2 = c | Xk+1 = b) = 1
7/30/2019 2_BasicInformationTheory
13/31
7/30/2019 2_BasicInformationTheory
14/31
Analysis and Synthesis
Stochastic models can be used foranalysinga source.
Find a model that well represents the real-world
source, and then analyse the model instead ofthe real world.
Stochastic models can be used forsynthesizinga source.
Use a random number generator in each step ofa Markov model to generate a sequencesimulating the source.
7/30/2019 2_BasicInformationTheory
15/31Show plastic slides!
7/30/2019 2_BasicInformationTheory
16/31
Part 3: Information and Entropy
Assume a binary memoryless source, e.g., a flip ofa coin. How much information do we receive whenwe are told that the outcome is heads?
If its a fair coin, i.e.,P(heads) =P(tails) = 0.5, we saythat the amount of information is 1 bit.
If we already know that it will be (or was) heads, i.e.,P(heads) = 1, the amount of information is zero!
If the coin is not fair, e.g.,P(heads) = 0.9, the amount of
information is more than zero but less than one bit! Intuitively, the amount of information received is the
same ifP(heads) = 0.9 orP(heads) = 0.1.
7/30/2019 2_BasicInformationTheory
17/31
Self Information
So, lets look at it the way Shannon did.
Assume a memoryless source with
alphabet A= (a1, , an) symbol probabilities (p1, , pn).
How much information do we get when
finding out that the next symbol is ai? According to Shannon the self information of
ai is
7/30/2019 2_BasicInformationTheory
18/31
Why?
Assume two independent eventsA andB, withprobabilities P(A) = pA andP(B) = pB.
For both the events to happen, the probability isp
Ap
B. However, the amount of information
should be added, not multiplied.
Logarithms satisfy this!
No, we want the information to increase with
decreasing probabilities, so lets use the negative
logarithm.
7/30/2019 2_BasicInformationTheory
19/31
Self Information
Example 1:
Example 2:
Which logarithm? Pick the one you like! If you pick the natural log,
youll measure in nats, if you pick the 10-log, youll get Hartleys,
if you pick the 2-log (like everyone else), youll get bits.
7/30/2019 2_BasicInformationTheory
20/31
7/30/2019 2_BasicInformationTheory
21/31
Entropy
Example:Binary Memoryless Source
BMS 0 1 1 0 1 0 0 0
1
0 0.5 1
The uncertainty (information) is greatest when
Then
Let
7/30/2019 2_BasicInformationTheory
22/31
Entropy: Three properties
1. It can be shown that 0 Hlog N.
2. Maximum entropy (H = log N) is reached
when all symbols are equiprobable, i.e.,pi = 1/N.
3. The difference log NHis called the
redundancyof the source.
7/30/2019 2_BasicInformationTheory
23/31
Part 4: Entropy for Memory Sources
Assume a block of source symbols (X1, ,
Xn) and define the block entropy:
The entropy for a memory source is defined
as:
That is, the summation is done over all possible combinations ofn symbols.
That is, let the block length go towards infintity.
Divide by n to get the number ofbits / symbol.
7/30/2019 2_BasicInformationTheory
24/31
Entropy for a Markov Source
The entropy for a state Sk can be expressed as
Averaging over all states, we get the
entropy for the Markov source as
Pkl is the transition probability from state kto state l.
7/30/2019 2_BasicInformationTheory
25/31
The Run-length Source
Certain sources generate long runs orbursts ofequal symbols.
Example:
Probability for a burst of length r:P(r) = (1-)r-1
Entropy:HR = - r=11P(r) logP(r)
If the average run length is , thenHR/ =HM.
A B
1-1-
7/30/2019 2_BasicInformationTheory
26/31
Part 5: The Source Coding Theorem
The entropy is the smallest number of bitsallowing error-free representation of the source.
Why is this? Lets take a look on typical sequences!
7/30/2019 2_BasicInformationTheory
27/31
Typical Sequences
Assume a longsequence from a binarymemoryless source withP(1) =p.
Among n bits, there will be approximately
w = n
pones. Thus, there isM= (n overw) such typical
sequences!
Only these sequences are interesting. Allother sequences will appear with smallerprobability the larger is n.
7/30/2019 2_BasicInformationTheory
28/31
How many are the typical
sequences?
bits/symbol
Enumeration needs logMbits, i.e,
bits per symbol!
7/30/2019 2_BasicInformationTheory
29/31
How many bits do we need?
Thus, we needH(X) bits per symbol
to code any typical sequence!
7/30/2019 2_BasicInformationTheory
30/31
The Source Coding Theorem
Does tell us
that we can represent the output from a source
XusingH(X) bits/symbol.
that we cannot do better.
Does not tell us
how to do it.
7/30/2019 2_BasicInformationTheory
31/31
Summary
The mathematical model of communication.
Source, source coder, channel coder, channel,
Rate, entropy, channel capacity.
Information theoretical entities Information, self-information, uncertainty, entropy.
Sources
BMS, Markov, RL The Source Coding Theorem