INTRODUCTION
The information transmission takes place when we read newspaper ,books,
exchanges letters, talk on telephone, work on internet, attend deliver a lecture,
listen to radio, watch television and so on is carried from one place to another.
However we seek information only when we are in doubt. We go to enquiry office
forecasts for information whether it will rain or not . On other hand ,if an event
can occur in just one way, there is no certainty about it and no information by the
occurrence of an event only when there was some certainty before its
occurrence. Naturally, the amount of information received by its occurrence must
be equal to amount of uncertainty prevailing before its occurrence . Thus,
uncertainty and information are two sides of same coin.
(i) SHANNON THEORY:
Due to C.E. Shannon ,deals with mathematical models for communication
problems. The concept of ‘entropy’ given by Shannon , in his mathematical model,
has been found useful in many different disciplines and has penetrated into field
as far away as linguistics ,psychology ,neurophysiology Economics .business,
statics ,biology and thermodynamics
(ii) CYBERNETICS:
Due to Nobert wiener, deals with the communication problem encountered in
living beings and social organization.
(iii) CODING THEORY:
A recently developed subject dealing with the theory of error correcting codes,
find application in problem of determining ‘good’ encoding scheme to combact
error in transmission.
The development of the subject in each of these branches has been so rapid that
evening introduction to their concept would require a separate volume . In this
chapter, we present a brief account of Shannon Theory only. The interested
reader may consult dome book on the subject for introduction to other branches
and also for a detailed study of Shannon theory.
A MEASURE OF INFORMATION:
Consider an event E with probability of occurrence P. Suppose , someone reliable
comes and gives the message that E has occurred . The question is what is the
amount of information conveyed by this message?
If P is close to 1 (say p=0.95), then one may argue that the message has conveyed
a very little information because it was virtually certain that E would occurs. On
the other hand , if P is close to 0 (say 0.001),then it is almost certain that E will not
occur and consequently the message stating its occurrence is quite unexpected
and hence contains a greater deal of information.
The above intuitive idea suggest that we should select a decreasing function of P,
as a measure of information. the function proposed by Shannon is
h(p) = - log p 0 ≤ p ≤ 1
which decrease from ∞ to 0 ie,(p=1). Then information function h(p) measured in
bit
consider two events A and B. Assume that P(A)>0, P(B)>0. If we are informed that
A has occurred ,we have received an amount of information equal to –log[PA)].
The probability that B will occur is P(B/A). If B has also occurred ;the additional
information received from these two successive message is
- log[P(A)]- log[P(B/A)] = - log[P(A) . P(B|A)] – log[P(AB)]
The R.H.S. of above is the information if both A and B have occurred , the same
amount of information is obtained if A occur first and then B occur , When A and
B are independent, Then
P(B/A) = P(B).
Let P(A)= P1 , P(B) = P2 and P(AB)= P1P2
h(P1P2) = h(P1) + h(P2) = h(P2P1)
information h(P) in bits0.2 0.5
1
2
3
45
Info
rmati
on (
in b
its)
AXIMATIC APPROACH TO INFORMATION:-
Axiom 1:
The information depends only on P and hence can be written as h(P) when h is
some function of P ∈ (0,1)
Axiom 2:
h(P) is a monotonically decreasing continuous function of P in (0,1) ie,
h(P1)> h(P2) if 0<P1<P2≤1…………… ………(1)
Axiom 3:
No information is conveyed by an event which is sure to occur ie.
h(1)=0…………………………………….(2)
Axiom 4:
For two independent event E1,E2 with probabilities of occurrences P1,P2 the
information content of the message which state that both have occurred is
equal to the information of the message dealing with only E1 plus that which
deals with only E2
That is
h(P1P2)= h(P1) + h(P2) 0<P1 , P2≤ 1-------------(3)
Theorem (Information Characterization)
The function h(P) satisfying axioms 1 to 4 above can only be of the form
H(P) = -clogP 0 < P ≤ 1
Where c>0 is an arbitrary constant
Proof:
let us transformation y1 = -log P1 , y2 = -logP2 -----------------------(4)
Further h(Pi) = h (e-yi)
=g(yi) i=1,2,3………………… (5)
The equation (3) can be written
g(y1) +g( y2)- - ---------------------------------------------------(6)
since yi is defined as a decreasing function of Pi, it follows from axion 2 that g(yi)
is an increasing function of yi ie.
g(y1) < g(y2) if 0≤ y1<y2……………………….(7)
now set y1 = y2 =0 in (6)
we get g(0)=2g(0)
by axiom 3 we have
g(0) = h(1) = 0
let now g(1) = c, which is (7) gives
0=g(0) <g(1)
applying (6)
g(2) =g(1) = g(1)= 2c
g(3) =g(2)+g(1) =3c
.
.
.
.G(n) =nc
Where n is an integer, thus g(y) is proportional to y , when y is an integer. Let us
now consider case for y to be rational. Let y= mn m,n being non- negative integer
G(m) = g(mn + m
n +mn + m
n ………..+mn ) = ng( m
n )
n terms
But g(m) = mc , because m is an integer
Thus g(mn ) = c(m
n )
Which show that g(y) = cy , c>0 …………………………………….(8)
Holds for any non –negative rational value of
It follows from (4) ,(5) and (8) that
H(p) = -clog P , c>0
This complete the proof.
ENTROPY -THE EXPECTED INFORMATION
We consider which states that an event E has occurred, the information conveyed
by this message is different from one conveyed by its complementary message.
Actually, if p is the probability of occurrence of E , the information conveyed by
the letter message is
h(1-p) = -log(1-p)
Observe that h(p)≠ h(1-p), unless p=1/2
So far as the event E is concerned , the information to be received is either h(p)
or h(1-p), so long as message conveying occurrence or non-occurrence of E is not
received. However, one can always compute expected information content of the
message (S) prior to its arrival. The expected information
H(P,1-P)=ph(p) +(1-p)h(1-p)
=-plogp- (l-p)log(l-p), 0<p<1
The function H(P, 1-P) is known as entropy of the probability distribution (p,1-
p).It may be observe that H(P,1-P)is symmetric in p and l-P, it is non-negative ; its
takes value zero at p=0 and p=1
The function H(P,1-P) is shown in fig.
0.25 0.50 0.70 1.00
Probability
0.2
0.4
1.0
0.8
0.6
Expe
cted
Inf
orm
ation
(in
bits
)
The entropy function H(P,I-P). It may be seen that H(P.I-P) is maximum at P =1/2
and value of this maximum is
H(12, 12) = - log212 - 12 log2
12
= 1 bit
ENTROPY AS A MEASURE OF UNCERTAINTY
Consider the probability distribution (0.98,0.01,0.01),the occurrence of the
event with probability0.01 gives an amount of information as high as -log (0.01)
=6.64 bits. This may happen only 2 out 100 times and in the other 98 cases the
information is as low as -log(0.98) = 0.0288 bits. The average information
received is
H=0.01(6.64) +0.01(6.64) +0.98(0.0288)
=0.161 bits
Which is small in accordance with the low degree of uncertainty.
Now consider probability distribution (13 ,13 , 13 ) so that there is a great deal and
hence much information is to be expected when we have ten possibilities rather
than each with probability 0.01 , there is even more uncertainty and hence more
information to be expected.
The above example show that uncertainty and expected information(entropy) are
two sides of same coin. The more is the uncertainty prior to the message , the
larger is the amount of information conveyed by it, at least on the average.
REQUIRE ON THE (ENTROPY) UNCERTAINTY FUNCTION
It suggest that larger the number of equally likely alternative ,larger is the
amount of uncertainty. Let all value of X be equiprobable each with probability
1/M. Then ,first requirement on the uncertainty function is:
R1 Monotonically
H(1/M, 1/M, 1/M,…………………………………………, 1/M) = f(M)
Is a monotonically increasing function of M, that is
M<M’ => f(M) < f(M’) for M,M’ =1,2,3,….
Consider now an experiment involving two independently random variable with
probability distribution
X x1 x2……………………………………xm and Y y1 y2…………………………….yl
P 1/M 1/M …………………………1/M Q 1/L 1/L………………………..1/L
The joint experiment involving X and Y has ML equally likely outcomes ,and thus
the average uncertainty of the joint experiment is f(ML), since X,Y are
independent, the average uncertainty about Y should not be effected by X.
R2 Additivity:
F(ML)= f(M)+ f(L) (M,L=1,2,3,…..)
We remove the restriction of equally likely outcome and turn into general case.
Divide the range space{x1,x2,x3……………………………………xM} of X into two mutually exclusive
groups.
S1 = { X1,X2,……………………………..Xr}
And
S2={Xr+1, Xr+2,Xr+3 ……………………………..XM}
X = 1,2,3,……m-1
The probability of choosing the group Sg(g=1,2) is obtained by summation
Pg=∑Pi(g=1,2) . if group Sg is chosen, Then we select Xi with(conditional) probability
Pi/Pg, i Sg for g=1,2.. before grouping the average uncertainty of the outcome is
H(P1,P2,……Pm)> if we reveal which of two group S1,S2 is selected we remove , on
the average ,an amount of uncertainty is H(Pi/Pg, i Sg) for g=1,2.
Thus, on the average ,the uncertainty after group is specified as
P1H(P1/P1, P2/P1 ,……….Pr/P1)+ P2H (Pr+1/P2, Pr+2/P2……PM/P2)
We expect that the average uncertainty before grouping minus average
uncertainty removed by specifying the group, must be equal to the average
uncertainty remaining after the group is specified.
Thus third requirement on the uncertainty function is:
R3 Grouping:
H(P1,P2………………………..Pm) = H(P1,P2)+(P1H(P1p1 , ………………… Pr
p1 ) +
P2H(Pr+1P2 ………………Pm
P2 )
Where Pi= ∑i=1
r
Pi and P2= ∑i=r+1
m
Pi
For mathematical convenience , we expect
R4 Continuity :
H(P,I – P) is a continuous function of P,It is easily verified that the entropy
function
H(P1,P2 …………………PM) = -∑i=1
M
Pi logpi
Satisfy all above experiment . moreover the following theorem establish that it is the only function satisfy R1 to R4 .
THE COMMUNICATION SYSTEM
Information theory is mainly concerned with the analysis of “ Communication
system” which has traditionally been represented by the block diagram shown in
fig. below.
The various component of the communication system are now explained.
Source:
Source as any device emitting one at each unit time , Letter (message) from its
‘alphabet’ S = ( S1, S2, S3 …………………..Sk) (K≥2). This letter generation is random
operation and therefore may provide information. For simplest kind of source, we
assumed that the successive symbol emitted from the source are stochastically
independent. Such an information source is called a Zero memory source and is
completely described by the source alphabet.
S={ Si, i= 1,2,……………………..k)
Si Sj
Source of message
Noise
UserDecoderChannelEncoder
Source
A information source and the probability of emitting these symbol say P(Si),
i=1,2…..k. then entropy of {P(Si), i= 1,2…..k} determine average information
provided by the source . The output of the source is called the transmitted
message.
ENCODER:
In order to transmit the information output by the source over to the
communication medium, it become necessary to translate the symbol into
another language which is more suitable for transmission.
The encoder(also called transmitter) is a device which converts each source
output symbol into a signal which is suitable for transmission. The input to the
encoder is the transmitted message and output of it is called the transmitted
signal.
CHANNEL:
The channel is medium over which the ‘ signal’ message is transmitted. It is the
intervening medium between the encoder and decoder is able to transfer
symbol from its input to its output. However the input symbol transferred by the
channel is transmitted signal. The input to the output of the channel is called the
received signal.
Noise:
The channel may be susceptible to noise in which the message signal reaching
the decoder defers from the transmitter signal . Noise is a general term for
anything which tends to produce error in transmission. Noise may be regarded as
a random process. For example the noise in a pair of telephone wires may be due
to cross talk from a adjacent pair. Channel without this disturbance factor are
called “Noise-less Channel”.
DECODER:
The function of the decoder is to reproduce the original transmitted message
from the channel output for delivery to the “destination” . it may be thought of
as an inverse operator to encoder. The two operation may be differ somewhat
because the decoder may also be required to combat the noise in the channel. If
the channel output is badly perturbed by noise correct decoding may not be
possible and there remains some uncertainly about the original transmitted
message .The input to the decoder is the received signal and the output of it is
called received message.
DESTINATION:
The terminating point for the message and may be a recording device such as a
photographic film or a human ear. Basically information theory is an attempt to
construct a mathematical model for each of the blocks of the communication
system. However here we shall be mainly concerned with the channel and
encoding procedure.
CHANNEL PROBABILITIES:-
Memory less channel:
Definition:
A memory less channel is described by an input alphabet
A ={ a1,a2…….ar}, an output alphabet B={b1,b2…………bs} and a set of conditional
probabilities P{bj/ai} for i and j , where P(bj/ai) is the probability that the output
symbol bj will be received if the input symbol ai is sent .
A channel has been shown in fig.
a1 b1
a2 b2
a3 b3
. .
. .
. .ar bs
Input output
Alphabet alphabet
{A channel}
BINARY SYSTEM CHNNEL
A binary symmetric channel has just two input symbol (a1=0,a2=0) and two output
symbol (b1=0,b2=1)and is symmetric in the scene that
P(b1|a2) = P(b2|a1)=P
InformationChannel
AB
P(b1|a2) = P(b2|a1)= P
Where P =l-P ; P being the probability of error in transmission . The channel
diagram of the BSC is shown
P
THE CHANNEL MATRIX
A convenient way of describing a channel is to arrange the output conditional
probabilities as shown in the following chnnel matrix
b1 b2 ………………………bS
a1 p1|2 p1|1………………….Ps|1
a2 p1|2 p2|2………………….ps|2. . .. . .. . .. . .ar p1|r ps|r…………………….Ps|r
0
P
0
1 1
P
P
Binary symmetric channel
Out put
Input
Where Pj|i = p(bj|ai), i=1,2,……..r , j= 1,2,……….s. Each row of the channel matrix,
corresponding to an input of the channel, and each column corresponds to a
channel output . Note that in every channel matrix, sum of terms in every row
must be equal to one ie.
∑j=1
s
p j∨i=1 1=1,2,-------r
Example:
The channel matrix of the BSE is
P P
P P
PROBABILITY RELATIONS IN A CHANNEL
Consider a channel with r input symbols a1,a2,…………….ar and S output symbols
b1,b2,…………………bs and then channel matrix.
P1|1 P2|1 ……………………….PS|1
P1|2 P2|2…………………………PS|2
. .
. .
. .
P1|r P2|r …………………………Ps|r
Let Pio = P(ai) , i= 1,2,……………..r denote the probability that symbol ai will selected
for transmission through the channel and Poj= P(bj), J=1,2,…………S, denote the
probability that output symbols bj will be received as channel output. Then the
relations between the probability of various input symbols and output symbols
may be obtained
∑i=1
r
pio p j∨i =Poj,, for j=1,2,……S …………………………………( 1 )
It shows that if we given the input probabilities Pio and the channel probabilities
Pj|i then the output probabilities Poj can be computed using (1).
P(ai,bj) = pj|i Pio for all i,j-----------------------------(2)
P(ai|bj) = Pj|I Pio/Poj for all i,j----------------------------------(3)
The relation(2) yield the joint probabilities of sending a symbol ai and receiving
the symbol bi , relation (3) gives the backward channel probabilities given that an
output bj has been received.
NOISELESS CHANNEL
A channel described by a channel matrix with one and only one non zero element in each column is called a noiseless channel.
Example 1. A BSC with P=0 or 1 is a noiseless channel.
Example 2. The channel represented by the channel matrix
½ ½ 0 0 0 0
0 0 3/5 5/10 0 0
0 0 0 0 0 1
JOINT AND CONDITIONAL ENTROPIES
Relation between joint and marginal entropies:
Consider two sets of messages:
X ={ X1,X2,…………………………………..Xm}
and Y = { Y1,Y2,…………………………..YN}
where Xi’s are the message sent (channel input) and Yj’s are the message received
(channel output)
Let Pij =P (X= Xi,Y =Yj), i = 1,2,…..M
J=1,2,……..N
Denote the probability of joint event that message Xi is sent and message Yj is
received our objective is to study relationship between joint,conditional and
marginal information associated with the bivariate probability distribution(Pij)
Let us defined the marginal probability distribution of X and Y+ by
Pio = ∑j=1
N
Pij and Poj=∑i=1
M
Pij for all i and j
Then, naturally the marginal entropies of the two marginal distribution are given
by
H(x) = - ∑i=1
M
Pio log Pio and H(Y) =∑j=1
N
Pojlog Poj
the entropy H(X) measure the uncertainity of the message sent and H(Y)
performs the same role for the message received.
The joint entropy is the entropy is the entropy of the joint distribution of the
message sent and received and is therefore given by
H(X,Y) = - ∑i=1
M
∑j=1
N
P ijlog Pij
H(X,Y) measures the uncertainty of the message sent and received
simultaneously .
We observe that
Max H(X,Y) = log MN = logM +Log N
=Max H(X) + Max H(Y)
CONDITIONAL ENTROPIES
Let Pj|I = PijPio denote the conditional probability that Yj is
the message received when it is given that Xi is the message sent. Id we very Y j
over the set of message received , for a fixed distribution P1|j,P2|j,……………..PN|i we
therefore define the conditional entropy of Y given that X= Xi as
H( Y|X= X i) = ∑i=1
M
P j|I log Pj|i
further we define average conditional entropy of Y given X as a weighted
average of H(Y|X =Xi) namely as
H(Y|X)= ∑i=1
M
P io H(Y|X =Xi) = ∑i=1
M
∑j=1
N
P ijlog Pj|i (∵ Pij=Pio Pj|i)
THEOREM :
H(Y|X) =H(X) +H(Y|X)= H(Y) +H(X|Y)
PROOF:
H(X) + H(Y|X) =∑l=1
M
Piolog Pio - ∑l=1
M
∑j=1
N
P ijlogPj|i
= -∑l=1
M
∑j=1
N
P ijlogPio -∑i=1
M
∑j=1
N
P ijlogPj|i
= - ∑i=1
M
∑j=1
N
P ijlog (PioPij)
= - ∑i=1
M
∑j=1
N
P ijlogPij
= H(X,Y)
similarly H(X,Y) = H(Y) + H(X|Y) (proved)
MUTUAL INFORMATION:
The expected mutual information:
Let us consider the set of message sent X ={ X1, X2,…………………….XM) and the set
of message received Y ={Y1,Y2,…………………………..YN).
Then The quantity
h(Xi,Yj) = log Pij / PioPoj , i=1,2,……….M
j=1,2,……….N
is known as the mutual information of the message sent Xi and Message
received Yj. The following observation on h(Xi,Yj) are obvious :
(i) h(Xi,Yj) = 0 whenever X,Y are stochastically independent
(ii) h((Xi,Yj) > or < 0 according Pj|I > or < Poj for a fixed X i
That is according as Yj is more or less frequently the message received , given that
Xi is the message sent . Now averaging all MN mutual information values with
Pij‘s as weight, we obtain the expected mutual information of X and Y as
I(X,Y) = ∑i=1
M
∑j=1
N
P ijlog Pij
P io Poj
Theorem:
I(X-Y) = H(X) - H(Y|X) =H(Y) – H(Y|X)
Proof:
H(X) – H(X|Y) = -∑i=1
M
∑j=1
N
P ijlog Pio + ∑i=1
M
∑j=1
N
P ijlogP i|j
= ∑i=1
M
∑j=1
N
P ij
Pi∨ j
Pio
= I(X,Y)
ENCODING :
We now consider noiseless channel only
Definition code:
Let S ={ S1, S2, ……………………………Sq)
be the source alphabet . Then we define a code as a mapping of all possible
sequence of symbol of S into sequence of symbol of some other alphabet
Q = { a1,a2 ,……………………………. aD) we call Q the code alphabet.
Definition -2 (Block code)
A code which maps each of the symbol of S into a fixed sequence of symbol Q is
called a block code.
The fixed sequence of symbol of Q are called code words associated with Si ‘s .
for example S1 may correspond to a1a2 and S2 may corresponds to a3a7a8a3 etc.
Defianation-3 (Binary code)
A code with = {0,1} is called a binary code
Example:
A binary block code is given below S1→0 ,S→11, S3→00, S4→11
UNIQUELY DECODABLE CODE
Definition 1(Non- Singular Code)
A block code is said to be non singular if all the words of the code are distinct.
Example:
A non-singular binary block code is given below
S1→0 ,S2→11, S3→00, S4→01
even though all the code words are distinct in the above non- singular code , it is
still possible for a given sequence of code words to have a ambiguous origin
either S3S2 or S1S1S2.
DEFINITION 2 :(UNUNIQELY DECODABLE (SEPARABLE )CODE:
A code is said to be uniquely decodable (separable) if every finite sequence of
code symbol corresponds to at one source symbol.
Example:
the following three codes are uniquely decodable
(i) S1 →00 ,S2→01, S3→10, S4→11
(ii) S1→0 ,S2→10, S3→110, S4→1110
(iii) S1→0 ,S2→01, S3→011, S4→0111
DEFINITION-3
A uniquely decodable code is said to be instantaneous if it is possible to decode
each word in a sequence without reference to succeeding code symbol.
Example:
this suggest that
“ every instantaneous code is uniquely decodable but not conversely “
The various sub-class of code are indicated
Non-BlockSingular Not uniquely
decodable
DEFINITION-4(PREFIX)
Let A and b be two finite (non-empty)sequence of code symbol. Then we say that
sequence A is the prefix of sequence B if it possible to write B as AC for some
other sequence C of the code symbol.
Example
The code word 0111 has three prefix - 011,01 and 0. It may easily be observed
that
“A necessary and sufficient condition for a code to be instantaneous is that no
complete code word be a prefix of some other”
CONSTRUCTION OF BINARY INSTANTANEOUS CODE:
consider a source S ={S1,S2,S3,S4,S5}.let us start by assigning0 to symboLS1
ie S1→0 . Then all other code words must be start with1. we can not let S2→1 ,
because this would leave us with no symbol with to start the remaining three
code words.
Thus we might have S2→10. This in turn requires that remaining code word must
be start with 11, we avoid let S2→11. Let S2→110 now only three symbol prefix
Block Non-Singular Uniquely decodable
Not instantaneous
Instantaneous
still unused is 111 and thus we might set S4→1110, S5→1111 thus we have
constructed the instantaneous code
S1→0 ,S2→10, S3→110, S4→1110 ,S5→1111
in the above encoding procedure by selecting S→0
For example:
we would have taken S1→00, then we may set S2→01, still have two prefix of
length 2 which are unused . we construct code
S1→00 ,S2→01, S3→10, S4→110, S5→111
we observed that in first procedure (when we start with zero) the later codes are
of larger length that of these in second procedure (when we start with 00)
Shannon-Fano Encoding Procedure
Let the assemble [S] of message to be transmitted be given by
[S] =[m1 , m2 ,……………………….mN]
and the corresponding probability distribution 9P) of transmission be given by
[P]= [P1,P2,…..PN] (Pi>0 , ∑i=1
N
Pi=1)
We shall divide an encoding procedure assigning efficient uniquely decodable
binary code to [S]. The following are two necessary requirement.
i. No complete code word can be the prefix of some other .
ii. The binary digit in each code word appear independently with equal
probability.
The Shannon-Fano procedure is as follows
STEP-1:
Arrange the message in decreasing order of their probabilities without loss of
generality
let
P1>P2>P3>P4 ………………..PN
thus we have
message Probability
m1 P1
m2 P2
m3 p3
. .
. .
. .mN PN
STEP-2
Partition the set of message into most equiprobable group say S1 S2
Message Probability
S1 m1 P(S1)P1+P2
m2
m3 p(s2) = p3+ ….pN
m4
.
.
.mN
Such that P(S1) = p(S2)
STEP -3
further partition S1 and S2 into two most equiprobable sub groups, say S11, S12 and
S21 S 22 respectively.
STEP-4
continue partitioning the resulting sub groups into further two most equi
probable subgroups contains exactly one message.
STEP-5
Assign the binary digit 0 to the first position of the coded word for the message
of S1 and assign the digit 1 to the first position of the code words for message of
S2 . The assignment of the binary digit to other position in the code word s are
simultaneously made in the accordance with further portioning of S1 and S 2 .
S2
CONCLUDING REMARK
A new dimension to Shannon Theory has been added with development of
what is now called “Useful information”. The underlying point here is that not all
the message have same importance (Utility) and therefore the information
measure should include the importance parameter also before the usual
probability parameter for example a message conveying the greeting of the
prime minister , intuitively carries more useful or relevant information than what
is one from relatives.
Reference Books
i. A Study of Generalized Measures inInformation Theory
By Kanti Swarup, P. K. Gupta and Man Mohan
ii. Dependability and Disposal Decision Problems in Operational Research
By Kanti Swarup & H . C. Jain
CONTENTS
1. Introduction
2. A measure of information
3. Entropy – the expected information
4. Entropy as a measure of uncertainty
5. Requirement of the (entropy) uncertainty function
6. The communication system
7. Channel probability
8. Joint and conditional entropy
9. Mutual information
10. Encoding
11. Concluding remark