+ All Categories
Home > Documents > 1 Introduction to Information and Entropy · the information content I of the probability...

1 Introduction to Information and Entropy · the information content I of the probability...

Date post: 21-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
22
ME597B/Math597G/Phy597C Spring 2015 1 Introduction to Information and Entropy Let X be a random variable with discrete outcomes, {x 1 , ··· ,x M }, where M 2. Examples are as follows: 1. A coin having two outcomes of Head and Tail. 2. A six-faced die having six outcomes of 1, 2, 3, 4, 5, and 6. 3. An ideal gas system, contained in a rigid, impermeable (i.e., non-porus), and diathermal vessel, having with N molecules. Under (quasi-static) thermodynamic equilibrium conditions, the system may have M energy states, where 2 M N . Let X have a probability distribution, described by a probability mass function {p i }, where M i=1 p i = 1 and p i 0 i. Now, we pose the following question: What is the information content of X (i.e., the probability mass function {p i })? To answer the above question, let us construct a message of finite length N , where N M . The message could be constructed from independent realizations of the random variable X. Let us find out how many binary digits (called bits ) are needed to convey this N -long message. If there are R bits, then it follows that ( 2 R = M N ) ( R log 2 = N log M ) ( R = N log M log 2 ) Taking the the logarithm with base 2, it follows that R = N log 2 M . Apparently, an N -long string of independent outcomes of the random variable X has an information content of N log 2 M bits, i.e., this many bits of information will have to be transmitted to convey the message. The probability distribution {p i } limits the types of messages that are likely to occur. For example, if p j p k , then it is very unlikely to construct a message with the number of x k ’s being larger than the number of x j ’s. For N being very large, we expect that x j will appear approximately Np j times out of N . Therefore, a typical message will contain {n i , Np i ; i =1, ··· ,M } symbols arranged in different ways. The number of different arrangements is given by η(N,M ) , N ! n 1 ! ··· n M ! where M i=1 n i = N and n i 0 i it is noted that η M N which is the maximum possible number of N -long messages. Then, it follows by using Stirling formula (which states log e k!= k log e k - k + O(log e k)) that log e η = log e N ! - M j=1 log e n j ! ( N log e N - N ) - M j=1 ( n j log e n j - n j ) = N log e N - M j=1 n j log e n j = N log e N - M j=1 ( Np j ) log e (Np j ) = N log e N - ( N log e N )( M j=1 p j ) - N M j=1 p j log e p j = -N M j=1 p j log e p j Then, to represent one of the ”likely” η sequences, it takes log 2 η ≈-N M j=1 p j log 2 p j bits of information. 1
Transcript
Page 1: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]

ME597B/Math597G/Phy597CSpring 2015

1 Introduction to Information and Entropy

Let X be a random variable with discrete outcomes, x1, · · · , xM, where M ≥ 2. Examples are as follows:

1. A coin having two outcomes of Head and Tail.

2. A six-faced die having six outcomes of 1, 2, 3, 4, 5, and 6.

3. An ideal gas system, contained in a rigid, impermeable (i.e., non-porus), and diathermal vessel, havingwith N molecules. Under (quasi-static) thermodynamic equilibrium conditions, the system may haveM energy states, where 2 ≤M ≪ N .

Let X have a probability distribution, described by a probability mass function pi, where∑M

i=1 pi = 1 andpi ≥ 0 ∀i. Now, we pose the following question:

What is the information content of X (i.e., the probability mass function pi)?

To answer the above question, let us construct a message of finite length N , where N ≫ M . The messagecould be constructed from independent realizations of the random variable X. Let us find out how manybinary digits (called bits) are needed to convey this N -long message. If there are R bits, then it follows that(

2R =MN

)⇒

(R log 2 = N logM

)⇒

(R =

N logM

log 2

)Taking the the logarithm with base 2, it follows that R = N log2M . Apparently, an N -long string ofindependent outcomes of the random variable X has an information content of N log2M bits, i.e., this manybits of information will have to be transmitted to convey the message.

The probability distribution pi limits the types of messages that are likely to occur. For example, ifpj ≫ pk, then it is very unlikely to construct a message with the number of xk’s being larger than thenumber of xj ’s. For N being very large, we expect that xj will appear approximately Npj times out of N .

Therefore, a typical message will contain ni , Npi; i = 1, · · · ,M symbols arranged in different ways. Thenumber of different arrangements is given by

η(N,M) , N !

n1! · · ·nM !where

M∑i=1

ni = N and ni ≥ 0 ∀i

it is noted that η ≪ MN which is the maximum possible number of N -long messages. Then, it follows byusing Stirling formula (which states loge k! = k loge k − k +O(loge k)) that

loge η = logeN !−M∑j=1

loge nj ! ≈(N logeN −N

)−

M∑j=1

(nj loge nj − nj

)= N logeN −

M∑j=1

nj loge nj = N logeN −M∑j=1

(Npj

)loge(Npj)

= N logeN −(N logeN

)( M∑j=1

pj

)−N

M∑j=1

pj loge pj

= −NM∑j=1

pj loge pj

Then, to represent one of the ”likely” η sequences, it takes log2 η ≈ −N∑M

j=1 pj log2 pj bits of information.

1

Page 2: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]

Shannon’s Theorem states that, as N → ∞, the minimum number of bits necessary to ensure the errorsto vanish in N trials is log2 η ≈ −N

∑Mj=1 pj log2 pj , which is less than N log2M bits as needed in the

absence of any knowledge of the probability distribution pi. The difference per trial can be attributed as

the information content I of the probability distribution pi, i.e.,N log2 M+N

∑Mj=1 pj log2 pj

N , from which it

follows that I[pi] , log2M +∑M

j=1 pj log2 pj .

An alternative representation of information (that is adopted by many authors) is: I[pi] ,∑M

j=1 pj log2 pj .

2 A Thermodynamic Perspective

Let a vessel with rigid, impermeable, and diathermal boundaries contain N (non-interacting and statisticallyindependent) randomly moving particles. Under a thermodynamic equilibrium condition, let the total energyof these N particles be E that is distributed as follows:

Let N particles be clustered in M groups, where M ≥ 2 and M ≪ N . In group i, where i = 1, 2, · · · ,M ,there are ni particles such that the expected value of the energy of each particle is εi with standard deviationδi. Then it follows that

N =

M∑i=1

ni and E =

M∑i=1

niεi for a very large N

Let us order these M groups of particles such that ε1 < ε2 < · · · < εM . It is assumed that(δiεi

)≪ 1 and(√

δ2i+δ2i+1

εi+1−εi

)≪ 1. Let us define pi , ni

N and Ei , Nεi, where i = 1, 2, · · · ,M ; obviously,∑M

i=1 pi = 1 and

E1 < E2 < · · · < EM and E =∑M

i=1(piEi). Then, it follows from the principle of energy minimization atan equilibrium condition that p1 > · · · > pM .

Let us initiate a quasi-static change through exchange of energy via the diathermal boundaries so thatthe total energy of the thermodynamic system is now E while the number of particles is still the same.Then these N particles have a new probability distribution pi, i = 1, 2, · · · ,M because the N particles arenow distributed among the same groups as ni, i = 1, 2, · · · ,M . The following conditions hold at this newcondition.

M∑i=1

pi = 1 and E =

M∑i=1

(piEi)

Note that Ei’s are unchanged and the expected value of the energy of each of the ni particles in the ith

group is still εi for i = 1, 2, · · · ,M .

Remark 2.1. The particle energies εi are discrete according to quantum mechanics and their values dependon the volume to which these particles are confined; therefore, the possible values of the total energy E arealso discrete. However, for a large volume and consequently a large number of particles, the spacings of thedifferent energy values are so small in comparison to the total energy of the system that the parameter Ecan be regarded as a continuous variable. Note that this fact prevails regardless of whether the particles arenon-interacting or interacting.

Remark 2.2. For a general case, where the vessel boundaries are allowed to be flexible, porus, and diather-mal, the specifications of the respective parameters E, V and N define a macrostate of the thermodynamicsystem. However, at the particle level, there is a very large number of ways in which a macrostate (E, V,N)can be realized. As seen above in the case of non-interacting particles, the total energy is simply the sumof the energies of N particles; since these N particles can be arranged in many different ways, each singleparticle of energy εi can be placed in many different ways to realize the total energy E. Each of these differentways specifies a microstate of the system and the actual number Ω of these microstates is a function of E,V and N . In general, the microstates of a given system are generated in quantum mechanics as the indepen-dent solutions in the form of wave functions ψ(r1, · · · , rN ) of the Schrodinger equation corresponding to theeigenvalue E of the relevant operator. In essence, a given macrostate of the system corresponds to a largenumber of microstates; in other words, a macrostate is an equivalence class of a large number of microstates.In the absence of any constraints, these microstates are equally probable, i.e., the system is equally likely tobe in any one of these microstates at an instant of time.

2

Page 3: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 4: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 5: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 6: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 7: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 8: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 9: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 10: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 11: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 12: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 13: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 14: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 15: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 16: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 17: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 18: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 19: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 20: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 21: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]
Page 22: 1 Introduction to Information and Entropy · the information content I of the probability distribution fpig, i.e., N log 2M+N ∑M j=1 pj log pj N, from which it follows that I[fpig]

Recommended