Chapter 2 AMeasure of Information - Chester F. Carlson ... · 2,the Kraft inequality is Xq i=1 2¡n...

Chapter2

A M easureofInformation

2.1 EncodingofInformationInthislecturewewillthinkofamessageasasequenceofsymbolsproducedbyaninformationsource. T hemessagewillappearasasignal, which is afunctionofoneormoreindependentvariables. T hesignalmaybeasoundwave, animage, oranyofamyriadofphysicalforms.

W ewill…ndinafuturelecturethatthemessagemaybebasedonphe-nomenaotherthanlanguage. A languagemaybethoughtofas organizedcombinations ofsymbols thatexpressandcommunicatethoughtsandfeel-ings. A messagemaybeanexpression inalanguage, andthatis howtheterm isordinarilyused. H owever, wewanttousetheterm inamoregeneralform sowecanmodelmorethanhumancommunication. Ifwethinkofamessagefrom amachineorfrom nature, itis di¢culttoassociate itwiththoughtsorfeelings. W ecangeneralizebynotingthatwecanusuallymodelthings, atleastconceptually, intermsoftheirstate. A state-spacemodelisaquitegeneralstructure. W ewillthereforesaythatamessageisanexpressionofthestateofasystem1.

Suppose thatwehavemade someobservations and havededuced thestateofasystem. W emaywanttorecordthestatesothatitcanberesettothatvalueatanothertime. T his is essential, forexample, insimulation.Torecordthestateweneedtoconstructasignalthatwilldrivearecording

1M oreprecisely, itis themathematicalmodelofasystem thathas states. A goodmodelwillbehavelikethenaturalsystem. T hebehaviorofthemodelis determinedbyitsstate. T hecurrentstateisdeducedfrom observations.

9

10 CH A P T ER 2. A M EA SU R EO F IN FO R M A T IO N

apparatus. A littlethoughtwillconvinceyouthatrecordingthestateofasystemorrecordingamessageareequivalenttasks. M oreover, recordingandcommunicationareequivalentfromourconceptualviewpoint.

A recording is done through a signalthatis suited tothe recordingmedium. T hatsignalmaybeproducedbyasequenceofsymbols from analphabetthatwemaychoose— itis notdetermined by the system underobservation. W esimplyrequirethateachpotentialstate(oreachpotentialmessage)beuniquelyrepresented. IfthereareN di¤erentstatesormessages,thentheremustbeatleastN uniquearrangementsofsymbolsavailabletous. Inthelastlecturewesawthatanyalphabetwithtwoormoresymbolscanbeused. M oreover, itis nottheappearanceofthesymbols, buttheirnumber, thatis importantindeterminingtheirenumerationproperties. W ewill…ndthatattimesonecangainsomee¢ciencybyproperlychoosingthesizeoftherecordingalphabet, butthatisusuallyasmallgain.

T he importantdeterminerofthe e¢ciency ofarecordingcode is themannerinwhichsequencesofsymbolsareassociatedwiththedi¤erentstates.Itis thestructureofthecode, andnottheidentityofthesymbols, thatisimportant. A goodcodecangreatlyreducetheamountofdataneededtorecordthestateofthemodel. T hisis, infact, thegoalofimagecompressionalgorithms. A compressionsystem containsanalgorithm, whichisamodelofimages, andamethodtorecordthedataneededtoreproducetheimage.T hebestcompression system is theonewhichreproduces the imagewithacceptable…delitywiththeminimumdata.

T heamountofinformationneededtorecordthestateofasystemdependsupontheprobabilities ofthevarious states. Itturns outthatifweknowthestateprobabilitieswecancalculatetheminimumamountofdatathatisrequiredontheaverage. T hisminimumisgivenbytheentropyofthesystem,whichiscloselyrelatedtotheconceptofentropyinphysicalsystems.

A codeisanalphabetofsymbolsandasetofrulesthatassociateeventswithpatternsofsymbolsfrom thealphabet. A neventisanythingthatonewishestoexpressintermsofthecode, suchasthestateofasystem. T hede-velopmentofcodesdependsonlyupontheabstractnotionofeventsandtheirassociatedprobabilities. T heassociationwithstatemodels isaccomplishedbyinterpretingthemeaningofeventasaparticularstate.

T heingredientsofacodeare:

1. analphabetAofsymbols. W edon’tcareaboutthespeci…crepresen-tationofobjectsinA, onlythattheybedistinguishableandtherebea

2.1. EN CO D IN G O F IN FO R M A T IO N 11

…nitenumberofthem. W ewilltypicallyrepresentthealphabetintheformAr = fa1;a2 ;:::;arg. W eonlycareaboutthesizer. T hesmall-estusefulalphabetisA2 , whosesymbolsarecustomarilyrepresentedasA2 = f0 ;1g.

2. asetofeventsE= fe1;e 2 ;:::g. A setof…nitesizemaybeindicatedasEn andoneofin…nitesizebyE1 . T hespeci…cphysicalcharacteristicsoftheevents isofnointerestasthefarasthecodeisconcerned.

3. aruleR thatmapseventsontocodewords, whicharecombinationsofsymbolsfromA.

4. asetofprobabilitiesoverE. T hatis, onecancomputetheprobabilityofanysetB ofevents fromE. IfB ½Ethenonehas aprobabilitymeasurep(B )whichsatis…esalltherulesforprobabilitymeasures.

A codeC canberepresentedbytheaboveitems, anddenotedbyC(A;E;R ;p).W erequirethatR besuchthateacheventis representedbyaunique

codeword. L etwibethecodewordforaneventei, andletnibythenumberofsymbols(orlength) ofwi. T hesetW ofcodewords isrepresentedbytheruleR :E! W . ForuniquedecodingwerequirearuleR ¡1 :W ! E.

L etn(e)bethelength, ornumberofsymbols, ofawordw (e)thatrepre-sentsanevente. T heaveragenumberofsymbolsusedtorepresentanevent,andtheaveragenumberofsymbolspercodeword, is

¹n =X

e2En(e)p(e) (2.1)

A goodcodeisonewhichhasasmallvaluefor¹n. H ereweseektoknowhowto…ndtheminimum valueof¹n foragiveneventspace, describedby(E;p)overallpossiblecoderulesforagivenalphabetAr. W ewouldalsoliketohaveguidanceonhowto…ndR foragiven(E;p;Ar) suchthatthecodeis e¢cientandboth R and R ¡1 arepracticaltoimplement. Informationtheoryhasmadeconsiderableprogressonallthesetasks.

Example1 Random N umberG eneratorA random numbergeneratorcan producepositive integers from the set

fe1 = 1;e 2 = 2 ;:::;en = ng. This machinehas a…niteeventspace. Thegeneratoroutputistoberepresentedbywordswhoselettersarefromabinary


codealphabet, A = fA;B g. W eare interestedin acodethatcanuniquelyrepresentasequenceofnumbers producedbythesource.

Ifn = 2 , an obvious coderule is toassign e1 = 1 ! A, e 2 = 2 ! B .Therewillthenbeaone-to-onerelationship betweenanysequencefrom thesourceandacorrespondingsequenceoflettersfrom thealphabet.

Ifn = 4 thenitisnecessarytoconstructfourcodewordsfromanalphabetoftwoletters. W ecouldusefe1 ! AA;e 2 ! AB ;e3 ! B A;e 4 ! B B g.

Byextension, wecanconstructacodewhosewords areoflengthlog2 nwhenn isanypowerof2. A nyn thatisnotapowerof2 canbeaccommodatedbyusingcodewordswhoselengthis thenextintegerabovelog2 n.

Codes thatareconstructedinthis mannerareuniquelydecodable. H ow-ever, theymaynotbeas e¢cientas possible. Thee¢ciencycannotbede-terminedwithoutknowingthestatisticsofthesourceevents.

A de…nitionofuniquedecodabilityisgivenbelow.

2.1.1 U niquelyD ecodableCodesSupposethatonewants torecordasequenceofevents, whichwouldthenproduceasequenceofcodewords. W ewouldliketodecodeanysequenceofsymbolsproducedbyconcatenatingthecodewordsproducedbythesequenceofevents. W ewanttobeabletodothiswithoutintroducinganothersymboltoserveasaseparator, orcomma, betweencodewordsinthesequence. Codeswiththispropertyarecalleduniquelydecodable.

2.1.2 InstantaneousCodesW ewouldliketobeabletodecodeeachofthecodewordsinthesequenceinde-pendently. T his imposesastrongerrequirementthanthatofbeinguniquelydecodable. W ewanttobeabletorecognizeeachcodewordas soonas itis seen, readinglefttoright, independentofsurroundingcodewords. T hisstrongerrequirementcouldimposeconditionsthatmakethecodelonger.

A necessary and su¢cientcondition fora code to be instantaneous is thatnocompletecodewordbeapre…xofsomeothercodeword.

Example2 A codethatis notinstantaneous.Thecodewithwordsfa = 1;b= 10 ;c= 10 0 ;d = 10 0 0g isuniquelydecod-

ablebecauseeachsymbolis determinedbythenumberofzeros itcontains.


ei wi nie1 1 1e 2 01 2e3 001 3e4 000 3

Table2.1: A ninstantaneouscodeforanalphabetoffoursymbols.

Thesequencebd acis encodedas 10 10 0 0 110 0 . O necannotdecodeasequenceofdigitsuntiloneissureallthezeroshavebeenreceived. H ence, thesequence10 maybethesymbolborthebeginningofcord . Itcannotbedecodeduntilmoredigitsareseen. H ence, this codeisnotinstantaneous.

Toconstructaninstantaneouscodewecanusethesimpleexpedientofnotusinganypreviouslyde…nedcodewordasapre…xwhende…ninganewcodeword.

Example3 A n instantaneouscode.Thecodefa = 1;b= 0 1;c= 0 0 1;d = 0 0 0g isinstantaneous. Thedecoding

ruleistorecognizeasymbolassoonasitscodewordiscomplete. Thesequencebd acencodesas 010001001.

Clearly, onecanhaveaninstantaneouscodethatisuniquelydecodable.T hequestioniswhetherthereisanyadvantageinnotimposingtheconstraintofuniquedecodability. Itturns outthatthere is noadvantage, aswillbeshownbytheanalysis developed below. W e …rststateand proveKraft’sinequality and then use itin M cM illan’s inequality. T his establishes theaboveassertion.

2.1.3 Kraft’s InequalityT heKraftinequalityprovides thenecessaryand su¢cientconditions foracodetobeinstantaneous. SupposethatasystemhasqrecognizableeventsinasetEq = fe1;e 2 ;:::;eqgwhicharetoberecordedorcommunicatedusinganalphabetAr = fa1;a2 ;:::;arg. T heeventscanbeanysystemobservablesorcombinationofobservablesthatareofinterest. H ereweareinterestedinthenumberq. A codeC isnowde…nedwhichmapseacheventontoacodeword


wi= R (ei)wherewi isconstructedofsymbolsfromAr andisoflengthni.A nexampleofacodewithsymbolsfromA2 andcodewordlengthsf1;2 ;3;3gisshowninTable2.1.

Theorem4 A su¢cientconditionfortheexistenceofaninstantaneouscodewithwordlengthsfn1;n 2 ;:::;nqg is that

qX

i= 1

r¡ni·1 (2.2)

Forabinarycode, whichusesthealphabetA2 , theKraftinequalityisqX

i= 1

2 ¡ni·1 (2.3)

Example5 Thebinarycodewordlengths inExample3aref1;2 ;3;3g. TheKraftinequalityis

2 ¡1 + 2 ¡2 + 2 ¡3+ 2 ¡3 = 1

H ence, aninstantaneouscodeexists, aswehaveseenbydemonstraionabove.

2.1.4 ProofoftheKraftInequalityToprovethesu¢ciency, wewillprovideamethodtoactuallyconstructacodewithwordlengthsfn1;n 2 ;:::;nqgwhentheselengthssatisfy(2.2). L etci bethenumberofcodewords oflength i. Intheaboveexamplec1 = 1,c2 = 1, c3 = 2 . Ifthemaximumcodewordlengthisn , thenthecountsmustsatisfy

nX

i= 1

ci= q (2.4)

sincetheremustbeacodewordforeachsourceevent.U singtheabovede…nitions, wecanwritetheKraftinequalityas

nX

i= 1

cir¡i·1 (2.5)

U ponmultiplyingbyrn wehavenX

i= 1

cirn¡i·rn (2.6)


T his inequalitymaybewrittenoutandrearrangedtoobtain

cn ·rn ¡c1rn¡1¡c2 rn¡2 ¡¢¢¢¡cn¡1r

T hetermontherightoftheinequalitymustbenonnegative. Ifwedivideitbyr andrearrange, weobtain

cn¡1 ·rn¡1¡c1rn¡2 ¡c2 rn¡3¡¢¢¢¡cn¡2 r

O nceagain, thetermontherightmustbenonnegative. W ecancontinueinthisfashiontoobtainthefollowingsequenceofinequlities:

cn · rn ¡c1rn¡1¡c2 rn¡2 ¡¢¢¢¡cn¡1r (2.7 )cn¡1 · rn¡1¡c1rn¡2 ¡c2 rn¡3¡¢¢¢¡cn¡2 r (2.8)

... (2.9 )c3 · r3¡c1r2 ¡c2 r (2.10)c2 · r2 ¡c1r (2.11)c1 · r (2.12)

T heaboveinequalitiesnowshowthatwecanconstructtherequiredcode-words. W earerequiredtoform c1 words oflength1, leavingr¡c1 sym-bolsthatcanserveasthebeginningoflongercodewords. Byaddinganewsymboltotheendofeachofthesepre…xes wecan constructas manyasr(r¡c1) = r2 ¡c1r wordsoflength2. W eareassuredby(2.11) thattherewillbeenoughwordsavailable. Fromtheunusedtwo-symbolpre…xes, therewillremainr2 ¡c1r¡c2 whichcanbeusedtoconstructr3¡c1r2 ¡c2 r three-symbolwords. Comparingwith(2.10)weseethatthisisasu¢cientnumber.Continuinginthismanner, wewill…ndthatitis possibletoconstructthenecessarynumberofwordsofeachlength. W ehavethusdemonstratedthat(2.2) or(2.5) aresu¢cienttoguaranteetheexistenceofan instantaneouscode.

2.1.5 M cM illan’s InequalityA lthough theabove proofshows thatsatisfaction oftheKraftinequalityis guarantees theexistenceofan instantaneous (and thereforeauniquelydecodable) code, itdoes notshowthatthecondition is necessary. M ight


there be uniquelydecodable codes thataremore e¢cient? The analysisbelowshowsthattherequirementisalsonecessary.

ConsiderraisingthesummationintheKraftinequalitytoapower. T henÃ

qX

i= 1

r¡ni! m

=¡r¡n1 + r¡n 2 + ¢¢¢+ r¡nq

¢m (2.13)

W henthisismultipliedoutandthetermswithequalexponentsaregatheredtogether, theexpressionwillbeoftheform

ÃqX

i= 1

r¡ni! m

=m nX

k= m

N kr¡k (2.14)

wheren isthelengthofthelongestcodewordandN k isthenumberoftermsoftheform r¡k. N k isalsothenumberofdistinctstringsofcodewordsthatcanbeconstructedbystringingtogetherm codewordssuchthatthestringlength is exactlyksymbols. Ifthecodeis tobeuniquelydecodable, thenthis numbermustbeless thanthenumberofpossiblestrings oflengthk,which is rk. T herefore, ifwesubstitute rk¸ N k inplaceofN k, theaboveequationbecomesaninequlity.

ÃqX

i= 1

r¡ni! m

·m nX

k= m

rkr¡k

· m n ¡m + 1· m n

L etx standforthesum ontheleft. Foranyx > 1, xm > m n foranyn ifm is su¢cientlylarge. T hiswouldviolatetheaboveinequality. H ence,auniquelydecodablecoderequires x · 1, which is identicaltotheKraftinequality(2.2).

M cM illan’s inequalityshowsthatthereis noadvantageinnotusinganinstantaneous code. T hatis, foranyuniquelydecodablecodethere is anequivalentinstantaneouscode.

2.1.6 E¢cientCodesW ewould like toconstructinstantaneous codes which use theminimumaveragenumberofsymbols perevent. T his willenableus todescribean


eventin themoste¢cientmannerpossible, a factthatis importantforstorageandtransmissionofinformation. T heaveragenumberofalphabetsymbolspercodewordisgivenby(2.28)

¹n =qX

i= 1

nipi

whereq is thenumberofdistinctevents inE. W ewanttochoosethe nitosatisfytheKraftinequalityandminimize¹n. T his isnotatrivialmatter,butithas been solved by the H u¤man codingprocedure forthe case ofstatistically independentevents. W ewillexaminethe H u¤man procedurelater. Fornowwewillconsidertheproblemof…ndingalowerboundon ¹n.

Inouranalysisweneedtomakeuseofthefollowingtheorem. W ewill…ndthistheoremusefulonotheroccasions.

Theorem6L etx1;x2 ;:::;xq andy1;y2 ;:::;yq beanytwosetsofnumberswithxi¸0 andyi¸0 for1·i·qwith

qX

i= 1

xi= 1 andqX

i= 1

yi= 1

ThenqX

i= 1

xiloga1xi·

qX

i= 1

xiloga1yi

(2.15)

Toprovethiswenote…rstthatlnx·x¡1 withequalityonlyatx = 1.T hiscanbeshownwithagraphofthetwofunctions. T hen

qPi= 1

xiln³yixi

´·

qPi= 1

xi³yixi¡1

´

·qP

i= 1yi¡

qPi= 1

xi

·0

withequalityifandonlyifxi= yiforalli. B yexpandingthelogarithmanddividingthroughbylna thetheorem isestablishedforanylogarithmicbasea.


N ow, bylettingxi = p(ei) andyi = 1=q, i= 1;2 ;:::;q wehave(witha = q)

qX

i= 1

pilogq1pi·

qX

i= 1

pilogqq

orqX

i= 1

pilogq1pi·1 (2.16)

withequalityifandonlyifpi= 1=qforalli. Itiscommontowritetheaboveexpressionintermsoflog2 , whichyields

qX

i= 1

pilog21pi·log2 q (2.17 )

T heabovesum isknownastheentropyofasourceinwhichtheeventsarestatisticallyindependent.

H (E) = ¡qX

i= 1

pilog2 pi bits/event (2.18)

T heentropy, whichwillbediscussed inconsiderabledetailinthenextlecture, measures theaverageamountofuncertaintythatanobserverhasaboutthenextsourceevent. T heuncertaintyismaximum whenallstatesareequallylikely. T hatis, from (2.18)

H (E)·log2 (q) (2.19 )

whereq isthesizeoftheeventspace, withequalityifandonlyifalleventsareequallylikely.

Fornow, theentropyis justan interestingquantitythatenables us toputalowerboundontheaveragecodewordlength. L ety1;y2 ;:::;yq beanynumericalquantitiesthatsatisfyyi¸0 forallisuchthat

P qi= 1yi= 1. T hen

byTheorem6weknowthat

qX

i= 1

pilog1pi·

qX

i= 1

pilog1yi

(2.20)


withequalityifandonlyifyi= piforalli. N owletuschoosetheyitobe

yi=r¡niqP

j= 1r¡nj

(2.21)

T hen

H (E) · ¡qX

i= 1

pi(log2 r¡ni)+

qX

i= 1

pi

Ãlog2

qX

j= 1

r¡nj!

· log2 rqX

i= 1

pini+ log2

ÃqX

j= 1

r¡nj!

· ¹n log2 r + log2qX

j= 1

r¡nj (2.22)

T hesuminthelasttermmustsatisfytheKraftinequalityifthecodeistobeuniquelydecodable. T helogarithmofthisterm isthereforenon-positive.T herefore, wecanwritetheaboveinequalityas

¹n log2 r¸H (E)

or

¹n ¸ H (E)log2 r

(2.23)

T hee¢ciencyofacodeisde…nedas

´ =H (E)¹n log2 r

(2.24)

A codewillhaveane¢ciencyofonewhenthewordlengthsfromthealphabetAr arechosentomatchtheentropy. T his issue is exploredfurtherinthehomework.

A codethatisclosetothelowerboundfortheminimumaveragelengthcanbeconstructedbyasimplemethod. Choosethecodewordlengths tosatisfy

logr1pi·ni< logr

1pi+ 1 (2.25)


Ifpi= r¡ki whereki isaninteger, thenchooseni= ki, otherwisechoosenitobethe…rstintegergreaterthan¡logr pi. T heaveragecodewordlengthcanbefoundbymultiplyingthroughtheaboveequationandsummingoverallthesourceevents.

qX

i= 1

pilogr1pi·

qX

i= 1

nipi<qX

i= 1

pilogr1pi+

qX

i= 1

pi (2.26)

H (E)log2 r

·¹n <H (E)log2 r

+ 1 (2.27 )

Example 7 A n imagecodingproblem. A 10 2 4 £10 2 4 imagehaseightgreylevels. A histogram analysis has foundthatthethelevelshavetheprobabili-ties listedinthetablebelow. A lsoshownaretwoinstantaneousbinarycodesthatcouldbeusedtorepresenttheimage. ThewordlengthsofCodeB satisfy(2.25)whilethoseofCodeA wereconstructedbytheH u¤mancodingproce-duretobediscussedinL ecture3. Itisreadilyveri…edthatbothcodes satisfytheKraftinequalityandareinstantaneous.

pi log2 pi CodeA CodeB nAi nBi0.25 2. 00 00 2 20.2 2.32 10 100 2 30.17 2.56 010 010 3 30.15 2.7 4 110 110 3 30.1 3.32 111 1110 3 40.08 3.64 0111 0111 4 40.03 5.06 01100 011000 5 60.02 5.64 01101 011010 5 6

Theaveragenumberofdigits percodewordare

nA =8X

i= 1

pinAi = 2 :73

nB =8X

i= 1

pinB i = 3:0 8

2.2. A M EA SU R EO F IN FO R M A T IO N 21

Clearly, Code B is noprize since itrequires moredigits perpixelthanwouldberequiredbyusingastraightthree-digitbinarycode. H owever, CodeA uses an average ofless than three digits perpixel. The entropyoftheimage, assumingthepixelsarestatisticallyindependent, is H = 2 :7bits perpixel. H ence, CodeA is actuallyquitee¢cient.

W ewill…ndthatthesimpleShannoncodingtechniqueof(2.25) canac-tuallybequitee¢cientwhengroupingsofpixelsratherthansinglepixelsareused. Itis thetheoreticalbasis ofcomputations ofasymptotice¢ciencyofcodingmethods.

2.2 A M easureofInformationInthislecturewewillexaminetherelationshipsbetweeneventsandobserva-tions. SupposethatthereisaspaceX ofeventsandaspaceY ofobservations.H owmuchinformationisprovidedbyaparticularobservationy2Y abouteachpossibleeventx 2X ?Thisquestioncanbeappliedtoanysituationinwhichthereis aneedtorelateobservationstounderlyingcauses. Itsrootswereinthe…eldofcommunication, wherethespaceX isthesetofsymbolsormessages and the space Y is the spaceofsignals, butithas farwiderapplication. Ingeneral, X andY arecollectionsofrandomeventsconnectedbysomeprocess, asshowninFigure2.1.

ProcessEvent Observation

Figure2.1: Eventsandobservationsmayberelatedbyanykindofprocess.T hee¤ectoftheprocess ismodeledbytheconditionalprobabilityfunctionp(yjx):

Itis presumedthatthejointprobabilitydistribution p(x;y) isavailableforanypoint(x;y) inthespaceX £Y . A llofourresultswillultimatelyberelatedtothisdistribution. T hetheorywillapplyifandonlyifthisfunctioncanbeconstructedwhenmodelingarealproblem.

T herulesofprobabilitycanbeusedtorelatethejointprobabilitytothe


p(y |x ) p (y )p (x )0.5

0.5

0.9

0.9

0.10.1

y 1

y 2

x 1

x 2

0.5

0.5

Figure2.2: Input/outputrelationshipsforabinarysymmetricchannelwitherrorprobability0.1.

marginalandconditionalprobabilities.

p(x) =P

y2Yp(x;y) p(yjx) = p(x;y)p(x)

p(y) =P

x2X p(x;y) p(xjy) = p(x;y)p(y)

Itis commonthattheconditionalrelationship p(yjx) is availableforaprocess. T his isthenormalsituationwhenmodelinganobservationsystem,communicationchannel, recordingsystem, andthelike. Ifonethenmodelsthedistributionofinputevents bychoosingp(x) itis possibletocalculatethejointprobabilitydistributionandalloftheotherprobabilitiesabove.

W eareinterestedintheamountofinformationabouttheinputthatisprovidedbytheobservedoutput. Beforetheobservation, aparticularinputhadprobabilityp(x)andaftertheobservationithasprobabilityp(xjy). A nyobservationoftheoutputchangestheprobabilitydistributionovertheinput.Somepossibleinputswillbemademorelikelyandsomeless likely. Intheextreme, oneeventmaybemadecertainandtheothersmadeimpossible. W ewouldliketohaveareasonablewaytodescribetheamountofinformationthatisprovidedabouttheinputbyanyparticularobservation.

A n example ofthe common digitalcommunication channelcalled thebinarysymmetricchannel(B SC) isshowninFigure2.2. T hismodelisusedformanydigitalcommunicationandrecordingsystems. B oththeinputandoutputspacecontaintwoevents. T heprobabilityoferroris equaltothecrossoverprobability. Forthisexample, itisassumedthattheinputeventsareequallylikely, which, becauseofthechannelsymmetry, causestheoutput


symbolstoalsobeequallylikely.A nobserver, locatedattheoutputside, assumesauniform probability

distribution p(x1) = p(x2 ) = 0:5 beforeanyobservationismade. A fteranobservationismade, sayy1, theprobabilities changetop(x1jy1) = 0:9 andp(x2jy1) = 0:1. T he“evidence,” althoughnotcompletelycertain, pointstox1 as the cause oftheobservation y1. H owmuch “information” has theobserverreceivedabouttheinput?

Themeasureofinformationisbasedonthefollowingde…nition:

D e…nition8 Theamountofinformationprovidedbytheoccurrenceoftheeventy2Y abouttheoccurrenceoftheeventx 2X isde…nedas

I (x;y) = logp(xjy)p(x)

(2.28)

T hebaseofthelogarithmdeterminestheunitsofinformation. W henthebaseis 2, theunitis thebit2. T hisde…nitionprovidesanatural, intuitive,interpretationof“information” thatwillbedevelopedasweinvestigateitsproperties.

T heinformationmeasurehas thepropertythatiftheobservationy in-creasestheprobabilityofx thenitispositive. T hatis, ifp(xjy) > p(x) thenI (x;y) > 0 . Ifweexpand(2.28)wehave

I (x;y) = ¡logp(x)+ logp(xjy) (2.29 )

T his expressioncanbegivenanintuitivemeaningbyde…ningatheuncer-taintyofanevent.

D e…nition9 Theuncertaintyofaneventx is¡logp(x).

W e see thattheuncertainty is zero ifp(x) = 1 and increases as theprobabilitydecreases.

Forthe B SC ofFigure 2.2 the uncertainty aboutboth x1 and x2 is¡log2 0:5= 1 bitbeforetheobservation. T hisuncertaintycanberemovedbytellingtheobserverwhichoftwoequallylikelyeventshasoccurred. A fterobservingy1 theprobabilityofx1 has changed to p(x1jy1) = 0:9 , and its

2T heunits “nat” and“H artley” areoftenusedwiththebaseseand10, respectively.T hedecimalunitis inhonorofcommunicationpioneerR .V .L . H artley


uncertaintynowis¡log2 0:9 = 0:152 bit. Itsuncertaintyhasdecreasedbe-causeitsprobabilityhas increasedfrom itsoriginalvalue. T heinformationreceivedabouttheeventx1 is 1 ¡0:152 = 0:84 6 bit. A tthesametime,theprobabilityofx2 haschangedtop(x2jy1) = 0:1. Its uncertaintynowis¡log2 0:1 = 3:32 2 bit. T heinformationreceivedabouttheeventx2 is -2.322bit. Ifitlaterturnsoutthatx1 actuallyoccurredthentheinformationre-ceivedgaveacorrectindicationandispositive. H owever, ifanerroroccurredthenitisnegative.

T heinformationgainedbytheobservationcanthenbeexpressedas

I (x;y) = Initialuncertainty- FinalU ncertainty (2.30)

T heamountofinformationaboutx thatisproducedbytheobservationyequalstheinitialuncertaintyaboutxminustheuncertaintyofx conditionedony.

T heterm “uncertainty” canberelatedtothefeelingofsurprise. T hereislittlesurpriseintheoccurrenceofaneventwithprobabilityclosetoone,butthereisgreatsurpriseattheoccurrenceofarareevent.

2.2.1 SelfInformationSupposethatanobservationyremovesalloftheuncertaintyaboutx. T hatis, p(xjy) = 1. T hiswouldbethecaseforanidealerror-freechannel. T hen,sincelogp(xjy) = 0 , the amountofinformation providedmustequaltheinitialuncertaintyaboutx. W ecallthistheselfinformation.

D e…nition10 Theselfinformationassociatedwithaneventx 2X is

I (x) = ¡logp(x) (2.31)

Itis commontowrite I (p)whenonewantstofocusontheprobabilityp(x)ratherthanontheevent. A plotofI (p) isshowninFigure2.3. W eseethataneventhasnouncertaintywhen p = 1 andin…niteuncertaintywhenp = 0 . I (p) is ameasureofour“surprise” whenaneventofprobability poccurs. W earenotverysurprisedifp ¼1 andverysurprisedifp ¼0 .

W ehaveappliedthemodelofpartialinformationgaintotransmissionthroughabinarycommunication channel. Itcanalsobeapplied tocode


p 10.80.60.40.20

7

6

5

4

3

2

1

0

Figure2.3: U ncertaintyofaneventasafunctionofitsprobability.

applications. W ehaveseenthatasetofeventscanberepresentedbycode-words. W hen acodeword is transmitted symbol-by-symbol, each symboladds tothequantityofinformationcarriedbytheword. A s each symbolis receivedattheotherend, itadds informationabouttheeventthatwastransmitted. Eachsymbolofthereceivedcodewordwillmakesomeeventslesslikelyandothersmorelikelyuntilthe…naldigitmakesoneeventcertainandtheothers impossible. W ewillexplorethiswhenweexaminedecodingtreesbelow.

2.2.2 Entropyas A verageSelfInformationSupposewewanttoknowtheaverageamountofinformationthatisprovidedbythesource. T his is theamountofinformationperinputeventthatwillhavetobecarriedbythechannelorstoredinarecordingsystem. W ecan…ndtheanswerbyaveragingovertheselfinformationofeachx2X .

T heselfinformation I (x) = ¡logp(x) isarandomvariablewhosevalueis determined bytheeventx. B eingarandom variable, itis possibletocomputeitsaveragevalue.

E[I ] =X

x2Xp(x)I (x)

= ¡X

x2Xp(x)logp(x) = H (X ) (2.32)


T hisisrecognizedasthequantitywenamedentropyinL ecture2. T here-fore, entropycanbeinterpretedastheaverageinformationoftheunderlyingeventspace. W henthelogarithm ofbase2 is used, theunits arebits perevent. W efoundthattheaveragecodewordlength ¹n cannotbelessthanHdigits. T hus, eachdigitofabinarycodecannot, ontheaverage, carrymorethanonebitofinformation.

In L ecture2 weshowedthattheentropyforaneventspaceofsizeq isboundedby

0 ·H (E)·log2 q (2.33)

withH (E) = 0 ifsomep(ei) = 1 andH (E) = log2 q ifallp(ei) = 1=q. Inthe…rstcaseonlyoneoftheeventscanoccursothereisnouncertaintyandinthesecondanyeventcanoccurwithequalprobabilitysotheuncertaintyismaximum.

T he B SC provides a usefulexample. L etthe inputprobabilities bep(x1) = p and p(x2 ) = 1¡p. T heentropyassociatedwithabinaryeventspaceisafunctionofthesinglevariable, p, andcanbeplottedasasimplecurve.

H (p) = ¡plog2 p¡(1¡p)log2 (1¡p) (2.34)

A s illustrated in Figure 2.4, this function is symmetricaboutp = 0:5, ismaximum wheretheevents areequallyprobableand is zerowhereeithereventis certain. T his means thatifp 6= 0:5 it is possible torecord orcommunicatetheinformationfromthissourceatarateoflessthanonebitperevent. T hiscan, infact, beapproximatedcloselybyH u¤manencodingofdoubletsortripletsfromthesource.

T heentropynowhasameaningintermsoftheaveragequantityofin-formationneededtorecordtheoutputofasource. W ehaveshownthattheentropyofasourcethatgenerates statistically independentevents canbecalculatedby(2.32). T herearemanysources thatdonot…ttheindepen-denteventmodel, andmoreanalysiswillbeneededtoconstructamethodtocomputeitforsuchsituations. Forexample, estimationoftheentropyofEnglishtextisverydi¢cultbecauseoneneedstotakealloftheconstraintsofspellingandgrammarintoaccountsomehow. Butwewill…ndthathow-everdi¢cultitis toestimateH , itprovidesalowerboundontheamountofdataneededtorepresentevents. Itisthereforeafundamentalquantityinthedesignofe¢cientcommunicationorrecordingsystems.


p 10.80.60.40.20

1

0.8

0.6

0.4

0.2

Figure2.4: EntropyH (p)asafunctionofp forabinaryeventspace.

Itshould be noted thatentropy is associated with long sequences ofevents, notwith individualevents. Itis an averageoverthe informationproducedbymanyevents. Itis meaningless tospeakoftheentropyofanindividualevent.

2.2.3 D ecodingTreesandPartialInformationL etussupposethateventsaretoberepresentedbycodewordswhosesymbolsaredrawnfromanalphabetofsizer. Inane¢cientcode, codewordlengthis relatedtoeventprobabilitysothatshortcodewords areassociatedwitheventsofhighprobability. W ewillnowseethatcodewordlengthisrelatedtoentropy.

T heuncertaintyofaneventcanberelatedtothelengthofitscodeword.Eventswithhighprobability, and, therefore, lowuncertaintyshouldgetshortcodewordswhilethosewithlowprobability, andhighuncertainty, shouldgetlongcodewords. A n e¢cientbinarycodehas words oflength¡log2 pi·ni < ¡log2 pi+ 1. T hus, thebinarycodewordforaneventei shouldhaveaboutthesamenumberofdigitsastheuncertaintymeasuredinbits. T heycannotbeequalunlesstheuncertaintyisaninteger, inwhichcasepi= 2 ¡niand ¹n = H (E). T hecodeinTable2.2 isanexample.

Supposenowthatsomeonebegins toshowyouacodewordadigitatatime. T he…rstdigitgivesyousomeinformationabouttheevent, thesecond


i pi ¡logpi w i ni1 1/2 1 0 12 1/4 2 10 23 1/8 3 110 34 1/8 3 111 4

Table2.2: Exampleofa100% e¢cientcode.

i pi ¡logpi w i ni1 0.4 1.32 0 12 0.3 1.7 4 10 23 0.2 2.32 110 34 0.1 3.32 111 4

Table2.3: Exampleofacodethatislessthan100% e¢cient.

gives youmore, and soon. Finally, whenyouhave seenallthedigits inthe codeword, you knowwhich eventhas occurred. Each digitgives youinformationandreduces youruncertainty. T hatiswhyweusethesymbolI (ei): itrepresentstheinformationrequiredtoremovetheuncertainty. W eequateinformationtoreductioninuncertaintyandmeasureitinbits.

B ecause¹n ¸H (E), eachdigitcanconvey, ontheaverage, atmostonebitofinformation. W eoftentalkaboutthesymbols ‘0’and‘1’as “bits” whentheyarereallydigits. A bitisaunitofinformation. A digitcarriesx bitsofinformation, wherex dependsuponthesituation.

A decodingtreeforthecodeinTable2.2 is showninFigure2.5. Eachbranchshowsthedigitrelatedtoitaswellasapair(p;I (p)). Supposethatyouaregivenacodeworddigitbydigit. Eachdigitchoosesbetweenanupperorlowerbranch. T he…rstsymbolhas probability p = 0:5ofbeing1 or0,sothe…rstdigitcarrieslog2 = 1 bitofinformation. Ifthe…rstsymbolis‘1’yougofromnodeatonodeb. A gain, eitherpathisequallylikely, andsoon.A teachstepthenewinformationremovesafullbitofuncertainty. IfI (p) issummedalongthepathitequalsthetotaluncertaintyoftheevent, which,inthiscase, isanintegerequaltothecodewordlength.

Supposenowthattheprobabilitiesareslightlydi¤erent, sothat¡logpiisnotaninteger. T hissituationisshowninTable2.3. T hecodetreeisshown


Figure2.5: D ecodingtreeforthecodeofTable3.1.


Figure2.6: D ecodingtreeforthecodeofTable3.2.


inFigure2.6. T heprobabilityp oftakingagivenbranchfromitsnodeisnownotalways0.5, sotheuncertaintyisnotalways1 bit. T heuncertaintyofeachbranchisthereforenotaninteger. T hesum oftheuncertaintiesalongeachpathisequaltotheuncertaintyoftheevent, andthisisclosetothecodewordlength. T heaveragecodewordlengthisnowanumberslightlygreaterthantheentropy. Ifonemultipliestheprobabilitiesoraddstheuncertaintiesalongapathleadingtoanevent, theresultistheprobabilityortheuncertaintyoftheevent, shownincolumns2 and3ofTable2.3.

W enowseethatthere is acloseassociation between informationanduncertainty. Eachsymbolinacodewordcarriesinformationthatreducestheuncertaintyabouttheevent. L owprobabilityeventsrequiremoredigits inane¢cientcodebecausetheyhavemoreuncertainty.

Eachdigitofacodewordreduces theuncertaintybut, unless itis thelastone, doesnoteliminateit. Tomakethisexplicit, letaijbesymboljofcodewordwiassociatedwitheventei. B eforeanydigithas beenobserved,the uncertaintyabout ei is¡logp(ei). A ftersymbola11 is observed theprobabilityofeihaschangedtop(eija11), whichistheprobabilityconditionedontheobservedsymbol. T heuncertaintymustnowbe¡logp(eija11). T hedi¤erenceintheuncertaintyistheinformationprovidedbya11.

T heinformationabouteiprovidedbya11 is

I (eija11) = logµp(eija11)p(ei)

¶

SupposethateventspaceofTable 2.3 is used and thattheevent e3 hasoccurred. T hemessageis, therefore, w3 = (1;1;0 ). T heoriginalprobabilityis p(e3) = 0:2 , which corresponds to2.32 bits ofuncertainty. A fterthe…rstdigitis receivedtheprobability is p(e3j1) = 1=3andtheuncertaintyhaschangedto¡log(1=3) = 1:58 bits. T heinformationreceivedwas 2.32-1.58= .7 4 bits. N otethatthis is thequantityonthatbranchofFigure2.6.W hentheseconddigitisreceivedtheprobabilityis p(e3j11) = 2 =3andtheuncertaintyislog(2 =3) = :58 bits. T heinformationreceivedwas1.58 -.58= 1bit, whichisthequantityonthatbranchofthetree. T he…naldigitincreasestheprobabilitytop(e3j110 ) = 1 sothattheuncertaintyisremoved. T hisdigitcarriedtheremaining.58 bitsofinformation.

T his exampleshows thattheamountofinformationcarriedbyadigitdepends upontheeventandtheprecedingdigits. Even iftheevents areindependent, itis notnecessarilytrue thatthedigits in acode sequencearestatisticallyindependent. H owever, wecansaythatina100% e¢cient


codeeachdigitmustcarryafullbitofinformationandmustbestatisticallyindependentofallotherdigits. Toseethislatterpoint, carryoutananalysisofthecodeofFigure2.5 inamannerthatparallelstheprecedingparagraph.

Date post:	29-Nov-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Chapter 2 AMeasure of Information - Chester F. Carlson ... · 2,the Kraft inequality is Xq i=1 2¡n...

Documents