+ All Categories
Home > Documents > An Introduction to Information Theory and Entropy - UCLA

An Introduction to Information Theory and Entropy - UCLA

Date post: 03-Feb-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
17
Ivo D. Dinov UCLA Statistics http://www.stat.ucla.edu/~dinov Courses & Students An Introduction to Information Theory and Entropy 1 1. Measuring complexity Researchers in the field of complexity face a classic problem: How can we tell that the system we are looking at is actually a complex system? Should we even be studying such a system? Of course, in practice, we will study the systems that interest us, for whatever reasons, so the problem identified above tends not to be a real problem. On the other hand, having chosen a system to study, we might well ask: How complex is this system? In this more general context, we probably want at least to be able to compare two systems, and be able to say that system A is more complex than system B. Eventually, we probably would like to have some sort of numerical rating scale. We can’t expect to be able to come up with a single universal measure of complexity . The best we are likely to have is a measuring system useful by a particular observer, in a particular context, for a particular purpose. Our focus here will be on measures related to how surprising or unexpected an observation, or event, is . This approach has been described as information theory. 2. Some probability background There are two main notions of probability of an event happening. These are: 1 Based on references at the end of the manuscript. Ivo D. Dinov UCLA Statistics http://www.stat.ucla.edu/~dinov Courses & Students A frequentist version of probability: In this version, we assume we have a set of possible events, each of which we assume occurs some number of times. Thus, if there are N distinct possible events (x 1 , x 2 , … , x N ), no two of which can occur simultaneously, and the events occur with frequencies (n 1 , n 2 , … , n N ), we say that the probability of event x i is given by P(x i ) = = N 1 j j n i n . This definition has the nice property that 1 1 ) ( = = N i i x P . An observer relative ( Bayesian ) version of probability: In this version, we take a statement of probability to be an assertion about the belief that a specific observer has of the occurrence of a specific event. Note that in this version of probability, it is possible that two different observers may assign different probabilities to the same event . Furthermore, the probability of an event is likely to change as we learn more about the event, or the context of the event. In some cases, we may be able to find a reasonable correspondence between these two views of probability. In particular, we may sometimes be able to understand the observer relative version of the probability of an event to be an approximation to the frequentist version , and to view new knowledge as providing us a better estimate of the relative frequencies. Some probability basics, where A and B are events: P(~A) = P(A c )=1 - P(A) P(A U B) = P(A) + P(B) - P(A I B). We will often denote P(A I B) by P(A , B). If P(A, B) = 0, we say A and B are mutually exclusive events .
Transcript
Page 1: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

An

Intro

duct

ion

to In

form

atio

n Th

eory

and

Ent

ropy

1 1.

Mea

surin

g co

mpl

exity

Rese

arch

ers

in th

e fie

ld o

f co

mpl

exity

fac

e a

clas

sic

prob

lem

: H

ow c

an w

e te

ll th

at

the

syst

em w

e ar

e lo

okin

g at

is

actu

ally

a c

ompl

ex s

yste

m?

Sho

uld

we

even

be

stud

ying

suc

h a

syst

em?

Of c

ours

e, in

pra

ctic

e, w

e w

ill s

tudy

the

syst

ems

that

inte

rest

us,

for

wha

teve

r re

ason

s, so

the

pro

blem

ide

ntifi

ed a

bove

ten

ds n

ot t

o be

a r

eal

prob

lem

. O

n th

e ot

her

hand

, hav

ing

chos

en a

sys

tem

to s

tudy

, we

mig

ht w

ell

ask:

How

com

plex

is th

is sy

stem

?

In t

his

mor

e ge

nera

l co

ntex

t, w

e pr

obab

ly w

ant

at l

east

to

be a

ble

to c

ompa

re t

wo

syst

ems,

and

be

able

to

say

that

sys

tem

A i

s m

ore

com

plex

tha

n sy

stem

B.

Even

tual

ly, w

e pr

obab

ly w

ould

like

to h

ave

som

e so

rt of

num

eric

al ra

ting

scal

e.

We

can’

t exp

ect t

o be

abl

e to

com

e up

with

a s

ingl

e un

iver

sal m

easu

re o

f com

plex

ity.

The

best

we

are

likel

y to

hav

e is

a m

easu

ring

syst

em u

sefu

l by

a pa

rticu

lar

obse

rver

,

in a

par

ticul

ar c

onte

xt, f

or a

par

ticul

ar p

urpo

se.

Our

foc

us h

ere

will

be

on m

easu

res

rela

ted

to h

ow s

urpr

isin

g or

une

xpec

ted

an

obse

rvat

ion,

or e

vent

, is.

Thi

s app

roac

h ha

s be

en d

escr

ibed

as i

nfor

mat

ion

theo

ry.

2. S

ome

prob

abili

ty b

ackg

roun

d

Ther

e ar

e tw

o m

ain

notio

ns o

f pro

babi

lity

of a

n ev

ent h

appe

ning

. Th

ese

are:

1 Bas

ed o

n re

fere

nces

at t

he e

nd o

f the

man

uscr

ipt.

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

A f

requ

entis

t ve

rsio

n of

pro

babi

lity:

In

this

ver

sion

, w

e as

sum

e w

e ha

ve a

set

of

poss

ible

eve

nts,

each

of

whi

ch w

e as

sum

e oc

curs

som

e nu

mbe

r of

tim

es.

Thus

, if

ther

e ar

e N

dist

inct

pos

sibl

e ev

ents

(x 1

, x 2

, …

, x

N),

no t

wo

of w

hich

can

occ

ur

sim

ulta

neou

sly,

and

the

even

ts o

ccur

with

freq

uenc

ies

(n1,

n 2, …

, n N

), w

e sa

y th

at th

e

prob

abili

ty o

f ev

ent

x i is

giv

en b

y P(

x i) =

∑=N

1j

jnin

. Th

is d

efin

ition

has

the

nic

e

prop

erty

that

1

1)

(=

∑=N i

ixP

.

An

obse

rver

rel

ativ

e (B

ayes

ian )

ver

sion

of

prob

abili

ty:

In t

his

vers

ion,

we

take

a

state

men

t of p

roba

bilit

y to

be

an a

sser

tion

abou

t the

bel

ief t

hat a

spe

cific

obs

erve

r has

of t

he o

ccur

renc

e of

a s

peci

fic e

vent

. N

ote

that

in

this

vers

ion

of p

roba

bilit

y, i

t is

poss

ible

tha

t tw

o di

ffere

nt o

bser

vers

may

ass

ign

diffe

rent

pro

babi

litie

s to

the

sam

e

even

t. Fu

rther

mor

e, th

e pr

obab

ility

of a

n ev

ent i

s lik

ely

to c

hang

e as

we

lear

n m

ore

abou

t the

eve

nt, o

r the

con

text

of t

he e

vent

.

In s

ome

case

s, w

e m

ay b

e ab

le t

o fin

d a

reas

onab

le c

orre

spon

denc

e be

twee

n th

ese

two

view

s of

pro

babi

lity.

In

parti

cula

r, w

e m

ay s

omet

imes

be

able

to u

nder

stan

d th

e

obse

rver

rel

ativ

e ve

rsio

n of

the

prob

abili

ty o

f an

even

t to

be a

n ap

prox

imat

ion

to th

e

frequ

entis

t ver

sion,

and

to v

iew

new

kno

wle

dge

as p

rovi

ding

us

a be

tter

estim

ate

of

the

rela

tive

frequ

enci

es.

Som

e pr

obab

ility

bas

ics,

whe

re A

and

B a

re e

vent

s:

P(~A

) = P

(Ac )=

1 - P

(A)

P(A U

B) =

P(A

) + P

(B) -

P(A

IB)

.

We

will

ofte

n de

note

P(A

IB)

by

P(

A , B

). If

P(A,

B)

= 0,

we

say

A an

d B

are

mut

ually

exc

lusiv

e ev

ents .

Page 2: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

Con

ditio

nal

prob

abili

ty:

P(A

| B

) is

the

pro

babi

lity

of A

, gi

ven

that

we

know

B

occu

rred

. The

join

t pro

babi

lity

of b

oth

A an

d B

is g

iven

by:

P(A

, B) =

P(A

| B)

P(B

).

Sinc

e P(

A, B

) = P

(B, A

), w

e ha

ve B

ayes

' The

orem

:

P(B)

P(A)

A) |

P(B

B) | P(

AP(

A),

A) |

P(B

P(B)

B) |

P(A

×=

×=

×.

If tw

o ev

ents

A a

nd B

are

suc

h th

at P

(A |

B) =

P(A

), w

e sa

y th

at th

e ev

ents

A a

nd B

are

inde

pend

ent .

Fro

m B

ayes

' The

orem

, we

will

als

o ha

ve th

at P

(B |

A) =

P(B

), an

d

furth

erm

ore,

P(A

, B) =

P(A

| B)

P(B

) = P

(A) P

(B).

This

last

equ

atio

n is

ofte

n ta

ken

as

the

defin

ition

of i

ndep

ende

nce .

We

have

in e

ssen

ce b

egun

her

e th

e de

velo

pmen

t of

a m

athe

mat

ical

met

hodo

logy

for

draw

ing

infe

renc

es a

bout

the

wor

ld fr

om u

ncer

tain

kno

wle

dge.

We

coul

d sa

y th

at o

ur

obse

rvat

ion

of th

e co

in s

how

ing

head

s gi

ves

us in

form

atio

n ab

out t

he w

orld

. W

e no

w

deve

lop

a fo

rmal

mat

hem

atic

al d

efin

ition

of

the

info

rmat

ion

cont

ent

of a

n ev

ent,

whi

ch o

ccur

s w

ith a

cer

tain

pro

babi

lity.

3. A

xiom

atic

Dev

elop

men

t of I

nfor

mat

ion

Theo

ry

We

now

wan

t to

deve

lop

a us

able

mea

sure

of t

he in

form

atio

n w

e ge

t fro

m o

bser

ving

the

occu

rren

ce o

f an

eve

nt h

avin

g pr

obab

ility

p. O

ur f

irst r

educ

tion

is to

igno

re a

ny

parti

cula

r fe

atur

es o

f th

e ev

ent,

and

only

obs

erve

whe

ther

or

not

it ha

ppen

ed.

In

esse

nce

this

mea

ns th

at w

e ca

n th

ink

of th

e ev

ent a

s ob

serv

ance

of

a sy

mbo

l who

se

prob

abili

ty o

f occ

urrin

g is

p.

We

will

thus

be

defin

ing

the

info

rmat

ion

in te

rms

of th

e pr

obab

ility

p.

The

follo

win

g re

pres

ent a

set o

f rea

sona

ble

axio

ms f

or a

n in

form

atio

n m

easu

re I(

p):

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

1.

Info

rmat

ion

is a

non

-neg

ativ

e qu

antit

y: I

(p)≥

0.

2.

If a

n ev

ent h

as p

roba

bilit

y 1,

we

get n

o in

form

atio

n fr

om th

e oc

curr

ence

of

the

even

t: I(

1)=

0.

3.

If tw

o in

depe

nden

t ev

ents

occ

ur (

who

se j

oint

pro

babi

lity

is t

he p

rodu

ct o

f

thei

r in

divi

dual

pro

babi

litie

s), t

hen

the

info

rmat

ion

we

get f

rom

obs

ervi

ng th

e

even

ts is

the

sum

of t

he tw

o in

form

atio

ns:

I(p1

*p2)

= I(

p1) +

I(p2

).

4.

We

will

wan

t ou

r in

form

atio

n m

easu

re t

o be

a c

ontin

uous

(an

d, i

n fa

ct,

mon

oton

ic)

func

tion

of t

he p

roba

bilit

y (s

light

cha

nges

in

prob

abili

ty s

houl

d

resu

lt in

slig

ht c

hang

es in

info

rmat

ion)

.

Cor

olla

ries

of t

hese

axi

oms

incl

ude:

1.

I(p2 ) =

I(p*

p) =

I(p)

+ I(

p) =

2*I

(p).

2.

Thus

, I(p

n ) = n

*I(p

), by

indu

ctio

n.

3.

I(p)

= I

((p1/

m)m

) =

m *

I(p

1/m),

so I

(p1/

m)

= (

1/m

)*I(

p) a

nd t

hus,

in g

ener

al,

I(pn/

m)=

(n/m

)*I(

p).

4.

And

thus

, by

cont

inui

ty, w

e ge

t, fo

r 0 <

p ≤

1, a

nd 0

< a

I(

pa ) = a

*I(p

).

5.

From

thes

e, w

e ca

n de

rive

the

nice

pro

perty

of i

nfor

mat

ion

mea

sure

:

I(p)

= -l

ogb(p

) = lo

g b(1

/p)

for s

ome

log-

base

b. T

he b

ase

b de

term

ines

the

units

we

are

usin

g. O

f cou

rse,

we

can

chan

ge th

e un

its b

y ch

angi

ng th

e ba

se, u

sing

the

form

ulas

, for

b 1, b

2, x

> 0

, log b

1(x) =

log b

2(x) /

log b

1(b2).

Thus

, usi

ng d

iffer

ent

base

s fo

r th

e lo

garit

hm r

esul

ts in

info

rmat

ion

mea

sure

s w

hich

are

just

con

stan

t m

ultip

les

of e

ach

othe

r, co

rres

pond

ing

with

mea

sure

men

ts i

n

diffe

rent

uni

ts:

1.

log 2

uni

ts a

re b

its (f

rom

bin

ary)

2.

log 3

uni

ts a

re tr

its (f

rom

trin

ary)

3.

log e

uni

ts a

re n

ats (

from

nat

ural

loga

rith

m) (

We

com

mon

ly u

se ln

(x)=

log e

(x))

4.

log 1

0 uni

ts a

re H

artle

ys, a

fter R

.V.L

. Har

tleys

, 194

2.

Page 3: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

Unl

ess

we

wan

t to

emph

asiz

e th

e un

its, w

e ne

ed n

ot b

othe

r to

spec

ify th

e ba

se fo

r the

loga

rithm

, and

sim

ply

writ

e lo

g(p)

. Ty

pica

lly, w

e th

ink

in te

rms o

f log

2(p)

.

Exam

ple:

Su

ppos

e w

e fli

p a

fair

coin

onc

e. T

he o

utco

mes

are

eve

nts

H a

nd T

eac

h

with

pro

babi

lity

½, a

nd th

us a

sin

gle

flip

of a

coi

n gi

ves

us

–log

2(1/

2) =

1, b

it of

info

rmat

ion

(whe

ther

the

outc

ome

is a

H o

r T).

Flip

ping

a f

air

coin

n t

imes

(or

, eq

uiva

lent

ly,

inde

pend

ently

flip

ping

n f

air

coin

s)

give

s us

–lo

g 2((

1/2)

n ) =

log 2

(2n )

= n*

log 2

(2)

= n

bits

of

info

rmat

ion.

We

coul

d

rand

omly

gen

erat

e (s

ee h

ttp://

socr

.sta

t.ucl

a.ed

u/)

a se

quen

ce o

f 25

flip

s as

, fo

r

exam

ple:

{HTH

HTT

HTH

HH

THTT

THTH

HH

THTT

}

or, u

sing

1 fo

r H a

nd 0

for T

, the

25

bits

{101

1001

0111

0100

0101

1101

00}.

We

thus

get

the

nice

fact

that

n fl

ips

of a

fair

coin

giv

es u

s n

bits

of i

nfor

mat

ion,

and

take

s n

bina

ry d

igits

to s

peci

fy.

That

thes

e tw

o qu

antit

ies

are

the

sam

e re

assu

res

us

that

we

have

don

e a

reas

onab

le a

xiom

atic

def

initi

on o

f inf

orm

atio

n m

easu

re.

4. S

ome

Entro

py T

heor

y

Supp

ose

now

that

we

have

n sy

mbo

ls {

a 1, a

2, …

, an}

, and

som

e so

urce

is p

rovi

ding

us

with

a s

tream

of

thes

e sy

mbo

ls.

Supp

ose

furth

er th

at th

e so

urce

em

its th

e sy

mbo

ls

with

pro

babi

litie

s {p

1, p 2

, …

, p

n},

resp

ectiv

ely.

For

now

, w

e al

so a

ssum

e th

at t

he

sym

bols

are

em

itted

inde

pend

ently

(su

cces

sive

sym

bols

do

not d

epen

d in

any

way

on

past

sym

bols

).

Wha

t is

the

aver

age

amou

nt o

f inf

orm

atio

n w

e ge

t fro

m e

ach

sym

bol w

e se

e in

the

stre

am?

Wha

t w

e re

ally

wan

t he

re i

s a

wei

ghte

d av

erag

e.

If w

e ob

serv

e th

e

sym

bol a

i, w

e w

ill b

e ge

tting

log(

1/p i

) in

form

atio

n fr

om th

at p

artic

ular

obs

erva

tion.

In a

long

run

of (

say

N) o

bser

vatio

ns, w

e w

ill s

ee (a

ppro

xim

atel

y) N

* p

i occ

urre

nces

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

of t

he s

ymbo

l a i

(in

the

fre

quen

tist

sens

e, t

hat's

wha

t it

mea

ns t

o sa

y th

at t

he

prob

abili

ty o

f see

ing

a i is

pi).

Thu

s, in

the

N (in

depe

nden

t) ob

serv

atio

ns, w

e w

ill g

et

tota

l inf

orm

atio

n I o

f

∑=

××

=

n

1i

ipip

N

(T

I) n

Info

rmat

io

Tota

l1

log

And

ther

efor

e, th

e av

erag

e in

form

atio

n w

e ge

t per

sym

bol o

bser

ved

will

be

∑=

×=

∑=

××

=

n

1i

ipip

n1

iip

ipN

N1 I

1lo

g1

log

Not

e th

at

() 0

1lo

g0

lim=

x

xx

, so

w

e ca

n,

for

our

purp

oses

, de

fine

p i*l

og(1

/ pi)

to b

e 0

whe

n p i

= 0

. Thi

s br

ings

us

to a

fun

dam

enta

l def

initi

on.

This

defin

ition

is e

ssen

tially

due

to S

hann

on in

194

8, in

the

sem

inal

pap

ers

in th

e fie

ld o

f

info

rmat

ion

theo

ry.

As

we

have

obs

erve

d, w

e ha

ve d

efin

ed i

nfor

mat

ion

stric

tly i

n

term

s of

the

prob

abili

ties

of e

vent

s. T

here

fore

, let

us

supp

ose

that

we

have

a s

et o

f

prob

abili

ties (

a pr

obab

ility

dis

tribu

tion

P =

{p 1

, p2,

… ,

p n})

.

Def

initi

on: W

e de

fine

the

(Sha

nnon

-Wie

ner)

ent

ropy

of t

he d

istri

butio

n P

by:

()

kk

kk

plo

gp

p1lo

gp

H(P

)n

1k

n

1k

×−

=

×

=∑

∑=

=

(1)

Ther

e is

an

obvi

ous

gene

raliz

atio

n of

the

entro

py fo

r con

tinuo

us, r

athe

r tha

n di

scre

te,

prob

abili

ty d

istri

butio

n P(

x):

() dx

xP

log

xP

H

(P)

)(

)(

×−

=∫

(2)

Ano

ther

way

to

thin

k ab

out

this

is

in t

erm

s of

exp

ecte

d va

lue.

G

iven

a P

DF/

PMF

P(x)

we

can

defin

e th

e ex

pect

ed v

alue

of a

n as

soci

ated

func

tion

F(x)

by:

dxx

Px

FX

FE

∫×

=)

()

())

((

.

Page 4: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

With

this

def

initi

on, w

e ha

ve th

at: H

(P) =

E(I(

p)).

In o

ther

wor

ds, t

he e

ntro

py o

f a

prob

abili

ty d

istri

butio

n is

just

the

expe

cted

val

ue o

f th

e in

form

atio

n m

easu

re o

f th

at

dist

ribut

ion.

We’

ll di

scus

s the

follo

win

g fe

w im

port

ant p

oint

s:

1.

Wha

t pr

oper

ties

does

the

fun

ctio

n H

(P)

have

? F

or e

xam

ple,

doe

s it

have

extre

ma,

and

if s

o w

here

?

2.

Is e

ntro

py a

rea

sona

ble

nam

e fo

r th

is?

In

parti

cula

r, th

e na

me

entr

opy

is

alre

ady

in u

se i

n ph

ysic

s/th

erm

odyn

amic

s.

How

are

the

se u

ses

of t

he t

erm

rela

ted

to e

ach

othe

r?

3.

Wha

t can

we

do w

ith th

is n

ew to

ol?

5. T

he G

ibbs

ineq

ualit

y

Firs

t, no

te th

at th

e (n

atur

al-lo

g) fu

nctio

n ln

(x) h

as d

eriv

ativ

e 1/

x. F

rom

this

, we

find

that

the

tang

ent t

o ln

(x)

at x

=1

is th

e lin

e y

= x

- 1.

Fur

ther

, sin

ce ln

(x) i

s co

ncav

e

dow

n, w

e ha

ve, f

or x

>0,

that

ln(x

) ≤

x - 1

, with

equ

ality

onl

y w

hen

x =

1.

Now

, giv

en tw

o pr

obab

ility

dis

tribu

tions

,

P =

{p1,

p 2, …

, p n

} and

Q =

{q1,

q 2, …

, q n

},

whe

re p

k, q k

≥ 0

and

1

11

=∑

==

∑=

n kkq

n kkp

, we

have

()

,01

11

11

1ln

=−

=∑

=−

=

∑=

−≤

∑=

n kkp

kq

n kkpkq

kpn k

kpkqkp

(3)

with

equ

ality

onl

y w

hen

pk=

qk

for

all k

. It

is e

asy

to s

ee th

at th

e in

equa

lity

actu

ally

hold

s for

any

log-

base

, not

just

bas

e e.

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

We

can

now

use

the

Gib

bs i

nequ

ality

to

find

the

prob

abili

ty d

istri

butio

n, w

hich

max

imiz

es t

he e

ntro

py f

unct

ion.

Su

ppos

e P

= {p

1, p 2

, …

, p

n}

is a

pro

babi

lity

dist

ribut

ion.

We

have

0≤

∑=

=

∑=

=

∑=

=

∑=

∑=

=

∑=

=

kpkqlo

g kp

n1

k

kpn1lo

g kp

n1

k

log(

n) -

kp1

log

kpn

1k

kpn

1k

lo

g(n)

- kp

1lo

g kp

n1

k

log(

n) -

kp1

log

kpn

1k

lo

g(n)

- H

(P)

(4

)

with

equ

ality

onl

y w

hen

n

kp1

= f

or a

ll k.

Th

e la

st s

tep

is t

he a

pplic

atio

n of

the

Gib

bs in

equa

lity

(3).

Wha

t thi

s m

eans

is th

at:

)lo

g()

(0

nP

H≤

(

5).

In p

artic

ular

, if f

or s

ome

k o, p

Ko=

1 an

d p k

=0

for a

ll k≠

k o, w

e ha

ve H

(P) =

0.

On

the

othe

r en

d of

the

spe

ctru

m,

the

entro

py H

(P)

= lo

g(n)

(m

axim

um p

ossi

ble

entro

py)

only

whe

n al

l of t

he e

vent

s (o

utco

mes

) ha

ve th

e sa

me

prob

abili

ty,

nkp

1=

. Tha

t is,

the

max

imum

of

th

e en

tropy

fu

nctio

n is

lo

g(of

th

e nu

mbe

r of

po

ssib

le

even

ts/o

utco

mes

), an

d oc

curs

whe

n al

l th

e ev

ents

are

equ

ally

lik

ely.

Thi

s ill

ustra

tes

the

entro

py a

s a

mea

sure

of

unce

rtain

ty

hig

h en

tropy

mea

ns lo

ts of

unc

erta

inty

and

low

entro

py y

ield

s hig

h ce

rtain

ty a

bout

the

outc

ome

of th

e pr

oces

s/ex

perim

ent.

Page 5: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

Exam

ple :

How

muc

h in

form

atio

n is

obt

aine

d by

a s

ingl

e ne

urop

sych

iatri

c (N

P) te

st?

Firs

t, th

e m

axim

um in

form

atio

n oc

curs

if

all

outc

omes

/sco

res/

resu

lts o

f th

e N

P te

st

have

equ

al p

roba

bilit

y to

be

obse

rved

(e.g

., in

an

AD

vs.

NoA

D te

st, o

n av

erag

e ha

lf

the

subj

ects

sho

uld

end

up h

avin

g A

D a

nd th

e ot

her

have

sho

uld

not h

ave

dem

entia

)

if w

e w

ant

to m

axim

ize

the

info

rmat

ion

give

n by

the

NP

test

. H

ere

are

seve

ral

com

mon

situ

atio

ns in

dica

ting

the

cond

ition

s fo

r ob

tain

ing

the

max

imum

info

rmat

ion

from

a s

ingl

e N

P te

st re

sult,

we

use

equa

tion

(1):

Expe

rimen

t/Pro

cess

Typ

e M

ax In

form

atio

n [p

lug

in e

quat

ion

(1)

p k=

1/n]

Bina

ry T

est (

AD

vs.

NoA

D)

1 bi

t = lo

g 2(2

)

Five

-leve

l tes

t res

ults

: [E

xtre

me(

E), S

ever

e(S)

, Mod

erat

e(M

), M

CI,

Nor

mal

(N)]

2.3

bits

= lo

g 2(5

)

Twel

ve-le

vel t

est r

esul

ts:

E, E

- , S+ , S

, S- , …

, MC

I- , N

3.6

bits

= lo

g 2(1

2)

Thus

, usi

ng +

/–‘s

giv

es t

he p

atie

nts/

doct

ors

abou

t 1.3

mor

e bi

ts o

f in

form

atio

n, p

er

test

-leve

l, th

an w

ithou

t us

ing

+/–‘

s, an

d ab

out

2.6

bits

per

gra

de m

ore

than

bin

ary

(AD

vs.

NoA

D)

type

tes

t re

sults

. W

hich

is

natu

rally

exp

ecte

d as

we

actu

ally

hav

e

mor

e in

form

atio

n av

aila

ble

in a

dditi

on to

pre

senc

e or

abs

ence

of A

D N

P sy

mpt

oms.

Exam

ple:

The

gen

etic

cod

e pr

ovid

es u

s w

ith s

eque

nces

con

stru

cted

fro

m 4

sym

bols

(A, C

, G, T

). Th

e m

axim

um a

vera

ge in

form

atio

n pe

r sym

bol i

s lo

g 2(4

)=2

bits

. If

the

sour

ce p

rovi

des

codo

ns (

bloc

ks o

f 3

of t

hese

sym

bols

), th

en t

he m

axim

um a

vera

ge

info

rmat

ion

is 6

bits

per

blo

ck, a

s I(

pk ) =

kxI

(p),

p =

¼. I

f w

e us

ed d

iffer

ent u

nits

,

e.g.

, log

10 ,

the

max

ent

ropy

will

be

4.15

9 na

ts p

er b

lock

.

Rem

arks

:

1.

Firs

t, th

ese

defin

ition

s of

info

rmat

ion

and

entro

py m

ay n

ot m

atch

with

som

e

othe

r use

s of

the

term

s. F

or e

xam

ple,

if w

e kn

ow th

at a

sou

rce

will

, with

equ

al

prob

abili

ty, t

rans

mit

eith

er th

e co

mpl

ete

text

of H

amle

t or t

he c

ompl

ete

text

of

Mac

beth

(an

d no

thin

g el

se),

then

rec

eivi

ng t

he c

ompl

ete

text

of

Ham

let

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

prov

ides

us

with

pre

cise

ly 1

bit

of i

nfor

mat

ion.

Sup

pose

a b

ook

cont

ains

ASC

II ch

arac

ters

. I

f th

e bo

ok i

s to

pro

vide

us

with

inf

orm

atio

n at

the

max

imum

rat

e, th

en e

ach

ASC

II ch

arac

ter

will

occ

ur w

ith e

qual

pro

babi

lity

it w

ill b

e a

rand

om se

quen

ce o

f cha

ract

ers.

2.

Seco

nd,

it is

im

porta

nt t

o re

cogn

ize

that

our

def

initi

ons

of i

nfor

mat

ion

and

entro

py d

epen

d on

ly o

n th

e pr

obab

ility

dis

tribu

tion.

In g

ener

al, i

t won

't m

ake

sens

e fo

r us

to ta

lk a

bout

the

info

rmat

ion

or th

e en

trop

y of

a s

ourc

e w

ithou

t

spec

ifyin

g its

pro

babi

lity

dist

ribut

ion.

3.

Bey

ond

that

, it c

an c

erta

inly

hap

pen

that

two

diffe

rent

obs

erve

rs o

f th

e sa

me

data

stre

am h

ave

diffe

rent

mod

els

of th

e so

urce

, and

thus

ass

ocia

te d

iffer

ent

prob

abili

ty d

istri

butio

ns to

the

sour

ce.

The

two

obse

rver

s w

ill t

hen

assig

n

diff

eren

t val

ues t

o th

e in

form

atio

n an

d en

tropy

ass

ocia

ted

with

the

sour

ce.

This

obs

erva

tion

acco

rds

with

our

intu

ition

: tw

o pe

ople

list

enin

g to

the

sam

e

tune

can

get

ver

y di

ffere

nt in

form

atio

n fr

om th

e m

usic

. Fo

r exa

mpl

e, w

ithou

t

appr

opria

te m

usic

bac

kgro

und,

one

per

son

may

get

exc

ited,

ano

ther

one

may

get b

ored

, yet

ano

ther

one

may

fall

asle

ep. T

he fi

rst l

iste

ner w

ho e

njoy

s m

usic

may

ass

ign

non

equa

l pro

babi

litie

s to

eac

h so

und/

chor

d/ep

ochs

as

he/s

he m

ay

antic

ipat

e m

uch

of w

here

the

tun

e go

es.

On

the

cont

rary

, th

e m

usic

al

com

posi

tion

may

sou

nd a

s (r

ando

m)

unsy

nchr

oniz

ed c

olle

ctio

n of

cho

rds

(e.g

., ab

stra

ct j

azz)

and

hen

ce t

he a

mou

nt o

f in

form

atio

n co

mpr

ehen

ded

by

this

list

ener

will

be

sign

ifica

ntly

hig

her,

as th

e pr

obab

ilitie

s th

at h

e/sh

e as

sign

s

to e

ach

note

/cho

rd a

re ro

ughl

y eq

ual.

A p

hysic

al E

xam

ple

(Gas

Par

ticle

s):

Let

us c

onsi

der

a si

mpl

e m

odel

for

an

idea

lized

gas

. Su

ppos

e a

cubi

cal v

olum

e V

cont

ains

gas

mad

e up

of N

poi

nt p

artic

les.

Ass

ume

also

tha

t th

roug

h so

me

mec

hani

sm, w

e ca

n de

term

ine

the

loca

tion

of e

ach

parti

cle

suffi

cien

tly w

ell a

s to

be

able

to lo

cate

it w

ithin

a b

ox w

ith s

ides

1/1

00 o

f the

side

s of t

he c

onta

inin

g vo

lum

e V.

The

re a

re 1

06 of t

hese

sm

all b

oxes

with

in V

.

Page 6: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

A f

requ

entis

t pro

babi

lity

mod

el fo

r th

is s

yste

m is

obt

aine

d by

mea

surin

g th

e nu

mbe

r

of p

artic

les

in e

ach

of t

he 1

06 sm

all

boxe

s at

one

fix

ed t

ime,

and

ass

igni

ng a

prob

abili

ty p

k of

fin

ding

a g

as p

artic

le i

n th

e sm

all

box

by c

ount

ing

the

num

ber

of

parti

cles

nk

in th

e bo

x, a

nd d

ivid

ing

by N

. Th

at is

, pk=

nk/N

. Fr

om th

is p

roba

bilit

y

dist

ribut

ion

mod

el, w

e ca

n ca

lcul

ate

the

entro

py:

∑=

=

∑=

=

610

1lo

g11

log

)(

kkn

NNkn

N kkp

kpP

H

Ther

e ar

e a

coup

le o

f spe

cial

cas

es to

con

side

r (re

pres

entin

g th

e ex

trem

a of

the

valu

es

of th

e en

tropy

).

1.

If th

e pa

rticl

es a

re e

venl

y di

strib

uted

am

ong

the

106 b

oxes

, the

n w

e w

ill h

ave

that

eac

h n k

=N/1

06 , and

in th

is c

ase

the

entro

py w

ill b

e:

()

()

()

10lo

g6

610

log

610

16

10lo

g6

101)

==

∑=

=k

PH

Th

is ca

se o

bvio

usly

pre

sent

s a m

axim

um e

ntro

py c

onfig

urat

ion.

2.

At t

he o

ppos

ite s

ide

of th

e sp

ectru

m w

e ha

ve a

ll 10

6 par

ticle

s sit

ting

in e

xact

ly

one

smal

l box

, and

the

entro

py o

f eac

h of

thos

e co

nfig

urat

ions

is:

okk

for

0kp

and

1okp

as

kkp

kpP

H

≠=

=

=∑

==

,06

101

1lo

g)

(

This

cas

e ob

viou

sly

pres

ents

a m

inim

um e

ntro

py c

onfig

urat

ion.

Not

ice

that

the

se tw

o ca

lcul

ated

ent

ropi

es o

f th

e sy

stem

dep

end

in a

stro

ng w

ay o

n

the

rela

tive

scal

e of

m

easu

rem

ent.

Fo

r ex

ampl

e,

if th

e pa

rticl

es

are

even

ly

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

dist

ribut

ed, a

nd w

e in

crea

se o

ur a

ccur

acy

of m

easu

rem

ent b

y a

fact

or o

f 10

(i.e

., if

each

sm

all b

ox is

1/1

000

of th

e si

de o

f V),

then

the

calc

ulat

ed m

axim

um e

ntro

py, i

n

the

first

cas

e, w

ill b

e lo

g(10

9 ) ins

tead

of l

og(1

06 ).

In a

dditi

on,

for

phys

ical

sys

tem

s, w

e kn

ow t

hat

quan

tum

lim

its (

e.g.

, H

eise

nber

g

unce

rtain

ty r

elat

ions

) w

ill g

ive

us a

bou

nd o

n th

e ac

cura

cy o

f our

mea

sure

men

ts, a

nd

thus

a m

ore

or le

ss n

atur

al s

cale

for

doi

ng e

ntro

py c

alcu

latio

ns.

On

the

othe

r ha

nd,

for

mac

rosc

opic

sys

tem

s, w

e ar

e lik

ely

to f

ind

that

we

can

only

mak

e re

lativ

e ra

ther

than

abs

olut

e en

tropy

cal

cula

tions

.

Third

, su

ppos

e w

e ge

nera

lize

our

mod

el s

light

ly,

and

allo

w t

he p

artic

les

to m

ove

abou

t with

in V

. A

con

figur

atio

n of

the

syst

em is

then

sim

ply

a lis

t of

106 n

umbe

rs

N

kb1

≤≤

(i.e

., a

list

of t

he n

umbe

rs o

f pa

rticl

es i

n ea

ch o

f th

e sm

all

boxe

s).

Supp

ose

that

the

mot

ions

of

the

parti

cles

are

suc

h th

at f

or e

ach

parti

cle,

ther

e is

an

equa

l pr

obab

ility

tha

t it

will

mov

e in

to a

ny g

iven

new

sm

all

box

durin

g on

e

(mac

rosc

opic

) tim

e st

ep.

How

lik

ely

is i

t th

at a

t so

me

late

r tim

e w

e w

ill f

ind

the

syst

em in

a h

igh

entro

py c

onfig

urat

ion?

How

like

ly is

it th

at if

we

star

t the

sys

tem

in

a lo

w e

ntro

py c

onfig

urat

ion,

it

will

sta

y in

a l

ow e

ntro

py c

onfig

urat

ion

for

an

appr

ecia

ble

leng

th o

f tim

e?

If th

e sy

stem

is

not

curr

ently

in

a m

axim

um e

ntro

py

conf

igur

atio

n, h

ow li

kely

is it

that

the

entro

py w

ill in

crea

se in

suc

ceed

ing

time

step

s

(rat

her t

han

stay

the

sam

e or

dec

reas

e)?

Rec

all

the

bino

mia

l co

effic

ient

s (n

umbe

r of

arr

ange

men

ts o

f n

obje

cts

take

n k

at a

time,

whe

re t

he o

rder

doe

s no

t m

atte

r, co

mbi

natio

ns)

)!(!

!k

n k

nkn

−=

. A

nd t

he

Stirl

ing's

app

roxi

mat

ion

of n

!: n

en

nn

π2!

=, f

or la

rge

n.

Ther

e ar

e 10

6 con

figur

atio

ns w

ith a

ll th

e pa

rticl

es s

ittin

g in

exa

ctly

one

sm

all

box,

and

as w

e sh

own

abov

e, th

e en

tropy

of e

ach

of th

ose

conf

igur

atio

ns is

H(P

)=0.

The

se

are

obvi

ousl

y m

inim

um e

ntro

py c

onfig

urat

ions

.

Page 7: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

If w

e no

w c

onsi

der

pairs

of

smal

l bo

xes,

the

num

ber

of c

onfig

urat

ions

with

all

the

parti

cles

eve

nly

dist

ribut

ed b

etw

een

two

boxe

s is

11

105

26

10×

=

,

whi

ch i

s la

rge.

The

entro

py o

f eac

h of

thes

e co

nfig

urat

ions

is

).2lo

g()2

log(

21)2

log(

21)

2(

=+

=

−−

−−

boxe

sin

even

lyAl

l | P

H

The

tota

l num

ber o

f sys

tem

con

figur

atio

ns, i

n te

rms

of th

e nu

mbe

r of p

artic

les

with

in

a sm

all b

ox, i

s at

leas

t 5*1

011 +

106 . I

f w

e st

art t

he s

yste

m in

a c

onfig

urat

ion

with

entro

py 0

, the

n th

e pr

obab

ility

that

at s

ome

late

r tim

e it

will

be

in a

con

figur

atio

n w

ith

entro

py

)2lo

g()

(≥

PH

will

be

larg

er t

han

[5*1

011]

/ [5

*1011

+ 1

06 ]> 1

-105 ,

as

|S|

|Ev

ent

|

P(Ev

ent)

=

and

0,

,,

>∀

+>

+++

xb

ab

aax

ba

xa

. H

ere,

a=

P(Al

l-in-

2-bo

xes)

,

b=P(

all-i

n on

e-bo

x) a

nd x

=P(

all-i

n-m

ore-

than

-2-b

oxes

).

As

an e

xam

ple

at t

he o

ther

end

, co

nsid

er t

he n

umbe

r of

con

figur

atio

ns w

ith t

he

parti

cles

dis

tribu

ted

alm

ost

equa

lly,

exce

pt t

hat

half

the

boxe

s ar

e sh

ort

by o

ne

parti

cle,

and

the

rest

hav

e on

e ex

tra p

artic

le.

The

num

ber o

f suc

h co

nfig

urat

ions

is:

510*3

10

2

610

6

10

Each

of

thes

e co

nfig

urat

ions

has

ent

ropy

ess

entia

lly a

ppro

xim

atel

y eq

ual t

o lo

g(10

6 ).

From

this

, we

can

conc

lude

that

if w

e st

art t

he s

yste

m in

a c

onfig

urat

ion

with

ent

ropy

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

of 0

(i.e

., al

l pa

rticl

es i

n on

e bo

x),

the

prob

abili

ty t

hat

late

r it

will

be

in a

hig

her

entro

py c

onfig

urat

ion

will

be

larg

er th

an

15

10*310

1≈

−−

.

Sim

ilar

argu

men

ts (w

ith s

imila

r re

sults

in

term

s of

pro

babi

litie

s) c

an b

e m

ade

for

starti

ng

in a

ny

conf

igur

atio

n w

ith e

ntro

py a

ppre

ciab

ly l

ess

than

log

(106 )

(the

max

imum

). I

n ot

her

wor

ds,

it is

ove

rwhe

lmin

gly

prob

able

tha

t as

tim

e pa

sses

,

mac

rosc

opic

ally

, the

syste

m w

ill in

crea

se in

ent

ropy

unt

il it

reac

hes t

he m

axim

um.

In m

any

resp

ects

, the

se g

ener

al a

rgum

ents

can

be th

ough

t of a

s a

proo

f (or

at l

east

an

expl

anat

ion)

of a

ver

sion

of th

e se

cond

law

of t

herm

odyn

amic

s:

Giv

en a

ny m

acro

scop

ic sy

stem

, whi

ch is

free

to c

hang

e co

nfig

urat

ions

, and

giv

en a

ny

conf

igur

atio

n wi

th e

ntro

py l

ess

than

the

max

imum

, th

ere

will

be o

verw

helm

ingl

y

man

y m

ore

acce

ssib

le c

onfig

urat

ions

with

hig

her

entro

py t

han

lowe

r en

tropy

, an

d

thus

, with

pro

babi

lity

indi

sting

uish

able

from

1, t

he s

yste

m w

ill (

in m

acro

scop

ic ti

me

steps

) su

cces

sivel

y ch

ange

to c

onfig

urat

ions

with

hig

her

entro

py u

ntil

it re

ache

s th

e

max

imum

.

7. S

hann

on's

com

mun

icat

ion

theo

ry

In s

ome

clas

sic

1948

pap

ers,

Cla

ude

Shan

non

laid

the

foun

datio

ns fo

r co

ntem

pora

ry

info

rmat

ion,

cod

ing}

, and

com

mun

icat

ion

theo

ry.

He

deve

lope

d a

gene

ral m

odel

for

com

mun

icat

ion

syst

ems,

and

a se

t of t

heor

etic

al to

ols

for a

naly

zing

suc

h sy

stem

s H

is

basic

mod

el c

onsis

ts o

f thr

ee p

arts:

a se

nder

(or s

ourc

e), a

cha

nnel

, and

a re

ceiv

er (o

r

Sour

ce S

igna

l En

codi

ng

Sour

ce

Cha

nnel

Cha

nnel

D

ecod

ing

Cha

nnel

Re

ceiv

er

Rec

eive

d Si

gnal

Noi

se

Page 8: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

sink

). I

n ad

ditio

n, t

he m

odel

als

o in

clud

es e

ncod

ing

and

deco

ding

ele

men

ts,

and

nois

e w

ithin

the

com

mun

icat

ion

chan

nel.

In S

hann

on's

disc

rete

mod

el,

it is

ass

umed

tha

t th

e so

urce

pro

vide

s a

stre

am o

f

sym

bols

sel

ecte

d fr

om a

fini

te a

lpha

bet A

= {

a 1, a

2, …

, an}

, whi

ch a

re th

en e

ncod

ed.

The

code

is s

ent t

hrou

gh th

e ch

anne

l (an

d po

ssib

ly c

orru

pted

by

nois

e).

At t

he o

ther

end

of t

he c

hann

el,

the

rece

iver

will

dec

ode,

and

der

ive

info

rmat

ion

from

the

sequ

ence

of s

ymbo

ls.

Giv

en a

sou

rce

of s

ymbo

ls a

nd a

cha

nnel

with

noi

se (

in p

artic

ular

, a

prob

abili

ty

mod

el fo

r the

se e

lem

ents

), w

e ca

n ta

lk a

bout

the

capa

city

of t

he c

hann

el. T

he g

ener

al

mod

el S

hann

on w

orke

d w

ith in

volv

ed tw

o se

ts o

f sym

bols

, the

inpu

t sym

bols

and

the

outp

ut s

ymbo

ls.

Let u

s sa

y th

e tw

o se

ts o

f sy

mbo

ls a

re A

= {

a 1, a

2, …

, an}

and

B =

{b1,

b 2, …

, bm}.

Not

e th

at w

e do

not

nec

essa

rily

assu

me

the

sam

e nu

mbe

r of s

ymbo

ls

in t

he t

wo

sets

. G

iven

the

noi

se i

n th

e ch

anne

l, w

hen

sym

bol

b k c

omes

out

of

the

chan

nel,

we

can

not b

e su

re w

hich

al,

was

put

in.

The

chan

nel i

s ch

arac

teriz

ed b

y th

e

set o

f pro

babi

litie

s

{P(a

l | b

k)}l,k

.

We

can

then

con

side

r va

rious

rel

ated

inf

orm

atio

n an

d en

tropy

mea

sure

s.

Firs

t, w

e

can

cons

ider

the

info

rmat

ion

we

get f

rom

obs

ervi

ng a

sym

bol b

k. G

iven

a p

roba

bilit

y

mod

el o

f th

e so

urce

, we

have

an

a pr

iori

estim

ate

P(a l

) th

at s

ymbo

l al w

ill b

e se

nt

next

. U

pon

obse

rvin

g b k

, we

can

revi

se o

ur e

stim

ate

to P

(al |

bk).

The

cha

nge

in o

ur

info

rmat

ion

(the

mut

ual i

nfor

mat

ion)

will

be

give

n by

:

=

−=

)l

P(a

)kb|

lP(

alo

g

)kb|

lP(

a

1lo

g)

lP(

a1lo

g )

kb ; lI(

a

(6)

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

We

have

the

prop

ertie

s of t

his

func

tiona

l:

1. I

(al;

b k) =

I(b

k; a l

)

2. I

(al;

b k) =

log(

P(a l

| b k

)) +

I(a l

)

3. I

(al;

b k)

≤ I(a

l)

If a l

an

d b k

are

inde

pend

ent (

i.e.,

P(a l

| b

k)= P

(al )

), th

en I(

a l; b

k)=0.

Wha

t we

ofte

n tim

es w

ant i

s to

ave

rage

the

mut

ual i

nfor

mat

ion

over

all

the

sym

bols

:

∑∈

×=

∑∈

×=∑

∈×

=

∑∈

×=

Bk

)k

P(b

)la|

kP(

bogl

)la|

kP(

b

Bk

)la;

kI(

b)

la|k

P(b

B) ; l

I(a

Al

)l

P(a

)kb|

lP(

aogl

)kb|

lP(

a

Al

)kb; l

I(a

)kb|

lP(

a )

kb I(

A;

(7

)

Ther

efor

e

∑∈

××

∑∈

=

∑∈

××

×∑

∈=

∑∈

×=

Bk

)k

P(b

)l

P(a

)kb ; l

P(a

)kb ; l

P(a

Al

Bk

)k

P(b

)l

P(a

)kb ; l

P(a

)kb | l

P(a

) lP(

aA

l

Al

B); lI(

a)

lP(

a

B) I(

A;

log

log

(

8)

We

have

the

pro

perti

es:

I(A;

B)

≥0 a

nd I

(A;

B) =

0,

if an

d on

ly i

f A

and

B ar

e

inde

pend

ent.

Def

initi

on: C

ondi

tiona

l Ent

ropy

is d

efin

ed b

y

∑∈

∑∈

×

=A

lB

k)

kb|l

P(a

ogl)

kb|l

P(a

B) |

H(A

1

(

9)

Page 9: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

Not

ice

that

:

∑∈

∑∈

×

=

==

×

×

∈∈

Al

Bk

)kb ; l

P(a

ogl)

kb; lP(

a

B)H

(A,

H(B

) H

(A)

Bk

kk

Al

ll

)P(

bogl

)P(

b

)P(

aogl

)P(

a

1

11

(10

)

And

H

(A, B

) =

H(A

) + H

(B|A

) = H

(B) +

H(A

|B);

I(A;

B) =

H(A

) + H

(B) –

H(A

, B)

= H

(A) -

H(A

|B)

= H

(B) -

H(B

|A) ≥

0.

If w

e ar

e gi

ven

a ch

anne

l, w

e co

uld

ask

wha

t is

the

max

imum

pos

sibl

e in

form

atio

n

can

be tr

ansm

itted

thro

ugh

the

chan

nel.

We

coul

d al

so a

sk w

hat m

ix o

f the

sym

bols

{al}

we

shou

ld u

se t

o ac

hiev

e th

e m

axim

um b

andw

idth

. I

n pa

rticu

lar,

usin

g th

e

defin

ition

s ab

ove,

we

can

defin

e th

e C

hann

el C

apac

ity, C

, to

be:

()

B) I(

A;P(

a)

C

max

=

We

have

the

nic

e pr

oper

ty t

hat

if w

e ar

e us

ing

the

chan

nel

at i

ts c

apac

ity,

then

for

each

of t

he a

l, I(

a l;B

) = C

, and

thus

, we

can

max

imiz

e ch

anne

l use

by

max

imiz

ing

the

use

for e

ach

sym

bol i

ndep

ende

ntly

.

Theo

rem

: [S

hann

on]

For

any

chan

nel,

ther

e ex

ist

way

s of

enc

odin

g in

put

sym

bols

such

tha

t w

e ca

n si

mul

tane

ousl

y ut

ilize

the

cha

nnel

as

clos

ely

as w

e w

ish

to t

he

capa

city

, and

at t

he s

ame

time

have

an

erro

r rat

e as

clo

se to

zer

o as

we

wis

h.

This

is a

ctua

lly q

uite

a re

mar

kabl

e th

eore

m.

We

mig

ht n

aive

ly g

uess

that

in o

rder

to

min

imiz

e th

e er

ror

rate

, we

wou

ld h

ave

to u

se m

ore

of th

e ch

anne

l cap

acity

for

erro

r

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

dete

ctio

n/co

rrec

tion,

and

les

s fo

r ac

tual

tra

nsm

issi

on o

f in

form

atio

n.

Shan

non

show

ed t

hat

it is

pos

sibl

e to

kee

p er

ror

rate

s lo

w a

nd s

till

use

the

chan

nel

for

info

rmat

ion

trans

mis

sion

at (

or n

ear)

its

capa

city

.

Unf

ortu

nate

ly, S

hann

on's

proo

f has

a c

oupl

e of

dow

nsid

es.

The

first

is th

at th

e pr

oof

is n

on-c

onst

ruct

ive.

It d

oesn

't te

ll us

how

to c

onst

ruct

the

codi

ng s

yste

m to

opt

imiz

e

chan

nel u

se, b

ut o

nly

tells

us

that

suc

h a

code

exi

sts.

The

sec

ond

is th

at in

ord

er to

use

the

capa

city

with

a lo

w e

rror

rat

e, w

e m

ay h

ave

to e

ncod

e ve

ry la

rge

bloc

ks o

f

data

. Th

is m

eans

that

if w

e ar

e at

tem

ptin

g to

use

the

chan

nel i

n re

al-ti

me,

ther

e m

ay

be ti

me

lags

whi

le w

e ar

e fil

ling

buffe

rs.

Ther

e is

thus

stil

l muc

h w

ork

poss

ible

in th

e

sear

ch fo

r effi

cien

t cod

ing

sche

mes

.

Am

ong

the

thin

gs w

e ca

n do

is lo

ok a

t nat

ural

cod

ing

syst

ems

(suc

h as

, for

exa

mpl

e,

the

DN

A c

odin

g sy

stem

, or n

eura

l sys

tem

s) a

nd s

ee h

ow th

ey u

se th

e ca

paci

ty o

f the

ir

chan

nel.

It is

not

unr

easo

nabl

e to

ass

ume

that

evo

lutio

n w

ill h

ave

done

a p

retty

goo

d

job

of o

ptim

izin

g ch

anne

l use

.

8. A

pplic

atio

n of

Ent

ropy

to m

odel

ing

DN

A se

quen

ces

Let u

s ap

ply

som

e of

thes

e id

eas

to th

e (g

ener

al)

prob

lem

of a

naly

zing

gen

omes

. We

can

star

t w

ith a

n ex

ampl

e su

ch a

s th

e co

mpa

rativ

ely

smal

l ge

nom

e of

Esc

heric

hia

coli ,

stra

in K

-12,

sub

stra

in M

G16

55, v

ersi

on M

52.

This

exa

mpl

e ha

s th

e co

nven

ient

feat

ures

:

1.

It ha

s bee

n co

mpl

etel

y se

quen

ced.

2.

The

sequ

ence

is a

vaila

ble

for d

ownl

oadi

ng: h

ttp://

ww

w.g

enet

ics.

wis

c.ed

u/

3.

Ann

otat

ed v

ersi

ons a

re a

vaila

ble

for f

urth

er w

ork.

4.

It is

larg

e en

ough

to b

e in

tere

stin

g (s

omew

hat o

ver 4

meg

a-ba

ses,

or

4 m

illio

n

nucl

eotid

es),

but n

ot so

hug

e as

to b

e co

mpl

etel

y un

wie

ldy.

5.

This

dat

a fil

e be

gins

with

:

Page 10: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

>gb|

U00

096|

U00

096

Esch

eric

hia

coli

K

-12

MG

1655

com

plet

e ge

nom

e

AG

CTT

TTC

AT

TCTG

AC

TG

CA

AC

GG

GC

AA

TATG

TCT

CTG

TG

TG

GA

TTA

AA

AA

AA

GA

GTG

TCTG

ATA

GC

AG

C

TTC

TGA

AC

TGG

TTA

CC

TGC

CG

TGA

GTA

AA

TTA

AA

A

TTTT

ATT

GA

CTT

AG

GTC

AC

TAA

ATA

CT

TTA

AC

CA

A

TAT

AG

GC

ATA

GC

GC

AC

AG

AC

AG

ATA

AA

AA

TTA

CA

G

AG

TAC

AC

AA

CA

TC

CA

TGA

AA

CG

CA

TTA

GC

AC

CA

CC

ATT

AC

CA

CC

AC

CA

TCA

CC

ATT

AC

CA

CA

GG

TA

AC

GG

TGC

GG

GC

TGA

CG

CG

TAC

AG

GA

AA

CA

CA

GA

AA

AA

AG

CC

CG

CA

CC

TGA

CA

GTG

CG

GG

CTT

TTT

TTTT

CG

AC

C

AA

AG

GTA

AC

GA

GG

TAA

CA

AC

CA

TGC

GA

GTG

TTG

AA

In th

is e

xplo

rato

ry p

roje

ct, o

ur g

oal w

ill b

e to

app

ly th

e in

form

atio

n an

d en

tropy

idea

s

outli

ned

abov

e to

gen

ome

anal

ysis

. O

ur fi

rst s

tep

is to

gen

erat

e a

rand

om g

enom

e of

com

para

ble

size

to c

ompa

re t

hing

s w

ith.

We

can

use

SOCR

, Exc

el, S

AS,

R, C

++,

Java

or

othe

r la

ngua

ges/

prog

ram

s to

gen

erat

e a

file

cont

aini

ng a

ran

dom

seq

uenc

e of

abou

t 4 m

illio

n le

tters

A, C

, G, T

. In

the

actu

al g

enom

e, th

ese

lette

rs s

tand

for

the

nucl

eotid

es a

deni

ne (A

), cy

tosin

e (C

), gu

anin

e (G

), an

d th

ymin

e (T

).

Ther

e ar

e ot

her

appr

oach

es t

o th

is p

roce

ss,

e.g.

, ra

ndom

ly s

huffl

ing

an a

ctua

l

obse

rved

gen

ome

(thus

mai

ntai

ning

the

rela

tive

prop

ortio

ns o

f A

s, C

s, G

s, an

d Ts

).

Part

of th

e ju

stifi

catio

n fo

r thi

s m

etho

dolo

gy is

that

act

ual (

iden

tifie

d) c

odin

g se

ctio

ns

of D

NA

tend

to h

ave

a ra

tio o

f 1

≠++

TA

GC

.

One

can

hop

e th

at im

porta

nt s

tretc

hes

of D

NA

will

hav

e en

tropy

diff

eren

t fro

m o

ther

stre

tche

s.

Of

cour

se,

as n

oted

abo

ve,

the

entro

py m

easu

re d

epen

ds i

n an

ess

entia

l

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

way

on

the

prob

abili

ty m

odel

attr

ibut

ed to

the

sour

ce.

We

will

wan

t to

try to

bui

ld a

mod

el th

at c

atch

es im

porta

nt a

spec

ts o

f wha

t we

find

inte

rest

ing

or s

igni

fican

t.

We

will

wan

t to

use

our

kno

wle

dge

of t

he s

yste

ms

in w

hich

DN

A i

s em

bedd

ed t

o

guid

e th

e de

velo

pmen

t of o

ur m

odel

s. O

n th

e ot

her

hand

, we

prob

ably

don

't w

ant t

o

cons

train

the

mod

el to

o m

uch.

Rem

embe

r tha

t inf

orm

atio

n an

d en

tropy

are

mea

sure

s

of u

nexp

ecte

dnes

s. If

we

cons

train

our

mod

el to

o m

uch,

we

won

't le

ave

any

room

for

the

unex

pect

ed!

We

know

, fo

r ex

ampl

e,

that

si

mpl

e

repe

titio

ns h

ave

low

ent

ropy

. B

ut i

f th

e

code

bei

ng u

sed

is r

edun

dant

(so

met

imes

calle

d de

gene

rate

), w

ith

mul

tiple

enco

ding

s fo

r th

e sa

me

sym

bol

(as

is t

he

case

for

DN

A c

odon

s), w

hat l

ooks

to o

ne

obse

rver

to

be a

ran

dom

stre

am m

ay b

e

reco

gniz

ed

by

anot

her

obse

rver

(w

ho

know

s th

e co

de) t

o be

a si

mpl

e re

petit

ion.

The

codi

ng s

eque

nces

for

pep

tides

and

prot

eins

are

enc

oded

via

cod

ons,

that

is,

by

sequ

ence

s of

blo

cks

of t

riple

s of

nucl

eotid

es.

Thus

, for

exa

mpl

e, th

e co

don

AGC

on m

RN

A (

mes

seng

er R

NA

) co

des

for

the

amin

o ac

id s

erin

e (o

r, if

we

happ

en t

o be

rea

ding

in

the

reve

rse

dire

ctio

n,

CGA,

it

mig

ht c

ode

for

alan

ine)

. O

n D

NA

, AG

C co

des

for

UCG

or

CGA

on t

he

mR

NA

, and

thus

cou

ld c

ode

for c

yste

ine

or a

rgin

ine.

Am

ino

acid

s sp

ecifi

ed b

y ea

ch c

odon

sequ

ence

on

mR

NA

.

A =

ade

nine

G

= g

uani

ne

C =

cyt

osin

e T

= th

ymin

e U

= u

raci

l

The

Gen

etic

Cod

e.

htt p

://w

ww

.acc

esse

xcel

lenc

e.or

g/

Page 11: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

Am

ino

acid

key

for t

he a

bove

Fig

ure:

Ala

= A

lani

ne

Arg

=Arg

inin

e

Asn

=Asp

arag

ine

A

sp=A

spar

tic

acid

Cys

=Cys

tein

e

Gln

=Glu

tam

ine

Glu

=Glu

tam

ic

acid

Gly

=Gly

cine

His

=His

tidin

e

Ile=I

sole

ucin

e Le

u=Le

ucin

e Ly

s=Ly

sine

Met

=Met

hion

ine

Phe=

Phen

ylal

anin

e

Pro=

Prol

ine

Se

r=Se

rine

Thr=

Thre

onin

e

Trp=

Tryp

toph

ane

Ty

r=Ty

rosi

ne

Val

=Val

ine

As

a fir

st s

tep

cons

ider

eac

h of

the

thre

e-nu

cleo

tide

codo

ns a

s a

dist

inct

sym

bol.

We

can

then

tak

e a

chun

k of

gen

ome

and

estim

ate

the

prob

abili

ty o

f oc

curr

ence

of

each

cod

on b

y sim

ply

coun

ting

and

divi

ding

by

the

leng

th.

At t

his

leve

l, w

e ar

e

assu

min

g w

e ha

ve n

o kn

owle

dge

of w

here

cod

ons

star

t, an

d so

in

this

mod

el,

we

assu

me

that

rea

dout

cou

ld b

egin

at a

ny n

ucle

otid

e. W

e th

us u

se e

ach

thre

e ad

jace

nt

nucl

eotid

es.

For e

xam

ple,

giv

en th

e D

NA

chu

nk (l

engt

h=34

):

AG

CTT

TTC

ATT

CTG

AC

TG

CA

AC

GG

GC

AA

TATG

TC

Our

cod

on c

ount

yie

lds (

in le

xico

grap

hica

l ord

er!):

AA

T 1

AA

C 1

AC

G 1

AC

T 1

AG

C 1

ATA

1 A

TG 1

ATT

1 C

AA

2 C

AT

1

CG

G 1

CTG

2 C

TT 1

GA

C 1

GC

A 2

GC

T 1

GG

C 1

GG

G 1

GTC

1 T

AT

1

TCA

1 T

CT

1 T

GA

1 T

GC

1 T

GT

1

TTC

2 T

TT 2

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

We

can

then

est

imat

e th

e en

tropy

of

this

sequ

ence

by:

bits.

4.

7

27

1l

lp2

log

lp=

∑=

×

1

The

max

imum

pos

sibl

e en

tropy

for

this

chu

nk w

ould

be:

lo

g 2(3

2) =

5

bits

.

Supp

ose

we

wan

t to

fin

d in

tere

stin

g

sect

ions

(fe

atur

es)

in th

e ge

nom

e. A

s

a st

artin

g pl

ace,

w

e ca

n sl

ide

a

win

dow

ov

er

the

geno

me,

an

d

estim

ate

the

entro

py

with

in

the

win

dow

. Th

e pl

ot b

elow

sho

ws

the

entr

opy

estim

ates

for

the

E.

coli

geno

me,

with

in a

win

dow

of

size

38 =656

1. T

he w

indo

w i

s sli

d in

ste

ps o

f siz

e 34 =8

1.

This

res

ults

in

57,1

94

snap

shot

s, on

e fo

r ea

ch p

lace

men

t of

the

win

dow

. Fo

r co

mpa

rison

, the

val

ues

for

a

rand

om g

enom

e ar

e al

so sh

own.

At t

his

leve

l, w

e ca

n m

ake

the

sim

ple

obse

rvat

ion

that

the

actu

al g

enom

e va

lues

are

quite

diff

eren

t fro

m th

e co

mpa

rativ

e ra

ndom

str

ing.

The

val

ues

for E

. col

i ran

ge

from

abo

ut 5

.8 t

o ab

out

5.96

, w

hile

the

ran

dom

val

ues

are

clus

tere

d qu

ite c

lose

ly

abov

e 5.

99 (t

he m

axim

um p

ossi

ble

is lo

g 2(4

3 )=lo

g 2(6

4) =

6. R

ecal

l, I(

p) =

-log

b(p)

= lo

g b(1

/p) a

nd p

=1/6

4).

Com

paris

on o

f the

ent

ropy

for E

. col

i ge

nom

e (s

piky

cur

ve) a

nd a

rand

om g

enom

e se

quen

ce.

Page 12: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

9. M

easu

res o

f Dim

ensi

onal

ity a

nd re

latio

n to

Ent

ropy

A u

sefu

l gen

eral

izat

ion

of e

ntro

py (a

s a

mea

sure

of c

ompl

exity

) was

dev

elop

ed b

y th

e

Hun

garia

n m

athe

mat

icia

n A.

Ren

yi.

The

Reny

i Ent

ropy

is d

efin

ed a

s th

e m

omen

ts o

f

orde

r q o

f a p

roba

bilit

y di

strib

utio

n P=

{pi}:

∑−

=i

q iplo

gq

1q

S1

(11

)

Taki

ng th

e lim

it as

q1,

we

get:

∑=

iip

log

ip1S

1

The

last

exp

ress

ion

is e

xact

ly t

he e

ntro

py w

e pr

evio

usly

def

ined

. S

o, S

q is

a

gene

raliz

ed e

ntro

py fo

r any

real

num

ber q

. The

lim

it of

S1,

as q

1 is

bec

ause

:

()

()

())

log(

1)

log(

11

lim1

1lim

ipq ip

qip

q ipq ip

q

and

qq

iq ip

log

q

qq

iq ip

log

q

=∂∂

−∂∂

∑∂∂

=−∑

Usi

ng th

e Re

nyi E

ntro

py, w

e ca

n th

en d

efin

e a

gene

raliz

ed d

imen

sion

asso

ciat

ed w

ith

a da

ta s

et.

Supp

ose

a da

ta s

et is

dis

tribu

ted

amon

g bi

ns o

f dia

met

er r

, we

can

let p

i

be th

e pr

obab

ility

that

a d

ata

item

fal

ls in

the

ith b

in (

estim

ated

by

coun

ting

the

data

elem

ents

in th

e bi

n, a

nd d

ivid

ing

by th

e to

tal n

umbe

r of i

tem

s).

We

can

then

, def

ine

a

dim

ensio

n (f

or e

ach

q):

−=

)

log(

log

10

limri

q ip

q

1

r q

D

(12)

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

Why

do

we

call

this

a g

ener

aliz

ed d

imen

sion?

Con

side

r fir

st D

q=0.

We

defin

e p i

0 = 0

,

whe

n p i

=0.

A

lso,

let N

r be

the

num

ber

of n

on-e

mpt

y bi

ns o

f di

amet

er r

it ta

kes

to

cove

r the

dat

a se

t. Th

en w

e ha

ve:

()

()

=∑

=

rrN

rr

iip

r

Do

1lo

glog

0lim

1lo

g

log

0lim

Def

initi

on:

D0

is t

he H

ausd

orff

dim

ensio

n, w

hich

is

som

etim

es c

alle

d th

e fra

ctal

dim

ensio

n of

the

set.

Exam

ples

:

1.

1D: C

onsi

der t

he u

nit i

nter

val [

0,1]

. Le

t k 21

kr=

. Th

en N

rk =

2k , a

nd

() ()1

2lo

g

2lo

g

0lim

==

kk

r

Do

2.

2D:

Con

side

r th

e un

it sq

uare

[0,

1]x[

0,1]

. A

gain

, let

k 21

kr=

. Th

en N

rk =

22k, a

nd

()

()2

2lo

g

2 2lo

g

0lim

==

kk

r

Do

.

3.

1D2D

? C

onsi

der

the

Can

tor

set:

The

cons

truct

ion

of th

e C

anto

r se

t is

done

by

indu

ctio

n.

Page 13: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

The

Can

tor

set i

s w

hat r

emai

ns f

rom

the

inte

rval

afte

r w

e ha

ve r

emov

ed m

iddl

e

third

s co

unta

bly

man

y tim

es.

It is

an

unco

unta

ble

set,

with

mea

sure

(le

ngth

) 0.

For t

his s

et w

e w

ill le

t k 31

kr=

. Th

en N

rk =

2k , a

nd

() ()63

1.0

)3lo

g(

)2lo

g(

3lo

g

2lo

g

0lim

==

=

kk

r

Do

The

Can

tor

set

is a

tra

ditio

nal

exam

ple

of a

fra

ctal

. I

t is

sel

f-sim

ilar,

and

has

Hau

sdor

ff di

men

sion

of

0.

631,

w

hich

is

st

rictly

gr

eate

r th

an

its

(inte

ger)

topo

logi

cal d

imen

sion

0.

Som

e no

nlin

ear

dyna

mic

al s

yste

ms

have

traj

ecto

ries

whi

ch a

re lo

cally

the

prod

uct o

f

a C

anto

r set

with

a m

anifo

ld (

i.e.,

Poin

care

sect

ions

are

gen

eral

ized

Can

tor s

ets)

.

Prop

ertie

s of D

q:

1.

Mon

oton

icity

: If

21

qq

≤, t

hen

21

qq

DD

≤.

2.

Frac

tal C

alcu

latio

ns:

If th

e se

t is

stric

tly s

elf-s

imila

r w

ith e

qual

pro

babi

litie

s

p i =

1/N

, the

n th

e ca

lcul

atio

ns a

re tr

ivia

l and

we

do n

ot n

eed

to ta

ke th

e lim

it

as r

0, si

nce

()

()

oD

rN

r

q

NN

q

1

r q

D=

=

×

−=

1

log

log

)lo

g(

1lo

g

10

lim

This

is th

e ca

se, f

or e

xam

ple,

for t

he C

anto

r set

.

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

3.

Info

rmat

ion

Dim

ensio

n:

=

)lo

g(

1lo

g

0lim

r

iip

ip

r 1

D

(13

)

The

num

erat

or is

just

the

entro

py o

f the

pro

babi

lity

dist

ribut

ion.

4.

Corr

elat

ion

Dim

ensio

n: T

his

dim

ensi

on is

rela

ted

to th

e pr

obab

ility

of f

indi

ng

two

elem

ents

of t

he se

t with

in a

dis

tanc

e r o

f eac

h ot

her.

()

=

)lo

g(

2lo

g

0lim

riip

r

D2

(1

4)

10. M

utua

l Inf

orm

atio

n

Con

side

r a

syst

em w

ith th

e in

put X

and

out

put Y

(X,

Y r

ando

m v

aria

bles

). H

ow c

an

we

mea

sure

the

unc

erta

inty

abo

ut X

afte

r ob

serv

ing

Y? L

et’s

def

ine

cond

ition

al

entr

opy

of X

with

giv

en Y

: H( X

| Y

) = H

(X, Y

) – H

(Y).

,,

(,

)(

,)l

og(

(,

))X

YX

YH

XY

fx

yf

xy

dxdy

∞∞

−∞−∞

=−∫∫

(14)

()

()l

og(

())

YY

HY

fy

fy

dy∞ −∞

=−∫

The

cond

ition

al e

ntro

py

(|

)H

XY

repr

esen

ts th

e am

ount

of t

he u

ncer

tain

ty r

emai

ning

abou

t the

sys

tem

inpu

t X a

fter t

he s

yste

m o

utpu

t Y h

as b

een

obse

rved

. The

nex

t cla

im

Page 14: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

is i

ntui

tivel

y cl

ear:

The

diffe

renc

e (

)(

|)

HX

HX

Y−

mus

t re

pres

ent

the

unce

rtain

ty

abou

t the

syst

em in

put t

hat i

s re

solv

ed b

y ob

serv

ing

the

syst

em o

utpu

t:

(,

)(

)(

|)

IX

YH

XH

XY

=−

(15

)

Mut

ual i

nfor

mat

ion

can

qual

itativ

ely

be th

ough

t of

as a

mea

sure

of

how

wel

l one

imag

e ex

plai

ns t

he o

ther

, an

d is

max

imiz

ed a

t th

e op

timal

alig

nmen

t. It

can

be

expr

esse

d in

the

follo

win

g fo

rm:

∑∑

=−

+=

ab

bP

aP

ba

Pb

aP

BA

HB

HA

HB

AI

)(

)(

),

(lo

g)

,(

),

()

()

()

,(

(16

)

The

cond

ition

al p

roba

bilit

y P(

b|a)

is th

e pr

obab

ility

that

B w

ill ta

ke th

e va

lue

b gi

ven

that

A h

as t

he v

alue

a.

The

cond

ition

al e

ntro

py i

s th

eref

ore

the

aver

age

of t

he

entro

py o

f B fo

r eac

h va

lue

of A

, wei

ghte

d ac

cord

ing

to th

e pr

obab

ility

of g

ettin

g th

at

valu

e of

A:

)(

),

()

|(

log

),

()

|(

,

AH

BA

Ha

bP

ba

PA

BH

ba

−=

−=∑

(1

7)

Thus

the

equa

tion

for m

utua

l inf

orm

atio

n ca

n be

rew

rite

as:

)

|(

)(

)|

()

()

,(

BA

HB

HA

BH

AH

BA

I−

=−

=

(18

)

Reg

istra

tion

by m

axim

izat

ion

of m

utua

l in

form

atio

n th

eref

ore

invo

lves

fin

ding

the

trans

form

atio

n th

at m

akes

imag

e A

the

best

poss

ible

pre

dict

or fo

r im

age

B w

ithin

the

regi

on o

f ove

rlap.

The

adva

ntag

e of

mut

ual

info

rmat

ion

over

joi

nt e

ntro

py i

s th

at i

t in

clud

es t

he

entro

pies

of t

he s

epar

ate

imag

es. M

utua

l inf

orm

atio

n an

d jo

int e

ntro

py a

re c

ompu

ted

for t

he o

verla

ppin

g pa

rts o

f the

imag

es a

nd th

e m

easu

res

are

ther

efor

e se

nsiti

ve to

the

size

and

the

cont

ents

of o

verla

p. A

pro

blem

that

can

occ

ur w

hen

usin

g jo

int e

ntro

py

on it

s ow

n is

that

low

val

ues

(nor

mal

ly a

ssoc

iate

d w

ith a

hig

h de

gree

of

alig

nmen

t)

can

be f

ound

for

com

plet

e m

isreg

istra

tions

. Fo

r ex

ampl

e, w

hen

trans

form

ing

one

Ivo

D. D

inov

UCL

A S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cour

ses &

Stu

dent

s

imag

e to

suc

h an

ext

ent t

hat o

nly

an a

rea

of b

ackg

roun

d ov

erla

ps fo

r the

two

imag

es,

the

join

t hist

ogra

m w

ill b

e ve

ry sh

arp,

ther

e is

onl

y on

e pe

ak fr

om b

ackg

roun

d.

Mut

ual i

nfor

mat

ion

is be

tter e

quip

ped

to a

void

suc

h pr

oble

ms,

beca

use

it in

clud

es th

e

mar

gina

l ent

ropi

es H

(A)

and

H(B

). Th

ese

will

hav

e lo

w v

alue

s w

hen

the

over

lapp

ing

part

of t

he i

mag

es c

onta

ins

only

bac

kgro

und

and

high

val

ues

whe

n it

cont

ains

anat

omic

al s

truct

ure.

The

mar

gina

l ent

ropi

es w

ill th

us b

alan

ce th

e m

easu

re s

omew

hat

by p

enal

izin

g fo

r tra

nsfo

rmat

ions

tha

t de

crea

se t

he a

mou

nt o

f in

form

atio

n in

the

sepa

rate

im

ages

. Con

sequ

ently

, mut

ual

info

rmat

ion

is le

ss s

ensit

ive

to o

verla

p th

an

join

t ent

ropy

, alth

ough

not

com

plet

ely

imm

une.

11. N

orm

alize

d M

utua

l Inf

orm

atio

n Th

e siz

e of

the

ove

rlapp

ing

part

of t

he i

mag

es i

nflu

ence

s th

e m

utua

l in

form

atio

n

mea

sure

in

two

way

s. Fi

rst

of a

ll, a

dec

reas

e in

ove

rlap

decr

ease

s th

e nu

mbe

r of

sam

ples

, whi

ch re

duce

s th

e sta

tistic

al p

ower

of t

he p

roba

bilit

y di

strib

utio

n es

timat

ion.

Seco

ndly

, th

e m

utua

l in

form

atio

n m

easu

re m

ay a

ctua

lly i

ncre

ase

with

inc

reas

ing

misr

egist

ratio

n (w

hich

usu

ally

coi

ncid

es w

ith d

ecre

asin

g ov

erla

p).

This

can

occu

r

whe

n th

e re

lativ

e ar

eas

of o

bjec

t an

d ba

ckgr

ound

eve

n ou

t an

d th

e su

m o

f th

e

mar

gina

l ent

ropi

es in

crea

ses,

faste

r tha

n th

e jo

int e

ntro

py. S

tudh

olm

e et

al.

prop

osed

a no

rmal

ized

mea

sure

of

mut

ual

info

rmat

ion,

whi

ch i

s le

ss s

ensi

tive

to c

hang

es i

n

over

lap:

),

()

()

(B

AH

BH

AH

NMI

+=

(19)

Mae

s et

al.

have

sug

geste

d th

e us

e of

the

Entro

py C

orre

latio

n Co

effic

ient

(EC

C) a

s

anot

her

form

of

norm

aliz

ed m

utua

l in

form

atio

n. N

MI

and

ECC

are

rela

ted

in t

he

follo

win

g m

anne

r:

NM

IEC

CB

HA

HB

AI

/22

)(

)(

),

(2

−=

=+

Page 15: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

12. K

ullb

ack-

Leib

ler d

iver

genc

e

It co

uld

be u

sefu

l to

defin

e a

dist

ance

bet

wee

n tw

o ve

ctor

dis

tribu

tions

. If

()

f Xx

is a

distr

ibut

ion

of th

e ve

ctor

X, a

nd

()

g Xx

is a

diff

eren

t dist

ribut

ion

of th

e X

, tha

n th

e

dista

nce

betw

een

thes

e di

strib

utio

ns c

an b

e w

ritte

n as

()||

()

()

()l

og(

)f

xg

xf

Df

dg

∞ −∞

=∫

XX

X

xx

xx

(

20)

For

a sin

gle

imag

e, t

he e

ntro

py i

s no

rmal

ly c

alcu

late

d fro

m t

he i

mag

e in

tens

ity

histo

gram

in w

hich

the

prob

abili

ties

{ p1,

p 2, p

3, …

, p n

} ar

e th

e hi

stogr

am e

ntrie

s. If

all v

oxel

s in

an

imag

e ha

ve th

e sa

me

inte

nsity

a, t

he h

istog

ram

con

tain

s a

singl

e no

n-

zero

ele

men

t with

pro

babi

lity

of 1

, and

the

entro

py o

f thi

s im

age

is 0.

If th

is un

iform

imag

e w

ere

to in

clud

e so

me

noise

, the

n th

e hi

stogr

am w

ill c

onta

in a

clus

ter

of n

on-z

ero

entri

es a

roun

d a

peak

at

the

aver

age

inte

nsity

val

ue.

So t

he

addi

tion

of n

oise

to th

e im

age

tend

s to

equ

aliz

e th

e pr

obab

ilitie

s, w

hich

incr

ease

s th

e

entro

py. O

ne c

onse

quen

ce is

that

inte

rpol

atio

n of

an

imag

e m

ay s

moo

th th

e im

age,

whi

ch

can

redu

ce

the

nois

e,

and

cons

eque

ntly

‘s

harp

en’

the

histo

gram

. Th

is

shar

peni

ng o

f the

hist

ogra

ms r

educ

es e

ntro

py.

App

licat

ion

of e

ntro

py f

or in

tram

odal

ity im

age

regi

stra

tion:

The

goa

l now

is to

calc

ulat

e th

e en

trop

y of

a d

iffer

ence

imag

e. If

two

perf

ectly

alig

ned

iden

tical

imag

es

are

subt

ract

ed th

e re

sult

is an

ent

irely

uni

form

imag

e th

at h

as z

ero

entro

py. F

or t

wo

imag

es th

at d

iffer

by

nois

e, th

e hi

stog

ram

will

be

blur

red ,

giv

ing

high

er e

ntro

py,

as i

s sh

own

in t

he F

igur

e be

low

. A

ny m

isreg

istra

tion,

how

ever

, w

ill l

ead

to e

dge

artif

acts

that

fur

ther

inc

reas

e th

e en

tropy

. V

ery

simila

r im

ages

can

the

refo

re b

e

regi

ster

ed b

y ite

rativ

ely

min

imiz

ing

the

entr

opy

of th

e di

ffer

ence

imag

e.

Ivo

D. D

inov

UCL

A S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cour

ses &

Stu

dent

s Fi

gure

1 H

isto

gram

is b

lurre

d if

the

imag

es a

re n

ot a

ligne

d (G

uido

Ger

ig)

12. J

oint

ent

ropy

Jo

int e

ntro

py m

easu

res

the

amou

nt o

f inf

orm

atio

n in

the

two

imag

es c

ombi

ned .

If th

ese

two

imag

es a

re to

tally

unr

elat

ed, t

hen

the

join

t ent

ropy

will

be

the

sum

of t

he

entro

pies

of

the

indi

vidu

al i

mag

es. T

he m

ore

simila

r th

e im

ages

are

, the

low

er t

he

join

t ent

ropy

com

pare

d to

the

sum

of

the

indi

vidu

al e

ntro

pies

. The

con

cept

of

join

t

entro

py c

an b

e vi

sual

ized

usin

g a

join

t hi

stogr

am c

alcu

late

d fro

m t

he i

mag

es,

as

show

n in

Fig

ure

belo

w. F

or a

ll vo

xels

in th

e ov

erla

ppin

g re

gion

s of

the

imag

es w

e

plot

the

inte

nsity

of

this

voxe

l in

imag

e A

agai

nst t

he in

tens

ity o

f th

e co

rres

pond

ing

voxe

l in

im

age

B. T

he j

oint

his

togr

am c

an b

e no

rmal

ized

by

divi

ding

by

the

tota

l

num

ber

of v

oxel

s N,

and

reg

arde

d as

a j

oint

pro

babi

lity

dens

ity f

unct

ion

(PD

F)

P(a;

b)

of i

mag

es A

and

B.

The

num

ber

of e

lem

ents

in t

he P

DF

can

eith

er b

e

dete

rmin

ed b

y th

e ra

nge

of in

tens

ity v

alue

s in

the

two

imag

es, o

r fro

m a

par

titio

ning

of th

e in

tens

ity sp

ace

into

“bi

ns”.

Page 16: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

Def

initi

on: T

he jo

int e

ntro

py H

(A,B

) is

ther

efor

e gi

ven

by:

∑∑

−=

ab

ba

Pb

aP

BA

H)

,(

log

),

()

,(

(2

1)

whe

re a

, b r

epre

sent

the

orig

inal

imag

e in

tens

ities

or

the

sele

cted

inte

nsity

bin

s. A

s

can

be s

een

from

the

Fig

ure

belo

w,

the

join

t hi

stogr

ams

disp

erse

or

blur

with

incr

easi

ng m

isre

gistr

atio

n an

d th

us in

crea

ses t

he e

ntro

py.

Figu

re 2

exa

mpl

e 2D

his

togr

ams

of th

e he

ad im

ages

(a) i

dent

ical

MR

imag

es, (

b) M

R a

nd C

T im

ages

(Hill

et a

l., “

Vox

el s

imila

rity

mea

sure

s fo

r au

tom

ated

imag

e re

gistr

atio

n,”

Visu

aliza

tion

in B

iom

edic

al

Com

putin

g 19

94, v

ol. P

roc.

SPI

E 23

59, p

p. 2

05–2

16, 1

994.

)

Alig

ned

Tra

nsla

ted

by 2

mm

Tra

nsla

ted

by 5

mm

Ivo

D. D

inov

UCL

A S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cour

ses &

Stu

dent

s

13.

Refe

renc

es:

[1] B

rillo

uin,

L.,

Scie

nce

and

info

rmat

ion

theo

ry A

cade

mic

Pre

ss, N

ew Y

ork,

195

6.

[2] B

rook

s, D

anie

l R.,

and

Wile

y, E

. O.,

Evol

utio

n as

Ent

ropy

, Tow

ard

a U

nifie

d

Theo

ry o

f Bio

logy

, Sec

ond

Editi

on, U

nive

rsity

of C

hica

go P

ress

, Chi

cago

, 198

8.

[3] C

ampb

ell,

Jere

my,

Gra

mm

atic

al M

an, I

nfor

mat

ion,

Ent

ropy

, Lan

guag

e, a

nd L

ife,

Sim

on a

nd S

chus

ter,

New

Yor

k, 1

982.

[4] C

over

, T. M

., an

d Th

omas

J. A

., El

emen

ts of

Info

rmat

ion

Theo

ry, J

ohn

Wile

y an

d

Sons

, New

Yor

k, 1

991.

[5] D

eLill

o, D

on, W

hite

Noi

se, V

ikin

g/Pe

ngui

n, N

ew Y

ork,

198

4.

[6] F

elle

r, W

., A

n In

trodu

ctio

n to

Pro

babi

lity

Theo

ry a

nd It

s App

licat

ions

, Wile

y,

New

Yor

k,19

57.

[7] F

eynm

an, R

icha

rd, F

eynm

an le

ctur

es o

n co

mpu

tatio

n, A

ddiso

n-W

esle

y, R

eadi

ng,

1996

.

[8] G

atlin

, L. L

., In

form

atio

n Th

eory

and

the

Livi

ng S

yste

m, C

olum

bia

Uni

vers

ity

Pres

s, N

ew Y

ork,

197

2.

[9] H

aken

, Her

man

n, In

form

atio

n an

d Se

lf-O

rgan

izat

ion,

a M

acro

scop

ic A

ppro

ach

to

Com

plex

Sys

tem

s, Sp

ringe

r-Ver

lag,

Ber

lin/N

ew Y

ork,

198

8.

[10]

Ham

min

g, R

. W.,

Erro

r det

ectin

g an

d er

ror c

orre

ctin

g co

des,

Bell

Syst.

Tec

h. J.

29 1

47, 1

950.

[11]

Ham

min

g, R

. W.,

Codi

ng a

nd in

form

atio

n th

eory

, 2nd

ed,

Pre

ntic

e-H

all,

Engl

ewoo

d Cl

i_s,

1986

.

[12]

Hill

, R.,

A fi

rst c

ours

e in

cod

ing

theo

ry C

lare

ndon

Pre

ss, O

xfor

d, 1

986.

[13]

Hod

ges,

A.,

Ala

n Tu

ring:

the

enig

ma

Vin

tage

, Lon

don,

198

3.

[14]

Hof

stadt

er, D

ougl

as R

., M

etam

agic

al T

hem

as: Q

uesti

ng fo

r the

Ess

ence

of M

ind

and

Patte

rn, B

asic

Boo

ks, N

ew Y

ork,

198

5

[15]

Jone

s, D

. S.,

Elem

enta

ry in

form

atio

n th

eory

Cla

rend

on P

ress

, Oxf

ord,

197

9.

[16]

Knu

th, E

ldon

L.,

Intro

duct

ion

to S

tatis

tical

The

rmod

ynam

ics,

McG

raw

-Hill

,

New

Yor

k, 1

966.

[17]

Lan

daue

r, R.

, Inf

orm

atio

n is

phys

ical

, Phy

s. To

day,

May

199

1 23

-29.

[18]

Lan

daue

r, R.

, The

phy

sical

nat

ure

of in

form

atio

n, P

hys.

Lett.

A, 2

17 1

88, 1

996.

[19]

van

Lin

t, J.

H.,

Codi

ng T

heor

y, S

prin

ger-V

erla

g, N

ew Y

ork/

Berli

n, 1

982.

Page 17: An Introduction to Information Theory and Entropy - UCLA

Ivo

D. D

inov

UC

LA S

tatis

tics

http

://w

ww

.stat

.ucl

a.ed

u/~d

inov

Cou

rses

& S

tude

nts

[20]

Lip

ton,

R. J

., U

sing

DN

A to

solv

e N

P-co

mpl

ete

prob

lem

s, S

cien

ce, 2

68 5

42–

545,

Apr

. 28,

199

5.

[21]

Mac

Will

iam

s, F

. J.,

and

Sloa

ne, N

. J. A

., Th

e th

eory

of e

rror

cor

rect

ing

code

s,

Else

vier

Sci

ence

, Am

ster

dam

, 197

7.

[22]

Mar

tin, N

. F. G

., an

d En

glan

d, J

. W.,

Mat

hem

atic

al T

heor

y of

Ent

ropy

,

Add

ison

-Wes

ley,

Rea

ding

, 198

1.

[23]

Max

wel

l, J.

C.,

Theo

ry o

f hea

t Lon

gman

s, G

reen

and

Co,

Lon

don,

187

1.

[24]

von

Neu

man

n, J

ohn,

Pro

babi

listic

logi

c an

d th

e sy

nthe

sis

of re

liabl

e or

gani

sms

from

unr

elia

ble

com

pone

nts,

in a

utom

ata

stud

ies(

Shan

on,M

cCar

thy

eds)

, 195

6 .

[25]

Pap

adim

itrio

u, C

. H.,

Com

puta

tiona

l Com

plex

ity, A

ddis

on-W

esle

y, R

eadi

ng,

1994

.

[26]

Pie

rce,

Joh

n R

., A

n In

trodu

ctio

n to

Info

rmat

ion

Theo

ry –

Sym

bols

, Sig

nals

and

Noi

se, (

seco

nd re

vise

d ed

ition

), D

over

Pub

licat

ions

, New

Yor

k, 1

980.

[27]

Rom

an, S

teve

n, In

trodu

ctio

n to

Cod

ing

and

Info

rmat

ion

Theo

ry, S

prin

ger-

Ver

lag,

Ber

lin/N

ew Y

ork,

199

7.

[28]

Sam

pson

, Je_

rey

R.,

Ada

ptiv

e In

form

atio

n Pr

oces

sing

, an

Intro

duct

ory

Surv

ey,

Sprin

ger-

Ver

lag,

Ber

lin/N

ew Y

ork,

197

6.

[29]

Sch

roed

er, M

anfr

ed, F

ract

als,

Cha

os, P

ower

Law

s, M

inut

es fr

om a

n In

finite

Para

dise

, W. H

. Fre

eman

, New

Yor

k, 1

991.

[30]

Sha

nnon

, C. E

., A

mat

hem

atic

al th

eory

of c

omm

unic

atio

n B

ell S

yst.

Tech

. J. 2

7

379;

als

o p.

623

, 194

8.

[31]

Sle

pian

, D.,

ed.,

Key

pap

ers

in th

e de

velo

pmen

t of i

nfor

mat

ion

theo

ry IE

EE

Pres

s, N

ew Y

ork,

197

4.

[32]

Tur

ing,

A. M

., O

n co

mpu

tabl

e nu

mbe

rs, w

ith a

n ap

plic

atio

n to

the

Ents

chei

dung

spro

blem

, Pro

c. L

ond.

Mat

h. S

oc. S

er. 2

42,

230

; se

e al

so P

roc.

Lon

d.

Mat

h. S

oc. S

er. 2

43,

544

, 193

6.

[33]

Zur

ek, W

. H.,

Ther

mod

ynam

ic c

ost o

f com

puta

tion,

alg

orith

mic

com

plex

ity a

nd

the

info

rmat

ion

met

ric, N

atur

e 34

1 11

9-12

4, 1

989.

[34]

Buz

ug T

. M. a

nd W

eese

J.,

“Im

age

regi

stra

tion

for

DSA

qua

lity

enha

ncem

ent”

,

Com

pute

rized

Imag

ing

Gra

phic

s 22

103

1998

.

[35]

Tom

Car

ter’

s Not

es: h

ttp://

cogs

.csu

stan

.edu

/~to

m/


Recommended