+ All Categories
Home > Documents > Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(•...

Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(•...

Date post: 25-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
41
Elements of Informa.on Theory
Transcript
Page 1: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Elements  of  Informa.on  Theory  

Page 2: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Materials  from  the  book  of  T.  M.  Cover  and  J.  M.  Thomas,  “Element  of  informa.on  theory”,  Wiley.    

Page 3: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Measure  of  Informa.on  (of  an  event)  

•     Given  a  probability  mass  func.on  (pmf)  p(x)    of  a  random  variable  X.  

•   The  informa,on,  associated  to  an  event  with  probability  p(x),  is  defined  as  

•  Less  frequent  event  !!!  A  LOT  OF  informa.on.    

•  More  frequent  event  !!!  SMALL  informa.on.  •   Base  of  the  Log  is  2  (we  do  not  lose  generality).  €

I(x) = −log p(x)[ ] Units:  bit  

Page 4: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).
Page 5: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).
Page 6: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Discrete  Entropy  

•     Expected  value  of  the  informa.on  

•   IT  IS  A  SCALAR  VALUE.    •  It  can  be  considered  as  a  DISPERSION  MEASURE  of  the  pmf  p(x).      •  The  nota.on  H(X)  means  that  is  related  to  the  r.v.  X.  

•  H(X)  represents  the  UNCERTAINTY  over  the  values  that  the  random  variable  X  can  take.    

H(X) = HX = − p(x = i)i=1

N

∑ log p(x = i)[ ]

Page 7: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Discrete  Entropy  

Page 8: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).
Page 9: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).
Page 10: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).
Page 11: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

The  entropy  does  not  depend  on  the  values  that  the  r.v.  X  can  take.  (in  the  example  above  they  can  be  considered  generic  math-­‐variables  or  simply  “le[ers”….  )    

Page 12: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

IMPORTANT:  

H(X)=0    if  the  probability  is  of  type  0,0,0,1,0,….0      

H(X)=log  N    (i.e,  its  maximum  value)    if  the  probability  is  of  type  1/N,1/N,  1/N,  1/N,  1/N,….  1/N      

Page 13: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Entropy:  measure  of  dispersion  

•     H(X)  is  a  measure  of  DISPERSION  (UNCERTAINTY):  

•   we  do  not  consider  the  con.nuous  scenario:  Differen'al  entropy  (con'nuous  case)  is  max  when  p(x)  is  a  Gaussian  density.    

….  

MAX  DISCRETE  ENTROPY:  UNIFORM  PMF  

1N

1N

1N

1N

1

HX = log2 N

HX = 0€

0log2 0 = 0MIN  DISCRETE  ENTROPY:  DELTA  

Page 14: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Rela.onship  with  the  variance  •  Another  dispersion  measure  is  the  variance.  BUT  the  variance  depends  on  the  

support  of  the  r.v.  X    (i.e.,  the  values  than  X  can  take).  

•  For  instance,  we  can  permute  the  posi.ons  of  the  deltas  and  the  entropy  does  not  change.  

In  this  two  pmfs:  the  entropy  is  the  same!!!  But  the  variance  no!  

Page 15: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).
Page 16: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

-­‐ What  is  the  more  “informa.ve”  system?  The  first  one,  2  events  with  probability  of  0.9  and  0.1  The  second  one,  2  events  with  probability  of  0.5  and  0.5  

We  have  more  “ques.ons”  in  the  second  case….      

Page 17: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

[              ]    

[              ]    

Page 18: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Joint  Entropy  of  two  r.v.’s  X,  Y  

[                                  ]    

Page 19: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Condi.onal  Entropy  -­‐  Y|X  

[                                  ]    

This  guy  does  not  need  presenta.on…  it  is  a  standard  entropy!  

Page 20: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Rela,onship  among  entropy,  joint  entropy  and  condi,onal  entropy  

Page 21: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).
Page 22: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

This  is  the  joint  pmf…  

Are  you  able  to  find  marginal  and  condi.onal  pmfs?  

“Joint”  

Page 23: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

[            ]    

                   Rela,ve  entropy  –  KL  divergence  

Page 24: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

                   Rela,ve  entropy  –  KL  divergence  

Note  that  again  it  is  not  symmetric,  and  it  is  quite  useful  for  the  the  causality  (where  important  concept,  especially  in  biomedical  applica.ons).  

Page 25: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).
Page 26: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

MUTUAL  INFORMATION  

[                  ]    

It  is  symmetric.  

Page 27: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

MUTUAL  INFORMATION  

IMPORTANT:  WE  CAN  STUDY  DEPENDENCY/INDEPENDENCY  BETWEEN  RANDOM  VARIABLES  (different  from  the  correla.on  coefficient…).    

Page 28: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Rela,onship  between  ENTROPY  and    MUTUAL  INFORMATION  

Page 29: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Ex-­‐Joint  

Page 30: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

“informa,on  can’t  hurt”  

Page 31: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).
Page 32: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

SUMMARY  

•     Recall  the  defini.ons:  

H(X,Y ) = HXY = − p(x = i,y = j)i=1

N

∑ log p(x = i,y = j)[ ]j=1

L

I(X;Y ) = IXY = − p(x = i,y = j)i=1

N

∑ log p(x = i)p(y = j)p(x = i,y = j)

⎣ ⎢

⎦ ⎥

j=1

L

p(x,y) = p(y | x)p(x)p(x,y) = p(x | y)p(y)

Recall  that:  

H(X |Y ) = HX |Y = − p(x = i,y = j)i=1

N

∑ log p(x = i | y = j)[ ]j=1

L

H(Y | X) = HY |X = − p(x = i,y = j)i=1

N

∑ log p(y = j | x = i)[ ]j=1

L

Page 33: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

SUMMARY  -­‐  RELATIONSHIPS  

Page 34: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

SUMMARY  -­‐  RELATIONSHIPS    

Page 35: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

RELATIONSHIPS  

HX

HY

IXY

HX |Y

HY |X

HXY

Red:  Hx  Yellow:  Hy  Red+Yellow=Hxy  (joint)  

HXY

HX

HY

IXY

HX |Y

HY |X

Page 36: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

HXY

HX

HY

IXY

HX |Y

HY |X

We  can  obtain  the  inequali.es:  

HXY ≤ HX +HY

HXY = HX +HY − IXYHXY = HX |Y +HY |X + IXYHXY = HX +HY |X

HXY = HY +HX |Y€

HX = HX |Y + IXYHY = HY |X + IXY

IXY = HX −HX |Y

IXY = HY −HY |X

IXY = HX +HY −HXY

IXY = IYX

HX ≤ HXY ≤ HX +HY

HY ≤ HXY ≤ HX +HY

RELATIONSHIPS  

Page 37: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Independent  Variables  

HXY

HX

HY

IXY = 0

HX |Y

HY |X

HX

HY

HX |Y

HY |X€

HXY

HX = HX |Y

HY = HY |X

HXY= HX +HY

The  joint  entropy  is  max,  and  I(X,Y)  is  min  

Page 38: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Case  X=Y  (totally  dependent)  

HXY

HX

HY

IXY = HX = HY = HXY

IXY

HXY = HX = HY = IXY

HX |Y = 0HY |X = 0€

IXY = HX = HY

Page 39: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Important  formulas  

•     Recall:  

0 ≤ HX ≤ log2 M

0 ≤ HY ≤ log2 L

(HY =)HX ≤ HXY ≤ HX +HY

p(x)  delta   p(x)  uniform  

X=Y   Independent  variables  

0 ≤ IXY ≤ HX (= HY ) X=Y  

0 ≤ HX |Y ≤ HX

0 ≤ HY |X ≤ HY

X=Y  

X=Y  

Independent  variables  

Independent  variables  

Independent  variables  

Page 40: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

More  processing  on  the  data,  more  loss  of  informa,on….  

Data-­‐processing  inequali,es  

Page 41: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take).

Some  Material  is  from  the  book  of  T.  M.  Cover  and  J.  M.  Thomas,  “Element  of  informa.on  theory”,  Wiley.    


Recommended