+ All Categories
Home > Documents > Information Theory Differential Entropy

Information Theory Differential Entropy

Date post: 07-Apr-2018
Category:
Upload: ganeshkumarmuthuraj
View: 224 times
Download: 0 times
Share this document with a friend
29
1 Differential Entropy   吳家麟教授
Transcript
  • 8/6/2019 Information Theory Differential Entropy

    1/29

    1

    Differential Entropy

  • 8/6/2019 Information Theory Differential Entropy

    2/29

    2

    Definition

    Let be a random variable with cumulative

    distribution function If F(x) iscontinuous, the r.v. is said to be continuous. Let

    f(x) = F(x) when the derivative is defined. If

    , then f(x) is called the pdf for .

    The set where f(x) > 0 is called the support set

    of .

    ( ) 1=

    dxxf

    X

    X

    X

    ).xX(PF(x) r =

  • 8/6/2019 Information Theory Differential Entropy

    3/29

    3

    Definition

    The differential entropy of a continuous r.v.

    with a density function f(x) is defined as

    (1)

    where S is the support of the r.v.

    Since depends only on f(x), sometimes the

    differential entropy is written as rather then.

    ( )fh

    ( )Xh X

    ( ) ( ) ( ) dxxfxfXhs= log

    ( )Xh

    ( )Xh

  • 8/6/2019 Information Theory Differential Entropy

    4/29

    4

    EX.1: (Uniform distribution)

    Note: For a

  • 8/6/2019 Information Theory Differential Entropy

    5/29

    5

    Ex. 2: (Normal distribution)

    Let , then

    ( )

    ( )

    [ ] ( )

    22

    2

    22

    2

    2

    2

    2

    2ln2

    1

    2

    1

    2ln

    2

    1

    2ln2

    ln

    21

    +=

    +=

    =

    =

    XE

    x

    x

    h

    22

    2

    22

    1~

    x

    eX

    =

  • 8/6/2019 Information Theory Differential Entropy

    6/29

    6

    Changing the base of the logarithm, we have

    nats2ln2

    1

    2ln2

    1ln

    2

    1

    2ln2

    1

    2

    1

    2

    2

    2

    e

    e

    =

    +=

    +=

    ( ) bits.2log2

    1 2eh =

  • 8/6/2019 Information Theory Differential Entropy

    7/29

    7

    Theorem 1

    Let be a sequence of rvs drawn

    i.i.d. according to the density f(x). Then

    proof: The proof follows directly from the weak

    law of the large numbers.

    ( )

    ( )[ ] ( ) yprobabilitinlog

    ,,log1

    21

    XhXfE

    XXXf

    nn

    =

    L

    nXXX ,,, 21 L

  • 8/6/2019 Information Theory Differential Entropy

    8/29

    8

    Def: For>0 and any n, we define the typical set

    w.r.t. f(x) as follows:

    ( )nA

    ( ) ( ) ( ) ( )

    ( ) ( )in

    in

    nn

    nn

    XfXXXfwhere

    XhXXXfn

    SXXXA

    121

    2121

    ,,

    ,,,log1:,,

    =

    =

    =

    L

    LL

  • 8/6/2019 Information Theory Differential Entropy

    9/29

    9

    Def: The volume Vol(A) of a set ARn is defined as

    Thm: The typical set has the following

    properties:

    ( ) = AndxdxdxAvol

    L

    21

    ( )nA

    ( )( )( )( ) ( )( )( )( ) ( ) ( )( ) largelysufficientnfor21Vol3.

    nallfor2Vol2.

    largelysufficientnfor1Pr.1

    +

    >

    Xhnn

    Xhnn

    n

    A

    A

    A

  • 8/6/2019 Information Theory Differential Entropy

    10/29

    10

    Thm: The Set is the smallest volume set with

    probability 1-, to the first order in the exponent.

    The volume of the smallest set that contains most ofthe Prob. Is approximately 2nh. This is an n-D volume, so

    the corresponding side length is (2nh)1/n=2h.

    The differential entropy is the logarithm of the

    equivalent side length of the smallest set that contains

    most of the Prob. low entropy implies that the rv is confined to a small

    effective volume and high entropy indicates that the rv is

    widely dispersed.

    ( )nA

  • 8/6/2019 Information Theory Differential Entropy

    11/29

    11

    Relation of Differential Entropy to Discrete Entropy

    Spose we divide the range of into bins

    of length . Lets assume that the densityis continuous within the bins.

    Quantization of a continuous rvx

    f(x)

    X

  • 8/6/2019 Information Theory Differential Entropy

    12/29

    12

    By the mean value theorem, there is a value

    within each bin such that

    Consider the quantized rv , which is defined by

    Then the prob. that is

    ( ) ( )+

    =

    )1(i

    ii dxxfxf

    ( ) ( )+

    ==

    )1(i

    iii xfdxxfP

    ix

    X

    ( )+

  • 8/6/2019 Information Theory Differential Entropy

    13/29

    13

    The entropy of the quantized version is

    Since

    ( ) ( )1==

    xfxf

    i

    ( )( ) ( )( )

    ( ) ( ) ( )( ) ( ) =

    =

    =

    =

    loglog

    loglog

    log

    log

    ii

    iii

    ii

    ii

    xfxf

    xfxfxf

    xfxf

    PPXH

  • 8/6/2019 Information Theory Differential Entropy

    14/29

    14

    If f(x)logf(x) is Riemann integrable, then

    This proves the following

    Thm: If the density f(x) of the rv is Riemann

    integrable, then

    Thus the entropy of an n-bit quantization of a

    continuous rv is approximately

    ( ) ( ) ( ) ( ) 0as,dxloglog xfxfxfxf ii

    X

    ( ) ( ) 0as,log =+ XhfhXH

    quantizeruniformbit-nafor2 -n=Since

    X ( ) nXh +

  • 8/6/2019 Information Theory Differential Entropy

    15/29

    Joint and Conditional Differential Entropy:

    )(),()|(

    )|(log),()|(

    )(log)(),,,( 21

    YhYXhYXh

    dxdyyxfyxfYXh

    dxxfxfXXXhnnn

    n

    =

    =

    =

    L

    15

  • 8/6/2019 Information Theory Differential Entropy

    16/29

    Theorem (Entropy of a multivariate normal distribution)

    ( )

    .oftdeterminanthedenoteserewh

    bits)2log(2

    1),(),,,(

    ThenK.matrixcovarianceandmeanwithondistributinormaltemultivariaahave,,,Let

    21

    21

    KK

    KeKhXXXh

    XXX

    n

    nn

    n

    ==L

    L

    16

  • 8/6/2019 Information Theory Differential Entropy

    17/29

    17

    ( )

    ( ) ( )

    ( ) ( ) ( )

    ( )( ) ( )

    ( )( )( ) KKXXE

    KXKXE

    dxKxKxxffh

    K

    xfKNXXX

    n

    ij

    ijjjii

    n

    ij

    jjijii

    nT

    xKx

    nnn

    T

    )2ln(21

    21

    )2ln(2

    1

    2

    1

    2ln~~

    2

    1)()(

    Then

    exp

    2

    1)~(),(~),,,(:pf

    1

    1

    2

    11

    ~~

    2

    1

    2

    121

    1

    +

    =

    +

    =

    =

    =

    L

  • 8/6/2019 Information Theory Differential Entropy

    18/29

    ( )( )[ ]( )

    ( )

    ( )

    bits)2log(21

    nats)2ln(2

    1

    )2ln(2

    1

    2

    )2ln(2

    1

    2

    1

    )2ln(2

    1

    2

    1

    )2ln(21

    21

    )2ln(2

    1

    2

    1

    1

    1

    1

    Ke

    Ke

    K

    n

    KI

    KKK

    KKK

    KKXXE

    n

    n

    n

    j

    n

    jj

    j

    n

    jj

    j i

    nijji

    n

    ij

    ij

    iijj

    =

    =

    +=

    +=

    +=

    +=

    +=

    18

  • 8/6/2019 Information Theory Differential Entropy

    19/29

    Relative Entropy and Mutual Information

    =

    =

    dxdyyfxf

    yxfyxfYXI

    g

    ffgfD

    )()(),(log),();(

    log)//(

    19

    ( ))()(//),();(),()()()|()()|()();(

    yfxfyxfDYXI

    YXhYhXhXYhYhYXhXhYXI

    =

    +===

  • 8/6/2019 Information Theory Differential Entropy

    20/29

    Remark:

    The mutual information between two continuous r.vs is the limit of

    the mutual information between their quantized versions.

    20

    ( ) );(log)|(log)(

    )|()();(

    YXIYXhXh

    YXHXHYXI

    =

    =

  • 8/6/2019 Information Theory Differential Entropy

    21/29

    21

    Properties of

    h(X)Y)h(X

    0Y)(X;I

    .01loglog

    )inequalitysJensen'(fgflog

    fglogfg)D(f-:pf

    0g)D(f

    Y)I(X;,q)D(p,)(

    ss

    ==

    =

    g

    xh

    s

  • 8/6/2019 Information Theory Differential Entropy

    22/29

    22

    alog)()(

    entropyaldifferentithechangenotdoesntranslatio:)()(Theorems

    )()X,,X,(Xh

    ),,()X,,X,(Xh

    n21

    1

    121n21

    +=

    =+

    =

    =

    =

    xhaXh

    xhcXh

    Xh

    XXXXh

    i

    n

    i

    ii

    L

    LL

  • 8/6/2019 Information Theory Differential Entropy

    23/29

    23

    det(A)logh(X)h(AX):Corollary

    log)(log)(log)(f

    ))dy

    a

    y(f

    a

    1(log)(

    1

    )y(flog)(f-h(aX)

    ),(

    1

    )(f,ThenaX.let Y:

    x

    x

    YY

    Y

    +=

    +=+=

    =

    =

    ==

    axhadxxx

    a

    yf

    a

    dy

    anda

    y

    faypf

    x

    x

    y

    x

  • 8/6/2019 Information Theory Differential Entropy

    24/29

    24

    Theorem : The multivariate normal distribution

    maximizes the entropy over all distributions

    with the same variance.

    )k,0(N~Xiffequalitywith,e)(2log2

    1h(x)

    Then,),1,XEXK,(i.e.XXEK

    econvariancandmeanzerohaveRvector XrandomLet the

    n

    n

    jiij

    T

    n

    K

    nji

    ==

  • 8/6/2019 Information Theory Differential Entropy

    25/29

    25

    Pf:

    )()(log)(log)(

    )log()(0

    .~)x~(andformquadraticais)x~(logthatNote

    vector.K)(0,aofdensitythebe

    .,allfor~)~g(satisfyingdensityanybe)x~(g

    k

    kkk

    k

    kk

    k

    hghghggh

    gggD

    Kxdxx

    Let

    jiKxdxxxLet

    k

    ijji

    ijji

    +===

    =

    =

    =

  • 8/6/2019 Information Theory Differential Entropy

    26/29

    26

    variance.samethewith

    onsdistributialloverentropythemaximizesondistributiGaussianthe

    (x)logformquadraictheofmomentssametheyieldandgthat

    factthefromfollowsglogonsubstitutithewhere

    kk

    k

    h(x)entropyaldifferentiwithvariablerandomabeXLet

  • 8/6/2019 Information Theory Differential Entropy

    27/29

    27

    natsinbeh(x)Leterror.

    predictionexpectedthebe)X-XE(.letandXofestimateanbeXLet

    h(x)entropyaldifferentiwithvariablerandomabeXLet

    2

    XofmeantheisXandGaussianisXiffequalitywith

    e2

    1)X-E(X

    XestimatorandXr.v.anyFor:Theorem

    2h(x)2

    e

  • 8/6/2019 Information Theory Differential Entropy

    28/29

    28

    2

    2h(x)

    2

    2

    2

    2ln2

    1h(x),i.e.

    varance]givenaforentropymaximumthehasondistributi[Gaussian

    (2)e

    2

    1(X)var

    ]for XestimatorbesttheisXofmeanthe[E(X))-E(X

    (1))X-(XEmin)X-E(X

    thenXofestimatoranybeXLet:pf

    e

    e

    x

    =

    =

  • 8/6/2019 Information Theory Differential Entropy

    29/29

    29

    inequalitysFano'

    e2

    1(Y))X-E(X

    thatfollowsit

    (Y)XestimatorandYninformatiosideGiven:Gorollary

    Gaussian.isXifonly(2)inequalityandX)ofmeantheisx,(i.e.

    estimatorbesttheisxifonly(1),inonlyequality,haveWe

    Y)2h(X2

    e


Recommended