Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science...

Post on 14-Jan-2016

224 views 0 download

Tags:

transcript

Information Geometry ofInformation Geometry of MaxEnt PrincipleMaxEnt Principle

Shun-ichi AmariShun-ichi AmariRIKEN Brain Science InstituteRIKEN Brain Science Institute

MaxEnt 07’

Information GeometryInformation GeometryInformation GeometryInformation Geometry

Systems Theory Information Theory

Statistics Neural Networks

Combinatorics PhysicsInformation Sciences

Riemannian ManifoldDual Affine Connections

Manifold of Probability Distributions

Math. AI

2

2

1; , ; , exp

22

xS p x p x

Information GeometryInformation Geometry ? ?Information GeometryInformation Geometry ? ?

p x

;S p x θ

Riemannian metric

Dual affine connections

( , ) θ

Manifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability Distributions

1 2 3 1 2 3

1,2,3 { ( )}

, , 1

x p x

p p p p p p

3p

2p1p

p

;M p x

Manifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability Distributions

,

1, 2,3

P x

x

M p

3

21

3

3 1 21

, 1i ii

P x p x p p p

p3

21

M 1 2 3

1 2 3

, ,

1

p p p

p p p

p

2 2 21 2 3 1

i ip

1

13 1 2

22

3

log

,

log

p

p

p

p

InvarianceInvarianceInvarianceInvariance ,S p x

1. Invariant under reparameterization1. Invariant under reparameterization

, ,p x

2. 2. Invariant under different representationInvariant under different representation

, ,y y x p y

2

1 2

21 2

, ,

| ( , ) ( , ) |

p x p x dx

p y p y dy

2 2 i iD

Two StructuresTwo StructuresTwo StructuresTwo Structures

Riemannian metric—Riemannian metric—FisherFisher information information

2ij i jds g d d

Affine connectionAffine connection-- geodesic, straight line-- geodesic, straight line

how curved is the manifold?how curved is the manifold?

Riemannian Structure

2 ( )

( )

( ) ( )

Euclidean

i jij

T

ij

ds g d d

d G d

G g

G E

Kullback-Leibler Divergence

quasi-distance( )

[ ( ) : ( )] ( ) log( )

[ ( ) : ( )] 0 =0 iff ( ) ( )

[ : ] [ : ]

no triangular inequality ---square of distance

Pythagorean Theorem

x

p xD p x q x p x

q x

D p x q x p x q x

D p q D q p

KL-divergence and Riemannian Structure

relation

{ ( , ) : ( , )} ( )

log ( , ) log ( , )( ) [ ]

T

iji j

D p x p x d d G d

p x p xg E

Fisher information matrix ( , )p x

( , )p x d

Affine Connection

covariant derivative

geodesic X=X X=X(t)

( )

c

i jij

X Y

s g d d

minimal distance

straight line

XY

α - connection

( )

(1)

( 1)

(0)

1 1*

2 2

*

:

Levi-Civita (Riemannian)

Exponential connection

Mixture connection

Renyi-Tasallis

EntropyKL-divergence

Affine ConnectionsAffine Connections

e-geodesice-geodesic

m-geodesicm-geodesic

log , log 1 lgr x t t p x t q x c t

, 1r x t tp x t q t

,

q x

p x

*( , )

DualityDualityDualityDuality

, , , i jijX Y X Y X Y g X Y

Riemannian geometryRiemannian geometry::

X

Y

X

Y

*

*, , ,X XX Y Z Y Z Z Y

1 2{ ( , )}S p x x1 2, 0,1x x

1 2{ ( ) ( )}M q x q x

Independent Distributions

Dually flat manifold

2 2

-coordinates -coordinates

potential functions ,

0

, exp : exponential family

: cumulant generating functio

char

n

: negative entrop ay--- cteriz

ijij

i j i j

i i

i i

g g

p x x

ed by flatness

S = {p(x), x discrete}

Dually Flat ManifoldDually Flat ManifoldDually Flat ManifoldDually Flat Manifold

1. Potential Functions1. Potential Functions

---convex (Legendre transformation)

2. Divergence2. Divergence :D p q

3. Pythagoras Theorem3. Pythagoras Theorem

: : :D p q D q r D p r

4. Projection Theorem4. Projection Theorem

p

rq

KL-divergence

Projection Theorem

arg min [ : ]s Mq D p s

arg min [ : ]s Mq D s p

m-geodesic

e-geodesic

p

sq

M

S

Applications to Statistics

curved exponential family:

, expp x u u u x

1

1 n

k

x x kn

: estimation

u

ˆ x

1, 2( , ) ,... np x u x x x( , ) exp{ ( )}p x x

1ˆ( ,..., )nu x x

0 0:H u u : testing

High-Order AsymptoticsHigh-Order AsymptoticsHigh-Order AsymptoticsHigh-Order Asymptotics

1

1

, (u) : , ,

u u , ,

n

n

p x x x

x x

ˆ ˆ Te E u u u u

1 22

1 1e G G

n n

11G G :Cramér-Rao

2 2 2

2e m mM AG H H

Other Applications

• Systems theory• Information theory• Neuromanifold• Belief propagation• Boosting (Murata-Eguchi)• Higher-order correlations• Mathematics --- Orlicz space (Pistone, Gracceli)• Physics ---

Amari-Nagaoka, Methods of Information Geometry, AMS & Oxford U., 2000

Amari, Differential-Geometrical Methods of Statistics, Springer, 1985Kass and Vos, Geomtrical Foundations of Asymptotic Inference, Wiley, 1997Murrey and Rice, Differential Geometry and Statistics, Chapman, 1993

Exponential Family : dually flat

,S p x

,

, exp

exp

ii

ii

p

p k

x

E

x

x x

x

Two coordinate systems

1

1

, , : e-flat

, , : m-flat

n

n

L

L

Exponential Familyexample (1) : discrete distributions

0 , ,

, exp

log ,

( ) log

n

ii

i i

o

i i i

i i

X x x

p x x

p

p

E x p

p p

L

0log

log 1 exp i

p

Negative entropy

example (2) : Gaussian distributions

example (3) : AR model

2

2

22

2 2 2

1 2 21

2 2 21 2

1, exp

22

1exp log 2

2 2

exp

,

xp x

x x

x x

E x E x

0 1 2, , , , ,t i t i tx a x x x x x L L

Legendre transformation

i i

i

i

min

: entropy

cumulant generating function

H

Divergence

:

: log

P P QD P Q

pKL P Q p

q

x

xx

x

1:

2i j

ijD P P dP g d d

Pythagorean Theorem

: : :D P Q D Q R D P R

P

R

Q

m-flat

e-flat

Divergence and Entropy

Max entropy : 0i

i

,0 exp 0 : uniformp x

, : ,0 0D p p H x x

,0p x

equi-divergence: equi-entropy

Dual Foliation

1 2

1 2

;

;

1 1 2

2 2 1

: fixed, free

: fixed, free

M

E

1M c 2E d

2E 0 Pythagorean theorem

1 2M E

Maximum Entropy

, 1, ,i iE a x c i k L

max H : , 1, ,i iE a i k L x

1 2

1 2

;

;

min : 0

P MD P

c H

ˆ ˆmin : 0 : : 0P M

D P D P P D P

P

P0

Simple Example : independence

1 2 1 2

1 2 121 2 1 2

, , 0,1

, exp

x x x x

p x x x x

x

x

1 2 12

1 2 12

, ;

, ;

12 1 2

i iE x

E x x

, 1, 2

maxi iE x c i

H

12 0E

M c

E 00

Simple example : Gaussian

exp iip x x

2

1 2 3

, ;

, ; ,

E x E x

L

L

2

2

1exp

2p x x

Time Series

0 1 2, , , ,x x x L Lx

: , 0,1

: ,

t i t i t t

t i t i

AR x a x N

MA x b

:

0t i t i

i

x h

t tx h

2

,i iiS H e H z h z

Geometry

, ,

1log , log ,

2

: dually flat

1log

4

iji j

S S S

g S S d

S

H S S d

Potentials

1 11 0

0 0

12

21

22

1: 1 log

2

const

H

H

S SD S S d

S S

H

Stochastic Realization

1

1 2

0

: autocorrelation

; ,

;

, max

: exp. fam.

i t t i

k

E x x

S M H

S E S AR k

Lc

c

Dual Problem

: inverse autocorrectioni, 1, ,i ic i k L

minimize H

modelMA

1inverse autocorrelation cost

t

tS

1 12 2

1

1

entropy :

geometry

-divergence : 1

max entropy, ,

family of probability distributions

T

T

D P Q p x q x dx

E E D

p c c D

x xx

x x x

Rényi-Tsallis entropy Manifold of positive measures m(x)

flat 1

21

{1 ( ) }2

1

2

H p x

q

Entropy (alpha-entropy) is a fundamental quantity

---- It is given rise to from a fundamental geometrical structure.

KL-divergence is derived therefrom.