+ All Categories
Home > Documents > Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools ›...

Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools ›...

Date post: 23-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
138
Applications of Information Geometry Clustering based on Divergence: Cluster Center Stochastic Reasoning: Belief Propagation in Graphical Model Support Vector Machine Bayesian Framework and Restricted Boltzmann Machine Natural Gradient in Multilayer Perceptron Learning Independent Component Analysis Sparse Signal Analysis and Minkovskian Gradient Convex Optimization
Transcript
Page 1: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Applications of Information Geometry

Clustering based on Divergence: Cluster CenterStochastic Reasoning: Belief Propagation in Graphical ModelSupport Vector MachineBayesian Framework and Restricted Boltzmann MachineNatural Gradient in Multilayer Perceptron LearningIndependent Component AnalysisSparse Signal Analysis and Minkovskian GradientConvex Optimization

Page 2: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Clustering of Points1 2 3{ , , ,..., }ND

Page 3: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Clustering : center of a cluster of points

1, , mC

arg min , ii

D

Center of C

C1

23

m

Page 4: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

is simply the arithmetic average in the dual coordinate system

k‐means:  clustering algorithm 

*

[ : ] ( ) ( )[ : ]

( ) 01*

i i i

i i

i

i

DD

m

Page 5: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

k‐means:  clustering algorithm

1. Choose k cluster centers2. Classify patterns of D  into clusters3. Calculate new cluster centers

Page 6: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Voronoi Diagram:  Boundaries of Clusters

1 2[ : *] [ : *]D D

Page 7: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Total Bregman divergence (Vemuri)

2

( ) ( )tBD( : )

1 | ( ) |p qp q

p qq

Page 8: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

t-center of C

2

*

1

1

i i

i

i

i

ww

w

Page 9: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Conformal change of divergence

: :D p q q D p q

ij ijg p g

( )

logijk ijk k ij j ik i jk

i i

T T s g s g s g

s

Page 10: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

t-center is robust

1, , ;

1; ,

nE

n

y

z y

influence fun ;ction z y

robust as : c z y

Page 11: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

1, yGw

z y

y

21

1

y y

yw

w

y

y

Robust: is boundedz

21Euclidean case 2

f x

1

2,

1

,

G

yz yy

z y y

1

1 [ : ] [ : y]minimize ( )1

i

i N

D DN w w

Page 12: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

MPEG7 database• Great intraclass variability, and small interclass dissimilarity.

Page 13: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Shape representation

Page 14: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

First clustering then retrieval

Page 15: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Other TBD applications

Diffusion tensor imaging (DTI) analysis [Vemuri]• Interpolation• Segmentation

Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari and Frank Nielsen, Total Bregman Divergence and its Applications to DTI Analysis, IEEE TMI, to appear

Page 16: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Information Geometryof Stochastic Reasing

Belief Propagation in Graphical Model

• Shun‐ichi Amari (RIKEN BSI)• Shiro Ikeda (Inst. Statist. Math.)• Toshiyuki Tanaka (Kyoto U.)

Page 17: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Stochastic Reasoning:  Graphical Model

( , , , , )p x y z r s

( , , | , ), , ,... 1, 1p x y z r sx y z

Page 18: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Stochastic Reasoningq(x1,x2,x3,…| observation)

X= (x1 x2 x3 …..) x = 1, ‐1

X= argmax q(x1, x2 ,x3 ,…..)     maximum likelihood

Xi = sgn E[xi]     least bit error rate estimator

Page 19: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Mean Value Marginalization: projection to independent distributions

0 1 1 2 2 0( ) ( ) ( )... ( ) ( )n nq q x q x q x q x x

1, 1( ) ( ..., ) .. ..i i n i nq x q x x dx dx dx

0[ ] [ ]q q E x E x

Page 20: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

1

1

1

1 2

exp

,

1, 1

e p

,

x

s

ij

L

i j i

i i r qr

r r i i s

i

i

q k x c

c c x x r

q w x x h x

i i

x r i i

x

x x

x

Boltzmann machine, spin glass, neural networksTurbo Codes, LDPC Codes

cliques

Page 21: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Computationally DifficultComputationally Difficult

exp r q

q E

q c

x x

x x

mean‐field approximation

belief propagation

tree propagation, CCCP (convex‐concave)

Page 22: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Information Geometry ofMean Field Approximation

• m‐projection• e‐projection

q = argmin D[q:p]q = argmin D[p:q]

( )[ : ] ( )log( )x

q xD q p q xp x

0e

0m 0 { ( )}i i iM p x

0( )p x M

Page 23: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

0

*

*

(x, *)

(x, *) *

( ) *( )( )*( )

( ), * ( *) ( ) *( ) 0

[ ] [ ]

px

q p

p q M

p x

q x p xt xp x

t x x x q x p x

E x E x

m‐projection keeps the expectation of x

Page 24: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Information GeometryInformation Geometry

0 0 0 0 0, exp

, expr r r r r r

M p

M p c

x x

x x x

1, ,r L

q x

rM

'rM

0M

( ) exp{ ( )rq x c x

Page 25: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Belief Prop Algorithm

0M

rM

'rM

r

'r

r

'r

Page 26: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

0 0

1 t0

1 1'

'11

0

( , ) ( , )

( , ) : belief for ( )

t tr r r

t tr r r r r

t tr r

r rtt

r

p x p x

p x c x

Belief PropagationBelief Propagation( , ) exp{ ( ) }r r r rp x c x x

Page 27: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Equilibrium of BPEquilibrium of BP , rξ

1)  m‐condition

*0 ,r rp x

-flat submanifold m M

rM

'rM

0M

rM

'rM

0M

q2)  e‐condition 

*r

θ ξ

-flat submanifold q ex

ξ1( ')ξ θ

Page 28: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Belief Propagation  e‐condition OK

CCCP m‐condition OK

1 2

1 2 1 2

( ; , , , ' '( , , ) ( ' , ' , ' )

L r

L L

θ ξ ξ ξ θ ξ

0 0

0

: ( , ) ( , )

'

r r r r

r

p x p x

θ θ

θ

Page 29: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Convex‐Concave ComputationalProcedure (CCCP) :   A. Yuille

1 21

1 2

( ) ( ) ( )

( ) ( )

t t

F F F

F FElimination of double loops

Page 30: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Geometry of Support Vector Machinemodification of kernel

Simple perceptron: decision surface

( )| |

| |

f x w x bw x bd

w

2minimize | |constraint ( ) 1i i

wy w x b

Page 31: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

21( , , ) | | ( )2

0 : support vector : 0

( , )

i i

i i i i

i i i i i i

L w b w w x b

y x

w y x f x w y x b

Page 32: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Embedding in higher‐dimensional space

( )( ) ( )

x z xf x w x b

z: infinite‐dimensional

Page 33: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Kernel trick ( , ') ( ) ( ')

( , ') ( ') ' ( ')

1( ) ( )

( , ) ( , )

i i i

ii

i i i

K x x x x

K x x x dx x

x x

f x w y K x x

Page 34: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

2 2

2

| '

| ( ) ( ) | z( ) ( )

( ) ( ) ( )

( ) ( , x')

i ji j

iji j

ij x xi j

ds z x dx z x x z x dx dxx x

g x x xx x

g x K xx x

Page 35: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

2

2

2

( ) exp{ { ( )}

( , ') ( ) ( ') ( , ')( ) exp{ | x x * | }

g ( ) { ( )} g ( ) ( ) ( )i i

ij ij i j

x f x

K x x x x K x xx

x x x x x

Conformal Transformation of Kernel

Page 36: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Basic Principles ofUnsupervised and SupervisedLearning Toward Deep Learning

Shun‐ichi Amari  (RIKEN Brain Science Institute)collaborators:    R. Karakida, M. Okada (U. Tokyo)

Page 37: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Deep LearningSelf‐Organization  +  Supervised Learning

RBM: Restricted Boltzmann MachineAuto‐Encoder,  Recurrent Net

DropoutContrastive divergence

Page 38: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Boltzmann Machine

Page 39: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

RBM:  Restricted Boltzmann Machine

Page 40: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

RBM

Page 41: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Simple Hebbian Self‐Organization

Page 42: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

self‐organization of  

Page 43: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Self‐Organization

Page 44: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Interaction of Hidden Neurons

Page 45: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Bayesian Duality in Exponential Family

Data x          Parameter (higher‐order concepts) 

Curved exponential family

Page 46: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

RBM= h,     x = Wvx = v        = hW

Page 47: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Gaussian Boltzmann Machine

Page 48: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Equilibrium Solution

Page 49: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Equilibrium Solution

General Solution 

You can choose m(≤ k) eigen values  form

• diagonalized by 

Stable Solution  the case of m = k

Page 50: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

‐ No connections within layers  

Contrastive Divergence

50

• How to train RBM  Maximum Likelihood (ML) learning is hard 

• RBM  ‐ 2‐layered probabilistic neural network

Sampling

EquilibriumInput 

Many iterations of Gibbs Sampling demand too muchcomputational time 

W

Hidden hVisible v

Page 51: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Contrastive Divergence Solution

Page 52: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Two Manifolds

Page 53: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Geometry of CDn (contrastive divergence)

Page 54: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Bernoulli‐Gaussian RBM

ICAR. Karakida

Page 55: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Simulation

InputMixing

Sources p (s) ICA Solution

OutputUniformDistribution

The number of Neurons: N = M = 2,  σ = 1/2

Independent sources are extracted in G‐B RBM

CD

55

Page 56: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Information Geometry of Neuromanifolds

Natural Gradient and Singularities

Shun-ichi AmariRIKEN Brain Science Institute

Multilayer Perceptron

Page 57: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Mathematical Neurons

i iy w x h w x

x y( )u

u

Page 58: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Multilayer Perceptrons

i iy v n w x

21; exp ,2

, i i

p y c y f

f v

x x

x w x

x y

1 2( , ,..., )nx x x x

1 1( ,..., ; ,..., )m mw w v v

Page 59: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Multilayer Perceptron: Neuromanifold

1 1,

,

, ; ,i i

m m

y f

v

v v

x θ

w x

θ w w

neuromanifold ( )x

space of functions

Page 60: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Universal Function Approximator

xwx

xx

xx

ii

i

m

ii

i

N

i

v

a

v

1

1i

Page 61: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Learning from examples

ˆ, xfx

1 1examples { , , , , }n nD y y x x

learning ; estimation

Page 62: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Backpropagation ‐‐‐gradient learningBackpropagation ‐‐‐gradient learning

1 1

2

examples : , , , training set1( , ; ) ,2

log , ;

t ty y

E y x y f

p y

x x

x

x

,

t t

i i

E

f v

x w x

( , )y f x n

Page 63: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Multilayer Perceptron: NeuromanifoldMetric and Topology

1 1,

,

, ; ,i i

m m

y f

v

v v

x θ

w x

θ w w

neuromanifold ( )x

space of functions

Page 64: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Metric: Riemannian manifold

22

ij i j

T

ds d

g d d

d G d

j

i

d

log ( | ; ) log ( | ; )( ) [ ]iji j

p y x p y xg E

Page 65: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Topology: Neuromanifold

• Metrical structure

• Topological structure

Page 66: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

singularities

Page 67: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Geometry of singular model

y v n w x

v| | 0v w

Page 68: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Parameter Space S

S

Equivalence

1) 02)

/

i i

i j i j

vv v

M S

ww w

i iy v n w x

Page 69: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Topological Singularities

: parameter spaceS

: behavioral spaceM

/M S

Page 70: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

2  hidden‐units

1 1 2

1 2 1 1

1 2

: 0

1

y v v n

S v v

v x w v x w

2

2 2

w x w x

w w w w

Page 71: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Gaussian mixtures

21( ) exp2i ip x v x w

Page 72: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Gaussian mixture

1 2 1 2; , , 1p x v w w v x w v x w

21 1exp22

x x

1 2 s ingular: , 1 0w w v v

1w2w

v

Page 73: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Learning, Estimation, and Model Selection

gen 0

train emp

: ;

;

E D p y p y

E D p y

x x

x

gen

gen train

: dimension2dE dn

dE En

Page 74: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Problem of Backprop

• slow convergence‐‐‐‐plateau‐‐‐saddle

• local minima 

( , ; )t t t t tl x y

Page 75: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Flaws of MLPFlaws of MLPslow convergence : plateau

local minima

Boosting, Bagging, SVM

error

Page 76: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Steepest Direction --- Natural GradientSteepest Direction --- Natural Gradient

1

1

2

, ,

=

n

i jij

l ll

l G l

d d Gd G d d

d

( )l

( , ; )t t t t tl x y

Page 77: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Natural Gradient

2 2

1

max

under ij i j

dl l d l l d

d g d d

d l G l

( , ; )t t t t tl x y

Page 78: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Information Geometry of MLPInformation Geometry of MLPNatural Gradient Learning :

S. Amari ; H.Y. Park

1

1 1 1 11 1 T

t t t t

lG

G G G f f G

Adaptive natural gradient learning

Page 79: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Regular statistical model ,

: Fisher information

M p x

G

11TE Gn

01ˆ, : ,2

2

E KL p x p x G En

dn

AIC,   MDL

Page 80: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Landscape of error at singularityMilner attractor

Page 81: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Dynamics of Learning

1,

( , ), ( , )

( , ) , ( , )

d dl G ldt dt

du dzf u z k u zdt dt

du f u zdz k u z

2 2 1 log

2u z z c

Page 82: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Coordinate Transformation

2 1

1 1 2 2

1 2

2 2

: 0

1

v vv

v v v v vv vz z

v

u w w uw ww w w

Page 83: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Dynamics of Learning:  Natural gradient works!!

1,

( , ), ( , )

( , ) , ( , )

d dl G ldt dt

du dzf u z k u zdt dt

du f u zdz k u z

2 2 1 log

2u z z c

Page 84: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Dynamic vector fields: General case (|z|<1 part stable)

Page 85: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Dynamic vector fields: General case (|z|>1 part stable )

Page 86: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Adaptive Natural Gradient works well

Page 87: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Signal ProcessingICA : Independent Component Analysis

t t t tA x s x s

sparse component analysis

positive matrix factorization

Page 88: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

mixture and unmixtureof independent signals

2x

1s

ns2smx

1x1

n

i ij jj

x A s

x As

Page 89: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Independent Component Analysis

1

i ij jA x A s

W W A

x s

y x

observations: x(1), x(2), …, x(t)recover: s(1), s(2), …, s(t)

s A W y

x

Page 90: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:
Page 91: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:
Page 92: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Semiparametric Statistical Model

( ; , ) | | ( )p r rx W W Wx

unknown

x(1), x(2), …, x(t)

ir r , ( ) :r s 1W A

Page 93: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Natural Gradient

, Tl

y W

W W WW

Page 94: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Space of Matrices : Lie group

-1d dX WW

2 1tr trT T T

T

d d d d d

ll

W X X WW W W

W WW

:dX

I I d X

WdW W

non-holonomic basis

1W

Page 95: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Information Geometry of ICA

natural gradientestimating functionstability, efficiency

S ={p(y)}

1 1 2 2{ ( ) ( )... ( )}n nI q y q y q y

{ ( )}p Wx

r q

( ) [ ( ; ) : ( )] ( )

l KL p qr

W y W yy

Page 96: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Estimating Functions

,

)0, '

, ' 0, 'rE

W

W F(y,WW W

F y WW W

estimating equation

0 t tt

tF y ,W y Wx

learning

t t t W F W x

Page 97: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Admissible class

on-line learnin

, { }

,

0: estimating equation

canonical estimating functio

g :

n:

T

t

t t t t

W

F IW

F y W I y y

F R W F y W

R W F Wx

W R W F W x

{ }i j j iy y y y W

Page 98: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Basis Given: overcomplete caseSparse Solution

many solutionsmany 0

ˆ

i i

i

t t

A s

s

A

x s a

x s

ˆ ˆ ˆ: A x Assparse

Page 99: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

generalized inverse

ˆmin 2is

sparse solution

ˆmin ii s

ˆ ˆ ˆ: x A Assparse

2 :-normL

1 :-normL

Page 100: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:
Page 101: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Minkovskian Gradient

min : convex functionconstraint F c

typical case:

21 1 ( *) ( *)2 21 ;

T

pi

X G

Fp

y

2, 1, 1/ 2p p p

Page 102: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Optimization under Sparsity Condition:

(X y n)

: -sparsek

2 logm k n

Page 103: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

1

, 1,2,...k

tt i i t

i

y x n t N

1

2

xN

xx

X X

y n

Linear regression

Overcomplete:  N < kunder‐determined  infinitely many solutions

Sparsity constraint solves the problem

Page 104: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

21ˆ arg min , ( )2t t

t

y u u

x

' 0t t tt

y x x

ˆ, T TG X X G X y

1 †ˆ T TX X X

y = X y

Maximum likelihood estimator

Euclidean case (Gaussian noise):

Not sparse

Page 105: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Sparse Solution

*min min [ ] ( ) ( *) ( *)( *) ( *) 0

penalty = pp i

D

F c

0

1 1

22

#1[ 0] :

:

: 0 1

:

i

i

p

i

F

F L

F p

F

sparsest solution

solution

generalized inverse solution

Sparse solution: overcomplete case

Page 106: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

( ) : *min β

orthogonal projection, dual

D β : β , F β = c dual geodesic

projec

projec

tion

tion

( )c cF

dual

Page 107: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

a) : 2, 1cR n p b) : 2, 1cR n p

c) : 2, 1cR n p

Fig. 1

non-convex

Page 108: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

L1-constrained optimizationLASSO

LARS

min under

solution : 0

0c

F c

c c

min

solution 0

0

F

, ,

: ,

* *c λsolutions β and β : coincide λ = λ c p 1

p < 1 λ = λ c multiple noncontinuous stability different

 λP Problem

 cP Problem

Page 109: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

*

*

Projection from to F = c (information geometry)*β

Page 110: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

n

F n

Fig. 5 subgradient

( )c cF

Page 111: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

LASSO path and LARS path(stagewise solution)

min :

min

F c

F

,c c λ correspondence

Page 112: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Active set and gradient

1

0

sgn ,, ,

1,1

i

pi i

p

A i

i AF i A

Page 113: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Solution path

1

0,

;

c c

A c c A c c

A A c c A A c c A c

c cA

c

cK

F

F F

ddc

F

c c cK G F

1 1 10; (sgn ) : iF F L

Page 114: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Solution pathin the subspace of the active set

1

0 : active direction

A A A

A A

F

K F

turning point A A

Page 115: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Gradient Descent Method

{ ( )}: covariant

{ ( )}: contravariant

i

ji

i

L L

L g L

2min L( +a): i jijg a a

1 ( )t t tc L

Page 116: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

ij i jG g a aa

0pG t t G a a

. 1piG a p a

0L G

a aa

Riemannian metric

Minkovskian metric

Lp-norm

Steepest direction of L

Page 117: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

1sign pi iG p a a a

ii

f F

1

1sgn pi i ia c f f

1, ij i jG g a a F G f a Natural gradient

Page 118: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

F f

1

1sgn pi iF c f f

0

1sgn

0

0

iF c f

Euclidean case

1

Page 119: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

arg max ii f

max i i jf f f

1, for and ,0 otherwise.i

i i jF

1t t F LASSOTry for various p, p->1Try for various noise functionLASSO and flat geometry

Page 120: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Extended LARS (p = 1) and Minkovskian gradient

11

norm

max under 1

1

sgn , max , ,

0, otherwise

pip

p

p

i i NA

a

p

1

a

a a

a a

Page 121: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

arg max ii f

max i i jf f f

1, for and ,0 otherwise.i

i i jF

1t t F LARS

Page 122: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

c-trajectory and-trajectory

Ex. 1-dim 21 *2

21 2 22 f F

L1/2 constraint: non-convex optimization

Page 123: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

2: min , cP c

c c

: 0P f 0

ˆ : Xu Zongben's operator R

c c c

0 c

R

c

Page 124: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

An Example of the greedy path 

β1

β2

Page 125: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

LASSO and LARS :

1 : is non-sparse1 : sparse

cpp

Minkovskian gradient

2

1

Page 126: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

:

0

c

c c c

c c c

D F

F

F

c

Page 127: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

1

c c c c c c c

c c c

F F

G H

G H F

p

Page 128: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Solution Path :

not continuous, not-monotonejump

c

c

Page 129: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Linear Programming

inner met

max

lo

ho

g

d

ij j i

i i

ij j ii

A x b

c x

A x b

x

Page 130: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Convex Cone Programming

P : positive semi-definite matrix

convex potential function

dual geodesic approach

, minA x b c x

Support vector machine

Page 131: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Convex Programming━ Inner Method

:LP A x b

min c x

log

ij j iA x b x

i x

Simplex method ; inner method

Barrier function

Page 132: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Polynomial-Time Algorithm

curvature : step-size 2mH

min : geodesict t c x x x

Page 133: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Integration of evidences:

arithmetic meangeometric meanharmonic mean‐mean

1 2, ,... mx x x

Page 134: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Generalized mean: f‐meanf(u): monotone; f‐representation of u

1 ( ) ( )( , ) { }2f

f a f bm a b f

12( ) , 1

log ,

( , )

1

( ,

)

=

f fm ca cb cm a

f u u

b

u

scale free

α-representation

Page 135: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

‐mean : 

2

1 :

1: 2

10 : ( )4 2

min( , ) max( , )

aba b

a ba b ab

m a bm a b

1 2( ( ), ( ))m p s p s

Page 136: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Various Means

2a b

ab2

1 1a b:

arithmetic geometric

Any other mean?

:

harmonic

Page 137: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

Family of Distributions 1( ), , ( )kp s p s

1( ) ( ), 1

k

mix i i ii

p s t p s t

explog ( ) log ( )i ip s t p s

mixture family :

exponential family :

1( ; ) { ( ( ))}i ip x f f p x

Page 138: Applications of Information Geometryimage.diku.dk › MLLab › SummerSchools › SlidesAmariIG3.pdf · 2014-09-12 · Applications of Information Geometry Clustering based on Divergence:

α‐geodesic projection

robust estimator


Recommended