Statistical Neurodynamicsof Deep Networks
Shun‐ichi Amari
RIKEN Brain Science Institute
Statistical NeurodynamicsRozonoer (1969)Amari (1971; 197Amari et al (2013)Toyoizumi et al (2015)Poole, …, Ganguli (2016)
~ (0 , 1)ijw N
Macroscopic behaviorscommon to almost all (typical) networks
Macroscopic variables
2
1
1
1activity :
distance: = [ : ']curvature :
( )( )
i
l l
l l
A xn
D D
A F AD K D
x x
Deep Networks
01
2
1
( )
1
( )
i ij i il l
il ll
l l
x w x w
A xn
A F A
2
0
~ (0, 1 / )
0, 1'(0) =const
ij
i
w N n
w
Pullback Metric
2 1la b
abl
ds g g dx dx d dn
x x
1ab a b
l
gn
e e
1l ln n
Poole et al (2016)Deep neural networks
Dynamics of Activity
2 20
1
20
( ) ( )~ (0, )
1 ( ) [ ( ) ] ( )
( ) ( ) ~ (0,1)
k k
l
y w y uu N A
A y E u An
A Av Dv v N
0
(0) (0) 1
( )
convergei
A A
x
Dynamics of Metric
2 2
21
( ) ( '( ) )
E[ '( )) ] E[ '( )) ]E[ ]
mean field approximation
( ) '( )
k k
a a
k k
ab k j kj
k j k j
dy B dyB
B B u w
g B B g
u w w u w w
A Av Dv
e e
1
1 1
1
( )conformal transformation!
( )
ab ab
ab
ll
ab ab
g A g
A
g
rotation, expansion
Dynamics of Curvature
2 2
''( )( )( ) '( )
| |
ab a b a b
a b a b
ab ab ab
ab ab
H y
u
H
e
w e w e w e
H H H
H
22
2 21
2 1 1
12
2 1
( ) ''( )
( ) ( )(2 1) ( )1
( )(2 1)
exponwntial expansion!
l l l l l
ab abab
ll
ab ab
A Av Dv
H A A A H
H l A
Dynamics of Distance (Amari, 1974)
21( , ') ( ')
1( , ') ' '
' 2
~N(0, V)
' ' V=
( ') E[ (
ii
i i
k k
k k
D x x x xn
C x x x x x xn
D A A C
u w y
u w y A C
C A C A
) ( ' )]C C A C C
1
1
( )
1
l lD K D
dDdD
Poole et al (2016)Deep neural networks
Problem!
( , )( )
equidistance property
l lD DD K D
x x
Shuttering
Multiplicity
Dynamics of recurrent net
Dropout and backprop
Multilayer Perceptrons
i iy v w x
, i if v x w x
1 2( , ,..., )nx x x x
1 1( ,..., ; ,..., )m mw w v v
1 w x
yx
Multilayer Perceptron
1 1,
,
, ; ,i i
m m
y f
v
v v
x θ
w x
θ w w
neuromanifold ( )x
space of functions S
singularities
Geometry of singular model
y v n w x
W
vv | | 0w
1 , ,
:
t t t tG y
G l l
Fisher
Natural Gradient Stochastic Descent
Information Matrix
invarint; steepest descent
x
model: 2 hidden neurons
2
1 1 2 2
2
,
,
12
tu
f w w
y f
u e dt
x J x J x
x
Singular Region in Parameter Space
1 2 1 2
1 2 2
1 2 1
1 1 2 2
, ,
0, ,
, 0,
,
R w w w w
w w w
w w w
f w w
J J J J
J J
J J
x J x J x
Coordinate transformation
1 1 2 2
1 2
1 2
2 1
2 1
1 2
,
,
,
, , ,
w ww w
w w w
w wzw w
w z
J Jv
u J J
v u
Singular Region , 0 1R w z J u
Milnor attractor
Topology of singular R
2 21
2 32
blow-down coordinates , ,
1 ,
1 ,
, 1n
c z u u
c z z u
S
: = e
u
ue eu
Dynamic vector fields: Redundant case