+ All Categories
Home > Documents > Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf ·...

Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf ·...

Date post: 21-May-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
38
LETTER Communicated by Steven Zucker Hebbian Learning of Recurrent Connections: A Geometrical Perspective Mathieu N. Galtier [email protected] Olivier D. Faugeras [email protected] NeuroMathComp Project Team, INRIA Sophia-Antipolis M´ editerran´ ee, 06902 Sophia Antipolis, France Paul C. Bressloff [email protected] Department of Mathematics, University of Utah, Salt Lake City, UT 84112, U.S.A., and Mathematical Institute, University of Oxford, Oxford OX1 3LB, U.K. We show how a Hopfield network with modifiable recurrent connections undergoing slow Hebbian learning can extract the underlying geome- try of an input space. First, we use a slow and fast analysis to derive an averaged system whose dynamics derives from an energy function and therefore always converges to equilibrium points. The equilibria reflect the correlation structure of the inputs, a global object extracted through local recurrent interactions only. Second, we use numerical meth- ods to illustrate how learning extracts the hidden geometrical structure of the inputs. Indeed, multidimensional scaling methods make it pos- sible to project the final connectivity matrix onto a Euclidean distance matrix in a high-dimensional space, with the neurons labeled by spa- tial position within this space. The resulting network structure turns out to be roughly convolutional. The residual of the projection defines the nonconvolutional part of the connectivity, which is minimized in the process. Finally, we show how restricting the dimension of the space where the neurons live gives rise to patterns similar to cortical maps. We motivate this using an energy efficiency argument based on wire length minimization. Finally, we show how this approach leads to the emergence of ocular dominance or orientation columns in primary vi- sual cortex via the self-organization of recurrent rather than feedforward connections. In addition, we establish that the nonconvolutional (or long- range) connectivity is patchy and is co-aligned in the case of orientation learning. Neural Computation 24, 2346–2383 (2012) c 2012 Massachusetts Institute of Technology
Transcript
Page 1: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

LETTER Communicated by Steven Zucker

Hebbian Learning of Recurrent Connections:A Geometrical Perspective

Mathieu N. [email protected] D. [email protected] Project Team, INRIA Sophia-Antipolis Mediterranee,06902 Sophia Antipolis, France

Paul C. [email protected] of Mathematics, University of Utah, Salt Lake City,UT 84112, U.S.A., and Mathematical Institute, University of Oxford,Oxford OX1 3LB, U.K.

We show how a Hopfield network with modifiable recurrent connectionsundergoing slow Hebbian learning can extract the underlying geome-try of an input space. First, we use a slow and fast analysis to derivean averaged system whose dynamics derives from an energy functionand therefore always converges to equilibrium points. The equilibriareflect the correlation structure of the inputs, a global object extractedthrough local recurrent interactions only. Second, we use numerical meth-ods to illustrate how learning extracts the hidden geometrical structureof the inputs. Indeed, multidimensional scaling methods make it pos-sible to project the final connectivity matrix onto a Euclidean distancematrix in a high-dimensional space, with the neurons labeled by spa-tial position within this space. The resulting network structure turns outto be roughly convolutional. The residual of the projection defines thenonconvolutional part of the connectivity, which is minimized in theprocess. Finally, we show how restricting the dimension of the spacewhere the neurons live gives rise to patterns similar to cortical maps.We motivate this using an energy efficiency argument based on wirelength minimization. Finally, we show how this approach leads to theemergence of ocular dominance or orientation columns in primary vi-sual cortex via the self-organization of recurrent rather than feedforwardconnections. In addition, we establish that the nonconvolutional (or long-range) connectivity is patchy and is co-aligned in the case of orientationlearning.

Neural Computation 24, 2346–2383 (2012) c© 2012 Massachusetts Institute of Technology

Page 2: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2347

1 Introduction

Activity-dependent synaptic plasticity is generally thought to be the basiccellular substrate underlying learning and memory in the brain. In DonaldHebb (1949) postulated that learning is based on the correlated activity ofsynaptically connected neurons: if both neurons A and B are active at thesame time, then the synapses from A to B and B to A should be strength-ened proportionally to the product of the activity of A and B. However, as itstands, Hebb’s learning rule diverges. Therefore, various modifications ofHebb’s rule have been developed, which basically take one of three forms(see Gerstner & Kistler, 2002, and Dayan & Abbott, 2001). First, a decay termcan be added to the learning rule so that each synaptic weight is able to“forget” what it previously learned. Second, each synaptic modification canbe normalized or projected on different subspaces. These constraint-basedrules may be interpreted as implementing some form of competition forenergy between dendrites and axons (for details, see Miller, 1996; Miller& MacKay, 1996; Ooyen, 2001). Third, a sliding threshold mechanism canbe added to Hebbian learning. For instance, a postsynaptic threshold ruleconsists of multiplying the presynaptic activity and the subtraction of theaverage postsynaptic activity from its current value, which is referred toas covariance learning (see Sejnowski & Tesauro, 1989). Probably the bestknown of these rules is the BCM rule presented in Bienenstock, Cooper,and Munro (1982). It should be noted that history-based rules can also bedefined without changing the qualitative dynamics of the system. Insteadof considering the instantaneous value of the neurons’ activity, these rulesconsider its weighted mean over a time window (see Foldiak, 1991; Wallis &Baddeley, 1997). Recent experimental evidence suggests that learning mayalso depend on the precise timing of action potentials (see Bi & Poo, 2001).Contrary to most Hebbian rules that detect only correlations, these rules canalso encode causal relationships in the patterns of neural activation. How-ever, the mathematical treatment of these spike-timing-dependent rules ismuch more difficult than rate-based ones.

Hebbian-like learning rules have often been studied within the frame-work of unsupervised feedfoward neural networks (see Oja, 1982; Bienen-stock et al., 1982; Miller & MacKay, 1996; Dayan & Abbott, 2001). Theyalso form the basis of most weight-based models of cortical development,assuming fixed lateral connectivity (e.g., Mexican hat) and modifiable verti-cal connections (see the review of Swindale, 1996).1 In these developmentalmodels, the statistical structure of input correlations provides a mechanismfor spontaneously breaking some underlying symmetry of the neuronal

1Only a few computational studies consider the joint development of lateral and ver-tical connections including Bartsch and Van Hemmen (2001) and Miikkulainen, Bednar,Choe, and Sirosh (2005).

Page 3: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2348 M. Galtier, O. Faugeras, and P. Bressloff

receptive fields, leading to the emergence of feature selectivity. When suchcorrelations are combined with fixed intracortical interactions, there is asimultaneous breaking of translation symmetry across the cortex, lead-ing to the formation of a spatially periodic cortical feature map. A relatedmathematical formulation of cortical map formation has been developedin Takeuchi and Amari (1979) and Bressloff (2005) using the theory of self-organizing neural fields. Although very irregular, the two-dimensional cor-tical maps observed at a given stage of development, can be unfoldedin higher dimensions to get smoother geometrical structures. Indeed,Bressloff, Cowan, Golubitsky, Thomas, and Wiener (2001) suggested thatthe network of orientation pinwheels in V1 is a direct product between acircle for orientation preference and a plane for position, based on a mod-ification of the ice cube model of Hubel and Wiesel (1977). From a moreabstract geometrical perspective, Petitot (2003) has associated such a struc-ture to a 1-jet space and used this to develop some applications to computervision. More recently, more complex geometrical structures such as spheresand hyperbolic surfaces that incorporate additional stimulus features suchas spatial frequency and textures, were considered, respectively, in Bressloffand Cowan (2003) and Chossat and Faugeras (2009).

In this letter, we show how geometrical structures related to the distribu-tion of inputs can emerge through unsupervised Hebbian learning appliedto recurrent connections in a rate-based Hopfield network. Throughout thisletter, the inputs are presented as an external nonautonomous forcing tothe system and not an initial condition, as is often the case in Hopfield net-works. It has previously been shown that in the case of a single fixed input,there exists an energy function that describes the joint gradient dynam-ics of the activity and weight variables (see Dong & Hopfield, 1992). Thisimplies that the system converges to an equilibrium during learning. Weuse averaging theory to generalize the above result to the case of multipleinputs, under the adiabatic assumption that Hebbian learning occurs on amuch slower timescale than both the activity dynamics and the samplingof the input distribution. We then show that the equilibrium distributionof weights, when embedded into R

k for sufficiently large integer k, en-codes the geometrical structure of the inputs. Finally, we numerically showthat the embedding of the weights in two dimensions (k = 2) gives rise topatterns that are qualitatively similar to experimentally observed corticalmaps, with the emergence of features columns and patchy connectivity. Incontrast to standard developmental models, cortical map formation arisesvia the self-organization of recurrent connections rather than feedforwardconnections from an input layer. Although the mathematical formalism weintroduce here could be extended to most of the rate-based Hebbian rulesin the literature, we present the theory for Hebbian learning with decaybecause of the simplicity of the resulting dynamics.

The use of geometrical objects to describe the emergence of connectiv-ity patterns has previously been proposed by Amari in a different context.

Page 4: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2349

Based on the theory of information geometry, Amari considers the geome-try of the set of all the networks and defines learning as a trajectory on thismanifold for perceptron networks in the framework of supervised learning(see Amari, 1998) or for unsupervised Boltzmann machines (see Amari,Kurata, & Nagaoka, 1992). He uses differential and Riemannian geometryto describe an object that is at a larger scale than the cortical maps con-sidered here. Moreover, Zucker and colleagues are currently developinga nonlinear dimensionality-reduction approach to characterize the statis-tics of natural visual stimuli (see Zucker, Lawlor, & Holtmann-Rice, 2011;Coifman, Maggioni, Zucker, & Kevrekidis, 2005). Although they do not uselearning neural networks and stay closer to the field of computer visionthan this letter does, it turns out their approach is similar to the geometricalembedding approach we are using.

The structure of the letter is as follows. In section 2, we apply mathe-matical methods to analyze the behavior of a rate-based learning network.We formally introduce a nonautonomous model to be averaged in a secondtime. This then allows us to study the stability of the learning dynamics inthe presence of multiple inputs by constructing an appropriate energy func-tion. In section 3, we determine the geometrical structure of the equilibriumweight distribution and show how it reflects the structure of the inputs. Wealso relate this approach to the emergence of cortical maps. Finally, theresults are discussed in section 4.

2 Analytical Treatment of a Hopfield-Type Learning Network

2.1 Model

2.1.1 Neural Network Evolution. A neural mass corresponds to a meso-scopic coherent group of neurons. It is convenient to consider them asbuilding blocks for computational simplicity; for their direct relationship tomacroscopic measurements of the brain (EEG, MEG, and optical imaging),which average over numerous neurons; and because one can functionallydefine coherent groups of neurons within cortical columns. For each neuralmass i ∈ {1..N} (which we abusively refer to as a neuron in the following),define the mean membrane potential Vi(t) at time t. The instantaneouspopulation firing rate νi(t) is linked to the membrane potential throughthe relation νi(t) = s(Vi(t)), where s is a smooth sigmoid function. In thefollowing, we choose

s(v) = Sm

1+ exp(−4S′m(v − φ)), (2.1)

where Sm, S′m, and φ are, respectively, the maximal firing rate, the maximalslope, and the offset of the sigmoid.

Page 5: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2350 M. Galtier, O. Faugeras, and P. Bressloff

Consider a Hopfield network of neural masses described by

dVi

dt(t) = −αVi(t)+

N∑j=1

Wi j(t) s(Vj(t))+ Ii(t). (2.2)

The first term roughly corresponds to the intrinsic dynamics of the neu-ral mass: it decays exponentially to zero at a rate α if it receives neitherexternal inputs nor spikes from the other neural masses. We will fix theunits of time by setting α = 1. The second term corresponds to the restof the network, sending information through spikes to the given neuralmass i, with Wi j(t) the effective synaptic weight from neural mass j. Thesynaptic weights are time dependent because they evolve according to acontinuous time Hebbian learning rule (see below). The third term, Ii(t),corresponds to an external input to neural mass i, such as information ex-tracted by the retina or thalamocortical connections. We take the inputsto be piecewise constant in time; at regular time intervals, a new inputis presented to the network. In this letter, we assume that the inputs arechosen by periodically cycling through a given set of M inputs. An al-ternative approach would be to randomly select each input from a givenprobability distribution (see Geman, 1979). It is convenient to introducevector notation by representing the time-dependent membrane potentialsby V ∈ C1(R+, R

N), the time-dependent external inputs by I ∈ C0(R+, RN),

and the time-dependent network weight matrix by W ∈ C1(R+, RN×N). We

can then rewrite the above system of ordinary differential equations as asingle vector-valued equation,

dVdt= −V +W · S(V )+ I, (2.3)

where S : RN → R

N corrresponds to the term-by-term application of thesigmoid s: S(V )i = s(Vi).

2.1.2 Correlation-Based Hebbian Learning. The synaptic weights are as-sumed to evolve according to a correlation-based Hebbian learning rule ofthe form

dWi j

dt= ε(s(Vi)s(Vj)− μWi j), (2.4)

where ε is the learning rate, and we have included a decay term in orderto stop the weights from diverging. In order to rewrite equation 2.4 in amore compact vector form, we introduce the tensor (or Kronecker) productS(V )⊗ S(V ) so that in component form,

[S(V )⊗ S(V )]i j = S(V )iS(V ) j, (2.5)

Page 6: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2351

where S is treated as a mapping from RN to R

N. The tensor product im-plements Hebb’s rule that synaptic modifications involve the product ofpostynaptic and presynaptic firing rates. We can then rewrite the combinedvoltage and weight dynamics as the following nonautonomous (due totime-dependent inputs) dynamical system:

� :

⎧⎪⎪⎪⎨⎪⎪⎪⎩

dVdt= −V +W · S(V )+ I

dWdt= ε

(S(V )⊗ S(V )− μW

) . (2.6)

Let us make few remarks about the existence and uniqueness of solu-tions. First, boundedness of S implies boundedness of the system �. Moreprecisely, if I is bounded, the solutions are bounded. To prove this, notethat the right-hand side of the equation for W is the sum of a bounded termand a linear decay term in W. Therefore, W is bounded, and, hence, theterm W · S(V ) is also bounded. The same reasoning applies to V. S beingLipschitz continuous implies that the right-hand side of the system is Lips-chitz. This is sufficient to prove the existence and uniqueness of the solutionby applying the Cauchy-Lipschitz theorem. In the following, we derive anaveraged autonomous dynamical system �′, which will be well defined forthe same reasons.

2.2 Averaging the System. System � is a nonautonomous system thatis difficult to analyze because the inputs are periodically changing. It hasalready been studied in the case of a single input (see Dong & Hopfield,1992), but it remains to be analyzed in the case of multiple inputs. We showin section A.1 that this system can be approximated by an autonomousCauchy problem, which will be much more convenient to handle. Thisaveraging method makes the most of multiple timescales in the system.

Indeed, it is natural to consider that learning occurs on a much slowertimescale than the evolution of the membrane potentials (as determinedby α):

ε � 1. (2.7)

Second, an additional timescale arises from the rate at which the inputs aresampled by the network. That is, the network cycles periodically throughM fixed inputs, with the period of cycling given by T. It follows that I isT-periodic, piecewise constant. We assume that the sampling rate is alsomuch slower than the evolution of the membrane potentials:

MT� 1. (2.8)

Page 7: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2352 M. Galtier, O. Faugeras, and P. Bressloff

Finally, we assume that the period T is small compared to the timescale ofthe learning dynamics,

ε � 1T

. (2.9)

In section A.1, we simplify the system � by applying Tikhonov’s theoremfor slow and fast systems and then classical averaging methods for peri-odic systems. It leads to defining another system �′, which will be a goodapproximation of � in the aymptotic regime.

In order to define the averaged system �′, we need to introduce someadditional notation. Let us label the M inputs by I(a), a = 1, . . . , M anddenote by V (a) the fixed-point solution of the equation V (a) =W · S(V (a))+I(a). If we now introduce the N ×M matrices V and I with componentsVia = V (a)

i and Iia = I(a)i , then we define

�′ :

⎧⎪⎪⎪⎨⎪⎪⎪⎩

dVdt= −V +W · S(V)+ I

dWdt= ε

(1M

S(V ) · S(V )T − μW(t)). (2.10)

To illustrate this approximation, we simulate a simple network with bothexact (i.e., �) and averaged (i.e., �′) evolution equations. For these sim-ulations, the network consists of N = 10 fully connected neurons and ispresented with M = 10 different random inputs taken uniformly in theintervals [0, 1]N. For this simulation, we use s(x) = 1

1+e−4(x−1) , and μ = 10.Figure 1 (left) shows the percentage of error between final connectivities fordifferent values of ε and T/M. Figure 1 (right) shows the temporal evolutionof the norm of the connectivity for both the exact and averaged system forT = 103 and ε = 10−3.

In the remainder of the letter, we focus on the system �′ whose solutionsare close to those of the original system � provided condition A.2 in theappendix is satisfied, that is, the network is weakly connected. Finally, notethat it is straighforward to extend our approach to time-functional rules(e.g., sliding threshold or BCM rules as described in Bienenstock et al.,1982), which, in this new framework, would be approximated by simpleordinary differential equations (as opposed to time-functional differentialequations) provided S is redefined appropriately.

2.3 Stability

2.3.1 Lyapunov Function. In the case of a single fixed input (M = 1),systems � and �′ are equivalent and reduce to the neural network withadapting synapses previously analyzed by Dong and Hopfield (1992).

Page 8: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2353

Figure 1: Comparison of exact and averaged systems. (Left) Percentage of errorbetween final connectivities for the exact and averaged system. (Right) Temporalevolution of the norm of the connectivities of the exact system � and averagedsystem �′.

Under the additional constraint that the weights are symmetric (Wi j =Wji),these authors showed that the simultaneous evolution of the neuronal ac-tivity variables and the synaptic weights can be reexpressed as a gradientdynamical system that minimizes a Lyapunov or energy function of state.We can generalize their analysis to the case of multiple inputs (M > 1) andnonsymmetric weights using the averaged system �′. That is, followingalong lines similar to Dong and Hopfield (1992), we introduce the energyfunction

E(U,W ) = −12〈U,W · U〉 − 〈I,U〉 + 〈1, S−1(U )〉 + Mμ

2‖W‖2 (2.11)

where U = S(V ), ‖W‖2 = 〈W,W〉 =∑i, j W2

i j,

〈U,W · U〉 =M∑

a=1

N∑i=1

U (a)i Wi jU

(a)

j , 〈I,U〉 =M∑

a=1

N∑i=1

I(a)i U (a)

i (2.12)

and

〈1, S−1(U )〉 =M∑

a=1

N∑i=1

∫ U (a)i

0S−1(ξ )dξ . (2.13)

Page 9: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2354 M. Galtier, O. Faugeras, and P. Bressloff

In contrast to Dong and Hopfield (1992), we do not require a priori thatthe weight matrix is symmetric. However, it can be shown that the systemalways converges to a symmetric connectivity pattern. More precisely, A ={(V,W ) ∈ R

N×M × RN×N: W =WT} is an attractor of the system �′. A proof

can be found in section A.2. It can then be shown that on A (symmetricweights), E is a Lyapunov function of the dynamical system �′, that is,

dEdt≤ 0, and

dEdt= 0 �⇒ dY

dt= 0, Y = (V,W )T .

The boundedness of E and the Krasovskii-LaSalle invariance principle thenimplies that the system converges to an equilibrium (see Khalil & Grizzle,1996). We thus have

Theorem 1. The initial value problem for the system Σ ′ with Y(0) ∈ H, con-verges to an equilibrium state.

Proof. See section A.3.

It follows that neither oscillatory nor chaotic attractor dynamics canoccur.

2.3.2 Linear Stability. Although we have shown that there are stable fixedpoints, not all of the fixed points are stable. However, we can apply a linearstability analysis on the system �′ to derive a simple sufficient conditionfor a fixed point to be stable. The method we use in the proof could beextended to more complex rules.

Theorem 2. The equilibria of system Σ ′ satisfy

⎧⎪⎪⎪⎨⎪⎪⎪⎩

V∗ =1

μMS(V∗) · S(V∗)T · S(V∗) + I,

W∗ =1

μMS(V∗) · S(V∗)T ,

(2.14)

and a sufficient condition for stability is

3S′m‖W∗‖ < 1, (2.15)

provided 1 > εμ, which is probably the case since ε � 1.

Proof. See section A.4.

This condition is strikingly similar to that derived in Faugeras, Grimbert,and Slotine (2008) (in fact, it is stronger than the contracting condition theyfind). It says the network may converge to a weakly connected situation.

Page 10: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2355

It also says that the dynamics of V is likely (because the condition is onlysufficient) to be contracting and therefore subject to no bifurcations: a fullyrecurrent learning neural network is likely to have a simple dynamics.

2.4 Equilibrium Points. It follows from equation 2.14 that the equilib-rium weight matrix W∗ is given by the correlation matrix of the firing rates.Moreover, in the case of sufficiently large inputs, the matrix of equilibriummembrane potentials satisfies V∗ ≈ I. More precisely, if |S(I(a)

i )| � |I(a)i | for

all a = 1, . . . , M and i = 1, . . . , N, then we can generate an iterative solutionfor V∗ of the form

V∗ = I + 1μ

S(I) · S(I)T · S(I)+ h.o.t.

If the inputs are comparable in size to the synaptic weights, then there is noexplicit solution for V∗. If no input is presented to the network (I = 0), thenS(0) �= 0 implies that the activity is nonzero, that is, there is spontaneousactivity. Combining these observations, we see that the network roughlyextracts and stores the correlation matrix of the strongest inputs within theweights of the network.

3 Geometrical Structure of Equilibrium Points

3.1 From a Symmetric Connectivity Matrix to a Convolutional Net-work. So far, neurons have been identified by a label i ∈ {1..N}; there is nonotion of geometry or space in the preceding results. However, as we showbelow, the inputs may contain a spatial structure that can be encoded bythe final connectivity. In this section, we show that the network behaves asa convolutional network on this geometrical structure. The idea is to inter-pret the final connectivity as a matrix describing the distance between theneurons living in a k-dimensional space. This is quite natural since W∗ issymmetric and has positive coefficients, properties shared with a Euclideandistance matrix. More specifically, we want to find an integer k ∈ N andN points in R

k, denoted by xi, i ∈ {1, . . . , N}, so that the connectivity canroughly be written as W∗

i j � g(‖xi − x j‖2), where g is a positive decreasingreal function. If we manage to do so, then the interaction term in system �

becomes

{W · S(V )}i �N∑

j=1

g(‖xi − x j‖2)S(V(x j)), (3.1)

where we redefine the variable V as a field such that V(x j) = Vj. Thisequation says that the network is convolutional with respect to the variablesxi, i = 1, .., N and the associated convolution kernel is g(‖x‖2).

Page 11: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2356 M. Galtier, O. Faugeras, and P. Bressloff

In practice, it is not always possible to find a geometry for which the con-nectivity is a distance matrix. Therefore, we project the appropriate matrixon the set of Euclidean distance matrices. This is the set of matrices M suchthat Mi j = ‖xi − x j‖2 with xi ∈ R

k. More precisely, we define D = g−1(W∗),where g−1 is applied to the coefficients of W∗. We then search for the dis-tance matrix D⊥ such that ‖D

�‖2 = ‖D−D⊥‖2 is minimal. In this letter, we

consider an L2-norm. Although the choice of an Lp-norm will be motivatedby the wiring length minimization argument in section 3.3, note that thechoice of a L2 norm is somewhat arbitrary and corresponds to penalizinglong-distance connections. The minimization turns out to be a least squareminimization whose parameters are the xi ∈ R

k. This can be implementedby a set of methods known as multidimensional scaling, which are reviewedin Borg and Groenen (2005). In particular, we use the stress majorization orSMACOF algorithm for the stress1 cost function throughout this letter. Thisleads to writing D = D⊥ +D

�and therefore W∗

i j = g(D⊥i j +D�i j).

We now consider two choices of the function g:

1. If g(x) = a(1− x

λ2

)with a > 1 and λ ∈ R+, then one can always write

W∗i j =W∗(xi, x j) = M(xi, x j)+ g(‖xi − x j‖2) (3.2)

such that M(xi, x j) = − aλ2 D

�i j.

2. If g(x) = ae−xλ2 with a, λ ∈ R+, then one can always write

W∗i j =W∗(xi, x j) = M(xi, x j) g(‖xi − x j‖2) (3.3)

such that M(xi, x j) = e−D

� i jλ2 ,

where W∗ is also redefined as a function over the xi: W∗(xi, x j) =W∗i j. For

obvious reasons, M is called the nonconvolutional connectivity. It is the roleof multidimensional scaling methods to minimize the role of the undeter-mined function M in the previous equations, that is, ideally having M ≡ 0(resp. M ≡ 1) for the first (resp. second) assumption above. The ideal caseof a fully convolutional connectivity can alway be obtained if k is largeenough. Indeed, proposition 1 shows that D = g−1(W∗) satisfies the trian-gular inequality for matrices (i.e., Di j ≤ (

√Dik +

√Dk j)

2) for both g under

some plausible assumptions. Therefore, it has all the properties of a distancematrix (symmetric, positive coefficients, and triangular inequality), and onecan find points in R

k such that it is the distance matrix of these points iffk ≤ N − 1 is large enough. In this case, the connectivity on the space definedby these points is fully convolutional; equation 3.1 is exactly verified.

Proposition 1. If the neurons are equally excited on average (i.e., ‖S(Vi )‖ =c ∈ R+), and

Page 12: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2357

1. If g(x) = a(1− x

λ2

)with a , λ ∈ R+, then D = g−1(W∗) satisfies the triangular

inequality.2. If g(x) = ae−

xλ2 with a , λ ∈ R+, then D = g−1(W∗) satisfies the triangular

inequality if the following assumption is satisfied:

arcsin(S(0)

)− arcsin (√

a3 −√

a6 − a3) ≥ π

8. (3.4)

Proof. See section A.5.

3.2 Unveiling the Geometrical Structure of the Inputs. We hypoth-esize that the space defined by the xi reflects the underlying geometricalstructure of the inputs. We have not found a way to prove this, so we pro-vide numerical examples that illustrate this claim. Therefore, the followingis only a (numerical) proof of concept. For each example, we feed the net-work with inputs having a defined geometrical structure and then showhow this structure can be extracted from the connectivity by the method out-lined in section 3.1. In particular, we assume that the inputs are uniformlydistributed over a manifold � with fixed geometry. This strong assump-tion amounts to considering that the feedforward connectivity (which wedo not consider here) has already properly filtered the information comingfrom the sensory organs. More precisely, define the set of inputs by thematrix I ∈ R

N×M such that I(a)i = f (‖yi − za‖�) where the za are uniformly

distributed points over �, the yi are the positions on � that label the ith neu-ron, and f is a decreasing function on R+. The norm ‖.‖� is the natural normdefined over the manifold �. For simplicity, assume f (x) = fσ (x) = Ae−

x2

σ2

so that the inputs are localized bell-shaped bumps on the shape �.

3.2.1 Planar Retinotopy. First, we consider a set of spatial gaussian inputsuniformly distributed over a two-dimensional plane: � = [0, 1]× [0, 1]. Forsimplicity, we take N = M = K2 and set za = yi for i = a, a ∈ {1, . . . , M}. (Thenumerical results show an identical structure for the final connectivity whenthe yj correspond to random points, but the analysis is harder.) In the simplercase of one-dimensional gaussians with N = M = K, the input matrix takesthe form I = Tf

σ

, where Tf is a symmetric Toeplitz matrix:

Tf =

⎛⎜⎜⎜⎜⎜⎜⎜⎝

f (0) f (1) f (2) · · · · · · f (K)

f (1) f (0) f (1) f (2) · · · f (K − 1)

f (2) f (1) f (0) f (1) · · · f (K − 2)

......

. . .. . .

. . ....

f (K) f (K − 1) f (K − 2) · · · · · · f (0)

⎞⎟⎟⎟⎟⎟⎟⎟⎠

. (3.5)

Page 13: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2358 M. Galtier, O. Faugeras, and P. Bressloff

Figure 2: Plot of planar retinotopic inputs on � = [0, 1]× [0, 1] (left) and finalconnectivity matrix of the system �′ (right). The parameters used for this simu-lation are s(x) = 1

1+e−4(x−1) , l= 1, μ = 10, N = M = 100, σ = 4. (Left) These inputscorrespond to Im = 1.

In the two-dimensional case, we set y = (u, v) ∈ � and introducethe labeling yk+(l−1)K = (uk, vl ) for k, l = 1, . . . K. It follows that I(a)

i ∼exp(− (uk−uk′ )

2

σ 2 ) exp(− (vl−vl′ )2

σ 2 ) for i = k+ (l − 1)K and a = k′ + (l′ − 1)K.Hence, we can write I = Tf

σ

⊗ Tfσ

, where ⊗ is the Kronecker product; theKronecker product is responsible for the K × K substructure we can observein Figure 2, with K = 10. Note that if we were interested in a n-dimensionalretinotopy, then the input matrix could be written as a Kronecker productbetween n Toeplitz matrices. As previously mentioned, the final connectiv-ity matrix roughly corresponds to the correlation matrix of the input matrix.It turns out that the correlation matrix of I is also a Kronecker product oftwo Toeplitz matrixes generated by a single gaussian (with a different stan-dard deviation). Thus, the connectivity matrix has the same basic form asthe input matrix when za = yi for i = a. The inputs and stable equilibriumpoints of the simulated system are shown in Figure 2. The positions xi ofthe neurons after multidimensional scaling are shown in Figure 3 for dif-ferent parameters. Note that we find no significant change in the positionxi of the neurons when the convolutional kernel g varies (as will also beshown in section 3.3.1). Thus, we show results for only one of these kernels,g(x) = e−x.

If the standard deviation of the inputs is properly chosen as in Figure 3b,we observe that the neurons are distributed on a regular grid, which isretinotopically organized. In other words, the network has learned the ge-ometric shape of the inputs. This can also be observed in Figure 3d, whichcorresponds to the same connectivity matrix as in Figure 3b but is repre-sented in three dimensions. The neurons self-organize on a two-dimensional

Page 14: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2359

Uniform sampling Non-Uniform sampling

Uniform sampling Uniform sampling

Uniform sampling Uniform sampling

Figure 3: Positions xi of the neurons after having applied multidimensionalscaling to the equilibrium connectivity matrix of a learning network of N = 100neurons driven by planar retinotopic inputs as described in Figure 2. In allfigures, the convolution kernel g(x) = e−x; this choice has virtually no impacton the shape of the figures. (a) Uniformly sampled inputs with M = 100, σ = 1,Im = 1, and k = 2. (b) Uniformly sampled inputs with M = 100, σ = 4, Im = 1,and k = 2. (c) Uniformly sampled inputs with M = 100, σ = 10, Im = 1, andk = 2. (d) Uniformly sampled inputs with M = 100, σ = 4, Im = 0.2, and k = 2.(e) Same as panel b, but in three dimensions: k = 3. (f) Nonuniformly sampledinputs with M = 150, σ = 4, Im = 1, and k = 2. The first 100 inputs are as inpanel B, but 50 more inputs of the same type are presented to half of the visualfield. This corresponds to the denser part of the picture.

Page 15: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2360 M. Galtier, O. Faugeras, and P. Bressloff

saddle shape that accounts for the border distortions that can be observedin two dimensions (which we discuss in the next paragraph). If σ is toolarge, as can be observed in Figure 3c, the final result is poor. Indeed, theinputs are no longer local and cover most of the visual field. Therefore, theneurons saturate, S(Vi) � Sm, for all the inputs, and no structure can be readin the activity variable. If σ is small, the neurons still seem to self-organize(as long as the inputs are not completely localized on single neurons) butwith significant border effects.

There are several reasons that we observe border distortions in Figure 3.We believe the most important is due to an unequal average excitation ofthe neurons. Indeed, the neurons corresponding to the border of the “visualfield” are less excited than the others. For example, consider a neuron onthe left border of the visual field. It has no neighbors on its left and thereforeis less likely to be excited by its neighbors and less excited on average. Theconsequence is that it is less connected to the rest of the network (see, e.g.,the top line of the right side of Figure 2) because their connection dependson their level of excitment through the correlation of the activity. Therefore,it is farther away from the other neurons, which is what we observe. Whenthe inputs are localized, the border neurons are even less excited on averageand thus are farther away, as shown in Figure 3a.

Another way to get distortions in the positions xi is to reduce or increaseexcessively the amplitude Im = maxi,a |I(a)

i | of the inputs. Indeed, if it issmall, the equilibrium actitivity described by equation 2.14 is also smalland likely to be the flat part of the sigmoid. In this case, neurons tend tobe more homogeneously excited and less sensitive to the particular shapeof the inputs. Therefore, the network loses some information about theunderlying structures of the inputs. Actually the neurons become relativelymore sensitive to the neighborhood structure of the network, and the borderneurons have a different behavior from the rest of the network, as shown inFigure 3e. The parameter μ has much less impact on the final shape since itcorresponds only to a homogeneous scaling of the final connectivity.

So far, we have assumed that the inputs were uniformly spread on themanifold �. If this assumption is broken, the final position of the neuronswill be affected. As shown in Figure 3f, where 50 inputs were added to thecase of Figure 3b in only half of the visual field, the neurons that code forthis area tend to be closer. Indeed, they tend not to be equally excited onaverage (as supposed in proposition 1), and a distortion effect occurs. Thismeans that a proper understanding of the role of the vertical connectivitywould be needed to complete this geometrical picture of the functioning ofthe network. This is, however, beyond the scope of this letter.

3.2.2 Toroıdal Retinotopy. We now assume that the inputs are uniformlydistributed over a two-dimensional torus, � = T

2. That is, the input labelsza are randomly distributed on the torus. The neuron labels yi are regularly

Page 16: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2361

Figure 4: Plot of retinotopic inputs on � = T2 (left) and the final connectivity

matrix (right) for the system �′. The parameters used for this simulation ares(x) = 1

1+e−4(x−1) , l = 1, μ = 10, N = 1000, M = 10, 000, σ = 2.

Figure 5: Positions xi of the neurons for k= 3 after having applied multidimen-sional scaling methods presented in section 3.1 to the final connectivity matrixshown in Figure 4.

and uniformly distributed on the torus. The inputs and final stable weightmatrix of the simulated system are shown in Figure 4. The positions xi ofthe neurons after multidimensional scaling for k = 3 are shown in Figure 5and appear to form a cloud of points distributed on a torus. In contrast tothe previous example, there are no distortions now because there are noborders on the torus. In fact, the neurons are equally excited on average inthis case which makes proposition 1 valid.

Page 17: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2362 M. Galtier, O. Faugeras, and P. Bressloff

3.3 Links with Neuroanatomy. The brain is subject to energy con-straints, which are completely neglected in the above formulation. Theseconstraints most likely have a significant impact on the positions of realneurons in the brain. Indeed, it seems reasonable to assume that the po-sitions and connections of neurons reflect a trade-off between the energycosts of biological tissue and their need to process information effectively.For instance, it has been suggested that a principle of wire length minimiza-tion may occur in the brain (see Swindale, 1996; Chklovskii, Schikorski, &Stevens, 2002). In our neural mass framework, one may consider that thestronger two neural masses are connected, the larger the number of real ax-ons linking the neurons together. Therefore, minimizing axonal length canbe read as that the stronger the connection, the closer, which is consistentwith the convolutional part of the weight matrix. However, the underlyinggeometry of natural inputs is likely to be very high-dimensional, whereasthe brain lies in a three-dimensional world. In fact, the cortex is so flatthat it is effectively two-dimensional. Hence, the positions of real neuronsare different from the positions xi ∈ R

k in a high-dimensional vector space;since the cortex is roughly two-dimensional, the positions could be realizedphysically only if k = 2. Therefore, the three-dimensional toric geometryor any higher-dimensional structure could not be perfectly implementedin the cortex without the help of nonconvolutional long-range connections.Indeed, we suggest that the cortical connectivity is made of two parts: (1) alocal convolutional connectivity corresponding to the convolutional term gin equations 3.2 and 3.3, which is consistent with the requirements of en-ergy efficiency, and (2) a nonconvolutional connectivity corresponding tothe factor M in equations 3.2 and 3.3, which is required in order to representvarious stimulus features. If the cortex were higher-dimensional (k � 2),then there would no nonconvolutionnal connectivity M, that is, M ≡ 0 forthe linear convolutional kernel or M ≡ 1 for the exponential one.

We illustrate this claim by considering two examples based on thefunctional anatomy of the primary visual cortex: the emergence of oculardominance columns and orientation columns, respectively. We proceed byreturning to the case of planar retinotopy (see section 5.3.1) but now withadditional input structure. In the first case, the inputs are taken to be binocu-lar and isotropic, whereas in the second case, they are taken to be monocularand anisotropic. The details are presented below. Given a set of prescribedinputs, the network evolves according to equation 2.10, and the lateral con-nections converge to a stable equilibrium. The resulting weight matrix isthen projected onto the set of distance matrices for k = 2 (as described insection 3.2.1) using the stress majorization or SMACOF algorithm for thestress1 cost function as described in Borg and Groenen (2005). We thus as-sign a position xi ∈ R

2 to the ith neuron, i = 1, . . . , N. (Note that the positionxi extracted from the weights using multidimensional scaling is distinctfrom the “physical” position yi of the neuron in the retinocortical plane;the latter determines the center of its receptive field.) The convolutional

Page 18: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2363

connectivity (g in equations 3.2 and 3.3) is therefore completely defined.On the planar map of points xi, neurons are isotropically connected to theirneighbors; the closer the neurons are, the stronger is their convolutionalconnection. Moreover, since the stimulus feature preferences (orientation,ocular dominance) of each neuron i, i = 1, . . . , N, are prescribed, we cansuperimpose these feature preferences on the planar map of points xi. Inboth examples, we find that neurons with the same ocular or orientationselectivity tend to cluster together: interpolating these clusters then gen-erates corresponding feature columns. It is important to emphasize thatthe retinocortical positions yi do not have any columnar structure, that is,they do not form clusters with similar feature preferences. Thus, in contrastto standard developmental models of vertical connections, the columnarstructure emerges from the recurrent weights after learning, which are in-trepreted as a Euclidean distances. It follows that neurons coding for thesame feature tend to be strongly connected; indeed, the multidimensionalscaling algorithm has the property that it positions strongly connected neu-rons close together. Equations 3.2 and 3.3 also suggest that the connectivityhas a nonconvolutional part, M, which is a consequence of the low dimen-sionality (k = 2). In order to illustrate the structure of the nonconvolutionalconnectivity, we select a neuron i in the plane and draw a link from it atposition xi to the neurons at position xj for which M(xi, x j) is maximal inFigures 7, 8, and 9. We find that M tends to be patchy; it connects neu-rons having the same feature preferences. In the case of orientation, M alsotends to be coaligned, that is, connecting neurons with similar orientationpreference along a vector in the plane of the same orientation.

Note that the proposed method to get cortical maps is artificial. First, thenetworks learn its connectivity and, second, the equilibrium connectivitymatrix is used to give a position to the neurons. In biological tissues, thecortical maps are likely to emerge slowly during learning. Therefore, a morebiological way of addressing the emergence of cortical maps may consistin repositioning the neurons at each time step, taking the position of theneurons at time t as the initial condition of the algorithm at time t + 1.This would correspond better to the real effect of the energetic constraints(leading to wire length minimization), which occur not only after learningbut also at each time step. However, when learning is finished, the energeticlandscape corresponding to the minimization of the wire length wouldstill be the same. In fact, repositioning the neurons successively duringlearning changes only the local minimum where the algorithm converges:it corresponds to a different initialization of the neurons’ positions. Yetwe do not think this corresponds to a more relevant minimum since thepositions of the points at time t = 0 are still randomly chosen. Althoughnot biological, the a posteriori method is not significantly worse than theapplication of algorithm at each time step and gives an intuition of theshapes of the local minima of the functional leading to the minimization ofthe wire length.

Page 19: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2364 M. Galtier, O. Faugeras, and P. Bressloff

0

5

0 100

2000

20

0

05

0 3000 6000 9000 12000

density power spectrumof the density

space

bin

size

frequency

frequency

frequency

space

space

bin

size

bin

size

bin

size

10

100

a) b)

c) d)

12 8 4 0 4 8 12

0

05

0 50

05

50

05

Figure 6: Emergence of ocular dominance columns. Analysis of the equilibriumconnectivity of a network of N = 1000 neurons exposed to M = 3000 inputs asdescribed in equation 3.6 with σ ′ = 10. The parameters used for this simula-tion are Im = 1, μ = 10 and s(x) = 1

1+e−4(x−1) . (a) Relative density of the networkassuming that the “weights” of the left neurons are +1 and the “weights” ofthe right eye neuron are −1. Thus, a positive (resp. negative) lobe correspondsto a higher number of left neurons (resp. right neurons), and the presence ofoscillations implies the existence of ocular dominance columns. The size of thebin to compute the density is 5. The blue (resp. green) curve corresponds toγ = 0 (resp. γ = 1). It can be seen that the case γ = 1 exhibits significant oscil-lations consistent with the formation of ocular dominance columns. (b) Powerspectra of curves plotted in panel a. The dependence of the density and powerspectrum on bin size is shown in panels c and d, respectively. The top picturescorrespond to the blue curves (i.e., no binocular disparity), and the bottom pic-tures correspond to the green curves γ = 1 (i.e., a higher binocular disparity).

3.3.1 Ocular Dominance Columns and Patchy Connectivity. In order to con-struct binocular inputs, we partition the N neurons into two sets i ∈ {1, . . . ,

N/2} and i ∈ {N/2+ 1, . . . , N} that code for the left and right eyes, respec-tively. The ith neuron is then given a retinocortical position yi, with theyi uniformly distributed across the line for Figure 6 and across the planefor Figures 7 and 8. We do not assume a priori that there exist any oculardominance columns, that is, neurons with similar retinocortical positions yi

Page 20: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2365

do not form clusters of cells coding for the same eye. We then take the athinput to the network to be of the form

Left eye I(a)i = (1+ γ (a))e−

(yi−za )2

σ ′2 , i = 1, . . . , N/2,

Right eye I(a)i = (1− γ (a))e−

(yi−za )2

σ ′2 , i = N/2+ 1, . . . , N,

(3.6)

where the za are randomly generated from [0, 1] in the one-dimensional caseand [0, 1]2 in the two-dimensional case. For each input a, γ (a) is randomlyand uniformly taken in [−γ , γ ] with γ ∈ [0, 1] (see Bressloff, 2005). Thus,if γ (a) > 0 (γ (a) < 0), then the corresponding input is predominantly fromthe left (right) eye.

First, we illustrate the results of ocular dominance simulations in onedimension in Figure 6. Although it is not biologically realistic, taking the vi-sual field to be one-dimensional makes it possible to visualize the emergenceof ocular dominance columns more easily. Indeed, in Figure 6 we analyzethe role of the binocular disparity of the network, that is, we change thevalue of γ . If γ = 0 (the blue curves in Figures 6a and 6b and top picturesin Figures 6c and 6d), there are virtually no differences between left andright eyes, and we observe much less segregation than in the case γ = 1(the green curves in Figures 6a and 6b and the bottom pictures in Figures6c and 6d). Increasing the binocular disparity between the two eyes resultsin the emergence of ocular dominance columns. Yet there does not seem tobe any spatial scale associated with these columns: they form on variousscales, as shown in Figure 6d.

In Figures 7 and 8, we plot the results of ocular dominance simulationsin two dimensions. In particular, we illustrate the role of changing thebinocular disparity γ , changing the standard deviation of the inputs σ ′,and using different convolutional kernels g. We plot the points xi obtainedby performing multidimensional scaling on the final connectivity matrixfor k = 2 and superimposing on this the ocular dominance map obtainedby interpolating between clusters of neurons with the same eye preference.The convolutional connectivity (g in equations 3.2 and 3.3) is implicitly de-scribed by the position of the neurons: the closer the neurons, the strongertheir connections. We also illustrate the nonconvolutional connectivity (Min equations 3.2 and 3.3) by linking one selected neuron to the neurons itis most strongly connected to. The color of the link refers to the color ofthe target neuron. The multidimensional scaling algorithm was applied foreach set of parameters with different initial conditions, and the best finalsolution (i.e., with the smallest nonconvolutional part) was kept and plot-ted. The initial conditions were random distributions of neurons or artifi-cially created ocular dominance stripes with different numbers of neuronsper stripe. It turns out the algorithm performed better on the latter. (Thenumber of tunable parameters was too high for the system to converge to

Page 21: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2366 M. Galtier, O. Faugeras, and P. Bressloff

Figure 7: Analysis of the equilibirum connectivity of a modifiable recurrentnetwork driven by two-dimensional binocular inputs. This figure and Figure8 correspond to particular values of the disparity γ and standard deviationσ ′. Each cell shows the profile of the inputs (top), the position of the neuronsfor a linear convolutional kernel (middle), and an exponential kernel (bottom).The parameters of the kernel (a and λ) were automatically chosen to minimizethe nonconvolutional part of the connectivity. It can be seen that the choiceof the convolutional kernel has little impact on the position of the neurons.(a, c) these panels correspond to γ = 0.5, which means there is little binoculardisparity. Therefore, the nonconvolutional connectivity connects neurons ofopposite eye preference more than for γ = 1, as shown in panels b and d. Theinputs for panels a and b have a smaller standard deviation than for panelsc and d. It can be seen that the neurons coding for the same eye tend to becloser when σ ′ is larger. The other parameters used for these simulations ares(x) = 1/(1+ e−4(x−1) ), l = 1, μ = 10, N = M = 200.

Page 22: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2367

Figure 8: See Figure 7 for the commentaries.

a global equilibrium for a random initial condition.) Our results show thatnonconvolutional or long-range connections tend to link cells with the sameocular dominance provided the inputs are sufficiently strong and differentfor each eye.

Page 23: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2368 M. Galtier, O. Faugeras, and P. Bressloff

3.3.2 Orientation Columns and Collinear Connectivity. In order to constructoriented inputs, we partition the N neurons into four groups �θ correspond-ing to different orientation preferences θ = {0, π

4 , π2 , 3π

4 }. Thus, if neuroni ∈ �θ , then its orientation preference is θi = θ . For each group, the neuronsare randomly assigned a retinocortical position yi ∈ [0, 1]× [0, 1]. Again,we do not assume a priori that there exist any orientation columns, that is,neurons with similar retinocortical positions yi do not form clusters of cellscoding for the same orientation preference. Each cortical input I(a)

i is gener-ated by convolving a thalamic input consisting of an oriented gaussian witha Gabor-like receptive field (as in Miikkulainen et al., 2005). Let Rθ denotea two-dimensional rigid body rotation in the plane with θ ∈ [0, 2π). Then

I(a)i =

∫Gi(ξ − yi)Ia(ξ − za)dξ, (3.7)

where

Gi(ξ ) = G0(Rθiξ ) (3.8)

and G0(ξ ) is the Gabor-like function,

G0(ξ ) = A+e−ξT .�−1.ξ − A−e−(ξ−e0 )T .�−1.(ξ−e0 ) − A−e−(ξ+e0 )T .�−1.(ξ+e0 ),

with e0 = (0, 1) and

� =⎛⎝σlarge 0

0 σsmall

⎞⎠ .

The amplitudes A+, A− are chosen so that∫

G0(ξ )dξ = 0. Similarly, thethalamic input Ia(ξ ) = I(Rθ ′a

ξ ) with I(ξ ) the anisotropic gaussian

I(ξ ) = e−ξT .�′−1.ξ , �′ =⎛⎝σ ′large 0

0 σ ′small

⎞⎠ .

The input parameters θ ′a and za are randomly generated from [0, π )

and [0, 1]2, respectively. In our simulations, we take σlarge = 0.133 . . . ,σ ′large = 0.266 . . . , and σsmall = σ ′small = 0.0333 . . . . The results of our simu-lations are shown in the Figure 9 (left). In particular, we plot the points xiobtained by performing multidimensional scaling on the final connectivitymatrix for k = 2, and superimposing on this the orientation preferencemap obtained by interpolating between clusters of neurons with the sameorientation preference. To avoid border problems, we have zoomed on the

Page 24: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2369

Figure 9: Emergence of orientation columns. (Left) Plot of the positions xi ofneurons for k = 2 obtained by multidimensional scaling of the weight matrix.Neurons are clustered in orientation columns represented by the colored areas,which are computed by interpolation. The strongest components of the noncon-volutional connectivity (M in equations 3.2 and 3.3) from a particular neuronin the yellow area are illustrated by drawing black links from this neuron tothe target neurons. Since yellow corresponds to an orientation of 3π

4 , thenonconvolutional connectivity shows the existence of a colinear connectivityas exposed in Bosking, Zhang, Schofield, and Fitzpatrick (1997). The parametersused for this simulation are s(x) = 1

1+e−4(x−1) , l = 1, μ = 10, N = 900, M = 9000.(Right) Histogram of the five largest components of the nonconvolutionalconnectivity for 80 neurons randomly chosen among those shown in the leftpanel. The abscissa corresponds to the difference in radian between the direc-tion preference of the neuron and the direction of the links between the neuronand the target neurons. This histogram is weighted by the strength of the non-convolutional connectivity. It shows a preference for coaligned neurons butalso a slight preference for perpendicularly aligned neurons (e.g., neurons ofthe same orientation but parallel to each other). We chose 80 neurons in thecenter of the picture because the border effects shown in Figure 3 do not ariseand it roughly corresponds to the number of neurons in the left panel.

center on the map. We also illustrate the nonconvolutional connectivity bylinking a group of neurons gathered in an orientation column to all otherneurons for which M is maximal. The patchy, anisotropic nature of thelong-range connections is clear. The anisotropic nature of the connectionsis further quantified in the histogram of Figure 9.

4 Discussion

In this letter, we have shown how a neural network can learn the under-lying geometry of a set of inputs. We have considered a fully recurrentneural network whose dynamics is described by a simple nonlinear rateequation, together with unsupervised Hebbian learning with decay that

Page 25: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2370 M. Galtier, O. Faugeras, and P. Bressloff

occurs on a much slower timescale. Although several inputs are periodi-cally presented to the network, so that the resulting dynamical system isnonautonomous, we have shown that such a system has a fairly simpledynamics: the network connectivity matrix always converges to an equi-librium point. We then demonstrated how this connectivity matrix can beexpressed as a distance matrix in R

k for sufficiently large k, which can berelated to the underlying geometrical structure of the inputs. If the connec-tivity matrix is embedded in a lower two-dimensional space (k = 2), thenthe emerging patterns are qualitatively similar to experimentally observedcortical feature maps, although we have considered simplistic stimuli andthe feedforward connectivity as fixed. That is, neurons with the same fea-ture preferences tend to cluster together, forming cortical columns withinthe embedding space. Moreover, the recurrent weights decompose into alocal isotropic convolutional part, which is consistent with the requirementsof energy efficiency and a longer-range nonconvolutional part that is patchy.This suggest a new interpretation of the cortical maps: they correspond totwo-dimensional embeddings of the underlying geometry of the inputs.

Geometric diffusion methods (see Coifman et al., 2005) are also an ef-ficient way to reveal the underlying geometry of sets of inputs. There aredifferences with our approach, although both share the same philosophy.The main difference is that geometric harmonics deals with the probabilityof co-occurence, whereas our approach is more focused on wiring length,which is indirectly linked to the inputs through Hebbian coincidence. Froman algoritmic point of view, our method is concerned with the positionof N neurons and not M inputs, which can be a clear advantage in cer-tain regimes. Indeed, we deal with matrices of size N ×N, whereas thetotal size of the inputs is N ×M, which is potentially much higher. Finally,this letter is devoted to decomposing the connectivity between a convo-lutional and nonconvolutional part, and this is why we focus not only onthe spatial structure but also on the shape of the activity equation on thisstructure. These two results come together when decomposing the con-nectivity. Actually, this focus on the connectivity was necessary to use theenergy minization argument of section 2.3.1 and compute the cortical mapsin section 3.3.

One of the limitations of applying simple Hebbian learning to recurrentcortical connections is that it takes into account only excitatory connections,whereas 20% of cortical neurons are inhibitory. Indeed, in most develop-mental models of feedforward connections, it is assumed that the localand convolutional connections in cortex have a Mexican hat shape withnegative (inhibitory) lobes for neurons that are sufficiently far from eachother. From a computational perspective, it is possible to obtain such aweight distribution by replacing Hebbian learning with some form of co-variance learning (see Sejnowski & Tesauro, 1989). However, it is difficultto prove convergence to a fixed point in the case of the covariance learningrule, and the multidimensional scaling method cannot be applied directly

Page 26: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2371

unless the Mexican hat function is truncated so that it is invertible. Anotherlimitation of rate-based Hebbian learning is that it does not take into ac-count causality, in contrast to more biologically detailed mechanisms suchas spike-timing-dependent plasticity.

The approach taken here is very different from standard treatments ofcortical development (as in Miller, Keller, & Stryker, 1989; Swindale, 1996),in which the recurrent connections are assumed to be fixed and of con-volutional Mexican hat form while the feedforward vertical connectionsundergo some form of correlation-based Hebbian learning. In the lattercase, cortical feature maps form in the physical space of retinocortoicalcoordinates yi, rather than in the more abstract planar space of points xiobtained by applying multidimensional scaling to recurrent weights un-dergoing Hebbian learning in the presence of fixed vertical connections. Aparticular feature of cortical maps formed by modifiable feedforward con-nections is that the mean size of a column is determined by a Turing-likepattern forming instability and depends on the length scales of the Mexicanhat weight function and the two-point input correlations (see Miller et al.,1989; Swindale, 1996). No such Turing mechanism exists in our approach,so the resulting cortical maps tend to be more fractal-like (many lengthscales) compared to real cortical maps. Nevertheless, we have establishedthat the geometrical structure of cortical feature maps can also be encodedby modifiable recurrent connections. This should have interesting conse-quences for models that consider the joint development of feedforward andrecurrent cortical connections. One possibility is that the embedding spaceof points xi arising from multidimensional scaling of the weights becomesidentified with the physical space of retinocortical positions yi. The emer-gence of local convolutional structures, together with sparser long-rangeconnections, would then be consistent with energy efficiency constraints inphysical space.

Our letter also draws a direct link between the recurrent connectivityof the network and the positions of neurons in some vector space suchas R

2. In other words, learning corresponds to moving neurons or nodesso that their final position will match the inputs’ geometrical structure.Similarly, the Kohonen algorithm detailed in Kohonen (1990) describes away to move nodes according to the inputs presented to the network. It alsoconverges toward the underlying geometry of the set of inputs. Althoughthese approaches are not formally equivalent, it seems that both have thesame qualitative behavior. However, our method is more general in thesense that no neighborhood structure is assumed a priori; such a structureemerges via the embedding into R

k.Finally, note that we have used a discrete formalism based on a finite

number of neurons. However, the resulting convolutional structure ob-tained by expressing the weight matrix as a distance matrix in R

k (seeequations 3.2 and 3.3) allows us to take an appropriate continuum limit.This then generates a continuous neural field model in the form of an

Page 27: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2372 M. Galtier, O. Faugeras, and P. Bressloff

integro-differential equation whose integral kernel is given by the un-derlying weight distribution. Neural fields have been used increasinglyto study large-scale cortical dynamics (see Coombes, 2005 for a review).Our geometrical learning theory provides a developmental mechanism forthe formation of these neural fields. One of the useful features of neu-ral fields from a mathematical perspective is that many of the methodsof partial differential equations can be carried over. Indeed, for a generalclass of connectivity functions defined over continuous neural fields, areaction-diffusion equation can be derived whose solution approximatesthe firing rate of the associated neural field (see Degond & Mas-Gallic,1989; Cottet, 1995; Edwards, 1996). It appears that the necessary connec-tivity functions are precisely those that can be written in the form 3.2.This suggests that a network that has been trained on a set of inputs withan appropriate geometrical structure behaves as a diffusion equation in ahigh-dimensional space together with a reaction term corresponding to theinputs.

Appendix

A.1 Proof of System’s � Averaging. Here, we show that system �

can be approximated by the autonomous Cauchy problem �′. Indeed, wecan simplify system � by applying Tikhonov’s theorem for slow and fastsystems and then classical averaging methods for periodic systems.

A.1.1 Tikhonov’s Theorem. Tikhonov’s theorem (see Tikhonov, 1952, andVerhulst, 2007 for a clear introduction) deals with slow and fast systems. Itsays the following:

Theorem 3. Consider the initial value problem,

x. = f (x, y, t), x(0) = x0, x ∈ Rn, t ∈ R+,

εy. = g(x, y, t), y(0) = y0, y ∈ Rm.

Assume that:

1. A unique solution of the initial value problem exists, and, we suppose, thisholds also for the reduced problem,

x. = f (x, y, t), x(0) = x0,

0 = g(x, y, t)

with solutions x(t), y(t).2. The equation 0 = g(x, y, t) is solved by y(t) = φ(x, t), where φ(x, t) is a

continuous function and an isolated root. Also suppose that y(t) = φ(x, t)

Page 28: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2373

is an asymptotically stable solution of the equation dydτ

= g(x, y, τ ) that isuniform in the parameters x ∈ R

n and t ∈ R+.3. y(0) is contained in an interior subset of the domain of attraction of y.

Then we have

limε→0

xε(t) = x(t), 0 ≤ t ≤ L ,

limε→0

yε(t) = y(t), 0 ≤ d ≤ t ≤ L ,

with d and L constants independent of ε.

In order to apply Tikhonov’s theorem directly to system �, we first needto rescale time according to t → εt. This gives

εdVdt=−V +W · S(V )+ I,

dWdt= S(V )⊗ S(V )− μW.

Tikhonov’s theorem then implies that solutions of � are close to solutionsof the reduced system (in the unscaled time variable)

{V(t) =W · S(V(t))+ I(t),

W = ε(S(V )⊗ S(V )− μW ),(A.1)

provided that the dynamical systems � in equation 3.6, and equation A.1are well defined. It is easy to show that both systems are Lipschitz becauseof the properties of S. Following Faugeras et al. (2008), we know that if

S′m‖W‖ < 1, (A.2)

there exists an isolated root V : R+ → RN of the equation V =W · S(V )+ I

and V is asymptotically stable. Equation A.2 corresponds to the weakly con-nected case. Moreover, the initial condition belongs to the basin of attractionof this single fixed point. Note that we require M

T � 1 so that the membranepotentials have sufficient time to approach the equilibrium associated witha given input before the next input is presented to the network. In fact, thisassumption make it reasonable to neglect the transient activity dynamicsdue to the switching between inputs.

A.1.2 Periodic Averaging. The system given by equation A.1 correspondsto a differential equation for W with T-periodic forcing due to the presenceof V on the right-hand side. Since T � ε−1, we can use classical averagingmethods to show that solutions of equation A.1 are close to solutions of

Page 29: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2374 M. Galtier, O. Faugeras, and P. Bressloff

the following autonomous system on the time interval [0, 1ε] (which we

suppose is large because ε � 1):

�0 :

⎧⎪⎪⎨⎪⎪⎩

V(t) =W · S(V(t))+ I(t)

dWdt

= ε

(1T

∫ T

0S(V(s))⊗ S(V(s))ds− μW(t)

) . (A.3)

It follows that solutions of � are also close to solutions of �0. Findingthe explicit solution V(t) for each input I(t) is difficult and requires fixed-point methods (e.g., a Picard algorithm). Therefore, we consider yet anothersystem �′ whose solutions are also close to �0 and hence �. In order toconstruct �′, we need to introduce some additional notation.

Let us label the M inputs by I(a), a = 1, . . . , M and denote byV (a) the fixed-point solution of the equation V (a) =W · S(V (a))+ I(a). Given the periodicsampling of the inputs, we can rewrite equation A.3 as

V (a) =W · S(V (a))+ I(a),

dWdt= ε

(1M

M∑a=1

S(V (a))⊗ S(V (a))− μW(t)

).

(A.4)

If we now introduce the N ×M matrices V and I with components Via =V (a)

i and Iia = I(a)i , we can eliminate the tensor product and simply write

equation A.4 in the matrix form

V =W · S(V )+ I,

dWdt= ε

(1M

S(V ) · S(V )T − μW(t))

,

(A.5)

where S(V ) ∈ RN×M such that [S(V )]ia = s(V (a)

i ). A second application ofTikhonov’s theorem (in the reverse direction) then establishes that solutionsof the system �0 (written in the matrix form A.5) are close to solutions ofthe matrix system

�′ :

⎧⎪⎪⎪⎨⎪⎪⎪⎩

dVdt

= −V +W · S(V)+ I

dWdt= ε

(1M

S(V ) · S(V )T − μW(t)) . (A.6)

Page 30: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2375

In this letter, we focus on system �′ whose solutions are close to those ofthe original system � provided condition A.2 is satisfied, that is, the networkis weakly connected. Clearly the fixed points (V∗,W∗) of system � satisfy

‖W∗‖ ≤ S2mμ

. Therefore, equation A.2 says that ifS2

mS′mμ

< 1, then Tikhonov’stheorem can be applied, and systems � and �′ can be reasonably consideredas good approximations of each other.

A.2 Proof of the Convergence to the Symmetric Attractor A. We needto prove the two points: (1)A is an invariant set and (2) for allY(0) ∈ R

N×M ×R

N×N, Y(t) converges to A as t →+∞. Since RN×N is the direct sum of the

set of symmetric connectivities and the set of antisymmetric connectivies,we write W(t) =WS(t)+WA(t), ∀t ∈ R+, where WS is symetric and WA isantisymetric.

1. In equation 2.10, the right-hand side of the equation forW is symmetric.Therefore, if ∃t1 ∈ R+ such that WA(t1) = 0, then W remains in A fort ≥ t1.

2. Projecting the expression forW in equation 2.10 on to the antisymmetriccomponent leads to

dWA

dt= −εμWA(t), (A.7)

whose solution is WA(t) =WA(0) exp(−εμt),∀t ∈ R+. Therefore,lim

t→+∞WA(t) = 0. The system converges exponentially to A.

A.3 Proof of Theorem 1. Consider the following Lyapunov function(see equation 2.10),

E(U,W ) = −12〈U,W · U〉 − 〈I,U〉 + 〈1, S−1(U )〉 + μ

2‖W‖2, (A.8)

where μ = μM, such that if W =WS +WA, where WS is symmetric and WAis antisymmetric:

− ∇E(U,W ) =⎛⎝WS · U + I − S−1

(U)

U · UT − μW

⎞⎠ . (A.9)

Therefore, writing the system �′, equation 2.10, as

dYdt= γ

⎛⎝WS · S(V )+ I − S−1

(S(V )

)S(V ) · S(V )T − μW

⎞⎠+ γ

⎛⎝WA.S(V )

0

⎞⎠ ,

Page 31: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2376 M. Galtier, O. Faugeras, and P. Bressloff

where Y = (V,W )T , we see that

dYdt= −γ (∇E(σ (V,W )))+ �(t), (A.10)

where γ (V,W )T = (V, εW/M)T , σ (V,W ) = (S(V ),W ), and � : R+ → Hsuch that ‖�‖t→+∞→ 0 exponentially (because the system converges to A).It follows that the time derivative of E = E ◦ σ along trajectories is given by

dEdt=

⟨∇E,

dYdt

⟩=

⟨∇V E,

dVdt

⟩+

⟨∇W E,

dWdt

⟩. (A.11)

Substituting equation A.10 then yields

dEdt=−〈∇E, γ (∇E ◦ σ )〉 + 〈∇E, �(t)〉︸ ︷︷ ︸

�(t)

=−〈S′(V )∇UE ◦ σ,∇UE ◦ σ 〉 − ε

M〈∇W E ◦ σ,∇W E ◦ σ 〉 + �(t).

(A.12)

We have used the chain rule of differentiation whereby

∇V (E) = ∇V (E ◦ σ ) = S′(V )∇UE ◦ σ,

and S′(V )∇UE (without dots) denotes the Hadamard (term-by-term) prod-uct, that is,

[S′(V )∇UE]ia = s′(V (a)i )

∂E

∂U (a)i

.

Note that |�|t→+∞→0 exponentially because∇E is bounded, and S′(V ) >

0 because the trajectories are bounded. Thus, there exists t1 ∈ R+ such that∀t > t1, ∃k ∈ R

∗+ such that

dEdt≤ −k‖∇E ◦ σ‖2 ≤ 0. (A.13)

As in Cohen and Grossberg (1983) and Dong and Hopfield (1992), we applythe Krasovskii-LaSalle invariance principle detailed in Khalil and Grizzle(1996). We check that:

Page 32: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2377

� E is lower-bounded. Indeed, V and W are bounded. Given that I andS are also bounded, it is clear that E is bounded.

�dEdt

is negative semidefinite on the trajectories as shown in equation

A.13.

Then the invariance principle tells us that the solutions of system �′

approach the set M = {Y ∈ H :

dEdt

(Y ) = 0}. Equation A.13 implies that

M = {Y ∈ H : ∇E ◦ σ = 0

}. Since

dYdt= −γ

(∇E ◦ σ)

and γ �= 0 everywhere,

M consists of the equilibrium points of the system. This completes the proof.

A.4 Proof of Theorem 2. Denote the right-hand side of system �′,equation 2.10, by

F(V,W ) =

⎧⎪⎨⎪⎩−V +W · S(V)+ I

ε

M

(S(V ).S(V )T − μMW

).The fixed points satisfy condition F(V,W ) = 0, which immediately leadsto equations 2.14. Let us now check the linear stability of this system. Thedifferential of F at V∗,W∗ is

dF(V∗,W∗ )(Z, J)

=

⎛⎜⎝ −Z +W∗ · (S′(V∗)Z )+ J · S(V∗)

ε

M((S′(V∗)Z ) · S(V∗)T + S(V∗) · (S′(V∗)Z )T − μMJ),

⎞⎟⎠ ,

where S′(V∗)Z denotes a Hadamard product, that is, [S′(V∗)Z]ia =s′(V∗

i(a))Z(a)

i . Assume that there exist λ ∈ C∗, (Z, J) ∈ H such that

dF(V∗,W∗ )(ZJ

) = λ(ZJ

). Taking the second component of this equation and

computing the dot product with S(V∗) leads to

(λ+ εμ)J · S = ε

M((S′Z ) · ST · S+ S · (S′Z )T · S),

where S = S(V∗), S′ = S′(V∗). Substituting this expression in the first equa-tion leads to

M(λ+ εμ)(λ+ 1)Z

=(

λ

μ+ ε

)S · ST · (S′Z )+ ε(S′Z ) · ST · S+ εS · (S′Z )T · S. (A.14)

Page 33: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2378 M. Galtier, O. Faugeras, and P. Bressloff

Observe that setting ε = 0 in the previous equation leads to an eigenvalueequation for the membrane potential only:

(λ+ 1)Z = 1μM

S · ST · (S′Z ).

Since W∗ = 1μM

(S · ST

), this equation implies that λ+ 1 is an eigenvalue of

the operator X �→W∗.(S′X). The magnitudes of the eigenvalues are alwayssmaller than the norm of the operator. Therefore, we can say that if 1 >

‖W∗‖S′m, then all the possible eigenvalues λ must have a negative real part.This sufficient condition for stability is the same as in Faugeras et al. (2008).It says that fixed points sufficiently close to the origin are always stable.

Let us now consider the case ε �= 0. Recall that Z is a matrix. We now“flatten” Z by storing its rows in a vector called Zrow. We use the followingresult in Brewer (1978): the matrix notation of operator X �→ A · X · B isA⊗ BT , where ⊗ is the Kronecker product. In this formalism, the previousequation becomes

M(λ+ εμ)(λ+ l)Zrow

=((

λ

μ+ ε

)S · ST ⊗ Id + εId ⊗ ST · S+ εS⊗ ST

)· (S′Z)row, (A.15)

where we assume that the Kronecker product has the priority over thedot product. We focus on the linear operator O defined by the right-handside and bound its norm. Note that we use the following norm ‖W‖∞ =supX

‖W.X‖‖X‖ , which is equal to the largest magnitude of the eigenvalues of W:

‖O‖∞ ≤ S′m

(∣∣∣∣ λμ∣∣∣∣‖S · ST ⊗ Id‖∞ + ε‖S · ST ⊗ Id‖∞

+ ε‖Id ⊗ ST · S‖∞ + ε‖S⊗ ST‖∞)

. (A.16)

Define νm to be the magnitude of the largest eigenvalue of W∗ = 1μM (S · ST ).

First, note that S · ST and ST · S have the same eigenvalues (μM)νi butdifferent eigenvectors denoted by ui for S · ST and vi for ST · S. In thebasis set spanned by the ui ⊗ v j, we find that S · ST ⊗ Id and Id ⊗ ST · S arediagonal with (μM)νi as eigenvalues. Therefore, ‖S · ST ⊗ Id‖∞ = (μM)νmand ‖Id ⊗ ST · S‖∞ = (μM)νm. Moreover, observe that

(ST ⊗ S)T · (ST ⊗ S) · (ui ⊗ v j)= (S · ST · ui)⊗ (ST · S · v j)

= (μM)2νiν j ui ⊗ v j. (A.17)

Page 34: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2379

Therefore, (ST ⊗ S)T · (ST ⊗ S) = (μM)2diag(νiν j). In other words, ST ⊗ Sis the composition of an orthogonal operator (i.e., an isometry) and adiagonal matrix. Immediately, it follows that ‖ST ⊗ S‖ ≤ (μM)νm.

Compute the norm of equation A.15:

|(λ+ εμ)(λ+ 1)| ≤ S′m(|λ| + 3εμ)νm. (A.18)

Define fε : C→ R such that fε (λ) = |(λ+ εμ)||(λ+ 1)| − (|λ| + 3εμ)S′mνm.We want to find a condition such that fε (C+) > 0, where C+ is the righthalf complex plane. This condition on ε, μ, νm, and S′m will be a sufficientcondition for linear stability. Indeed, under this condition, we can showthat only eigenvalues with a negative real part can meet the necessarycondition, equation A.18. A complex number of the right half-plane cannotbe eigenvalues, and thus the system is stable. The case ε = 0 tells us thatf0(C+) > 0 if 1 > S′mνm; compute

∂ fε∂ε

(λ) = μ(�(λ)+ με)|(λ+ 1)||(λ+ εμ)| − 3μS′mνm.

If 1 ≥ εμ, which is probably true given that ε � 1, then |(λ+1)||(λ+εμ)| ≥ 1. Assum-

ing λ ∈ C+ leads to

∂ fε∂ε

(λ) ≥ μ(με − 3S′mνm) ≥ μ(1− 3S′mνm).

Therefore, the condition 3S′mνm < 1, which implies S′mνm < 1 and leads tofε (C+) > 0.

A.5 Proof of Proposition 1. The uniform excitment of neurons on av-erage reads, for all i = 1, .., N, ‖S(Vi)‖ = c ∈ R+. For simplicity and withoutloss of generality, we can assume c = 1 and a ≥ 1 (we can play with a and λ

to generalize this to any c, a ∈ R+ as long as a > c). The triangular inequalitywe want to prove can therefore be written as

√g−1(S(Vi).S(V j)

T ) ≤√

g−1(S(Vi).S(Vk)T )+

√g−1(S(V j).S(Vk)

T ).

(A.19)

For readability we rewrite u = S(Vi), v = S(V j), and w = S(Vk). These threevectors are on the unity sphere and have only positive coefficients. Actually,there is a distance between these vectors that consists of computing thegeodesic angle between them. In other words, consider the intersection ofvect(u,v) and the M-dimensional sphere. This is a circle where u and v arelocated. The angle between the two points is written θu,v . It is a distance on

Page 35: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2380 M. Galtier, O. Faugeras, and P. Bressloff

the sphere and thus satisfies the triangular inequality:

θu,v ≤ θu,w + θw,v. (A.20)

Actually, all these angles belong to [0, π2 [ because u, v, and w only have

positive coefficients.Observe that g−1(u.vT ) = g−1

(cos(θu,v )

)and separate the two cases for

the choice of function g:

1. If g(x) = a(1− x

λ2

)with a ≥ 1, then g−1(x) = λ2

(1− x

a

). Therefore, de-

fine h1 : x �→ λ

√1− cos(x)

a . We now want to apply h1 to equation A.20,but h1 is monotonic only if x ≤ π

2 . Therefore, divide equation A.20 by2 and apply h1 on both sides to get

h1

(θu,v

2

)≤ h1

(12θu,w +

12θw,v

).

Now consider the function ηa(x, y) = h1(x+ y)− h1(x)− h1(y) forx, y ∈ [0, π

4 [. Because, h1 is increasing, it is clear that∂ηa∂x (x, y) ≤ 0 (and

similarly for y), such that ηa reaches its maximum for x = y = π4 . Be-

sides, ηa(π4 , π

4 ) ≤ η1(π4 , π

4 ) < 0. This proves that 2h1(12θu,w + 1

2θw,v ) ≤h1(θu,w )+ h1(θw,v ). Moreover, it is easy to observe that 2h1(

x2 ) ≥ h1(x)

for all a > 1. This concludes the proof for g(x) = a(1− x

λ2

).

2. If g(x) = ae−x, then g−1(x) = λ2ln( ax ). As before, define h2 : x �→

λ√

ln( acos(x)

). We still want to apply h2 to equation A.20, but h2 is not de-

fined for x > π2 , which is likely for the right-hand side of equation A.20.

Therefore, we apply h2 to equation A.20, divided by two, and use the

fact that h2 is increasing on [0, π2 [. This leads to h2(

θu,v

2 ) ≤ h2(12θu,w +

12θw,v ). First, we use the convexity of h2 to get 2h2(

θu,v

2 ) ≤ h2(θu,w )+h2(θw,v ), and then we use the fact that 2h2(

x2 ) ≥ h2(x) for x ∈ [0, δ[, with

δ < π2 . This would conclude the proof, but we have to make sure the

angles remain in [0, δ[. Actually, we can compute δ, which veri-fies 2h2(

δ2 ) = h2(δ). This leads to δ = 2arccos(

√a3 −√a6 − a3). In fact,

the coefficients of u, v, and w are strictly positive and larger thanS(0). Therefore, the angles between them are strictly smaller thanπ2 . More precisely, θu,v ∈ [0, π

2 − 2arcsin(S(0)

)[. Therefore, a neces-

sary condition for the result to be true is 2arccos(√

a3 −√a6 − a3) ≥π2 − 2arcsin

(S(0)

), using the fact that arccos(x) = π

2 − arcsin(x) leads

to arcsin(S(0)

)− arcsin(√

a3 −√a6 − a3) ≥ π8 .

Acknowledgments

M.N.G. and O.D.F. were partially funded by ERC advanced grant NerVi nb227747. M.N.G. was partially funded by the region PACA, France. This letter

Page 36: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2381

was based on work supported in part by the National Science Foundation(DMS-0813677) and by Award KUK-C1-013-4 made by King Abdullah Uni-versity of Science and Technology. P.C.B. was also partially supported bythe Royal Society Wolfson Foundation. This work was partially supportedby the IP project BrainScales No. 269921.

References

Amari, S. (1998). Natural gradient works efficiently in learning. Neural Computation,10(2), 251–276.

Amari, S., Kurata, K., & Nagaoka, H. (1992). Information geometry of Boltzmannmachines. IEEE Transactions on Neural Networks, 3(2), 260–271.

Bartsch, A., & Van Hemmen, J. (2001). Combined Hebbian development of genicu-locortical and lateral connectivity in a model of primary visual cortex. BiologicalCybernetics, 84(1), 41–55.

Bi, G., & Poo, M. (2001). Synaptic modification by correlated activity: Hebb’s postu-late revisited. Annual Review of Neuroscience, 24, 139.

Bienenstock, E., Cooper, L., & Munro, P. (1982). Theory for the development of neu-ron selectivity: Orientation specificity and binocular interaction in visual cortex.J. Neurosci., 2, 32–48.

Borg, I., & Groenen, P. (2005). Modern multidimensional scaling: Theory and applications.New York: Springer-Verlag.

Bosking, W., Zhang, Y., Schofield, B., & Fitzpatrick, D. (1997). Orientation selectivityand the arrangement of horizontal connections in tree shrew striate cortex. Journalof Neuroscience, 17(6), 2112–2127.

Bressloff, P. (2005). Spontaneous symmetry breaking in self-organizing neural fields.Biological Cybernetics, 93(4), 256–274.

Bressloff, P. C., & Cowan, J. D. (2003). A spherical model for orientation and spatialfrequency tuning in a cortical hypercolumn. Philosophical Transactions of the RoyalSociety B, 358, 1643–1667.

Bressloff, P., Cowan, J., Golubitsky, M., Thomas, P., & Wiener, M. (2001). Geometricvisual hallucinations, Euclidean symmetry and the functional architecture ofstriate cortex. Phil. Trans. R. Soc. Lond. B, 306(1407), 299–330.

Brewer, J. (1978). Kronecker products and matrix calculus in system theory. IEEETransactions on Circuits and Systems, 25(9), 772–781.

Chklovskii, D., Schikorski, T., & Stevens, C. (2002). Wiring optimization in corticalcircuits. Neuron, 34(3), 341–347.

Chossat, P., & Faugeras, O. (2009). Hyperbolic planforms in relation to visual edgesand textures perception. PLoS Computational Biology, 5(12), 367–375.

Cohen, M., & Grossberg, S. (1983). Absolute stability of global pattern formation andparallel memory storage by competitive neural networks. IEEE Transactions onSystems, Man, and Cybernetics, SMC, 13, 815–826.

Coifman, R., Maggioni, M., Zucker, S., & Kevrekidis, I. (2005). Geometric diffusionsfor the analysis of data from sensor networks. Current Opinion in Neurobiology,15(5), 576–584.

Coombes, S. (2005). Waves, bumps, and patterns in neural field theories. BiologicalCybernetics, 93(2), 91–108.

Page 37: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

2382 M. Galtier, O. Faugeras, and P. Bressloff

Cottet, G. (1995). Neural networks: Continuous approach and applications to imageprocessing. Journal of Biological Systems, 3, 1131–1139.

Dayan, P., & Abbott, L. (2001). Theoretical neuroscience: Computational and mathematicalmodeling of neural systems. Cambridge, MA: MIT Press.

Degond, P., & Mas-Gallic, S. (1989). The weighted particle method For convection-diffusion equations: Part 1: The case of an isotropic viscosity. Mathematics ofComputation, 53, 485–507.

Dong, D., & Hopfield, J. (1992). Dynamic properties of neural networks with adaptingsynapses. Network: Computation in Neural Systems, 3(3), 267–283.

Edwards, R. (1996). Approximation of neural network dynamics by reaction-diffusion equations. Mathematical Methods in the Applied Sciences, 19(8), 651–677.

Faugeras, O., Grimbert, F., & Slotine, J.-J. (2008). Abolute stability and completesynchronization in a class of neural fields models. SIAM J. Appl. Math, 61(1),205–250.

Foldiak, P. (1991). Learning invariance from transformation sequences. Neural Com-putation, 3(2), 194–200.

Geman, S. (1979). Some averaging and stability results for random differential equa-tions. SIAM J. Appl. Math, 36(1), 86–105.

Gerstner, W., & Kistler, W. M. (2002). Mathematical formulations of Hebbian learn-ing. Biological Cybernetics, 87, 404–415.

Hebb, D. (1949). The organization of behavior: A neuropsychological theory. Hoboken, NJ:Wiley.

Hubel, D. H., & Wiesel, T. N. (1977). Functional architecture of macaque monkeyvisual cortex. Proc. Roy. Soc. B, 198, 1–59.

Khalil, H., & Grizzle, J. (1996). Nonlinear systems. Upper Saddle River, NJ: PrenticeHall.

Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.

Miikkulainen, R., Bednar, J., Choe, Y., & Sirosh, J. (2005). Computational maps in thevisual cortex. New York: Springer.

Miller, K. (1996). Synaptic economics: Competition and cooperation in synapticplasticity. Neuron, 17, 371–374.

Miller, K. D., Keller, J. B., & Stryker, M. P. (1989). Ocular dominance column devel-opment: Analysis and simulation. Science, 245, 605–615.

Miller, K., & MacKay, D. (1996). The role of constraints in Hebbian learning. NeuralComp, 6, 100–126.

Oja, E. (1982). A simplified neuron model as a principal component analyzer. J. Math.Biology, 15, 267–273.

Ooyen, A. (2001). Competition in the development of nerve connections: A reviewof models. Network: Computation in Neural Systems, 12(1), 1–47.

Petitot, J. (2003). The neurogeometry of pinwheels as a sub-Riemannian contactstructure. Journal of Physiology–Paris, 97(2–3), 265–309.

Sejnowski, T., & Tesauro, G. (1989). The Hebb rule for synaptic plasticity: Algorithmsand implementations. In J. H. Byrne & W. O. Berry (Eds.), Neural models of plasticity:Experimental and theoretical approaches (pp. 94–103). Orlando, FL: Academic Press.

Swindale, N. (1996). The development of topography in the visual cortex: A reviewof models. Network: Computation in Neural Systems, 7(2), 161–247.

Page 38: Hebbian Learning of Recurrent Connections: A Geometrical ...bresslof/publications/12-3.pdf · Hebbian Learning of Recurrent Connections 2349 Based on the theory of information geometry,

Hebbian Learning of Recurrent Connections 2383

Takeuchi, A., & Amari, S. (1979). Formation of topographic maps and columnarmicrostructures in nerve fields. Biological Cybernetics, 35(2), 63–72.

Tikhonov, A. (1952). Systems of differential equations with small parameters multi-plying the derivatives. Matem. sb, 31(3), 575–586.

Verhulst, F. (2007). Singular perturbation methods for slow-fast dynamics. NonlinearDynamics, 50(4), 747–753.

Wallis, G., & Baddeley, R. (1997). Optimal, unsupervised learning in invariant objectrecognition. Neural Computation, 9(4), 883–894.

Zucker, S., Lawlor, M., & Holtmann-Rice, D. (2011). Third order edge statistics revealcurvature dependency. Journal of Vision, 11, 1073.

Received February 1, 2011; accepted March 1, 2012.


Recommended