NN_ReadingCourse/249211
(TENTATIVE) PLAN OF THE COURSE Introduction Chapter 1: Basics
of statistical mechanics
The Curie-Weiss model Chapter 2: Neural networks for associative
memory and pattern recognition Chapter 3: The Hopfield model
Hopfield model with low-load and solution via log-constrained
entropy Self-average, spurious states, phase diagram Hopfield model
with high-load and solution via stochastic stability
Chapter 4: Beyond the Hebbian paradigma Chapter 5: A gentle
introduction to machine learning
Chapter 7: A few remarks on deep learning, “complex” patterns, and
outlooks Multilayered Boltzmann machines and deep learning. Mapping
Restricted Bolzmann machines and Hopfield networks
Seminars: Numerical tools for machine learning; Non-mean-field
neural networks; (Bio-)Logic gates; Maximum entropy approach,
Hamilton-Jacobi techniques for mean-field models, …
212
The machine starts from scratch It takes 24h training Input layer:
current screen output layer: button to push When nothing changes
for too long the program is stopped and started again
A skilled player…
/249214
The neuronal interaction from an electrical perspective
A neuron transports its information by way of a nerve impulse
called an action potential.
When an action potential arrives at the synapse's presynaptic
terminal button, it
may stimulate the release of neurotransmitters. These are released
into the synaptic cleft to bind onto the receptors
of the postsynaptic membrane and influence another cell, either in
an inhibitory or
excitatory way.
Neurons interact at contact points called synapses: a junction
within two nerve cells, consisting of a miniature gap within which
impulses are carried by a neurotransmitter.
/249215
synaptic vescicles to release neurotransmitter molecules. These
molecules diffuse from the
presynaptic terminal across the synaptic cleft and bind to their
receptor sites on the ligand-gated sodium
ion (Na+) channles. This causes the ligand-gates sodium ion
channels to open and sodium ions diffuse
into the cell, making the membrane potential more positive. If the
membrane potential reaches threshold
level, an acption potential will be produced.
There exist different kinds of neurotransmitters, each associated
with different functions and possible pathologies
/249217
This model, also known as spiking neuron model, is a mathematical
description of the properties of neurons (and other cells in the
nervous system) that generate sharp electrical potentials
Biological neuron model
Biological neuron models aim to explain the mechanisms underlying
the operation of the nervous system for the purpose of restoring
lost control capabilities. Unlike the “artificial neuron” models,
biological neuron models allows experimental validation, and the
use of physical units to describe the experimental procedure
associated with the model predictions.
As for the relationship between neuronal membrane currents at the
input stage and membrane voltage at the output stage, the most
extensive experimental inquiry was made by Hodgkin–Huxley in the
early 1950s using an experimental setup that punctured the cell
membrane and allowed to force a specific membrane voltage/current.
(Nobel Prize in Physiology and Medicine 1963)
/249218
It models each neuron as a leaky capacitor with membrane resistance
Rm, membrane capacitance Cm and resting potential EL. Below the
action potential threshold, the voltage of this capacitor decays
(or “leaks”) to the resting level EL:
Integrate-and-fire model
Indeed, the exact shape of the action potential does not matter
here: since all action potentials sent down the axon are to a good
approximation identical, the only informative feature of a neuron’s
spiking is the times at which the action potentials occur.
where I is the injection current. Realistic values for the
parameters are EL=-70 mV, Rm=10 MΩ, and Cm= 50 μF, V(t=0)=EL. To
model the spiking of the neuron when it reaches threshold, one
assumes that when the membrane potential reaches Vth=-55 mV, the
neuron fires a spike and then resets its membrane potential to
Vreset=-75 mV.
Cm dVm(t)
/249219
(a) Leaky integrate-and-fire neuron circuit model, (b) For input (I
< Ith), Vm(t) never exceeds Vth- hence neuron never spikes.
However, for I ≥ Ith, neuron will fire when Vm(t) ≥ Vth and
immediately reset i.e. Vm(t) = EL, (c) With higher input (e.g. I ≥
Ith), firing rate or the frequency increases like a biological
neuron while for low input (I < Ith), frequency is zero. The
output frequency (fO) vs. input is the signature neuronal function
to be mimicked artificially.
I V
V Ith
( I )
Analytical insight into the firing activity of the noisy neuron
Estimate spike density, role of topology, mean time taken to reach
an absorbing boundary, etc.
/249220
Overall input current on a neuron is assumed as a Poissonian
process NI,E = number of active synapses connected to the neuron
λI,E = firing rate wI,E = magnitute of input
HC Tuckwell, Introduction to theoretical Neurobiology, (Cambridge
University Press, Cambridge, 1988). HC Tuckwell, Stochastic
Processes in the Neurosciences, CBMS-NSF Conference Series in App.
Math. (1989).
A. Schematic illustration for the network model: individual cells
are connected via excitatory (red) and inhibitory (blue) synaptic
connections. B. Synaptic connectivity matrix. Weights are randomly
distributed around a mean value g=−10mV/Hz. C. Sample network
activity. D. Power spectral density of the network mean
activity.
A. Hutt, A. Mierau, J. Lefebvre, PLoNE (2016)
Stein’s model
Ornstein-Uhlenbeck (OU) processes
= q w2
/249
One of the central goals of research in neuroscience is to
understand how the biophysical properties of neurons and neuronal
organization combine to provide such impressive computing power and
speed. An understanding of biological computation may also lead to
solutions for related problems in robotics and data processing
using non-biological hardware and software.
Conventional silicon integrated circuits
Neural computation circuits
Each logic gate typically obtains inputs from two or three others,
and a huge number of independent binary decisions are made in the
course of a computation
Each non-linear neural processor (neuron) gets input from hundreds
or thousands of others and a collective solution is computed on the
basis of the simultaneous interaction of thousands of
devices.221
/249222
Each amplifier j has an input resistor ρj leading to a reference
ground and an input capacitor Cj.
Amplifiers have sigmoid monotonic input-output relations. The
function Vj=gj(uj) characterizes this input-output relation: it
describes the output voltage of amplifier Vj due to an input
voltage uj.
The processing elements, or “neurons”, are modeled as amplifiers in
conjunction with feedback circuits comprised of wires, resistors
and capacitors organized so as to model the most basic
computational features o neurons, i.e., axons, dendritic
arborization, and synapses connecting different neurons.
/249
In order to provide for both excitatory and inhibitory synaptic
connections between neurons, each amplifier is given two outputs, a
normal (+) output and an inverted (-) output
A synapse between two neurons is defined by a conductance Tij which
connects one the two outputs of amplfier j to the input of
amplifier i. This connection is made with a resistor of value
Rij=1/|Tij|. If the synapse is excitarory (Tij>0), this resistor
is connected to the normal (+) output of the amplifier j and vice
versa.
The net input current to any neuron i (and hence the input voltage
ui) is the sum of the currents flowing through the set of resistors
connecting its input to the outputs of the other neurons.
The circuit also includes an externally supplied input current Ii
for each neuron. These inputs can be used to set the general level
of excitability of the network through constant biases, which
effectively shift the input-output relation along the ui
axis.
223
/249
The equations describing the time evolution of this circuit
is:
Ri is a parallel combination of ρi and the Rij:
For simplicity, set i.e., independent of i (but this is not
necessary)
Posing , the equations become
For an "initial-value" problem, this equation provides a full
description of the time evolution of the state of the circuit.
Integration of this equation allows any hypothetical network to be
simulated.
Ci(dui/dt) = NX
j=1
Tij = Tij/C, Ii = Ii/C
224
/249
For a network with symmetric connections (Tij = Tji) these
equations always lead to a convergence to stable states, in which
the outputs of all neurons remain constant (Hopfield, 1984). Also,
when the width of the amplifier gain curve g(u) is narrow - the
high-gain limit - the stable states of a network comprised of N
neurons are the local minima of the quantity
The state space over which the circuit operates is the interior of
the N-dimensional hypercube defined by Vi ∈ {0, 1}. However, in the
high-gain limit, the minima only occur at corners of this space →
stable states correspond to those locations in the discrete space
consisting of the 2N corners of this hypercube which minimize the
cost function E.
E = 1/2 NX
ViIi
A. Energy-terrain contour map for the flow map shown in B. B.
Typical flow map of neural dynamics for the circuit considered,
with symmetric connections (Tij = Tji) C. More complicated dynamics
that can occur for unrestricted Tij. Limit cycles are
possible.
high-gain limit
/249226
E Agliari, A Barra, L Dello Schiavo, A Moro,
Complete integrability of information processing by
biochemical reactions, Sci.Rep.(2016) E Agliari et al., Notes on
stochastic (bio)-logic gates: the role of allosteric cooperativity,
Sci. Rep. (2015) E Agliari et al., Collective behaviours: from
biochemical kinetics to electronic circuits, Sci. Rep. (2013)
From the ’60 to the ’90, “universality” has been a keyword in the
SM literature of phase transitions and it was meant to highlight
the robust, structural analogies that several (very disparate)
systems share “close to criticality”. In recent years, with the
extension in the applicability range of SM (covering widespread
subjects as biological networks, economical problems, material
sciences, etc.), we are discovering a novel class of universal
behavios: the main patterns through which systems process
information seem to be very similar.
“Universality reloaded”
b) ferromagnet external field magnetization self consistency
c) cortical neuron afferent current spike intensity reponse
function
d) chemical reaction
/249
E Agliari, A Barra, L Dello Schiavo, A Moro,
Complete integrability of information processing by
biochemical reactions, Sci.Rep.(2016) E Agliari et al., Notes on
stochastic (bio)-logic gates: the role of allosteric cooperativity,
Sci. Rep. (2015) E Agliari et al., Collective behaviours: from
biochemical kinetics to electronic circuits, Sci. Rep. (2013)
/249228
E Agliari, A Barra, L Dello Schiavo, A Moro,
Complete integrability of information processing by
biochemical reactions, Sci.Rep.(2016) E Agliari et al., Notes on
stochastic (bio)-logic gates: the role of allosteric cooperativity,
Sci. Rep. (2015) E Agliari et al., Collective behaviours: from
biochemical kinetics to electronic circuits, Sci. Rep. (2013)
… towards (bio)-logical stochastic computation…
/249229
J as a black-box storing information Let us consider a neural
network made of N=4 neurons an P=2 patterns given by
ξ1 = (-1, +1, +1, -1) ξ2 = (+1, -1, +1, -1)
and, recalling Jij = ∑μ ξiμ ξjμ, we get (neglecting
normalization)
that is, the Hebbian matrix looks like
This corresponds to two flip-flops connected by and AND gate: J is
storing information in terms of constraints (this is way the
compression P<N).
Clause 1: σ1 misaligned (i.e., anti-correlated) with σ2 AND
Clause 2: σ3 misaligned (i.e., anti-correlated) with σ4
J =
2
664
J11 J12 J13 J14 J21 J22 J23 J24 J31 J32 J33 J34 J41 J42 J43
J44
3
775 =
2
664
0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0
3
775
/249
The TSP problem
"Given a list of cities and the distances between each pair of
cities, what is the shortest possible route that visits each city
and returns to the origin city?" np-hard problem Solution consists
of an ordered list of n cities such that the total path length d of
the closed tour is the lowest possible.
To “map” this problem onto the computational network, we require a
representation scheme which allows the digital output states of the
neurons (operating in the high-gain limit) to be decoded into this
list.
n cities n2 neurons
Example n=5
Sequence: C, A, E, B, D Total length: d = dCA+
dAE+dEB+dBD+dDC
n! states of this form n-fold degeneracy (initial city) 2-fold
degeneracy (tour order)
⇒ n!/2n distinct paths for closed TSP routes
230
/249
X
i
dXY VXi(VY,i+1 + VY,i1) this contribution is non- null if there are
two or more non-null entries in the same city row
this contribution is non-null if there are two or more non-null
entries in the same position column
this contribution is non-null if any city or any position is not
covered
this contribution grows with the path overall distance
E = 1/2 NX
ViIi
TXi,Y j = AXY (1 ij)Bij(1 XY ) C DXY (j,i+1 + j,i1)
IXi = Cn excitation bias
global inhibition data term
ViIi
TXi,Y j = AXY (1 ij)Bij(1 XY ) C DXY (j,i+1 + j,i1)
The convergence of the 10-city analog circuit to a tour. The linear
dimension of each square is proportional to the value of Vxi. a, b,
c intermediate times, d the final state. The indices in d
illustrate how the final state is decoded into a tour (solution of
TSP)
232
/249
Outlooks
Retrieval Improve the performance of the network Role of the neural
network topology Relax hp’s towards a more realistic model
Learning Quest for a more rigorous and foundamental understanding
Solving tasks that are easy for people to perform but hard for
people to describe formally e.g., (informal) language
translation
H. Sompolinsky (1986) Neural networks with nonlinear synapses and a
static noise, Phys. Rev. A B Wemmenhove, ACC Coolen (2003) Finite
connectivity attractor neural networks, J. Phys. A
M. Mezard (2017) Mean-field message-passing equations in the
Hopfield model and its generalizations, Phys. Rev. E J. Tubiana, R.
Monasson (2017) Emergence of Compositional Representations in
Restricted Boltzmann Machines, Phys. Rev. Lett. E. Agliari, A.
Barra, A. Galluzzi, F. Guerra, F. Moauro (2012) Multitasking
Associative Networks, Phys. Rev. Lett. E. Agliari et al. (2015)
Retrieval Capabilities of Hierarchical Networks: From Dyson to
Hopfield, Phys. Rev. Lett.
233
/249
Extensions of the Hebbian kernel
M. Griniasty, M.V. Tsodyks, D.J. Amit (1993) Conversion of Temporal
Correlations Between Stimuli to Spatial Correlations Between
Attractors, Neur. Comp. D.J. Amit, N. Brunel, M.V. Tsodyks (1994)
Correlations of Cortical Hebbian Reverberations: Theory versus
Experiment, J. Neurosci. L. Cugliandolo, M.V. Tsodyks (1994)
Capacity of networks with correlated attractors, J. Phys. A E.
Agliari et al. (2013) Parallel retrieval of correlated patterns:
From Hopfield networks to Boltzmann machines, Neur. Net.
Pattern correlation The Hebbian coupling Jij=ξiξj can be
generalized in order to include possibly more complex combinations
among patterns. For instance
where X is a symmetric matrix; of course, by taking X =I we recover
the Hebb coupling. A particular choice was introduced to account
for temporal correlations
Jij = 1
a ∈ +
Jij = 1
mμ = 0, ∀μ
/249
A more biological topology Introduce a metric and arrange neurons
in a modular way (still connected!) Modulate the coupling matrix
according to the metric
Different modules can perform different tasks simultaneously
Sequential Parallel retrieval
Getting closer to biology we enrich the emergent phenomenology in
the right way!
E. Agliari, A. Barra, A. Galluzzi, F. Guerra, F. Moauro (2012)
Multitasking Associative Networks, Phys. Rev. Lett. E. Agliari et
al. (2015) Retrieval capabilities of Hierarchical Networks: from
Dyson to Hopfield, Phys. Rev. Lett.
235
/249236
Left Right
Dyson network deterministically and recursively built complete,
weighted graph endowed with a metric
E. Agliari, A. Barra, A. Galluzzi, F. Guerra, F. Moauro (2012)
Multitasking Associative Networks, Phys. Rev. Lett. E. Agliari et
al. (2015) Retrieval capabilities of Hierarchical Networks: from
Dyson to Hopfield, Phys. Rev. Lett.
/249
4
6
8
10
12
14
16
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
k ! 1k finite 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
4
8
12
16
4
6
8
10
12
14
16
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.02
0.04
0.06
0.08
0.1
0.02
0.04
0.06
0.08
0.1
0.02
0.04
0.06
0.08
0.1
0.02
0.04
0.06
0.08
0.1
0.02
0.04
0.06
0.08
0.1
Jij = kX
l=dij
/249
238
Experience: set of examples (x,y) drawn from an unknown
distribution q(x,y) Learning: adopting weights {Jij} so that for a
given input x we can get infos about y according to (an
approximation of) q(x,y)
Most effective RBMs display a Gaussian layer
Two-layer Boltzmann machine: ask and read from the visible
layer
/249
σi
σj
E. Agliari, A. Barra (2011) A Hebbian approach to complex-network
generation, Europhys. Lett. E. Agliari, A. Barra, A. De Antoni, A.
Galluzzi (2013) Parallel retrieval of correlated patterns: From
Hopfield networks to Boltzman machines, Neur. Net. E. Agliari, A.
Barra, A. Galluzzi, D. Tantari, F. Tavani (2014) A Walks in the
Statistical Mechanical Formulation of Neural Network - Alternative
Routes to Hebb Prescription, NCTA
239
Equivalence of Hopfield nets and restricted Boltzmann
machines
Bipartite spin-glass Digital visible neurons, σi = ±1, ∀ i = 1, …,
N Analog hidden neurons, zμ Gaussian, ∀ μ =1, …, P Iterlayer
coupling ξiμ ~ 1/2 [δ(ξiμ -1 ) + δ(ξiμ +1)]
σi
zν
Hopfield model on a complete graph Digital neurons σi = ±1, ∀ i =
1, …, N
Hebbian coupling
Jij = 1
/249240
The set of couplings {ξμi} encodes for learnt patterns {ξμ} There
exists a performance limit for RBMs: N>P
HRBM(, z|) = 1p N
2N
NX
/249241
The analog neurons in the hidden layer change continuously in time
and their activity can be described by a SDE in analogy with the
integrate-and-fire model
T dzµ(t)
dt = zµ(t) +
Pr(zµ|, ) = r
µ=1
Pr(zµ|, )
The activity of digital neurons in the visible layer follows a
Glauber dynamics
Pr(|z, ) = NY
PP µ=1 zµ
PP µ=1 zµ
/249242
PP µ=1 zµ
PP µ=1 zµ
The joint probability can be found exploiting Bayes’ formula
In particular,
µ i
µ j
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
d
m
m1 m2 m3 m4 m5 m6
Agliari, et al. (2012) “Multitasking attractor networks with
neuronal threshold noise”, Neur. Net. Agliari, et al. (2013)
“Multi-tasking capabilities at medium load”, J. Phys. A Agliari, et
al. (2013) “Multitasking capabilities near saturation”, J. Phys. A
Agliari, et al., (2017) “Retrieving Infinite Numbers of Patterns in
a Spin-Glass Model of Immune Networks”, Europhys. Lett.
Once ξ1 retrieved (m1=1-d), it is convenient to coordinate free
spins to align with the next pattern, say ξ2 , instead of letting
them align randomly
ξ1
ξ2
P(ξiμ = +1) = P(ξiμ = -1) = (1-d)/2 P(ξiμ = 0) = d d>0, d
finite
When dilution scales with N, d = 1 - c/Nγ
the topology of the relating network, and retrieval capacity are
affected
243
/249244
Below the percolation threshold Graph is fragmented into cliques
Each clique corresponds to a different pattern, i.e. to a different
clone
NT = 104, α = 0.1, δ = γ γ = 0.9 (left panel) and γ = 0.8 (right
panel).
Isolated nodes (8856 and 8913, respectively) are omitted
T
BT
T
T
giovedì 25 aprile 2013
Below the percolation threshold Graph is fragmented into cliques
Each clique corresponds to a different pattern, i.e. to a different
clone
NT = 104, α = 0.1, δ = γ γ = 0.9 (left panel) and γ = 0.8 (right
panel).
Isolated nodes (8856 and 8913, respectively) are omitted
T
BT
T
T
giovedì 25 aprile 2013
Above the percolation threshold Graph forms complex components
Different T cells share several B cells signal interference
NT = 104, α = 0.1, δ = 1
γ = 0.9 (left panel) and γ = 0.8 (right panel).
Isolated nodes (6277 and 6487, respectively) are omitted T
BT
T
emerge, modularity progressively decays and a giant component
eventually appears hinder parallel retrieval!
γ >
δ
giovedì 25 aprile 2013
Bipartite graph G2, made up of NT and NB nodes, with limNB→∞ NB/NT
determines the load regime
P (µi |d) = 1 d
2 µi 1,0 +
Coupling in G2 provided by {ξiμ} iid from
After marginalization, monopartite graph G1, with NT nodes that
interact pairwise through the coupling matrix
Jij = NBX
µ=1
/249245
2 µi 1,0 +
d = 1 c
/249246
From a graph theory perspective: γ ≥ δ G1 fragmented into multiple
disconnected components, each forming a clique or a collection of
cliques connected via a bridge. Each clique corresponds to a
pattern simultaneous recall of multiple patterns allowed
From a statistical mechanics perspective: Load (i.e., NB/NT) grows,
i.e. when δ grows source of non-Gaussian interference noise that is
non-negligible for γ ≤ δ. If δ = γ system still able to retrieve
all the patterns, but with a decreasing recall overlap.
γ < δ G1 can exhibit a giant component, which prevents the
system from simultaneous pattern recall.
Low/Medium storage case NB ~ NTδ , δ<1
/249247
High storage case: NB = α NT, α >0 NB ~ NTδ → the case δ = γ is
borderline: δ = 1 Gaussian noise due to non-condensed patterns, and
this is found to destroy the retrieval states. Here the system
behaves as a spin-glass extreme diluted for G1 is insufficient to
sustain a high pattern load.
αc2 < 1 typical components in G1 are finite-sized, and form
cliques whose occurrence frequency decays exponentially with their
size. Each clique corresponds to a pattern; this arrangement allows
for the simultaneous recall of multiple patterns. αc2 > 1 G1
exhibits giant component, which can compromise the system’s
parallel processing ability
RS ansatz → critical surface Tc(α;c) that separates two distinct
phases. T>Tc: each subsystem behaves as a paramagnet T<Tc:
each subsystem retrieves one particular pattern (or its inverse),
representing parallel retrieval (perfectly at zero temperature) of
an extensive number of patterns. Tc(α,1/√α) = 0 ∀α ≥ 0 , so for αc2
> 1 no transition at finite temperature away from this phase is
possible.
/249
Any new layer improves the performance of the neural network,
focusing on finer details
A two-layer RBM corresponds to a Hopfield model with two- and
one-body interactions (ME: first and second moments) and
accomplishes learning once si sj and si are recovered.
248
Idea… A p-layer RBM corresponds to a Hopfield model with up to
p-body interactions (ME: up to the p- th moment) and accomplishes
learning once si 1si2…sip are recovered, therefore we can describe
a richer (more than Gaussian) reality!
/249
249