+ All Categories
Home > Documents > X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support...

X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support...

Date post: 15-Jan-2016
Category:
Upload: madison-young
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
144
x 0 x n w 0 w n o i n i i x w 0 o/w 0 and 0 if 1 0 i n i i x w o Threshold units
Transcript
Page 1: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

x0

xn

w0

wn

oi

n

iixw

0

o/w 0 and 0 if 10

i

n

iixwo

Threshold units

Page 2: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

History spiking neural networks

Vapnik (1990) ---support vector machine

Broomhead & Lowe (1988) ----Radial basis functions (RBF)

Linsker (1988) ----- Informax principle

Rumelhart, Hinton -------- Back-propagation & Williams (1986)

Kohonen(1982) ------ Self-organizing mapsHopfield(1982) ------ Hopfield Networks

Minsky & Papert(1969) ------ Perceptrons

Rosenblatt(1960) ------ Perceptron

Minsky(1954) ------ Neural Networks (PhD Thesis)

Hebb(1949) --------The organization of behaviour

McCulloch & Pitts (1943) -----neural networks and artificial intelligence were born

Page 3: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

History of Neural Networks

• 1943: McCullough and Pitts - Modeling the Neuron for Parallel Distributed Processing

• 1958: Rosenblatt - Perceptron• 1969: Minsky and Papert publish limits

on the ability of a perceptron to generalize

• 1970’s and 1980’s: ANN renaissance• 1986: Rumelhart, Hinton + Williams

present backpropagation• 1989: Tsividis: Neural Network on a chip

Page 4: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

William McCulloch

Page 5: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Neural Networks

• McCulloch & Pitts (1943) are generally recognised as the designers of the first neural network

• Many of their ideas still used today (e.g. many simple units combine to give increased computational power and the idea of a threshold)

Page 6: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Neural Networks

• Hebb (1949) developed the first learning rule (on the premise that if two neurons were active at the same time the strength between them should be increased)

Page 7: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 8: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Neural Networks

• During the 50’s and 60’s many researchers worked on the perceptron amidst great excitement.

• 1969 saw the death of neural network research for about 15 years – Minsky & Papert

• Only in the mid 80’s (Parker and LeCun) was interest revived (in fact Werbos discovered algorithm in 1974)

Page 9: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

How Does the Brain Work ? (1)

NEURON• The cell that perform information

processing in the brain• Fundamental functional unit of

all nervous system tissue

Page 10: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

How Does the Brain Work ? (2)

Each consists of : SOMA, DENDRITES, AXON, and SYNAPSE

Page 11: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Biological neurons

axon

dendrites

dendrites

synapse

cell

Page 12: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Neural Networks

• We are born with about 100 billion neurons

• A neuron may connect to as many as 100,000 other neurons

Page 13: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Biological inspiration

Dendrites

Soma (cell body)

Axon

Page 14: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Biological inspiration

synapses

axon dendrites

The information transmission happens at the synapses.

Page 15: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Biological inspiration

The spikes travelling along the axon of the pre-synaptic neuron trigger the release of neurotransmitter substances at the synapse.

The neurotransmitters cause excitation or inhibition in the dendrite of the post-synaptic neuron.

The integration of the excitatory and inhibitory signals may produce spikes in the post-synaptic neuron.

The contribution of the signals depends on the strength of the synaptic connection.

Page 16: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Biological Neurons

• human information processing system consists of brain neuron: basic building block

– cell that communicates information to and from various parts of body

• Simplest model of a neuron: considered as a threshold unit –a processing element (PE)

• Collects inputs & produces output if the sum of the input exceeds an internal threshold value

Page 17: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Artificial Neural Nets (ANNs)

• Many neuron-like PEs units– Input & output units receive and broadcast signals to the

environment, respectively

– Internal units called hidden units since they are not in contact with external environment

– units connected by weighted links (synapses)

• A parallel computation system because– Signals travel independently on weighted channels & units

can update their state in parallel– However, most NNs can be simulated in serial computers

• A directed graph, with labeled edges by weights is typically used to describe the connections among units

Page 18: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Each processing unit has a simple program that: a) computes a weighted sum of the input data it receives from those units which feed into it b) outputs of a single value, which in general is a non-linear function of the weighted sum of the its inputs ---this output then becomes an input to those units into which the original units feeds

activationlevel

A NODE

inig

ai

inputfunctionactivation function

output

input linksoutputlinks

aj Wj,i

ai = g(ini)

Page 19: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

g = Activation functions for units

Step function(Linear Threshold Unit)

Sign function Sigmoid function

step(x) = 1, if x >= threshold 0, if x < threshold

sign(x) = +1, if x >= 0 -1, if x < 0

sigmoid(x) = 1/(1+e-x)

Page 20: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Real vs artificial neurons

axon

dendrites

dendrites

synapse

cell

x0

xn

w0

wn

oi

n

iixw

0

o/w 0 and 0 if 10

i

n

iixwo

Threshold units

Page 21: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Artificial neurons

Neurons work by processing information. They receive and provide information in form of spikes.

The McCullogh-Pitts model

Inputs

Outputw2

w1

w3

wn

wn-1

. . .

x1

x2

x3

xn-1

xn

y)(;

1

zHyxwzn

iii

Page 22: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Mathematical representation

The neuron calculates a weighted sum of inputs and compares it to a threshold. If the sum is higher than the threshold, the output is set to 1, otherwise to -1.

Non-linearity

Page 23: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

• x1

• x2

• xn

• …

• w1• w2

• …

• wn

i

n

iiwx

1threshold threshold • f

i

n

iin wxxxxf

121 if,1),...,,(

otherwise,0

Artificial neurons

Page 24: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 25: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Basic Concepts

Definition of a node:

• A node is an element which performs the function

y = fH(∑(wixi) + Wb)fH(x)

Input 0 Input 1 Input n...

W0 W1 Wn

+

Output

+

...

Wb

NodeNode

ConnectionConnection

Page 26: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Anatomy of an Artificial Neuron

bias

inputs

h(w0 ,wi , xi )

yf h

y

x1

w1

xi

wi

xn

wn

1

w0 f : activation function

output

h : combine wi & xi

Page 27: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Simple Perceptron

• Binary logic application• fH(x) = u(x) [linear

threshold]• Wi = random(-1,1)

• Y = u(W0X0 + W1X1 + Wb)

• Now how do we train it?

fH(x)

Input 0 Input 1

W0 W1

+

Output

Wb

Page 28: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

• From experience: examples / training data

• Strength of connection between the neurons is stored as a weight-value for the specific connection.

• Learning the solution to a problem = changing the connection weights

Artificial Neuron

An artificial neuron

A physical neuron

Page 29: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Mathematical Representation

bw1

w2

wn

x1

x2

xn

+

b

x0

f(n)..

.

.

ny

Inputs Weights Summation Activation Output

Inputs

Outputw2

w1

wn. .

y1

net b

y f (net)

n

i ii

w x

+x2

xn

b

x1

Page 30: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 31: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 32: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

A simple perceptron

• It’s a single-unit network

• Change the weight by an amount proportional to the difference between the desired output and the actual output.

Δ Wi = η * (D-Y).Ii

Perceptron Learning Rule

Learning rate Desired output

Input

Actual output

Page 33: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Linear Neurons

•Obviously, the fact that threshold units can only output the values 0 and 1 restricts their applicability to certain problems.

•We can overcome this limitation by eliminating the threshold and simply turning fi into the identity function so that we get:

)(net )( tto ii

•With this kind of neuron, we can build networks with m input neurons and n output neurons that compute a function

f: Rm Rn.

Page 34: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Linear Neurons

•Linear neurons are quite popular and useful for applications such as interpolation.

•However, they have a serious limitation: Each neuron computes a linear function, and therefore the overall network function f: Rm Rn is also linear.

•This means that if an input vector x results in an output vector y, then for any factor the input x will result in the output y.

•Obviously, many interesting functions cannot be realized by networks of linear neurons.

Page 35: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Mathematical Representation

nenfa

1

1)(

00

01)(

n

nnfa nnfa )(

2

( ) na f n e

Page 36: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Gaussian Neurons

•Another type of neurons overcomes this problem by using a Gaussian activation function:

•1

•0

•1

ffii(net(netii(t))(t))

netnetii(t)(t)•-1

2

1)(net

))(net(

t

ii

i

etf

Page 37: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Gaussian Neurons

•Gaussian neurons are able to realize non-linear functions.

•Therefore, networks of Gaussian units are in principle unrestricted with regard to the functions that they can realize.

•The drawback of Gaussian neurons is that we have to make sure that their net input does not exceed 1.

•This adds some difficulty to the learning in Gaussian networks.

Page 38: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Sigmoidal Neurons

•Sigmoidal neurons accept any vectors of real numbers as input, and they output a real number between 0 and 1.

•Sigmoidal neurons are the most common type of artificial neuron, especially in learning networks.

•A network of sigmoidal units with m input neurons and n output neurons realizes a network function f: Rm (0,1)n

Page 39: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Sigmoidal Neurons

•The parameter controls the slope of the sigmoid function, while the parameter controls the horizontal offset of the function in a way similar to the threshold neurons.

•1

•0

•1

ffii(net(netii(t))(t))

netnetii(t)(t)•-1

/))(net(1

1))(net(

tii ietf

= 1

= 0.1

Page 40: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Example: A simple single unit adaptive network

• The network has 2 inputs, and one output. All are binary. The output is – 1 if W0I0 + W1I1 + Wb > 0  – 0 if W0I0 + W1I1 + Wb ≤ 0 

• We want it to learn simple OR: output a 1 if either I0 or I1 is 1.

Page 41: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Artificial neurons

The McCullogh-Pitts model:

• spikes are interpreted as spike rates;

• synaptic strength are translated as synaptic weights;

• excitation means positive product between the incoming spike rate and the corresponding synaptic weight;

• inhibition means negative product between the incoming spike rate and the corresponding synaptic weight;

Page 42: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Artificial neurons

Nonlinear generalization of the McCullogh-Pitts neuron:

),( wxfy y is the neuron’s output, x is the vector of inputs, and w is the vector of synaptic weights.

Examples:

2

2

2

||||

1

1

a

wx

axw

ey

ey T

sigmoidal neuron

Gaussian neuron

Page 43: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

NNs: Dimensions of a Neural Network

– Knowledge about the learning task is given in the form of examples called training examples.

– ANN is specified by:– an architecture: a set of neurons and links

connecting neurons. Each link has a weight, – a neuron model: the information processing unit of

the NN,– a learning algorithm: used for training the NN by

modifying the weights in order to solve the particular learning task correctly on the training examples.

The aim is to obtain a NN that generalizes well, that is, that behaves correctly on new instances of the learning task.

Page 44: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Neural Network Architectures

Many kinds of structures, main distinction made between two classes:

a) feed- forward (a directed acyclic graph (DAG): links are unidirectional, no cycles

b) recurrent: links form arbitrary topologies e.g., Hopfield Networks and Boltzmann machines

Recurrent networks: can be unstable, or oscillate, or exhibit chaoticbehavior e.g., given some input values, can take a long time to compute stable output and learning is made more difficult….However, can implement more complex agent designs and can

modelsystems with state

We will focus more on feed- forward networks

Page 45: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 46: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 47: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 48: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 49: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 50: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 51: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 52: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 53: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 54: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 55: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Single Layer Feed-forward

Input layerof

source nodes

Output layerof

neurons

Page 56: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Multi layer feed-forward

Inputlayer

Outputlayer

Hidden Layer

3-4-2 Network

Page 57: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Feed-forward networks:

Advantage: lack of cycles = > computation proceeds uniformly from input units to output units.

-activation from the previous time step plays no part in computation, as it is not fed back to an earlier unit

- simply computes a function of the input values that depends on the weight settings –it has no internal state other than the weights themselves.

- fixed structure and fixed activation function g: thus the functions representable by a feed-forward network are restricted to have acertain parameterized structure

Page 58: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Learning in biological systems

Learning = learning by adaptation

The young animal learns that the green fruits are sour, while the yellowish/reddish ones are sweet. The learning happens by adapting the fruit picking behaviour.

At the neural level the learning happens by changing of the synaptic strengths, eliminating some synapses, and building new ones.

Page 59: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Learning as optimisation

The objective of adapting the responses on the basis of the information received from the environment is to achieve a better state. E.g., the animal likes to eat many energy rich, juicy fruits that make its stomach full, and makes it feel happy.

In other words, the objective of learning in biological organisms is to optimise the amount of available resources, happiness, or in general to achieve a closer to optimal state.

Page 60: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Synapse concept

• The synapse resistance to the incoming signal can be changed during a "learning" process [1949]

Hebb’s Rule: If an input of a neuron is repeatedly and

persistently causing the neuron to fire, a metabolic change happens in the synapse of that particular

input to reduce its resistance

Page 61: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Neural Network Learning

• Objective of neural network learning: given a set of examples, find parameter settings that minimize the error.

• Programmer specifies- numbers of units in each layer - connectivity between units,

• Unknowns- connection weights

Page 62: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Supervised Learning in ANNs

•In supervised learning, we train an ANN with a set of vector pairs, so-called exemplars.

•Each pair (x, y) consists of an input vector x and a corresponding output vector y.

•Whenever the network receives input x, we would like it to provide output y.

•The exemplars thus describe the function that we want to “teach” our network.

•Besides learning the exemplars, we would like our network to generalize, that is, give plausible output for inputs that the network had not been trained with.

Page 63: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Supervised Learning in ANNs

•There is a tradeoff between a network’s ability to precisely learn the given exemplars and its ability to generalize (i.e., inter- and extrapolate).

•This problem is similar to fitting a function to a given set of data points.

•Let us assume that you want to find a fitting function f:RR for a set of three data points.

•You try to do this with polynomials of degree one (a straight line), two, and nine.

Page 64: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Supervised Learning in ANNs

•Obviously, the polynomial of degree 2 provides the most plausible fit.

•f(x)

•x

•deg. 1

•deg. 2

•deg. 9

Page 65: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Overfitting

Overfitted ModelReal Distribution

Page 66: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 67: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 68: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 69: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 70: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Supervised Learning in ANNs

•The same principle applies to ANNs:

• If an ANN has too few neurons, it may not have enough degrees of freedom to precisely approximate the desired function.

• If an ANN has too many neurons, it will learn the exemplars perfectly, but its additional degrees of freedom may cause it to show implausible behavior for untrained inputs; it then presents poor ability of generalization.

•Unfortunately, there are no known equations that could tell you the optimal size of your network for a given application; you always have to experiment.

Page 71: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Learning in Neural Nets

Learning Tasks

Supervised UnsupervisedData:Labeled examples (input , desired output)

Tasks:classificationpattern recognition regressionNN models:perceptron adalinefeed-forward NN radial basis functionsupport vector machines

Data:Unlabeled examples (different realizations of the input)

Tasks:clusteringcontent addressable memory

NN models:self-organizing maps (SOM)Hopfield networks

Page 72: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Learning Algorithms

Depend on the network architecture:

• Error correcting learning (perceptron)• Delta rule (AdaLine, Backprop)• Competitive Learning (Self Organizing

Maps)

Page 73: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Perceptrons

• Perceptrons are single-layer feedforward networks• Each output unit is independent of the others• Can assume a single output unit• Activation of the output unit is calculated by:

• O = Step( )

where xj is the activation of input unit j, and we assume an additional weight and input to represent the threshold

n

j jxjw0

Page 74: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Perceptron

x1

x2

xn

w1

w2

wn

.

.

X0 = 1

w0

n

j jxjw0 > 01 if

-1 otherwise

O =

n

j jxjw0

Page 75: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 76: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Perceptron

Rosenblatt (1958) defined a perceptron to be a machine that learns, using examples, to assign input vectors (samples) to different classes, using linear functions of the inputs

Minsky and Papert (1969) instead describe perceptron as a stochastic gradient-descent algorithm that attempts to linearly separate a set of n-dimensional training data.

Page 77: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Linear Separable

++

+

-

-

-x1

x2

(a)

+

- +

-

x1

x2

some functions not representable - e.g., (b) not linearly separable

(b)

Page 78: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

So what can be represented using perceptrons?

and or

Representation theorem: 1 layer feedforward networks canonly represent linearly separable functions. That is,the decision surface separating positive from negativeexamples has to be a plane.

Page 79: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Learning Boolean AND

Page 80: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

XOR

• No w0, w1, w2 satisfy: (Minsky and Papert, 1969)

0

0

0

0

021

01

02

0

www

ww

ww

w

Page 81: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Expressive limits of perceptrons

• Can the XOR function be represented by a perceptron

(a network without a hidden layer)?

XOR cannot be represented.

Page 82: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

How can perceptrons be designed?

• The Perceptron Learning Theorem (Rosenblatt, 1960): Given enough training examples, there is an algorithm that will learn any linearly separable function.

Theorem 1 (Minsky and Papert, 1969) The perceptron rule converges to weights that correctly classify all training examples provided the given data set represents a function that is linearly separable

Page 83: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

The perceptron learning algorithm

• Inputs: training set {(x1,x2,…,xn,t)}• Method

– Randomly initialize weights w(i), -0.5<=i<=0.5– Repeat for several epochs until convergence:

• for each example– Calculate network output o.– Adjust weights:

iii

ii

www

xotw

)( Perceptron training

rule

learning rate error

Page 84: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Why does the method work?

• The perceptron learning rule performs gradient descent in weight space.– Error surface: The surface that describes the error on

each example as a function of all the weights in the network. A set of weights defines a point on this surface.

– We look at the partial derivative of the surface with respect to each weight (i.e., the gradient -- how much the error would change if we made a small change in each weight). Then the weights are being altered in an amount proportional to the slope in each direction (corresponding to a weight). Thus the network as a whole is moving in the direction of steepest descent on the error surface.

• The error surface in weight space has a single global minimum and no local minima. Gradient descent is guaranteed to find the global minimum, provided the learning rate is not so big that that you overshoot it.

Page 85: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Multi-layer, feed-forward networks

Perceptrons are rather weak as computing models since they can only learn linearly-separable functions.

Thus, we now focus on multi-layer, feed forward networks of non- linear sigmoid units: i.e.,

g(x) =

xe11

Page 86: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Multi-layer feed-forward networks

Multi-layer, feed forward networks extend perceptrons i.e., 1-layernetworks into n-layer by:• Partition units into layers 0 to L such that;

•lowermost layer number, layer 0 indicates the input units

•topmost layer numbered L contains the output units.

•layers numbered 1 to L are the hidden layers

•Connectivity means bottom-up connections only, with no cycles, hence the name"feed-forward" nets

•Input layers transmit input values to hidden layer nodes hence do notperform any computation.

Note: layer number indicates the distance of a node from the inputnodes

Page 87: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Multilayer feed forward network

x0 x1 x2 x3 x4

v1v2 v3

o1o2

Layer of input units

Layer of hidden units

Layer of output units

Page 88: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Multi-layer feed-forward networks

• Multi-layer feed-forward networks can be trained by back-propagation provided the activation function g is a differentiable function.– Threshold units don’t qualify, but the sigmoid

function does.

• Back-propagation learning is a gradient descent search through the parameter space to minimize the sum-of-squares error.

– Most common algorithm for learning algorithms in multilayer networks

Page 89: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Sigmoid units

x0

xn

w0

wn

oi

n

iixw

0

Sigmoid unit for g

aea

1

1)(

))(1)(()(

aaa

a

This is g’ (the basis for gradient descent)

Page 90: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Determining optimal network structure

Weak point of fixed structure networks: poor choice can lead to poor performance

Too small network: model incapable of representing the desired Function

Too big a network: will be able to memorize all examples but forming a large lookup table, but will not generalize well to inputs that have not been seen before.

Thus finding a good network structure is another example of asearch problems.Some approaches to search for a solution for this problem includeGenetic algorithmsBut using GAs is very cpu-intensive.

Page 91: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Learning rate

• Ideally, each weight should have its own learning rate

• As a substitute, each neuron or each layer could have its own rate

• Learning rates should be proportional to the sqrt of the number of inputs to the neuron

Page 92: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Setting the parameter values

• How are the weights initialized?• Do weights change after the presentation of

each pattern or only after all patterns of the training set have been presented?

• How is the value of the learning rate chosen?• When should training stop?• How many hidden layers and how many nodes

in each hidden layer should be chosen to build a feedforward network for a given problem?

• How many patterns should there be in a training set?

• How does one know that the network has learnt something useful?

Page 93: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

When should neural nets be used for learning a problem

• If instances are given as attribute-value pairs.– Pre-processing required: Continuous

input values to be scaled in [0-1] range, and discrete values need to converted to Boolean features.

• Noise in training examples.

• If long training time is acceptable.

Page 94: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Neural Networks: Advantages

•Distributed representations •Simple computations

•Robust with respect to noisy data

•Robust with respect to node failure

•Empirically shown to work well for many problem domains

•Parallel processing

Page 95: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Neural Networks: Disadvantages•Training is slow

•Interpretability is hard

•Network topology layouts ad hoc

•Can be hard to debug

•May converge to a local, not global, minimum of error

•May be hard to describe a problem in terms of features with numerical values

Page 96: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Back-propagation Algorithm

yj(n)

y0=+1

y1(n)

yi(n) wji(n)

wj1(n)

wj0(n)=bj (n)

∑ netj(n) f(.)

0

( ) ( ) ( )

( ) ( ( ))

( ) ( ) ( )

m

j ji ii

j j j

j j j

net n w n y n

y n f net n

e n d n y n

Total error )(2

1)( 2 nen

Cj

All output neurons

N

nav n

N 1

)(1 Average squared error where N=No. of items in the training set

Page 97: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

( ) ( ) ( )( ) ( )

( ) ( ) ( ) ( ) ( )j j j

ji j j j ji

e n y n net nn n

w n e n y n net n w n

( )( ) ( ( )) ( )

( ) j j j iji

ne n v n y n

w n

)()(

)(ne

ne

nj

j

)(2

1)( 2 nen

Cj

as

1)(

)(

ny

ne

j

j

as

)()()( nyndne jjj

( )' ( ( ))

( )j

j jj

y nf net n

net n

as

( ) ( ( ))j j jy n f net n

( )( )

( )j

iji

net ny n

w n

as

0

( ) ( ) ( )m

j ji ii

net n w n y n

Back-propagation Algorithm

Gradient decent

)()()( nynnw ijji

Error term

where ( ) ( ) '( ( ))j j j

j

n e n f net n

net

)()()( nynnw ijji 1

( ( ))1 exp( ( ))j j

j

net nnet n

if

as

'( ( )) ( )[1 ( )]j j jnet n y n y n

( ) ( ( ))j j jy n net n

Page 98: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Back-propagation Algorithm

Neuron k is an output node

( ) ( ) '( ( )) [ ( ) ( )] ( )[1 ( )]k k k k k k kn e n net n d n y n y n y n

( )k j j kjnet net w

Neuron j is a hidden node

Output layer (k)Hidden layer(j)

1

2

jw1

jw2

j

)()()( nynnw ijji

Weight adjustment

Learningrate

Localgradient

Input signal

( 1) ( ) ( )ji ji jiw n w n w n

Output of neuron k

( ) ' ( ( )) ( ) ( )

( )[1 ( )] ( ) ( )

kj j j k kj

k kj k j

j j k kjk

netn net n n w n

net net net

y n y n n w n

Page 99: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 100: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

-3.0

00

-2.0

00

-1.0

00

0.00

0

1.00

0

2.00

0

3.00

0

4.00

0

5.00

0

6.00

0

-3.0

00

-2.0

00

-1.0

00

0.00

0

1.00

0

2.00

0

3.00

0

4.00

0

5.00

0

6.00

00.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

Error

W1W2

Page 101: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Brain vs. Digital Computers (1)

Computers require hundreds of cycles to simulate a firing of a neuron

The brain can fire all the neurons in a single step.

ParallelismSerial computers require billions of cycles to

perform some tasks but the brain takes less than a seconde.g. Face Recognition

Page 102: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

What are Neural Networks

• An interconnected assembly of simple processing elements, units, neurons or nodes, whose functionality is loosely based on the animal neuron

• The processing ability of the network is stored in the interunit connection strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns.

Page 103: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Definition of Neural Network

A Neural Network is a system composed of many

simple processing elements operating in parallel

which can acquire, store, and utilize experiential

Knowledge.

Page 104: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Architecture

• Connectivity:– fully connected– partially connected

• Feedback– feedforward network: no feedback

• simpler, more stable, proven most useful

– recurrent network: feedback from output to input units

• complex dynamics, may be unstable

• Number of layers i.e. presence of hidden layers

Page 105: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Input

Layer

Hidden

Layer

Output

Layer

Inputs

NodeConnection

Outputs

Feedforward, Fully-Connected with One Hidden Layer

Page 106: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Hidden Units

• Layer of nodes between input and output nodes

• Allow a network to learn non-linear functions

• Allow the net to represent combinations of the input features

Page 107: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Learning Algorithms

• How the network learns the relationship between the inputs and outputs

• Type of algorithm used depends on type of network- architecture, type of learning,etc.

• Back Propagation: most popular– modifications exist: quick prop, Delta-bar-

Delta

• Others: Conjugate gradient descent, Levenberg-Marquardt, K-Means, Kohonen, standard pseudo-inverse (SVD) linear optimization

Page 108: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Types of Networks

• Multilayer Perceptron• Radial Basis Function• Kohonen• Linear• Hopfield• Adaline/Madaline• Probabilistic Neural Network (PNN)• General Regression Neural Network

(GRNN) • and at least thirty others

Page 109: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

• A Neural Network is a system composed of many simple processing elements operating in parallel which can acquire, store, and utilize experiential knowledge

• Basic Artificial Model– Consists of simple processing elements called neurons, units or

nodes– Each neuron is connected to other nodes with an associated

weight(strength) which typically multiplies the signal transmitted. Each neuron has a single threshold value

• Characterization– Architecture: the pattern of nodes and connections between

them– Learning algorithm, or training method: method for

determining weights of the connections– Activation function: function that produces an output based on

the input values received by node

Page 110: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Perceptrons

• First studied in the late 1950s• Also known as Layered Feed-Forward

Networks• The only efficient learning element at

that time was for single-layered networks

• Today, used as a synonym for a single-layer, feed-forward network

Page 111: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Perceptrons

• First studied in the late 1950s• Also known as Layered Feed-Forward

Networks• The only efficient learning element at

that time was for single-layered networks

• Today, used as a synonym for a single-layer, feed-forward network

Page 112: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Single Layer Perceptron

OUT = F(NET)OUT

X1

X2 • • • Xn

w1

w2

wn

OUT = F(NET)

X1

X2 • • • Xn

w1

w2

wn

Squashing function need not be sigmoidal

Page 113: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Perceptron Architecture

Page 114: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Single-Neuron Perceptron

Page 115: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Decision Boundary

Page 116: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Example OR

Page 117: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

OR solution

Page 118: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Multiple-Neuron Perceptron

Page 119: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Learning Rule Test Problem

Page 120: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Starting Point

Page 121: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Tentative Learning Rule

Page 122: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Second Input Vector

Page 123: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Third Input Vector

Page 124: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Unified Learning Rule

Page 125: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Multiple-Neuron Perceptron

Page 126: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Apple / Banana Example

Page 127: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Apple / Banana Example, Second iteration

Page 128: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Apple / Banana Example, Check

Page 129: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Historical Note

There was great interest in Perceptrons in the '50s and '60s - centred on the work of Rosenblatt.

 This was crushed by the publication of "Perceptrons" by Minsky and Papert.

 • EOR problem

regions must be linearly separable.

(0,1)

(0,0)

(1,1)

(1,0)

 • Training Problems

esp. with higher order nets.

Page 130: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

MLP used to describe any general feedforward (no recurrent connections) network

However, we will concentrate on nets with units arranged in layers

x1

xn

Page 131: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

NB different books refer to the above as either 4 layer (no. of layers of neurons) or 3 layer (no. of layers of adaptive weights). We will follow the latter convention

1st question:

what do the extra layers gain you? Start with looking at what a single layer can’t do

x1

xn

Page 132: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

XOR problem

XOR (exclusive OR) problem

0+0=01+1=2=0 mod 21+0=10+1=1 Perceptron does not work here

Single layer generates a linear decision boundary

Page 133: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Minsky & Papert (1969) offered solution to XOR problem by combining perceptron unit responses using a second layer of units

1

2

+1

+1

3

Page 134: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

This is a linearly separable problem!Since for 4 points { (-1,1), (-1,-1), (1,1),(1,-1) } it is always linearly separable if we want to have three points in a class

Page 135: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

xn

x1

x2

Input Output

Three-layer networks

Hidden layers

Page 136: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

What do each of the layers do?

1st layer draws linear boundaries

2nd layer combines the boundaries

3rd layer can generate arbitrarily complex boundaries

Page 137: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Can also view 2nd layer as using local knowledge while 3rd layer does global

With sigmoidal activation functions can show that a 3 layer net can approximate any function to arbitrary accuracy: property of Universal Approximation

Proof by thinking of superposition of sigmoids

Not practically useful as need arbitrarily large number of units but more of an existence proof

For a 2 layer net, same is true for a 2 layer net providing function is continuous and from one finite dimensional space to another

Page 138: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.
Page 139: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

First find the outputs OI , OII . In order to do this, propagate the inputs forward. First find the outputs for the neurons of hidden layer

1 1 10 11 1 12 20

2 2 20 21 1 22 20

3 3 30 31 1 32 20

( ) ( 1 )

( ) ( 1 )

( ) ( 1 )

i ii

i ii

i ii

O O W X O W W X W X

O O W X O W W X W X

O O W X O W W X W X

Example: Learning addition

Page 140: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

0 1 1 2 2 3 30

0 1 1 2 2 3 30

( ) ( 1 )

( ) ( 1 )

I Ii i I I I Ii

II IIi i II II II IIi

O O W O O W W O W O W O

O O W O O W W O W O W O

Then find the outputs of the neurons of output layer

Example: Learning addition

Page 141: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Now propagate back the errors. In order to do that first find the errors for the output layer, also update the weights between hidden layer and output layer

))(1(

))(1(

IIIIIIIIII

IIIII

OtOO

OtOO

33

22

11

0

OW

OW

OW

W

II

II

II

II

33

22

11

0

OW

OW

OW

W

IIII

IIII

IIII

IIII

Example: Learning addition

Page 142: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

And backpropagate the errors to hidden layer.

Example: Learning addition

Page 143: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

Example: Learning addition

))(1()1(

))(1()1(

))(1()1(

33331

3333

22221

2222

11111

1111

IIIIIIk

kk

IIIIIIk

kk

IIIIIIk

kk

WWOOWOO

WWOOWOO

WWOOWOO

2112

1111

10110

XW

XW

XW

2222

1221

20220

XW

XW

XW

2332

1331

30330

XW

XW

XW

And backpropagate the errors to hidden layer.

Page 144: X0x0 xnxn w0w0 wnwn o Threshold units. History spiking neural networks Vapnik (1990) ---support vector machine Broomhead & Lowe (1988) ----Radial basis.

333

222

111

000

333

222

111

000

323232

313131

303030

222222

212121

202020

121212

111111

101010

IIIIII

IIIIII

IIIIII

IIIIII

III

III

III

III

WWW

WWW

WWW

WWW

WWW

WWW

WWW

WWW

WWW

WWW

WWW

WWW

WWW

WWW

WWW

WWW

WWW

Example: Learning addition

Finally update weights!!!!


Recommended