The McCulloch Neuron (1943) - Engenharia Elétrica · The McCulloch Neuron (1943) g = step function...

Laboratório de Automação e Robótica - A. Bauchspiess – Soft Computing - Neural Networks and Fuzzy Logic

The McCulloch Neuron (1943)

g = step function The euclidian space ℜn is divided in two regions A and B

]1;0[)(1

∈→−=⎟⎠

⎞⎜⎝

⎛−= ∑

=

abgbpwga tn

iii pw

w1

p1

bp

2

pn

w2

wn

+

A

B

p1

p2

bpwpw =+ 2211

for n=2

51


The McCulloch Neuron – as patterns classifier

o

o

oo

o o

o o

o oo oo o

x xx xx x

xx

x xx

x

x xx x

Linearly separable collections Linearly dependent (non-separable) collections

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Some Boolean functions of two variables represented in a binary plan.

52


Linear and Non-Linear Classifiers

There exist possible logical functions connecting n inputs to one binary output.

n # of binray patterns

# of logical functions

# linearly separable

% linearly separable

1 2 4 4 100 2 4 16 14 87,5 3 8 256 104 40,6 4 16 65536 1.772 2,9 5 32 4,3 x 109 94.572 2,2 x 10-3 6 64 1,8 x 1019 5.028.134 3,1 x 10-13

nm 222 =

The logical functions of one variable: A, A , 0, 1 The logical functions of two variables: A, B, B,A , 0, 1

,,,, BABABABA ∧∨∧∨ ,,,, BABABABA ∧∨∧∨ BABA ⊕⊕ ,

53


Two Step Binary Perceptron The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The neuron 6 implements a logical AND function by choosing

∑=

=5

366

iiwb .

For example:

111;31

54366564636 ====⇒==== aaaifonlyandifabwww

54


Three Step Binary Perceptron

55

p1

1

1

p2

p1

w1 3

w3 9

w4 9

AA

B

Bw5 9

a1 1

a1 1=A B

^

p2

3

6

4

7

9

10

105

8


Neurons and Artificial Neural Networks §  Micro-structure

characteristics of each neuron in the network §  Meso-Structure

organization of the network §  Macro-Structure

association of networks, eventually with some analytical processing approach for complex problems

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

56

w1

p1

bp2

pn

w2

wn

+ Bias input

Bias: with p=0, output ≠0 still possible !


Typical activation functions Linear ssf =)( Hopfield

BSB purelin

s

f s( )

Signal

⎩⎨⎧

<−

≥+=

0101

)(ssesse

sf Perceptron hardlims

s-1

1f s( )

Step

⎩⎨⎧

<

≥+=

0001

)(ssesse

sf Perceptron BAM

hardlim

s

1f s( )

Hopfield/ BAM

⎪⎩

⎪⎨

⎧

=

<−

>+

=

00101

)(sifunchanged

ssesse

sf Hopfield BAM

s-1

1f s( )

57


Typical activation functions

BSB or Logical Threshold ⎪

⎩

⎪⎨

⎧

+≥+

+<<−

−≤−

=

KsseKKsKsesKsseK

sf )( BSB satlin

satlins

s-K

Kf s( )

Logístics

sesf

−+=

11)(

Perceptron Hopfield BAM, BSB

logsig

s

1f s( )

Hiperbolic Tangent s

s

eessf 2

2

11)tanh()(

−

−

+

−==

Perceptron Hopfield BAM, BSB

tansig

s

1

-1

f s( )

58


Meso-Structure – Network Organization...

# neurons per layer # network layers # connection type (forward, backward, lateral).

1- Multilayer Feedforward

Multilayer Perceptron (MLP)

59


Meso-Structure – Network Organization...

2- Single Layer laterally connected (BSB (self-feedback), Hopfield)

3 – Bilayers Feedforward/Feedbackward

60


Meso-Structure – Network Organization

4 – Multilayer Cooperative/Comparative Network

5 – Hybrid Network

Sub-

Rede 1

Sub-

Rede 2

61

Network 1

Network 2


Neural Macro-Structure

Rede 1

Rede 2a Rede 2b Rede 2c

Rede 3

- # networks - connection type - size of networks - degree of connectivity

62

NetW.

NetW.

NetW. NetW. NetW.2


∑+← 2

k

ijjijij x

xww

µδ

Supervised Learning

ydxww −≡+← δµδ§ Delta Rule → Perceptron

§ Widrow-Hoff delta rule (LMS) → ADALINE, MADALINE

§ Generalized Delta Rule

Widrow-Hoff Delta Rule (LMS)

x

d-y__δ µ – learning rate

63


Delta rule → Perceptron x

d-y__δ

64

Perceptron – Rosenblatt, 1957 Dynamics:

⎩⎨⎧

<

≥+==

+=∑

0001

)(j

jjj

ijijijj

ssesse

sfy

bpws

w1 j

bj1

p1 j

sj

yjp

2 j

pn j

w2 j

wn j

+

jjj yd −=δ

wij ← wij + µ δj xij Delta Rule

µ - learning rate δj = 0 → the weight is not changed.

Psychology Reasoning: - positive reinforcement - negative reinforcement


ADALINE and MADALINE

65

Widrow & Hoff, 1960 – (Mult.) Adaptive Linear Element Training:

∑ +=i

jijijj bpwy

jjObs δε ≡: wij ← wij + µ δj xij Delta Rule

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛+−=−= ∑ jijijjjjj bpwdsdε

∑+← 2

k

ijjijij x

xww

µεWidrow-Hoff delta rule LMS – Least Mean Squared algorithm

0.1< µ <1 – stability and convergency speed. MatLab: NEWLIN, NEWLIND, ADAPT, LEARNWH


LMS Algorithm

66

Objective: learn a function from the samples (xk, dk) { xk}, {dk } and {ek } → stationary stochastic processes e = d – y → actual stochastic error

E[e2 ] = E[d2 ] – 2Pwt + wRwt

ℜ→ℜnf :

∑=

==n

i

tiiwxy1

xw

→ Linear neuron

E[e2] = E[(d-y)2] = E[(d-xwt)2] = E[d2 ] – 2E[dx]wt + wE [xtx] wt

Expected value

Assuming w deterministic. With E [xtx] ≡ R → autocorrelation input matrix E [dx] ≡ P → cross correlated vector

0 = 2w*R – 2P

(Partial derivatives equal 0 for optimal w*)

w* = PR-1 Optimal analytic solution of the optimization (solvelin.m)


Iterative LMS Algorithm

67

Objective: adaptively learn a function from the samples (xk, dk) ℜ→ℜnf :

Knowing P and R, ϶ R-1 , then for some w: ∇w E[e2 ] = 2wR – 2P Post-multiplyting by ½ R-1

½ ∇w E[e2 ] R-1 = w –P R-1 = w – w*

w* = w – ½ ∇w E[e2 ] R-1

wk+1 = wk – ck ∇w E[e2 ] R-1

(ck = ½ → Newton’s method)

LMS Hypothesis: E[e2

k+1| e20 , e2

1, ... e2k] = e2

k

i *

How to, cautiously find new (better ) values for wi , the free parameters ?


Iterative LMS Algorithm...

68

Iterative (adaptive) solution (The optimal solution is never reached!) MADALINE i-input, j-neuron

∑+← 2

k

ijjijij x

xww

µδ

i *

⎥⎦

⎤⎢⎣

⎡

∂∂

∂∂

n

kkwe

we 2

1

2, !

⎥⎦

⎤⎢⎣

⎡

∂

−∂

∂

−∂=

n

kkkk

wyd

wyd 2

1

2 )(,)(!

⎥⎦

⎤⎢⎣

⎡

∂

∂−−

∂

∂−−=

n

kkk

kkk w

yydwyyd )(2,)(21

!

⎥⎦

⎤⎢⎣

⎡

∂

∂

∂

∂−=

n

kkk w

ywye !,21

[ ] )(2– ,2 1 tkkkkk

nkkk yexxe wxx ==−= !

assuming R = I → estimated steppest decent algorithm: wk+1

= wk – ck ∇w e2k

Gradient of e2

k with respect to w ∇w e2

k =

LMS algorithm reduces to wk+1 = wk + 2ck ek xk

Norma- lization


The Multilayer Perceptron - The Generalized Delta Rule

Rumelhart, Hinton e Williams, PDP/MIT, 1986

)0(11 xp = )1(

1x1

)2(1 yx =

2)2(2 yx =

)1(2x

)1(3x

)0(22 xp =

)0(33 xp =

69

Neuron Dynamics: Processing Element (PE) j in layer k input i

with f (activation function) continuous differentiable

)( )()(

)1()()(0

)(

kj

kj

i

ki

kij

kj

kj

sfx

xwws

=

+= ∑ − Turning Point Question: How to find the error associated with an internal neuron??


The generalized delta rule )0(

11 xp = )1(1x

1)2(

1 yx =

2)2(2 yx =

)1(2x

)1(3x

)0(22 xp =

)0(33 xp =

70

∑=

−=m

jjj yd

1

22 )(ε

),...,,( )()(1

)()( kmj

kj

koj

kj www=w

),...,,1( )1()1(1

)1( −−− = knj

kj

kj xxx

⎥⎥⎦

⎤

⎢⎢⎣

⎡

∂

∂

∂

∂

∂

∂=

∂

∂=∇ )(

2

)(1

2

)(0

2

)(

2)( ,, k

mjkj

kj

kj

kj www

εεεε!

w

)(

)(

)(

2

)(

2)(

kj

kj

kj

kj

kj

ss ww ∂

∂

∂

∂=

∂

∂=∇

εε

)1()()( −= kj

kj

kjs xw )1(

)(

)(−=

∂

∂ kjk

j

kjs xw

)1()(

2

)(

2)( −

∂

∂=

∂

∂=∇ k

jkj

kj

kj s

xw

εε

)(

2)(

21

kj

kj s∂

∂−=

εδ

Training

- quadratic error -  weigths of PE j

-  input vector of PE j

With →

so

Defining the quadratic derivative error as

Instantaneous gradient:

)1()()( 2 −−=∇ kj

kj

kj xδ Gradient of the error with respect

to the weights as function of the former layer signals!!


The generalized delta rule... )0(

11 xp = )1(1x

1)2(

1 yx =

2)2(2 yx =

)1(2x

)1(3x

)0(22 xp =

)0(33 xp =

71

For the output layer, the quadratic derivative error is:

The output error associated with PEj, in the last layer:

)(1

2)(

)(1

2

)())((

21

)(

21

kj

N

i

kii

kj

N

iii

kj s

sfd

s

ydkk

∂

−∂

−=∂

−∂

−=∑∑==δ

The partial derivatives are 0 for i ≠ j

)()())((

))(())((

21 )()(

)(

)()(

)(

2)()( k

jkjjk

j

kjjk

jjkj

kjjk

j sfxdssfd

sfdssfd

ʹ′−=∂

−∂−−=

∂

−∂−=δ

jjkjj

kj ydxd −=−= )()(ε

Giving: )(. )()()( k

jkj

kj sf ʹ′= εδ

Remember, “activation function, f, continuous differentiable”


The generalized delta rule... )0(

11 xp = )1(1x

1)2(

1 yx =

2)2(2 yx =

)1(2x

)1(3x

)0(22 xp =

)0(33 xp =

72

For a hidden layer k, the quadratic derivative error can be calculated using the linear outputs of layer k+1:

∑+

=

+

+ ⎟⎟⎠

⎞⎜⎜⎝

⎛

∂

∂

∂

∂−=

∂

∂−=

1

1)(

)1(

)1(

2

)(

2)(

21

21 kN

iki

ki

ki

kj

kj s

sssεε

δ

∑∑++

=

++

=

+

+ ⎟⎟⎠

⎞⎜⎜⎝

⎛

∂

∂=⎟⎟⎠

⎞⎜⎜⎝

⎛

∂

∂⎟⎟⎠

⎞⎜⎜⎝

⎛

∂

∂−=

11

1)(

)1()1(

1)(

)1(

)1(

2

21 kk N

iki

kik

i

N

iki

ki

ki s

sss

sδ

ε

(Chain Rule)

∑=

−+=kN

i

ki

kij

kj

kj xwws

1

)1()()(0

)(Taking into account that

( )∑ ∑+

= =

+++

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛+

∂

∂=

1

1 1

)()1()1(0)(

)1()(k kN

i

N

l

kl

kli

kik

i

ki

kj sfww

sδδ

( )∑ ∑+

= =

++

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛

∂

∂=

1

1

)()(

1

)1()1()(k kN

i

klk

i

N

l

kli

ki

kj sf

swδδ

( ) ( ) ( ))()()(

)()( thatandif0

gconsiderin

kj

kjk

j

klk

j

sfsfs

jlsfs

ʹ′=∂

∂≠=

∂

∂

( ) ( ))(1

)1()1()( .

)(

1kj

N

i

kji

ki

kj sf

kj

wk

ʹ′

≡

⎟⎟⎠

⎞⎜⎜⎝

⎛= ∑

+

=

++

!! "!! #$

ε

δδWe have:

)(. )()()( kj

kj

kj sf ʹ′= εδ

Finally, the quadratic derivative errror for a hidden layer:


The “Error Backpropagation” algorithm

1. randomw kij ←)( , initialize the network weigths

2. for (x,d), training pair, obtain y. Feedforward propagation: ∑=

−=m

jjj yd

1

22 )(ε

3. k layerlast← 4. for each element j in the layer k do:

Compute )(kjε using jj

kjj

kj ydxd −=−= )()(ε if k is the last layer,

∑+

=

++=1

1

)1()1()(kN

i

kji

ki

kj wδε if it is a hidden layer;

Compute )(. )()()( k

jk

jk

j sf ʹ′= εδ

5. 1−← kk if k > 0 go to step 4, else continue. 6. )()()()( 2)()1( k

ik

ikj

kj nn xww µδ+=+

7. For the next training pair go to step 2.

73


The Backpropagation Algorithm in practice

74

1 – In the standard form BP is very slow. 2 – BP Pathologies: paralysis in regions of small gradient. 3 – Initial conditions can lead to local minima. 4 – Stop conditions – number of epochs, ∆wij < ϵ 5 – BP variants

- trainbpm (with momentum) - trainbpx (adaptive learning rate) - .... - trainlm (Levenberg-Marquard – J, Jacobian)

E - Energia da rede

Padrões armazenados

Padrão espúrioValor Inicial

Padrão recuperado

Estados

e2

“Bad Start” “Good Start”

Local Minima

wi,j

Optimum

e2(wi,j) - Illustrative quadratic error as function of the weights eJJJJ TTk

j1)( )( −+=Δ µW

Obs: the error surface is, normally, unknown. Steepest descent → go in the opposite direction of the local gradient (“downhill”).


Computational Tools

•  SNNS

•  MatLab - Neural Network Toolbox

•  NeuralWorks

•  Java

•  C++

•  Hardware Implementations of RNAs

75


SNNS - Stuttgarter Neural Network Simulator

76


MatLab - complete environment

-System Simulation -Training -Control

uhat1

tansigpurelinnetsum 1

+

netsum

+

Zero -OrderHold

Unit Delay 5z

1

Unit Delay 1z

1

Switch 3

Switch 2

Switch 1

Switch Saturation 1

MatrixGain 4

K*u

MatrixGain 3

K*u

MatrixGain 2

K*u

MatrixGain 1

K*u

Fcn3

f(u)

Fcn2

f(u)

Fcn1

f(u)

Fcn

f(u)

Discrete State -Space 3

y(n)=Cx(n)+Du(n)x(n+1)=Ax(n)+Bu(n)





Constant 7B2_c

Constant 6B1_cConstant 5

-C-

Constant 4-C-

Constant 3-C-

Constant 2-C-

y3

u2

r1

qi

model liq 4 order

qi h4

Scope

Model Reference Controller

Plant Output

Reference

ControlSignal

NeuralNetworkController h4

h4tansig

radbas

purelin

logsig

77

Laboratório de Automação e Robótica - A. Bauchspiess – Soft Computing - Neural Networks and Fuzzy Logic 78

Demonstration - Perceptron % Perceptron % Training an ANN to learn to classify a non-linear problem % Input Pattern P=[ 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1] % Target %T=[1 0 1 1 1 0 1 0] % Linear separable T=[1 0 0 1 1 0 1 0] % non separable % Try with Rosenblat's Perceptron net=newp(P,T,'hardlim') % train the network net=train(net,P,T) Y=sim(net,P)

T = 1 0 0 1 1 0 1 0 Y = 1 0 1 0 0 0 1 0

T = 1 0 1 1 1 0 1 0 Y = 1 0 1 1 1 0 1 0


Demonstration - OCR

Training Vector

20 % Noise

p1

p1

p63

y1

y2

y16

x1

x2

xN

ANN


Demonstration – OCR...

Training with 10 x (0,10,20,30,40,50) % noise

p1

p1

p63

y1

y2

y16

x1

x2

xN

* - error without noisy training patterns * - error using noisy training patterns

% of missclassifications – Neural OCR Classifier

Noisy patterns used in training (unitl % of bits flipped)

With Some Noisy Training Pattern → Learns how to treat “any” noise


Demonstration – LMS, ADALINE, FIR

!,:),(

)()(

)()2()1()()(

22

110

210

unstablebecanbutcompactmoreisModelIIRObsZerosonlystablealwaysModelFIR

zwzwzwwzUzY

nkuwkuwkuwkuwky

nn

n

−−− +++=

−+−+−+=

!

!

0 50 100 150-4

-2

0

2

4System changes at 80 sec

sec

123sec)15080(

12.01sec)9.790( 2221 ++

=−++

=−ss

gss

g

(TDL – Time Delay Line)

sec1.0, =sTTimeSampling


Demo – LMS, ADALINE, FIR... % ADALINE - Adaptive dynamic system identification % First sampled system - until 80 sec g1=tf(1,[1 .2 1]), gd1=c2d(g1,.1) % Sytem changes dramatically - after 80 sec g2=tf(3,[1 2 1]),gd2=c2d(g2,.1) % Pseudo Random Binary Signal - good for identification u=idinput(120*10,'PRBS',[0 0.01],[-1 1]); % time vector ... [y1,t1,x1]=lsim(gd1,u1,t1); [y2,t2,x2]=lsim(gd2,u2,t2,x1); % Creates new adaline nework with delayed inputs (FIR) % Learning Rate = 0.09 net=newlin(t,y,[1 2 3 4 5 6 7 8 9 10],0.09) [net,Y,E]=adapt(net,t,y) % design an average transfer function netd=newlind(t,y)


Demo – LMS, ADALINE, FIR...

0 500 1000 1500-4

-2

0

2

4RMSE Set 1=6.5742

0 500 1000 1500-4

-2

0

2

4Error

0 200 400 600 800 1000 1200-5

0

5

10RMSE Set 2=22.7817

0 200 400 600 800 1000 1200-5

0

5Error

n=10, lr=0.1

u=idinput(1500,'PRBS',[0 0.01])

u=idinput(1200,'PRBS',[0 0.05])

n=10, lr=0.1

Verification Signal

ADALINE Learns System AND also Changes in the Dynamics!!

But, in other frequency range not so good... (needs to Adjust TDL, lr, Ts)

Date post:	19-Jul-2018
Category:	Documents
Upload:	vannhan
View:	214 times
Download:	0 times

The McCulloch Neuron (1943) - Engenharia Elétrica · The McCulloch Neuron (1943) g = step function...

Documents