+ All Categories
Home > Documents > Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6)

Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6)

Date post: 23-Feb-2016
Category:
Upload: oprah
View: 55 times
Download: 1 times
Share this document with a friend
Description:
Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6). SCCS451 Artificial Intelligence Week 12. Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath Pupacdi. Agenda. Multi-layer Neural Network Hopfield Network. - PowerPoint PPT Presentation
64
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath Pupacdi SCCS451 Artificial Intelligence Week 12
Transcript
Page 1: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

1

Chapter 6: Artificial Neural NetworksPart 2 of 3 (Sections 6.4 – 6.6)

Asst. Prof. Dr. Sukanya PongsuparbDr. Srisupa Palakvangsa Na AyudhyaDr. Benjarath Pupacdi

SCCS451 Artificial IntelligenceWeek 12

Page 2: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

2

Agenda

Multi-layer Neural NetworkHopfield Network

Page 3: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

3

Multilayer Neural NetworksA multilayer perceptron is a feedforward neural network with ≥ 1 hidden layers.

Threshold

Inputs

x1

x2

OutputY

HardLimiter

w2

w1

LinearCombiner

Inputlayer

Firsthiddenlayer

Secondhiddenlayer

Outputlayer

O u

t p

u t

S i

g n

a l s

I n p

u t

S i

g n

a l s

Single-layer VS Multi-layer Neural Networks

Page 4: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

4

Roles of Layers

Input Layer Accepts input signals from outside worldDistributes the signals to neurons in hidden layerUsually does not do any computation

Output Layer (computational neurons)Accepts output signals from the previous hidden layerOutputs to the worldKnows the desired outputs

Hidden Layer (computational neurons)Determines its own desired outputs

Page 5: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

5

Hidden (Middle) Layers

Neurons in hidden layers unobservable through input and output of the networks.Desired output unknown (hidden) from the outside and determined by the layer itself1 hidden layer for continuous functions2 hidden layers for discontinuous functionsPractical applications mostly use 3 layersMore layers are possible but each additional layer exponentially increases computing load

Page 6: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

6

How do multilayer neural networks learn?

More than a hundred different learning algorithms are available for multilayer ANNsThe most popular method is back-propagation.

Page 7: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

7

Back-propagation AlgorithmIn a back-propagation neural network, the learning algorithm has 2 phases.1. Forward propagation of inputs2. Backward propagation of errors

The algorithm loops over the 2 phases until the errors obtained are lower than a certain threshold.Learning is done in a similar manner as in a perceptron

A set of training inputs is presented to the network.The network computes the outputs.The weights are adjusted to reduce errors.

The activation function used is a sigmoid function.

Xsigmoid

eY

11

Page 8: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

8

Common Activation FunctionsS t e p f u n c t io n S ig n f u n c t io n

+ 1

-10

+ 1

-10X

Y

X

Y+ 1

-10 X

Y

S ig m o id f u n c t io n

+ 1

-10 X

Y

L in e a r f u n c t io n

0 if ,00 if ,1

XX

Y step

0 if ,10 if ,1

XX

Y signX

sigmoid

eY

1

1 XY linear

Hard limit functions often used for decision-making neurons for classification and pattern recognition

Popular in back-propagation networks

Often used for linear approximation

Output is a real number in the [0, 1] range.

Page 9: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

9

3-layer Back-propagation Neural Network

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

Page 10: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

10

How a neuron determines its outputVery similar to the perceptron

Xsigmoid

eY

11

n

iiiwxX

1

1. Compute the net weighted input

2. Pass the result to the activation function

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

j

2

5

1

8

0.10.2

0.50.3

X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2 = 3.9Y = 1 / (1 + e-3.9) = 0.98

Let θ = 0.2

0.98

0.98

0.98

0.98

0.98

0.98

Input Signals

Page 11: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

11

How the errors propagate backwardThe errors are computes in a similar manner to the errors in the perceptron.Error = The output we want – The output we get

)()()( , pypype kkdk Error at an output neuron k at iteration p

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

k

2

5

1

8

0.10.2

0.50.3

ek(p) = 1 – 0.98 = 0.02

Suppose the expected output is 1.

0.98

Iteration p

Error Signals

Page 12: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

12

Back-Propagation Training AlgorithmStep 1: Initialization

Randomly define weights and threshold θ such that the numbers are within a small range

where Fi is the total number of inputs of neuron i. The weight initialization is done on a neuron-by-neuron basis.

ii FF4.2 ,4.2

0 5 10 15 20 25 30 350

1

2

Random weight range [2.4, 0]

Fi

2.4/

Fi

Page 13: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

13

Back-Propagation Training AlgorithmStep 2: Activation

Propagate the input signals forward from the input layer to the output layer.

]1,0[,1

1

YY sigmoid

Xsigmoid

e

n

iiiwxX

1

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

j

2

5

1

8

0.10.2

0.50.3

X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2 = 3.9Y = 1 / (1 + e-3.9) = 0.98

Let θ = 0.20.98

0.98

0.98

0.98

0.98

Input Signals

Page 14: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

14

Back-Propagation Training AlgorithmStep 3: Weight Training

There are 2 types of weight training1. For the output layer neurons2. For the hidden layer neurons

*** It is important to understand that first the input signals propagate forward, and then the errors propagate backward to help train the weights. ***

In each iteration (p + 1), the weights are updated based on the weights from the previous iteration p.The signals keep flowing forward and backward until the errors are below some preset threshold value.

Page 15: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

15

3.1 Weight Training (Output layer neurons)

These formulas are used to perform weight corrections.

)()()1( ,,, pwpwpw kjkjkj

)()()(, ppypw kjkj

)()(1)()( pepypyp kkkk

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

k

1

2

m

w1,kw2,k

wm,k

yk(p)

Iteration p

jwj,k

ek(p) = yd,k(p) - yk(p)

δ = error gradient

Page 16: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

16

)()()1( ,,, pwpwpw kjkjkj

)()()(, ppypw kjkj

)()(1)()( pepypyp kkkk

We want to compute this We know this

predefined We know how to compute this

We know how to compute these

k

1

2

m

w1,kw2,k

wm,k

yk(p)

Iteration p

jwj,k

ek(p) = yd,k(p) - yk(p)

We do the above for each of the weights of the outer layer neurons.

Page 17: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

17

3.2 Weight Training (Hidden layer neurons)

These formulas are used to perform weight corrections.

)()()1( ,,, pwpwpw jijiji

)()()(, ppxpw jiji

l

kjkkjjj pwppypyp

1

)()()(1)()(

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

j

1

2

n

w1,jw2,j

wn,j

Iteration p

iwi,j

1

2

l

k

Page 18: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

18

l

kjkkjjj pwppypyp

1

)()()(1)()(

We want to compute this We know this

predefined input

We do the above for each of the weights of the hidden layer neurons.

)()()1( ,,, pwpwpw jijiji

)()()(, ppxpw jiji

Propagates from the outer layer

We know this

j

1

2

n

w1,jw2,j

wn,j

Iteration p

iwi,j

1

2

l

k

We know how to compute these

Page 19: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

19

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

P = 1

Page 20: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

20

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

Weights trainedWeights trainedP = 1

Page 21: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

21

After the weights are trained in p = 1, we go back to Step 2 (Activation) and compute the outputs for the new weights.

If the errors obtained via the use of the updated weights are still above the error threshold, we start weight training for p = 2.

Otherwise, we stop.

Page 22: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

22

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

P = 2

Page 23: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

23

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

Weights trainedWeights trainedP = 2

Page 24: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

24

Example: 3-layer ANN for XORx2 or input2

x1 or input1

(1, 1)

(1, 0)

(0, 1)

(0, 0)

XOR is not a linearly separable function.

A single-layer ANN or the perceptron cannot deal with problems that are not linearly separable. We cope with these problem using multi-layer neural networks.

Page 25: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

25

Example: 3-layer ANN for XOR

y5 5

x1 3 1

x2

Input layer

Output layer

Hidden layer

4 2

3 w13

w24

w23

w14

w35

w45

4

5

1

1

1

(Non-computing)

n

iiiwxX

1

Let α = 0.1

Page 26: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

26

Example: 3-layer ANN for XOR

26

Training set: x1 = x2 = 1 and yd,5 = 0

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.01.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

Calculate y3 = sigmoid(0.5+0.4-0.8) = 0.5250

Calculate y4 = sigmoid(0.9+1.0+0.1) = 0.8808

0.5250

0.8808

-0.63

0.9689

y5 = sigmoid(-0.63+0.9689-0.3) = 0.5097

0.5097

e = 0 – 0.5097 = – 0.5097

Let α = 0.1

Page 27: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

27

Example: 3-layer ANN for XOR (2)

27

Back-propagation of error (p = 1, output layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.01.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ = y5x (1-y5) x e = 0.5097 x (1-0.5097)

x (-0.5097) = -0.1274

Δwj,k (p) = α x yj(p) x δk(p)Δw3,5 (1) = 0.1 x 0.5250 x (-0.1274) = -0.0067

Let α = 0.1

wj,k (p+1) = wj,k (p) + Δwj,k (p)w3,5 (2) = -1.2 – 0.0067 = -1.2067

-1.2067

Page 28: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

28

Example: 3-layer ANN for XOR (3)

28

Back-propagation of error (p = 1, output layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.01.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

w4,5 (2) = 1.1 – 0.0112 = 1.0888

-1.2067

Δwj,k (p) = α x yj(p) x δk(p)Δw4,5 (1) = 0.1 x 0.8808 x (-0.1274) = -0.0112

wj,k (p+1) = wj,k (p) + Δwj,k (p)

1.0888

Page 29: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

29

Example: 3-layer ANN for XOR (4)

29

Back-propagation of error (p = 1, output layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.01.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

θ5 (2) = 0.3 + 0.0127= 0.3127

-1.2067

Δθk (p) = α x y(p) x δk(p)Δθ5 (1) = 0.1 x -1 x (-0.1274) = 0.0127θ5 (p+1) = θ5 (p) + Δ θ5 (p)

1.0888

0.3127

Page 30: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

30

Example: 3-layer ANN for XOR (5)

30

Back-propagation of error (p = 1, input layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.01.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

-1.2067

1.0888

0.3127

Δwi,j (p) = α x xi(p) x δj(p)Δw1,3 (1) = 0.1 x 1 x 0.0381 = 0.00381wi,j (p+1) = wi,j (p) + Δwi,j (p)w1,3 (2) = 0.5 + 0.00381 = 0.5038

δj(p) = yi(p) x (1-yi (p)) x ∑ [αk(p) wj,k (p)], all k’s δ3(p) = 0.525 x (1- 0.525) x (-0.1274 x -1.2) = 0.0381

0.5038

Page 31: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

31

Example: 3-layer ANN for XOR (6)

31

Back-propagation of error (p = 1, input layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.01.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

-1.2067

1.0888

0.3127

Δwi,j (p) = α x xi(p) x δj(p)Δw1,4 (1) = 0.1 x 1 x -0.0147 = -0.0015wi,j (p+1) = wi,j (p) + Δwi,j (p)w1,4 (2) = 0.9 -0.0015 = 0.8985

δj(p) = yi(p) x (1-yi (p)) x ∑ [αk(p) wj,k (p)], all k’s δ4(p) = 0.8808 x (1- 0.8808) x (-0.1274 x 1.1) = -0.0147

0.5038

0.8985

Page 32: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

32

Example: 3-layer ANN for XOR (7)

32

Back-propagation of error (p = 1, input layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.01.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

-1.2067

1.0888

0.3127

Δwi,j (p) = α x xi(p) x δj(p)Δw2,3 (1) = 0.1 x 1 x 0.0381 = 0.0038wi,j (p+1) = wi,j (p) + Δwi,j (p)w2,3 (2) = 0.4 + 0.0038 = 0.4038

δ3(p) = 0.0381δ4(p) = -0.0147

0.5038

0.4038 0.8985

Page 33: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

33

Example: 3-layer ANN for XOR (8)

33

Back-propagation of error (p = 1, input layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.01.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

-1.2067

1.0888

0.3127

Δwi,j (p) = α x xi(p) x δj(p)Δw2,4 (1) = 0.1 x 1 x -0.0147 = -0.0015wi,j (p+1) = wi,j (p) + Δwi,j (p)w2,4 (2) = 1 – 0.0015 = 0.9985

δ3(p) = 0.0381δ4(p) = -0.0147

0.5038

0.4038 0.8985

0.9985

Page 34: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

34

Example: 3-layer ANN for XOR (9)

34

Back-propagation of error (p = 1, input layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.01.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

-1.2067

1.0888

0.3127

δ3(p) = 0.0381δ4(p) = -0.0147

0.5038

0.4038 0.8985

0.9985

θ3 (2) = 0.8 - 0.0038 = 0.7962

Δθk (p) = α x y(p) x δk(p)Δθ3 (1) = 0.1 x -1 x 0.0381 = -0.0038θ3 (p+1) = θ3 (p) + Δ θ3 (p)

0.7962

Page 35: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

35

Example: 3-layer ANN for XOR (10)

35

Back-propagation of error (p = 1, input layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.01.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

-1.2067

1.0888

0.3127

δ3(p) = 0.0381δ4(p) = -0.0147

0.5038

0.4038 0.8985

0.9985

θ4 (2) = -0.1 + 0.0015 = -0.0985

Δθk (p) = α x y(p) x δk(p)Δθ4 (1) = 0.1 x -1 x (-0.0147) = 0.0015θ4 (p+1) = θ4 (p) + Δ θ4 (p)

0.7962

-0.0985

Page 36: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

36

Example: 3-layer ANN for XOR (9)

36

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.01.1

0.8

0.30.4

0.9

0.5

α = 0.1

Now the 1st iteration (p = 1) is finished. Weight training process is repeated until the sum of squared errors is less than 0.001 (threshold).

-1.2067

1.0888

0.3127

0.5038

0.4038 0.8985

0.9985

0.7962

-0.0985

Page 37: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

37

Learning Curve for XOR

0 50 100 150 200

101

Epoch

Sum

-Squ

ared

Err

orSum-Squared Network Error for 224 Epochs

100

10-1

10-2

10-3

10-4

The curve shows ANN learning speed.

224 epochs or 896 iterations were required.

Page 38: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

38

Final Results

38

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.01.1

0.8

0.30.4

0.9

0.5 -10.4

9.8

4.6

4.7

4.8 6.4

6.4

7.3

2.8

Training again with different initial values may result differently. It works so long as the sum of squared errors is below the preset error threshold.

Page 39: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

39

Final Results

Inputs

x1 x2

1010

1100

011

Desiredoutput

yd

0

0.0155

Actualoutput

y5Y

Error

e

Sum ofsquarederrors

e 0.9849 0.9849 0.0175

0.0155 0.0151 0.0151 0.0175

0.0010

Different result possible for different initial.But the result always satisfies the criterion.

Page 40: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

40

McCulloch-Pitts Model: XOR Op.

y55

x1 31

x2 42

+1.0

1

1

1+1.0

+1.0

+1.0

+1.5

+1.0

+0.5

+0.5 2.0

Activation function: sign function

Page 41: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

41

Decision Boundary

(a) Decision boundary constructed by hidden neuron 3;(b) Decision boundary constructed by hidden neuron 4; (c) Decision boundaries constructed by the complete three-layer network

x1

x2

1

(a)

1

x2

1

1

(b)

00

x1 + x2 – 1.5 = 0 x1 + x2 – 0.5 = 0

x1 x1

x2

1

1

(c)

0

Page 42: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

42

Problems of Back-Propagation

Not similar to the process of a biological neuronHeavy computing load

Page 43: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

43

Accelerated Learning in Multi-layer NN (1)

Represent sigmoid function by hyperbolic tangent:

where a and b are constants.Suitable values: a = 1.716 and b = 0.667

aeaY bX

htan

12

Page 44: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

44

Accelerated Learning in Multi-layer NN (2)

Include a momentum term in the delta rule

where is a positive number (0 1) called the

momentum constant. Typically, the momentum constant is set to 0.95.

This equation is called the generalized delta rule.

)()()1()( ppypwpw kjjkjk

Page 45: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

4545

Learning with Momentum

0 20 40 60 80 100 120 10-4

10-2

100

102

Epoch

Sum

-Squ

ared

Err

or

Training for 126 Epochs

0 100 140 -1

-0.5

0

0.5

1

1.5

Epoch

Lear

ning

Rat

e

10-3

101

10-1

20 40 60 80 120

Reduced from 224 to 126 epochs

Page 46: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

46

Accelerated Learning in Multi-layer NN (3)

Adaptive learning rate: Ideasmall smooth learning curvelarge fast learning, possibly instable

Heuristic rule:increase learning rate when the change of the sum of squared errors has the same algebraic sign for several consequent epochs.decrease learning rate when the sign alternates for several consequent epochs

Page 47: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

4747

Effect of Adaptive Learning Rate

0 10 20 30 40 50 60 70 80 90 100Epoch

Training for 103 Epochs

0 20 40 60 80 100 1200

0.2

0.4

0.6

0.8

1

Epoch

Lear

ning

Rat

e

10-4

10-2

100

102Su

m-S

quar

ed E

rror

10-3

101

10-1

Page 48: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

4848

Momentum + Adaptive Learning Rate

0 10 20 30 40 50 60 70 80Epoch

Training for 85 Epochs

0 10 20 30 40 50 60 70 80 900

0.5

1

2.5

Epoch

Lear

ning

Rat

e

10-4

10-2

100

102Su

m-S

quar

ed E

rror

10-3

101

10-1

1.5

2

Page 49: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

49

The Hopfield Network

Neural networks were designed on an analogy with the brain, which has associative memory.We can recognize a familiar face in an unfamiliar environment. Our brain can recognize certain patterns even though some information about the patterns differ from what we have remembered.Multilayer ANNs are not intrinsically intelligent.Recurrent Neural Networks (RNNs) are used to emulate human’s associative memory.Hopfield network is a RNN.

Page 50: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

50

The Hopfield Network: Goal

To recognize a pattern even if some parts are not the same as what it was trained to remember.The Hopfield network is a single-layer network.It is recurrent. The network outputs are calculated and then fed back to adjust the inputs. The process continues until the outputs become constant.Let’s see how it works.

Page 51: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

51

Single-layer n-neuron Hopfield Network

xi

x1

x2

xn

yi

y1

y2

yn

1

2

i

nI n p

u t

S i

g n

a l s

O u

t p

u t

S i

g n

a l s

Page 52: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

52

Activation Function

If the neuron’s weighted input is greater than zero, the output is +1.If the neuron’s weighted input is less than zero, the output is -1.If the neuron’s weighted input is zero, the output remains in its previous state.

XYXX

Ysign

if,if,1

0if,1

Page 53: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

53

Hopfield Network Current State

The current state of the network is determined by the current outputs, i.e. the state vector.

xi

x1

x2

xn

yi

y1

y2

yn

1

2

i

n

ny

yy

21

Y

Page 54: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

54

What can it recognize?

n = the number of inputs = nEach input can be +1 or -1There are 2n possible sets of input/output, i.e. patterns.M = total number of patterns that the network was trained with, i.e. the total number of patterns that we want the network to be able to recognize

Page 55: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

55

Example: n = 3, 23 = 8 possible states

y1

y2

y3(1, 1,1)( 1, 1,1)

( 1, 1, 1) (1, 1, 1)

(1, 1,1)( 1,1,1)

(1, 1, 1)( 1,1, 1)

0

Page 56: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

56

Weights Weights between neurons are usually represented in matrix form

For example, let’s train the 3D network to recognize the following 2 patterns (M = 2, n = 3)

Once the weights are calculated, they remained fixed.

IYYW MM

m

Tmm

1

111

1Y

111

2Y

Page 57: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

57

Weights (2)

M = 2

Thus we can determine the weight matrix as follows

111

1Y

111

2Y 1111 TY 1112 TY

IYYW MM

m

Tmm

1

100010001

I

100010001

2111111

111111

W

022202220

Page 58: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

58

How is the Hopfield network tested?Given an input vector X, we calculate the output in a similar manner that we have seen before.

Ym = sign(W Xm – θ ), m = 1, 2, …, M

111

000

111

022202220

1 signY

111

000

111

022202220

2 signY

Θ is the threshold matrixIn this case all thresholds are set to zero.

Page 59: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

59

Stable States

As we see, Y1 = X1 and Y2 = X2. Thus both states are said to be stable (also called fundamental states).

111

000

111

022202220

1 signY

111

000

111

022202220

2 signY

Page 60: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

60

Unstable StatesWith 3 neurons in the network, there are 8 possible states. The remaining 6 states are unstable.

Possible State Iteration

Inputs Outputs

Fundamental Memoryx1 x2 x3 y1 y2 y3

1 1 1 0 1 1 1 1 1 1 1 1 1

-1 1 1 0 -1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 -1 1 0 1 -1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 -1 0 1 1 -1 1 1 1

1 1 1 1 1 1 1 1 1 1

-1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 1 0 -1 -1 1 -1 -1 -1 -1 -1 -1

1 -1 -1 -1 -1 -1 -1

-1 1 -1 0 -1 1 -1 -1 -1 -1 -1 -1 -1

1 -1 -1 -1 -1 -1 -1

1 -1 -1 0 -1 1 -1 -1 -1 -1 -1 -1 -1

1 -1 -1 -1 -1 -1 -1

Page 61: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

61

Error Correction Network

Each of the unstable states represents a single error, compared to the fundamental memory.The Hopfield network can act as an error correction network.

Page 62: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

62

The Hopfield Network

The Hopfield network can store a set of fundamental memories.The Hopfield network can recall those fundamental memories when presented with inputs that maybe exactly those memories or slightly different.However, it may not always recall correctly.Let’s see an example.

Page 63: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

63

Ex: When Hopfield Network cannot recall

X1 = (+1, +1, +1, +1, +1)X2 = (+1, -1, +1, -1, +1)X3 = (-1, +1, -1, +1, -1)

Let the probe vector beX = (+1, +1, -1, +1, +1)It is very similar to X1, but the network recalls it as X3.This is a problem with the Hopfield Network

Page 64: Chapter 6:  Artificial Neural Networks Part 2 of 3  (Sections 6.4  –  6.6)

64

Storage capacity of the Hopfield Network

Storage capacity is the largest number of fundamental memories that can be stored and retrieved correctly.The maximum number of fundamental memories Mmax that can be stored in the n-neuron recurrent network is limited by

nMmax 0.15


Recommended