Download - Neural Network Chapter 6. 323-670 Artificial IntelligenceChapter 62 Hopfield Networks Hopfiled [1982] : theory of memory p.490 Model of content addressable.

Neural Network

Chapter 6

323-670 Artificial Intelligence Chapter 62

Hopfield Networks

• Hopfiled [1982] : theory of memory p.490

• Model of content addressable memory p. 491– distribute representative– distributed, asynchronous control– content-addressable memory– fault tolerance

• Figure 18.1 p. 490 – black unit = active– white unit = inactive


Hopfield Networks

– units are connected to each other with weighted

– symmetric connection– a positive weighted connection

indicates that the two unit tend to activate each other.

– a negative weighted connection allows an active unit to deactivate a neighboring units.


Parallel relaxation algorithm

The network operates as follows:a random unit is chosenif any of it neighbors are active, the unit

computes the sum of the weights on the connections to those active neighbors.

if the sum is positive, the unit becomes active, otherwise it becomes inactive.

Another random unit is chosen, and the process repeat until the network reach a stable state. (e.g. until no more unit can change state)


Hopfield Networks

• Figure 18.1 p. 490 A Hopfield network– black and positive will attempt to

activate the unit connected to it

• Figure 18.2 p. 491 Four stable states : storing the pattern– given any set of weights and any initial

state, the parallel relaxation algorithm will be stale into one of these four states


Hopfield Networks

• Figure 18.3 p. 491 Model of content-addressable memory– Setting the activities of the units to correspond to a

partial pattern– To retrieve a pattern, we need to apply the portion of

it.– the network will then settle into the stable state that

best matches the partial pattern. – shows the local minima = nearest stable state

• Figure 18.4 p. 492 what a Hopfield network compute– from one state to another state.


Hopfield Networks

– Problem : sometimes the network can not find global solution because the network stick with

the local minima because they settle into stable states via a completely distributed

algorithm. – For example in Figure 18.4, If a network

reaches a stable state A then no single unit willing to change its state in order to move

uphill, so the network will never reach global optimal state B.


Perceptron A perceptron (1962) Rosenblatt models a neural by taking a weight

sum of its inputs and sending the output 1 if the sum is grater than some adjustable threshold value (otherwise it send 0)

Figure 18.5-18.7 p.493-394 Threshold function

Figure 18.8 Intelligence System g(x) = summation i = 1 to n of wixi

output(x) = 1 if g(x) > 0 0 if g(x) < 0


In case of zero with two inputs g(x) = w0 + w1x1 + w2x2 = 0 x2 = -(w1/w2)x1 - (w0/w2) equation for a

line the location of the line is determine by the

weight w0 w1 and w2

if an input vector lies on one side of the line, the perceptron will output 1

if it lies on the other side, the perceptron will output 0

Decision surface : a line that correctly separates the training instances corresponds to a perfectly function perceptron.

See Figure 18.9 p. 496

Perceptron


Decision surface

the absolute value of g(x) tells how far a given input vector x lies from the decision surface.

so we know how good a set of weights is.

let w be the weight vector (w0, w1,.., wn)


Multilayer perceptron

Figure 18.10 p. 497 Adjusting the weights by Gradient Descent

hill-climbing/down-hill See Algorithm Fixed-Increment

Perceptron Learning Figure 18.11 p.499 A perceptron

learning to solve a classification problem : K = 10, K = 100, K = 635

Figure 18.12 p.500 the XOR is not linearly separable

We need multilayer perceptron to solve the XOR problem

See Figure 18.3 p. 500 where x1 = 1 and x2 = 1


Backpropagation Algorithm

Parker 1985, Rumelhart et. al 1986 fully connected, feedforward

network, multilayer network Figure 18.14, p.502 fast, resistant to damage , learn

efficiently see Figure 18.15, p.503 use for classify problem use sigmoid activation function (S-

shaped) : it process a real value between 0 and 1 as output see Figure 18.16, p.503



Figure 18.14, p.502 start with a random set of weights the network adjusts its weights

each time it sees an input-outout pair

each pair require two stages1) a forward pass : involves presenting

a sample input to the network and letting activations flow until they reach the output layer.

2) a backward pass : the network’s actual output (from the forward pass) is compared with the target output and error estimates a re computed for output units



The weight connected to the output units can be adjusted in order to reduce the errors.

We can use the error estimates of the output units to derive error estimates for the units in the hidden layers.

Finally errors are propagated back to the connections stemming from the input units.


Backpropagation Algorithm p. 504-506... initial weight = -0.1 to

0.1, initial the activation function of the thresholding unit, learning rate = ,

choose an input-output pair

oj = network actual value (ค่�าที่�� network ค่�านวณได้�)

yj = target output (ค่�าจริ�งของข�อมู�ล ที่��เริาใช้� train)

adjust weights between the hidden layer and output layer (w2ij)

adjust weights between the input layer and hidden layer (w1ij)

input layer w1ij hidden layer w2i j output layer

Xi A hj B oj C


Backpropagation updates its weights after seeing each input-output pair. After it has seen all the input-output pairs and adjusts its weight that many times, we call one epoch had been completed.

number of epochs make the network more efficiency

we can speed up the network by using the momentum term

see equation p. 506 perceptron convergence theorem

(Rosenblatt 1962) : guarantees that the perceptron will find a solution...



Generalization Figure 18.17 p.508 Good network should capable of

storing entire training sets and have a setting for weights that generally describe the mapping for all cases, not the individual input-output pairs.



Reinforcement Learning use punishment and

reward system (same as animal)

1) the network is presented with a sample input form the training set

2) the network computes what it thinks should be the sample output

3) the network is supplied with a real-valued judgment by a teacherreceive positive value : indicates

good performancereceive negative value : indicates

bad performance4) the network adjusts its weights,

and process repeats we try to receive positive value or

to have good performance supervised learning


Unsupervised Learning

no feedback for its outputs no teacher required given a set of input data, the

network is allowed to discover regularities and relations between the different parts of the input

feature discovery : Figure 18.8 p. 511 Data for unsupervised learning

3 types of animal... 1) mammals 2)reptiles 3) birds



we need to sure that only one of the three output units becomes active for any given input.

see Figure 18.19 p. 512 A competitive learning network

use winner-take-all behavior


Unsupervised Learning single competitive learning

algorithm p. 512-513

1) present an input vector2) calculate the initial activation for

each output unit3) let the output units fight until only

one is active4) adjust the weights on the input lines that lead to the single active output unit, increase the weights

on connections between the active output unit and active input units. (this makes it more likely that the output unit will be active next time

the pattern is required) 5) repeat steps 1 to 4 for all input

patterns for many epochs.


Recurrent Networks

Jordan 1986 use in temporal AI task, planning, natural language

processing we need more than a single

output vector we need a series of output

vectors Figure 18.22 p. 518 A Jordan

network Figure 18.23 p. 519 A

recurrent network with a mental model


Hopfield Networks

• Hopfiled [1982] : theory of memory p.490

• Model of content addressable memory p. 491– distribute representative– distributed, asynchronous control– content-addressable memory– fault tolerance

• Figure 18.1 p. 490 – black unit = active– white unit = inactive



Hopfield Networks

– units are connected to each other with weighted

– symmetric connection– a positive weighted connection

indicates that the two unit tend to activate each other.

– a negative weighted connection allows an active unit to deactivate a neighboring units.


Parallel relaxation algorithm

The network operates as follows:a random unit is chosenif any of it neighbors are active, the unit

computes the sum of the weights on the connections to those active neighbors.

if the sum is positive, the unit becomes active, otherwise it becomes inactive.

Another random unit is chosen, and the process repeat until the network reach a stable state. (e.g. until no more unit can change state)


Hopfield Networks

• Figure 18.1 p. 490 A Hopfield network– black and positive will attempt to

activate the unit connected to it

• Figure 18.2 p. 491 Four stable states : storing the pattern– given any set of weights and any initial

state, the parallel relaxation algorithm will be stale into one of these four states



Hopfield Networks

• Figure 18.3 p. 491 Model of content-addressable memory– Setting the activities of the units to correspond to a

partial pattern– To retrieve a pattern, we need to apply the portion of

it.– the network will then settle into the stable state that

best matches the partial pattern. – shows the local minima = nearest stable state

• Figure 18.4 p. 492 what a Hopfield network compute– from one state to another state.



Hopfield Networks

– Problem : sometimes the network can not find global solution because the network stick with

the local minima because they settle into stable states via a completely distributed

algorithm. – For example in Figure 18.4, If a network

reaches a stable state A then no single unit willing to change its state in order to move

uphill, so the network will never reach global optimal state B.



Perceptron A perceptron (1962) Rosenblatt models a neural by taking a weight

sum of its inputs and sending the output 1 if the sum is grater than some adjustable threshold value (otherwise it send 0)

Figure 18.5-18.7 p.493-394 Threshold function

Figure 18.8 Intelligence System g(x) = summation i = 1 to n of wixi

output(x) = 1 if g(x) > 0 0 if g(x) < 0






In case of zero with two inputs g(x) = w0 + w1x1 + w2x2 = 0 x2 = -(w1/w2)x1 - (w0/w2) equation for a

line the location of the line is determine by the

weight w0 w1 and w2

if an input vector lies on one side of the line, the perceptron will output 1

if it lies on the other side, the perceptron will output 0

Decision surface : a line that correctly separates the training instances corresponds to a perfectly function perceptron.

See Figure 18.9 p. 496

Perceptron



Decision surface

the absolute value of g(x) tells how far a given input vector x lies from the decision surface.

so we know how good a set of weights is.

let w be the weight vector (w0, w1,.., wn)


Multilayer perceptron

Figure 18.10 p. 497 Adjusting the weights by Gradient Descent

hill-climbing/down-hill See Algorithm Fixed-Increment

Perceptron Learning Figure 18.11 p.499 A perceptron

learning to solve a classification problem : K = 10, K = 100, K = 635

Figure 18.12 p.500 the XOR is not linearly separable

We need multilayer perceptron to solve the XOR problem

See Figure 18.3 p. 500 where x1 = 1 and x2 = 1







Parker 1985, Rumelhart et. al 1986 fully connected, feedforward

network, multilayer network Figure 18.14, p.502 fast, resistant to damage , learn

efficiently see Figure 18.15, p.503 use for classify problem use sigmoid activation function (S-

shaped) : it process a real value between 0 and 1 as output see Figure 18.16, p.503






Figure 18.14, p.502 start with a random set of weights the network adjusts its weights

each time it sees an input-outout pair

each pair require two stages1) a forward pass : involves presenting

a sample input to the network and letting activations flow until they reach the output layer.

2) a backward pass : the network’s actual output (from the forward pass) is compared with the target output and error estimates a re computed for output units



The weight connected to the output units can be adjusted in order to reduce the errors.

We can use the error estimates of the output units to derive error estimates for the units in the hidden layers.

Finally errors are propagated back to the connections stemming from the input units.


Backpropagation Algorithm p. 504-506... initial weight = -0.1 to

0.1, initial the activation function of the thresholding unit, learning rate = ,

choose an input-output pair

oj = network actual value (ค่�าที่�� network ค่�านวณได้�)

yj = target output (ค่�าจริ�งของข�อมู�ล ที่��เริาใช้� train)

adjust weights between the hidden layer and output layer (w2ij)

adjust weights between the input layer and hidden layer (w1ij)

input layer w1ij hidden layer w2i j output layer

Xi A hj B oj C



Backpropagation updates its weights after seeing each input-output pair. After it has seen all the input-output pairs and adjusts its weight that many times, we call one epoch had been completed.

number of epochs make the network more efficiency

we can speed up the network by using the momentum term

see equation p. 506 perceptron convergence theorem

(Rosenblatt 1962) : guarantees that the perceptron will find a solution...





Generalization Figure 18.17 p.508 Good network should capable of

storing entire training sets and have a setting for weights that generally describe the mapping for all cases, not the individual input-output pairs.




Reinforcement Learning use punishment and

reward system (same as animal)

1) the network is presented with a sample input form the training set

2) the network computes what it thinks should be the sample output

3) the network is supplied with a real-valued judgment by a teacherreceive positive value : indicates

good performancereceive negative value : indicates

bad performance4) the network adjusts its weights,

and process repeats we try to receive positive value or

to have good performance supervised learning



no feedback for its outputs no teacher required given a set of input data, the

network is allowed to discover regularities and relations between the different parts of the input

feature discovery : Figure 18.8 p. 511 Data for unsupervised learning

3 types of animal... 1) mammals 2)reptiles 3) birds




we need to sure that only one of the three output units becomes active for any given input.

see Figure 18.19 p. 512 A competitive learning network

use winner-take-all behavior



Unsupervised Learning single competitive learning

algorithm p. 512-513

1) present an input vector2) calculate the initial activation for

each output unit3) let the output units fight until only

one is active4) adjust the weights on the input lines that lead to the single active output unit, increase the weights

on connections between the active output unit and active input units. (this makes it more likely that the output unit will be active next time

the pattern is required) 5) repeat steps 1 to 4 for all input

patterns for many epochs.


Recurrent Networks

Jordan 1986 use in temporal AI task, planning, natural language

processing we need more than a single

output vector we need a series of output

vectors Figure 18.22 p. 518 A Jordan

network Figure 18.23 p. 519 A

recurrent network with a mental model




The End