Neural Network
Chapter 6
323-670 Artificial Intelligence Chapter 62
Hopfield Networks
• Hopfiled [1982] : theory of memory p.490
• Model of content addressable memory p. 491– distribute representative– distributed, asynchronous control– content-addressable memory– fault tolerance
• Figure 18.1 p. 490 – black unit = active– white unit = inactive
323-670 Artificial Intelligence Chapter 63
Hopfield Networks
– units are connected to each other with weighted
– symmetric connection– a positive weighted connection
indicates that the two unit tend to activate each other.
– a negative weighted connection allows an active unit to deactivate a neighboring units.
323-670 Artificial Intelligence Chapter 64
Parallel relaxation algorithm
The network operates as follows:a random unit is chosenif any of it neighbors are active, the unit
computes the sum of the weights on the connections to those active neighbors.
if the sum is positive, the unit becomes active, otherwise it becomes inactive.
Another random unit is chosen, and the process repeat until the network reach a stable state. (e.g. until no more unit can change state)
323-670 Artificial Intelligence Chapter 65
Hopfield Networks
• Figure 18.1 p. 490 A Hopfield network– black and positive will attempt to
activate the unit connected to it
• Figure 18.2 p. 491 Four stable states : storing the pattern– given any set of weights and any initial
state, the parallel relaxation algorithm will be stale into one of these four states
323-670 Artificial Intelligence Chapter 66
Hopfield Networks
• Figure 18.3 p. 491 Model of content-addressable memory– Setting the activities of the units to correspond to a
partial pattern– To retrieve a pattern, we need to apply the portion of
it.– the network will then settle into the stable state that
best matches the partial pattern. – shows the local minima = nearest stable state
• Figure 18.4 p. 492 what a Hopfield network compute– from one state to another state.
323-670 Artificial Intelligence Chapter 67
Hopfield Networks
– Problem : sometimes the network can not find global solution because the network stick with
the local minima because they settle into stable states via a completely distributed
algorithm. – For example in Figure 18.4, If a network
reaches a stable state A then no single unit willing to change its state in order to move
uphill, so the network will never reach global optimal state B.
323-670 Artificial Intelligence Chapter 68
Perceptron A perceptron (1962) Rosenblatt models a neural by taking a weight
sum of its inputs and sending the output 1 if the sum is grater than some adjustable threshold value (otherwise it send 0)
Figure 18.5-18.7 p.493-394 Threshold function
Figure 18.8 Intelligence System g(x) = summation i = 1 to n of wixi
output(x) = 1 if g(x) > 0 0 if g(x) < 0
323-670 Artificial Intelligence Chapter 69
In case of zero with two inputs g(x) = w0 + w1x1 + w2x2 = 0 x2 = -(w1/w2)x1 - (w0/w2) equation for a
line the location of the line is determine by the
weight w0 w1 and w2
if an input vector lies on one side of the line, the perceptron will output 1
if it lies on the other side, the perceptron will output 0
Decision surface : a line that correctly separates the training instances corresponds to a perfectly function perceptron.
See Figure 18.9 p. 496
Perceptron
323-670 Artificial Intelligence Chapter 610
Decision surface
the absolute value of g(x) tells how far a given input vector x lies from the decision surface.
so we know how good a set of weights is.
let w be the weight vector (w0, w1,.., wn)
323-670 Artificial Intelligence Chapter 611
Multilayer perceptron
Figure 18.10 p. 497 Adjusting the weights by Gradient Descent
hill-climbing/down-hill See Algorithm Fixed-Increment
Perceptron Learning Figure 18.11 p.499 A perceptron
learning to solve a classification problem : K = 10, K = 100, K = 635
Figure 18.12 p.500 the XOR is not linearly separable
We need multilayer perceptron to solve the XOR problem
See Figure 18.3 p. 500 where x1 = 1 and x2 = 1
323-670 Artificial Intelligence Chapter 612
Backpropagation Algorithm
Parker 1985, Rumelhart et. al 1986 fully connected, feedforward
network, multilayer network Figure 18.14, p.502 fast, resistant to damage , learn
efficiently see Figure 18.15, p.503 use for classify problem use sigmoid activation function (S-
shaped) : it process a real value between 0 and 1 as output see Figure 18.16, p.503
323-670 Artificial Intelligence Chapter 613
Backpropagation Algorithm
Figure 18.14, p.502 start with a random set of weights the network adjusts its weights
each time it sees an input-outout pair
each pair require two stages1) a forward pass : involves presenting
a sample input to the network and letting activations flow until they reach the output layer.
2) a backward pass : the network’s actual output (from the forward pass) is compared with the target output and error estimates a re computed for output units
323-670 Artificial Intelligence Chapter 614
Backpropagation Algorithm
The weight connected to the output units can be adjusted in order to reduce the errors.
We can use the error estimates of the output units to derive error estimates for the units in the hidden layers.
Finally errors are propagated back to the connections stemming from the input units.
323-670 Artificial Intelligence Chapter 615
Backpropagation Algorithm p. 504-506... initial weight = -0.1 to
0.1, initial the activation function of the thresholding unit, learning rate = ,
choose an input-output pair
oj = network actual value (ค่�าที่�� network ค่�านวณได้�)
yj = target output (ค่�าจริ�งของข�อมู�ล ที่��เริาใช้� train)
adjust weights between the hidden layer and output layer (w2ij)
adjust weights between the input layer and hidden layer (w1ij)
input layer w1ij hidden layer w2i j output layer
Xi A hj B oj C
323-670 Artificial Intelligence Chapter 616
Backpropagation updates its weights after seeing each input-output pair. After it has seen all the input-output pairs and adjusts its weight that many times, we call one epoch had been completed.
number of epochs make the network more efficiency
we can speed up the network by using the momentum term
see equation p. 506 perceptron convergence theorem
(Rosenblatt 1962) : guarantees that the perceptron will find a solution...
Backpropagation Algorithm
323-670 Artificial Intelligence Chapter 617
Generalization Figure 18.17 p.508 Good network should capable of
storing entire training sets and have a setting for weights that generally describe the mapping for all cases, not the individual input-output pairs.
Backpropagation Algorithm
323-670 Artificial Intelligence Chapter 618
Reinforcement Learning use punishment and
reward system (same as animal)
1) the network is presented with a sample input form the training set
2) the network computes what it thinks should be the sample output
3) the network is supplied with a real-valued judgment by a teacherreceive positive value : indicates
good performancereceive negative value : indicates
bad performance4) the network adjusts its weights,
and process repeats we try to receive positive value or
to have good performance supervised learning
323-670 Artificial Intelligence Chapter 619
Unsupervised Learning
no feedback for its outputs no teacher required given a set of input data, the
network is allowed to discover regularities and relations between the different parts of the input
feature discovery : Figure 18.8 p. 511 Data for unsupervised learning
3 types of animal... 1) mammals 2)reptiles 3) birds
323-670 Artificial Intelligence Chapter 620
Unsupervised Learning
we need to sure that only one of the three output units becomes active for any given input.
see Figure 18.19 p. 512 A competitive learning network
use winner-take-all behavior
323-670 Artificial Intelligence Chapter 621
Unsupervised Learning single competitive learning
algorithm p. 512-513
1) present an input vector2) calculate the initial activation for
each output unit3) let the output units fight until only
one is active4) adjust the weights on the input lines that lead to the single active output unit, increase the weights
on connections between the active output unit and active input units. (this makes it more likely that the output unit will be active next time
the pattern is required) 5) repeat steps 1 to 4 for all input
patterns for many epochs.
323-670 Artificial Intelligence Chapter 622
Recurrent Networks
Jordan 1986 use in temporal AI task, planning, natural language
processing we need more than a single
output vector we need a series of output
vectors Figure 18.22 p. 518 A Jordan
network Figure 18.23 p. 519 A
recurrent network with a mental model
323-670 Artificial Intelligence Chapter 623
Hopfield Networks
• Hopfiled [1982] : theory of memory p.490
• Model of content addressable memory p. 491– distribute representative– distributed, asynchronous control– content-addressable memory– fault tolerance
• Figure 18.1 p. 490 – black unit = active– white unit = inactive
323-670 Artificial Intelligence Chapter 624
323-670 Artificial Intelligence Chapter 625
Hopfield Networks
– units are connected to each other with weighted
– symmetric connection– a positive weighted connection
indicates that the two unit tend to activate each other.
– a negative weighted connection allows an active unit to deactivate a neighboring units.
323-670 Artificial Intelligence Chapter 626
Parallel relaxation algorithm
The network operates as follows:a random unit is chosenif any of it neighbors are active, the unit
computes the sum of the weights on the connections to those active neighbors.
if the sum is positive, the unit becomes active, otherwise it becomes inactive.
Another random unit is chosen, and the process repeat until the network reach a stable state. (e.g. until no more unit can change state)
323-670 Artificial Intelligence Chapter 627
Hopfield Networks
• Figure 18.1 p. 490 A Hopfield network– black and positive will attempt to
activate the unit connected to it
• Figure 18.2 p. 491 Four stable states : storing the pattern– given any set of weights and any initial
state, the parallel relaxation algorithm will be stale into one of these four states
323-670 Artificial Intelligence Chapter 628
323-670 Artificial Intelligence Chapter 629
Hopfield Networks
• Figure 18.3 p. 491 Model of content-addressable memory– Setting the activities of the units to correspond to a
partial pattern– To retrieve a pattern, we need to apply the portion of
it.– the network will then settle into the stable state that
best matches the partial pattern. – shows the local minima = nearest stable state
• Figure 18.4 p. 492 what a Hopfield network compute– from one state to another state.
323-670 Artificial Intelligence Chapter 630
323-670 Artificial Intelligence Chapter 631
Hopfield Networks
– Problem : sometimes the network can not find global solution because the network stick with
the local minima because they settle into stable states via a completely distributed
algorithm. – For example in Figure 18.4, If a network
reaches a stable state A then no single unit willing to change its state in order to move
uphill, so the network will never reach global optimal state B.
323-670 Artificial Intelligence Chapter 632
323-670 Artificial Intelligence Chapter 633
Perceptron A perceptron (1962) Rosenblatt models a neural by taking a weight
sum of its inputs and sending the output 1 if the sum is grater than some adjustable threshold value (otherwise it send 0)
Figure 18.5-18.7 p.493-394 Threshold function
Figure 18.8 Intelligence System g(x) = summation i = 1 to n of wixi
output(x) = 1 if g(x) > 0 0 if g(x) < 0
323-670 Artificial Intelligence Chapter 634
323-670 Artificial Intelligence Chapter 635
323-670 Artificial Intelligence Chapter 636
323-670 Artificial Intelligence Chapter 637
323-670 Artificial Intelligence Chapter 638
In case of zero with two inputs g(x) = w0 + w1x1 + w2x2 = 0 x2 = -(w1/w2)x1 - (w0/w2) equation for a
line the location of the line is determine by the
weight w0 w1 and w2
if an input vector lies on one side of the line, the perceptron will output 1
if it lies on the other side, the perceptron will output 0
Decision surface : a line that correctly separates the training instances corresponds to a perfectly function perceptron.
See Figure 18.9 p. 496
Perceptron
323-670 Artificial Intelligence Chapter 639
323-670 Artificial Intelligence Chapter 640
Decision surface
the absolute value of g(x) tells how far a given input vector x lies from the decision surface.
so we know how good a set of weights is.
let w be the weight vector (w0, w1,.., wn)
323-670 Artificial Intelligence Chapter 641
Multilayer perceptron
Figure 18.10 p. 497 Adjusting the weights by Gradient Descent
hill-climbing/down-hill See Algorithm Fixed-Increment
Perceptron Learning Figure 18.11 p.499 A perceptron
learning to solve a classification problem : K = 10, K = 100, K = 635
Figure 18.12 p.500 the XOR is not linearly separable
We need multilayer perceptron to solve the XOR problem
See Figure 18.3 p. 500 where x1 = 1 and x2 = 1
323-670 Artificial Intelligence Chapter 642
323-670 Artificial Intelligence Chapter 643
323-670 Artificial Intelligence Chapter 644
323-670 Artificial Intelligence Chapter 645
323-670 Artificial Intelligence Chapter 646
Backpropagation Algorithm
Parker 1985, Rumelhart et. al 1986 fully connected, feedforward
network, multilayer network Figure 18.14, p.502 fast, resistant to damage , learn
efficiently see Figure 18.15, p.503 use for classify problem use sigmoid activation function (S-
shaped) : it process a real value between 0 and 1 as output see Figure 18.16, p.503
323-670 Artificial Intelligence Chapter 647
323-670 Artificial Intelligence Chapter 648
323-670 Artificial Intelligence Chapter 649
323-670 Artificial Intelligence Chapter 650
Backpropagation Algorithm
Figure 18.14, p.502 start with a random set of weights the network adjusts its weights
each time it sees an input-outout pair
each pair require two stages1) a forward pass : involves presenting
a sample input to the network and letting activations flow until they reach the output layer.
2) a backward pass : the network’s actual output (from the forward pass) is compared with the target output and error estimates a re computed for output units
323-670 Artificial Intelligence Chapter 651
Backpropagation Algorithm
The weight connected to the output units can be adjusted in order to reduce the errors.
We can use the error estimates of the output units to derive error estimates for the units in the hidden layers.
Finally errors are propagated back to the connections stemming from the input units.
323-670 Artificial Intelligence Chapter 652
Backpropagation Algorithm p. 504-506... initial weight = -0.1 to
0.1, initial the activation function of the thresholding unit, learning rate = ,
choose an input-output pair
oj = network actual value (ค่�าที่�� network ค่�านวณได้�)
yj = target output (ค่�าจริ�งของข�อมู�ล ที่��เริาใช้� train)
adjust weights between the hidden layer and output layer (w2ij)
adjust weights between the input layer and hidden layer (w1ij)
input layer w1ij hidden layer w2i j output layer
Xi A hj B oj C
323-670 Artificial Intelligence Chapter 653
323-670 Artificial Intelligence Chapter 654
Backpropagation updates its weights after seeing each input-output pair. After it has seen all the input-output pairs and adjusts its weight that many times, we call one epoch had been completed.
number of epochs make the network more efficiency
we can speed up the network by using the momentum term
see equation p. 506 perceptron convergence theorem
(Rosenblatt 1962) : guarantees that the perceptron will find a solution...
Backpropagation Algorithm
323-670 Artificial Intelligence Chapter 655
323-670 Artificial Intelligence Chapter 656
323-670 Artificial Intelligence Chapter 657
Generalization Figure 18.17 p.508 Good network should capable of
storing entire training sets and have a setting for weights that generally describe the mapping for all cases, not the individual input-output pairs.
Backpropagation Algorithm
323-670 Artificial Intelligence Chapter 658
323-670 Artificial Intelligence Chapter 659
Reinforcement Learning use punishment and
reward system (same as animal)
1) the network is presented with a sample input form the training set
2) the network computes what it thinks should be the sample output
3) the network is supplied with a real-valued judgment by a teacherreceive positive value : indicates
good performancereceive negative value : indicates
bad performance4) the network adjusts its weights,
and process repeats we try to receive positive value or
to have good performance supervised learning
323-670 Artificial Intelligence Chapter 660
Unsupervised Learning
no feedback for its outputs no teacher required given a set of input data, the
network is allowed to discover regularities and relations between the different parts of the input
feature discovery : Figure 18.8 p. 511 Data for unsupervised learning
3 types of animal... 1) mammals 2)reptiles 3) birds
323-670 Artificial Intelligence Chapter 661
323-670 Artificial Intelligence Chapter 662
Unsupervised Learning
we need to sure that only one of the three output units becomes active for any given input.
see Figure 18.19 p. 512 A competitive learning network
use winner-take-all behavior
323-670 Artificial Intelligence Chapter 663
323-670 Artificial Intelligence Chapter 664
Unsupervised Learning single competitive learning
algorithm p. 512-513
1) present an input vector2) calculate the initial activation for
each output unit3) let the output units fight until only
one is active4) adjust the weights on the input lines that lead to the single active output unit, increase the weights
on connections between the active output unit and active input units. (this makes it more likely that the output unit will be active next time
the pattern is required) 5) repeat steps 1 to 4 for all input
patterns for many epochs.
323-670 Artificial Intelligence Chapter 665
Recurrent Networks
Jordan 1986 use in temporal AI task, planning, natural language
processing we need more than a single
output vector we need a series of output
vectors Figure 18.22 p. 518 A Jordan
network Figure 18.23 p. 519 A
recurrent network with a mental model
323-670 Artificial Intelligence Chapter 666
323-670 Artificial Intelligence Chapter 667
323-670 Artificial Intelligence Chapter 668
The End