Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 215 times |
Download: | 0 times |
If we are using such linear interpolation, then our radial If we are using such linear interpolation, then our radial basis function (RBF) basis function (RBF) 00 that weights an input vector that weights an input vector based on its distance to a neuron’s reference (weight) based on its distance to a neuron’s reference (weight) vector is vector is 00(D) = D(D) = D-1-1. .
For the training samples For the training samples xxpp, p = 1, …, P, p = 1, …, P00, surrounding , surrounding the new input the new input xx, we find for the network’s output o:, we find for the network’s output o:
(In the following, to keep things simple, we will assume (In the following, to keep things simple, we will assume that the network has only one output neuron. However, that the network has only one output neuron. However, any number of output neurons could be implemented.)any number of output neurons could be implemented.)
Radial Basis FunctionsRadial Basis Functions
November 4, 2010 Neural Networks Lecture 15: Radial Basis Functions
1
)( where, 1
00
ppp
pp xfddP
o xx
Since it is difficult to define what “surrounding” should Since it is difficult to define what “surrounding” should mean, it is common to consider mean, it is common to consider allall P training samples P training samples and use any monotonically decreasing RBF :and use any monotonically decreasing RBF :
This, however, implies a network that has as many This, however, implies a network that has as many hidden nodes as there are training samples. This in hidden nodes as there are training samples. This in unacceptable because of its computational complexity unacceptable because of its computational complexity and likely poor generalization ability – the network and likely poor generalization ability – the network resembles a look-up table.resembles a look-up table.
Radial Basis FunctionsRadial Basis Functions
November 4, 2010 Neural Networks Lecture 15: Radial Basis Functions
2
P
pppdP
o1
1xx
It is more useful to have fewer neurons and accept that It is more useful to have fewer neurons and accept that the training set cannot be learned 100% accurately:the training set cannot be learned 100% accurately:
Here, ideally, each reference vector Here, ideally, each reference vector ii of these N of these N neurons should be placed in the center of an input-neurons should be placed in the center of an input-space cluster of training samples with identical (or at space cluster of training samples with identical (or at least similar) desired output least similar) desired output ii..
To learn near-optimal values for the reference vectors To learn near-optimal values for the reference vectors and the output weights, we can – as usual – employ and the output weights, we can – as usual – employ gradient descent.gradient descent.
Radial Basis FunctionsRadial Basis Functions
November 4, 2010 Neural Networks Lecture 15: Radial Basis Functions
3
N
iiiN
o1
1μx
November 4, 2010 Neural Networks Lecture 15: Radial Basis Functions
4
The RBF NetworkThe RBF NetworkExample: Example: Network function f: Network function f: RR3 3 RR
output layeroutput layer
RBF layerRBF layer
input layerinput layer
input vectorinput vector
output vectoroutput vector
xx00=1=1 xx22
oo11
xx33
11,,11
ww22 ww33 ww44
22,,22 33,,33 44,,44
ww11
11
ww00
Radial Basis FunctionsRadial Basis FunctionsFor a fixed number of neurons N, we could learn the For a fixed number of neurons N, we could learn the following output weights and reference vectors:following output weights and reference vectors:
To do this, we first have to define an error function E:To do this, we first have to define an error function E:
Taken together, we get:Taken together, we get:
November 4, 2010 Neural Networks Lecture 15: Radial Basis Functions
5
NN
N Nw
Nw
,...,,,..., 11
1
P
p
P
pppp odEE
1 1
2)(
2
1
N
iipipp wdE μx
Then the error gradient with regard to wThen the error gradient with regard to w11, …, w, …, wNN is: is:
For For i,ji,j, the j-th vector component of , the j-th vector component of ii, we get:, we get:
Learning in RBF NetworksLearning in RBF Networks
November 4, 2010 Neural Networks Lecture 15: Radial Basis Functions
6
ipppi
p odw
Eμx
)(2
ji
ip
ip
ip
ippji
p wodE
,
2
2,
)(2
μx
μx
μx
The vector length (||…||) expression is inconvenient, The vector length (||…||) expression is inconvenient, because it is the square root of the given vector because it is the square root of the given vector multiplied by itself.multiplied by itself.
To eliminate this difficulty, we introduce a function R To eliminate this difficulty, we introduce a function R with R(Dwith R(D22) = ) = (D) and substitute (D) and substitute ..
This leads to a simplified differentiation:This leads to a simplified differentiation:
Learning in RBF NetworksLearning in RBF Networks
November 4, 2010 Neural Networks Lecture 15: Radial Basis Functions
7
2
2
2
2 ' ip
ip
ip
ip
ipR
Rμx
μx
μx
μx
μx
Learning in RBF NetworksLearning in RBF Networks
November 4, 2010 Neural Networks Lecture 15: Radial Basis Functions
8
Together with the following derivative…Together with the following derivative…
… … we finally get the result for our error gradient:we finally get the result for our error gradient:
jijpipppiji
p xRodwE
,,2
,
'4
μx
jijpji
ip x ,,,
2
2
μx
This gives us the following updating rules:This gives us the following updating rules:
where the (positive) learning rates where the (positive) learning rates ii and and i,ji,j could be could be
chosen individually for each parameter wchosen individually for each parameter w ii and and i,ji,j..
As usual, we can start with random parameters and As usual, we can start with random parameters and then iterate these rules for learning until a given error then iterate these rules for learning until a given error threshold is reached.threshold is reached.
Learning in RBF NetworksLearning in RBF Networks
November 4, 2010 Neural Networks Lecture 15: Radial Basis Functions
9
jijpipppijiji xRodw ,,
2
,, ')( μx
ipppii odw μx )(
Learning in RBF NetworksLearning in RBF Networks
November 4, 2010 Neural Networks Lecture 15: Radial Basis Functions
10
If the node function is given by a Gaussian, then:If the node function is given by a Gaussian, then:
2
expD
DR
As a result:As a result:
22
exp1
'D
DR
Learning in RBF NetworksLearning in RBF Networks
November 4, 2010 Neural Networks Lecture 15: Radial Basis Functions
11
The specific update rules are now:The specific update rules are now:
2
2
exp)(
ip
ppii odwμx
2
2
,,,, exp))((
ip
jijpppijiji xodwμx
andand
It turns out that, particularly for Gaussian RBFs, it is It turns out that, particularly for Gaussian RBFs, it is more efficient and typically leads to better results to more efficient and typically leads to better results to use partially offline training:use partially offline training:
First, we use any clustering procedure (e.g., k-means) First, we use any clustering procedure (e.g., k-means) to estimate cluster centers, which are then used to set to estimate cluster centers, which are then used to set the values of the reference vectors the values of the reference vectors ii and their and their spreads (standard deviations) spreads (standard deviations) ii..
Then we use the gradient descent method described Then we use the gradient descent method described above to determine the weights wabove to determine the weights w ii..
Learning in RBF NetworksLearning in RBF Networks
November 4, 2010 Neural Networks Lecture 15: Radial Basis Functions
12