Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 219 times |
Download: | 5 times |
1
Optimizing number of hidden neurons in neural networks
Janusz A. StarzykSchool of Electrical Engineering and Computer Science
Ohio UniversityAthens Ohio U.S.A
IASTED International Conference on Artificial Intelligence and Applications Innsbruck, Austria Feb, 2007
2
Outline
Neural networks – multi-layer perceptron Overfitting problem Signal-to-noise ratio figure (SNRF) Optimization using signal-to-noise ratio figure Experimental results Conclusions
3
Neural networks– multi-layer perceptron
(MLP)
)( 11 yfz ii xwy 11
122ii zwy )( 22 yfz
Inputs x Outputs z
4
Neural networks– multi-layer perceptron
(MLP)
Efficient mapping from inputs to outputs
Powerful universal function approximation
Number of inputs and outputs determined by the data
Number of hidden neurons: determines the fitting accuracy critical
inputs outputs
1 2 3 4 5 6 7 8 9 10-1.5
-1
-0.5
0
0.5
1
1.5
training datafunction approximation
MLP
5
Overfitting problem
Generalization:
Overfitting: overestimates the function complexity, degrades generalization capability
Bias/variance dilemma Excessive hidden neuron
overfitting
Training data(x, y) Model
MLP training new data
(x’) y’Model
1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4-5
0
5
10
15
20
25
30
35
40Training dataDesired functionOverfitted functionTesting setDesired value for new dataPredicted value for new data
6
Overfitting problem
Avoid overfitting: cross-validation & early stopping
All available training data
(x, y)
training data(x, y)
testing data(x’, y’)
4.4 4.6 4.8 5 5.2 5.4 5.6 5.8
-1.2
-1.1
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
3.2 3.4 3.6 3.8 4 4.2 4.4-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
Training error etrain
Testing error etest
Number of hidden neurons
Fitting error
etrain
etest
MLPtraining
MLPtesting
Optimum number
Stopping criterion:
etest starts to increase or etrain
and etest start to diverge
7
Overfitting problem
How to divide available data?
All available training data
(x, y)
training data(x, y)
testing data(x’, y’)
Number of hidden neurons
Fitting error
etrain
etest
Optimum number
When to stop?
data wasted
Can test error catch the generalization error?
8
Overfitting problem
Desired:
•Quantitative measure of unlearned useful information from etrain
•Automatic recognition of overfitting
1.5 2 2.5 3-5
0
5
10
15
20
25
30
35
40
45
Training dataDesired functionOverfitted functionTesting setDesired value on new dataPredicted value on new data
1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4
-5
0
5
10
15
Training data cubicTesting setDesired value on new dataPredicted value on new data
1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2Training dataTesting setFitting function
9
Sampled data: function value + noise Error signal:
approximation error component + noise component
Signal-to-noise ratio figure (SNRF)
Noise part Should not be learned
Useful signalShould be reduced
Assumption: continuous function & WGN as noise Signal-to-noise ratio figure (SNRF):
signal energy/noise energy Compare SNRFe and SNRFWGN
Learning should stop – ?If there is useful signal left unlearnedIf noise dominates in the error signal
10
Signal-to-noise ratio figure (SNRF)– one-dimensional case
1 1.5 2 2.5 3 3.5 4 4.5 5-1.5
-1
-0.5
0
0.5
1
1.5Training dataQuadratic fitting
1 1.5 2 2.5 3 3.5 4 4.5 5-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Training data and approximating function Error signal
),...2,1( Ninse iii
approximation error component noise component+
How to measure the level of these two components?
11
iii nse
nsns EEE
N
ii
iins
e
eeCE
1
2
),(
),( 1 iis eeCE
1 1.5 2 2.5 3 3.5 4 4.5 5-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
0),( 1 ii nnC
Signal-to-noise ratio figure (SNRF)– one-dimensional case
High correlation between
neighboring samples of signals
snsn EEE
12
Signal-to-noise ratio figure (SNRF)– one-dimensional case
),(),(
),(
1
1
iiii
ii
n
se eeCeeC
eeC
E
ESNRF
),(),(
),(
1
1
iiii
iiWGN nnCnnC
nnCSNRF
0)(_ NWGNSNRF
NNWGNSNRF
1)(_
13
-0.015 -0.01 -0.005 0 0.005 0.01 0.0150
5
10
15
20
25
30
35Histogram of SNRF for WGN with 216 samples
1.7mean : 0
standard deviation: 0.0039
Signal-to-noise ratio figure (SNRF)– one-dimensional case
)(7.1)()( ___ NNNth WGNSNRFWGNSNRFWGNSNRF Hypothesis test:
5% significance level
14
Signal-to-noise ratio figure (SNRF)– multi-dimensional case
Signal and noise level: estimated within neighborhood
),...2,1( 1
NpeewEM
ipippisp
sample p
Ni
Np
d
dw
eed
M
iMpi
Mpi
pi
pippi
,...2,1
,...2,1
1
1
1
M neighbors
15
N
psps EE
1
N
p
M
ipippi
N
iisnsn eeweEEE
1 11
2
N
p
M
ipippi
N
ii
N
p
M
ipippi
n
se
eewe
eew
E
ESNRF
1 11
2
1 1
All samples
Signal-to-noise ratio figure (SNRF)– multi-dimensional case
16
Signal-to-noise ratio figure (SNRF)– multi-dimensional case
0)(
1 11
2
1 1_
N
p
M
ipippi
N
ii
N
p
M
ipippi
WGNSNRF
eewe
eew
N
NNWGNSNRF
2)(_
)(2.1)()( ___ NNNth WGNSNRFWGNSNRFWGNSNRF
WGNSNRF
M=1 threshold multi-dimensional (M=1)≈ threshold one-dimensional
17
Optimization using SNRF
Noise dominates in the error signal, Little information left unlearned,
Learning should stop
SNRFe< threshold SNRFWGN
Start with small network Train the MLP etrain
Compare SNRFe & SNRFWGN
Add hidden neurons
Stopping criterion:SNRFe< threshold SNRFWGN
18
Optimization using SNRF
Set the structure of MLP Train the MLP with back-propagation iteration
etrain
Compare SNRFe & SNRFWGN
Keep training with more iterations
Applied in optimizing number of iterations in back-propagation training to avoid overfitting
(overtraining)
19
Experimental results
Optimizing number of iterations
-3 -2 -1 0 1 2 3-0.5
0
0.5
1
1.5Testing performance using 10 iterations
x
y
testing dataapproximated value
-3 -2 -1 0 1 2 3-0.5
0
0.5
1
1.5Testing performance using 200 iterations
x
y
testing dataapproximated value
noise-corrupted 0.4sinx+0.5
20
Optimization using SNRF
Optimizing order of polynomial
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
0
0.2
0.4
0.6
0.8
1
x
y
Training dataTesting dataDesired function
5 10 15 20 25
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
order of fitting polynomial
Training errorTesting errorGeneralization errorSNRFStopping threshold
4 6 8 10 12 14
0
0.005
0.01
0.015
0.02
order of fitting polynomial
Training errorTesting errorGeneralization error
21
Experimental results
0 5 10 15 20 25 30 35 40 45 50-2
0
2
4
6
number of hidden neurons
SN
RF
SNRF of error signal vs. number of hidden neurons
SNRF of error signalthreshold
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
1Training MSE and Testing MSE vs. number of hidden neurons
number of hidden neurons
MS
E
training performancetesting performance
Optimizing number of hidden neuronstwo-dimensional function
-1-0.5
00.5
1-1
-0.50
0.51
-3
-2
-1
0
1
2
3
Training data from multi-dimensional function
22
Experimental results
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
1
-3
-2
-1
0
1
2
3
Difference between desired function and approximating function using 25 neurons
-1
-0.5
0
0.5
1
-1-0.5
00.5
1
-3
-2
-1
0
1
2
Difference between desired function and approximating function using 35 neurons
23
Experimental results
Mackey-glass database
Every consecutive 7 samples the following sampleMLP
0 2 4 6 8 10 12 14 16 18 20-0.5
0
0.5
1
1.5
2
2.5
number of hidden neurons(b)
SN
RF
SNRF of error signal vs. number of hidden neurons
SNRF of error signalthreshold
0 2 4 6 8 10 12 14 16 18 200
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02Training MSE and Testing MSE vs. number of hidden neurons
number of hidden neurons(a)
MS
E
Training MSETesting MSE
24
0 20 40 60 80 100 120 140 160 180 200-0.02
-0.01
0
0.01
0.02Error signal obtained in OAA
sample
erro
r sig
nal
0 50 100 150 200 250 300 350 400-2
0
2
4x 10
-3 Autocorrelation of error signal obtained in OAA
Aut
ocor
rela
tion
Experimental results
WGN characteristic
25
Experimental results
Puma robot arm dynamics database8 inputs (positions, velocities, torques) angular acceleration
MLP
0 10 20 30 40 50 60 70 80 90 100-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
number of hidden neurons
SN
RF
SNRF of error signal vs. number of hidden neurons
SNRF of error signalthreshold
0 10 20 30 40 50 60 70 80 90 1000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5Training MSE and Testing MSE vs. number of hidden neurons
number of hidden neurons
MS
E
training performancetesting performance 6th degree polynomial fit
26
Conclusions
Quantitative criterion based on SNRF to optimize number of hidden neurons in MLP
Detect overfitting by training error only No separate test set required Criterion: simple, easy to apply, efficient and
effective Optimization of other parameters of neural
networks or fitting problems