Date post: | 19-Dec-2015 |
Category: |
Documents |
Upload: | christina-davis |
View: | 240 times |
Download: | 3 times |
Chapter 7Introduction to Back Propagation
Neural Networks BPNNKH Wong
Neural Networks NN ver. 4h 1
Introduction
• Very Popular• A high performance Classifier (multi-class)• Successful in handwritten optical character
OCR recognition, speech recognition, image noise removal etc.
• Easy to implementation– Slow in learning– Fast in classification
Neural Networks NN ver. 4h 2
http://www.ninds.nih.gov/disorders/brain_basics/ninds_neuron.htmhttp://yann.lecun.com/exdb/mnist/
Overview
• Back Propagation Neural Networks (BPNN)– Part 1: Feed forward processing (classification or
Recognition)– Part 2: Feed backward processing (Training the
network), also include forward processing• Appendix:• A MATLAB example is explained• %source :
http://www.mathworks.com/matlabcentral/fileexchange/19997-neural-network-for-pattern-recognition-tutorial
Neural Networks NN ver. 4h 3
Theory of Back Propagation Neural Net (BPNN)
• Use many samples to train the weights (W) & Biases (b), so it can be used to classify an unknown input into different classes
• Will explain– How to use it after training: forward pass
(classify / or called recognize input )– How to train it: how to train the weights and
biases (using forward and backward passes)
Neural Networks NN ver. 4h 4
Motivation
• Biological findings inspire the development of Neural Net– Input weights Logic function output
• Biological relation– Input– Dendrites – Output
Neural Networks NN ver. 4h 5
X=inputs
W=weights
Neuron(Logic function)
Output
Optical character recognitionOCR example
• Training: Train the system first by presenting a lot of samples to the network
• Recognition: When an image is input to the system, it will tell what character it is
Neural Networks NN ver. 4h 6
Neural Net Output3=‘1’, other outputs=‘0’
Neural Net
Training up the network:weights (W) and bias (b)
Part 1 (classification in action /or called Recognition process)
Forward pass of Back Propagation Neural Net (BPNN)
Assume weights (W) and bias (b) are found by training already (to be discussed in part2)
Neural Networks NN ver. 4h 7
Recognition: assume weight (W) bias (b) are found earlier
• Neural Networks NN ver. 4h 8
OutputOutput0=0Output1=0Output2=0Output3=1
:Output1=0
Each pixel is X(u,v)
Neurons in BPNN
• In side each neuron :
Neural Networks NN ver. 4h9
1X l 2X l 3X l
2W l 3W lNlW
NlX
Kk
k
lll kxk
ll
u
llllllllll
Kk
k
llllll
e
fy
thereforee
uf
f
buxy
kxkuf
1b)()(
1
1
1)u(
,1
1)(
i.e. function, (sigmod) logistica is Typically
b,W,u,X,Y that such
,b)()( with)u(Y
Output neuronsInputs
lkxkxkx
b
lkωkωkω
llll
l
llll
layer for ]),...3(),2(),1([ inputs ofset a X
neuron onefor one
,layer ]for ),...3(),2(),1([ weightsofset a W
Multi-layer structure of a BP neural network
•
Neural Networks NN ver. 4h 10
Input layer
,W,u,X,Y neuron eachfor that such
biases ofset b weights,ofset W inputs, ofset Xoutputs,Yllllllll uxy
llayer
:hidden
1layer
:hidden
l
lX 1X l
()
b
W
has neuron Each
neurons multiple
haslayer A
f
biases
weightsl
l
layer
Output
Neurons in the Multi-layer structure• In between any neighboring 2
layers, a set of neurons can be found
Neural Networks NN ver. 4h 11
Kk
k
llllll kxkuufy1
b)()( with)(
neuron eachfor W ,WX,
layer at inputs
weightsx
lxl
l
1layer at input 1 lx l
)1(lx
1 ll xy
)1(l)2(l)2(lx
lu luf
Each Neuron
)(Kl)(Kx l
BPNN Forward pass• Forward pass is to find the output when an input is given. For
example:• Assume we have used N=60,000 images to train a network to
recognize c=10 numerals.• When an unknown image is given to the input, the output
neuron corresponds to the correct answer will give the highest output level.
Neural Networks NN ver. 4h 12
10 output neurons for 0,1,2,..,9
Inputimage
000100
Architecture (exercise: write formulas for A1(i=4) and A2(k=3)How many inputs, hidden neurons, outputs, weights in each layer?
Neural Networks NN ver. 4h •
Input:P=9x1Indexed by j
A1: Hidden layer1 =5 neurons, indexed by iWl=1=9x5bl=1=5x1
l=1(j=1,i=1)
l=1(j=2,i=1)
l=1(j=9,i=1)
P(j=1)
P(j=2)
P(j=3)
::
P(j=9)
)1(ib...)1,2()1,1(112
11
1
1
1A
PijPij ll
e
A1(i=1)P(j=1)
P(j=2)
P(j=9)
Neuron i=1Bias=b1(i=1)
2(i=1,k=1)
2(i=2,k=1)
2(i=5,k=1)
))1(b...)1()1,2()1()1,1((222
21
2
1
1A
kkAkikAki ll
e
A2(k=2)
A1
A2
A5
Neuron k=1Bias=b2(k=1)
l=1(j=1,i=1)
l=1(j=2,i=1)
l=1(j=9,i=5)
l=1(j=3,i=4)
A1(i=5)
A1(i=1)
A2:layer2, 3 Output neuronsindexed by kWl=2=5x3bl=2=3x1
l=2(i=5,k=3)
l=2(i=1,k=1)
l=2(i=2,k=2)
l=2(i=2,k=1)A1(i=2)
13
Layer l=1 Layer l=2S2 generated
S1 generated
Answer (exercise: write values for A1(i=4) and A2(k=3)
• P=[ 0.7656 0.7344 0.9609 0.9961 0.9141 0.9063 0.0977 0.0938 0.0859]
• Wl=1=[ 0.2112 0.1540 -0.0687 -0.0289 0.0720 -0.1666 0.2938 -0.0169 -0.1127]
• bl=1= 0.1441• %Find A1(i=4)• A1_i_is_4=1/(1+exp[-(l=1*P+bl=1))]• =0.49• How many inputs, hidden neurons, outputs, weights and biases in
each layer?• Answer: Inputs=9, hidden neurons=5, outputs=3, weights in hidden
layer (layer1) =9x5, neurons in output layer (layer2)= 5x3, 5 biases in hidden layer (layer1), 3 biases in output layer (layer2)
Neural Networks NN ver. 4h 14
)4(ib...)4,2()4,1(112
11
1
1
1)4(A
PijPij ll
ei
Numerical Example : Architecture of the example
Neural Networks NN ver. 4h 15
Input Layer9x1 pixels
output Layer 3x1
neuron) eachfor bias (1 1x neurons 5b
neuron eachfor inputs 9x neurons 5W
layer
hidden
l
l
x
lx
neuron eachfor (),b,W fbiasesweights ll •
Part 2: feed backward processing (Training the network)
Backward pass of Back Propagation Neural Net (BPNN) (Training)
Ref:http://en.wikipedia.org/wiki/Backpropagation
Neural Networks NN ver. 4h 16
Feed backward stage•
Neural Networks NN ver. 4h 17
1ll
1lxlx
Part1:FeedForward (studied before)
Part2: Feedbackward
llayer
)(1 bxfx l
We will explain why and prove equations in the following slides
For training we need to find , why?
E
The criteria to train a network • Based on the overall error function, there are ‘N’ samples and
‘c’ classes to be learned•
Neural Networks NN ver. 4h 18
network forward feed theofouput at the
sample training theof classoutput The
(teacher) sample training theof class truegiven The
;2
yt2
1
2
1
:sample training theofError
2
1error Overall
2
2
2
2
1
2
1 1
thnk
thnk
nnc
k
nk
nk
n
th
N
n
c
k
nk
nk
N
ny
nt
norms
ytE
n
ytE
Theory
•
Neural Networks NN ver. 4h 19
19
Neuron j
(1)--------- rule chainby , EE
,E
find want toWe
)(y
, definitionBy
isOutput j, neurona For
output actualy,or teachertarget
outputat error squared overall2
1
1j
1
2
ij
j
j
j
jij
ij
j
nk
kkjkj
j
nk
kkjkj
j
w
u
u
y
yw
sow
bwxfuf
bwxu
y
t
ytE
wkj
K=1,2,..,nn inputs to neuron jOutput of neuron j is yj
Learning by gradient decent
•
Neural Networks NN ver. 4h 20
b
Ebb
w
Ew
Ewwww
Tw
Ew
E
www
w
oldnew
argument, same For the
need why wesThat'
slide),next in explained be willmethoddecent gradient (The
0.1)factor learning ( ve smalla useslowly it do o
decent)gradient by (learning make
cycle learningeverfy for decrease want to weIf
using
,calculated is new a (epoch), cycle learning each In
oldoldnew
oldnew
We need to find , why?
•
Neural Networks NN ver. 4h21
EE
EEEE
EEwE
EEEE
()(
E
)(E
EE
EE
EEE
oldnew
oldoldnew
oldnew
oldnew
oldnewoldnew
decrease will- set :Conclusion
ve always is since ),()(
)(-)()(
becomes *) into **put
rate learning set the to termve smalla is is where
(**)- set we
*----- )()(
, Here
..)()(
definitionby seriesTaylor
Using Taylor series http://www.fepress.org/files/math_primer_fe_taylor.pdfhttp://en.wikipedia.org/wiki/Taylor's_theorem
E
Theory
•
Neural Networks NN ver. 4h 22
term3 term2,, term1
rule) chain(by , EE
(1) from so E, affects how see want toWe
through neuron toconnected is input An
ij
j
j
j
jij
ij
ij
w
u
u
y
yw
w
wji
xi
yj
wijneuron j
ui
j
nk
kkjkj
j
nk
kkjkj
bwxfuf
bwxu
1j
1
)(y
Case 1: if neuronj is at the output layer
•
Neural Networks NN ver. 4h 23
ysensitivit
ufufty
termtermu
yf
y
xufuftyw
* term * termtermw
tyy
yt
yterm
bxw
bxw
w
uterm
ufufufu
uf
u
yterm
, term , termterm
w
u
u
y
yw
j
jjjjj
j
j
jj
ijjjjij
ij
jjj
jj
j
jiij
jiij
ij
j
jjjj
j
j
j
ij
j
j
j
jij
)2())(1)((
2*1)(E
:note
)(1)(E
321E
since
outputat measured,21E
:1
constant since, :3
appendix See,)(1)()(')(
:2
321
, EE
2
xiyj
wij
uj
Outputyj
TrueTarget Class=tj
j
Neuron j as an output neuron
We want to see ijwE
Case2 : if neuronj is at the hidden layerOutput yj affects all neurons connected to it in next layer
• Neural Networks NN ver. 4h 24
ijj
Ll
ljll
ij
Ll
ljll
ij
Ll
ljll
j
ljjlj
l
ll
l
ll
Ll
l j
l
lj
ij
j
j
j
jij
xufufww
termterm
termtermwtermtermtermw
wy
term
uywy
u
u
y
yu
y
u
uyterm
w
u
u
y
yw
)(1)(E
hence
slide previous thein that as in same are 3,2
32321E
E:1
layeroutput in all affects because ,:term1B
eq.(2)) (see EE
:term1A
term1Bterm1A,
EE:1
term3 term2,, term1
, EE
1
1
1
1
neuron j
2lu 2ly
Llu Lly
1lu 1ly
2, ljw
program
in
W2
,
Lljw
jy
lby indexed
neurons
output
1l
2l
3l
ixju
For this hidden neuron j, this df1 in the program
Input xi to the hidden neuron j, P(:,) in program
program in W1ijw
1, ljw
After all are found
•
Neural Networks NN ver. 4h 25
w
Eww
w
Ew
www
E
w
oldnew
oldnew
methoddecent graident
theusing minimized is so
all update tostep thisuse can We
ijw
E
Training• How to train the neurons: how to train the weights (W) and biases
(b) (use forward, backward passes)• Initialize W and b randomly• Iter=1: all_epochs (or break when E is very small)
– For all training samples {• Forward pass (same as the recognition process in part1) for
each output neuron:– Use training samples: Xclass_t : feed forward to find y.
– E=error_function(y-t)• Backward pass:
– From the output layer find and b to reduce Error E. Find all s(of the output)
– Calculate and b of all hidden layers,}
Neural Networks NN ver. 4h26
Summary
• Learn what is Back Propagation Neural Networks (BPNN)
• Learn the forward pass• Learn the backward pass and the training of
the BPNN network
Neural Networks NN ver. 4h 27
References• Wiki
– http://en.wikipedia.org/wiki/Backpropagation– http://en.wikipedia.org/wiki/
Convolutional_neural_network
• Matlab programs– Neural Network for pattern recognition- Tutorial
http://www.mathworks.com/matlabcentral/fileexchange/19997-neural-network-for-pattern-recognition-tutorial
– CNN Matlab example http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox
Neural Networks NN ver. 4h 28
Appendices
Neural Networks NN ver. 4h 29
Appendix 1:Sigmod function f(u) and its derivative f’(u)
•
Neural Networks NN ver. 4h30
)(1)()()(
)(1)()1(
11
)1(
1
)1(
)1()1(
)1(
1
)1()1(
1
)1(
1
)1(
1)(
rule) chain using(,)1(
)1(1
1)(
)(
)()(
1set simplicityfor ,1
1
1
1)(
'
22'
'
'
ufufufdu
udf
ufufee
e
eee
ee
e
e
ee
ee
uf
du
ed
ede
d
du
udfuf
Hence
ufdu
udfee
uf
uu
u
uuu
uu
u
u
uu
uu
u
u
u
uu
http://link.springer.com/chapter/10.1007%2F3-540-59497-3_175#page-1
http://mathworld.wolfram.com/SigmoidFunction.html
•
Neural Networks NN ver. 4h 31
nnLlLl
nnn
l
nnn
nnn
l
nnnn
nl
n
n
nnnn
llll
tyf
L l
ivuftyb
E
b
ui
b
uufyt
b
uftyt
b
E
b
ytyt
b
Ei)(ii) & (ii
t
ufy
iiiuftytE
n-th
iiub
u
ub
ib
ubxu
u'
layeroutput at the
)(
,1),(in since
,)(
, From
(teacher)or target truththe
,outputcurrent theis )( Becuase
)()(2
1
2
1
sample theSince
)(ysensitivit theEEE
hence
),(1 so, since
'
'
22
1
Alternative
Derivation (for the output layer , in each neuron)
1ll
Output(last layer)t=target (teacher)y=outputFeeding back to the previous layer
derivation
•
Neural Networks NN ver. 4h32
eq(ii)δbb
Ebb
xE
viv(iv eq
T
viE
E
xivxE
ufty(iv)xuftyE
bxufty
ufyt
yyt
E
bwxuytE
lb
loldl
n
blold
lnew
lll
l
ll
l
ll
nnlnnl
lnn
lnn
l
nnn
l
nnn
see , argument, same For the
),,.use hence slide),next see method,decent gradient theis (This
factor) learning ( ve smalla useslowly it do o
)( make
cycle learningeverfy for decrease want to weIf
-(v)------- ,calculated is new a phase, learning eachFor
) weight and input each(for )(
)(' in since,)('
)(')(
and,2
1 , (iii) from Also
1oldoldoldnew
oldnew
2
BNPP example in matlab
Based on Neural Network for pattern recognition- Tutorial
http://www.mathworks.com/matlabcentral/fileexchange/19997-neural-network-for-
pattern-recognition-tutorial
Neural Networks NN ver. 4h 33
Example: a simple BPNN
• Number of classes (no. of output neurons)=3• Input 9 pixels: each input is a 3x3 image• Training samples =3 for each class• Number of hidden layers =1• Number of neurons in the hidden layer =5
Neural Networks NN ver. 4h 34
Display of testing patterns
•
Neural Networks NN ver. 4h 35
Architecture
Neural Networks NN ver. 4h •
Input:P=9x1Indexed by j
A1: Hidden layer1 =5 neurons, indexed by iWl=1=9x5bl=1=5x1
l=1(j=1,i=1)
l=1(j=2,i=1)
l=1(j=9,i=1)
P(j=1)
P(j=2)
P(j=3)
::
P(j=9)
)1(ib...)1,2()1,1(112
11
1
1
1A
PijPij ll
e
A1(i=1)P(j=1)
P(j=2)
P(j=9)
Neuron i=1Bias=b1(i=1)
2(i=1,k=1)
2(i=2,k=1)
2(i=5,k=1)
))1(b...)1()1,2()1()1,1((222
21
2
1
1A
kkAkikAki ll
e
A2(k=2)
A1
A2
A5
Neuron k=1Bias=b2(k=1)
l=1(j=1,i=1)
l=1(j=2,i=1)
l=1(j=9,i=5)
l=1(j=3,i=4)
A1(i=5)
A1(i=1)
A2:layer2, 3 Output neuronsindexed by kWl=2=5x3bl=2=3x1
l=2(i=5,k=3)
l=2(i=1,k=1)
l=2(i=2,k=2)
l=2(i=2,k=1)A1(i=2)
36
Layer l=1 Layer l=2S2 generated
S1 generated
• %source : http://www.mathworks.com/matlabcentral/fileexchange/19997-neural-network-for-pattern-recognition-tutorial• clear memory %commented added by kh wong• clear all• clc• nump=3; % number of classes• n=3; % number of images per class• % training images reshaped into columns in P • % image size (3x3) reshaped to (1x9)• • % training images • P=[196 35 234 232 59 244 243 57 226; ...• 188 15 236 244 44 228 251 48 230; ... % class 1• 246 48 222 225 40 226 208 35 234; ...• • 255 223 224 255 0 255 249 255 235; ...• 234 255 205 251 0 251 238 253 240; ... % class 2• 232 255 231 247 38 246 190 236 250; ...• • 25 53 224 255 15 25 249 55 235; ...• 24 25 205 251 10 25 238 53 240; ... % class 3• 22 35 231 247 38 24 190 36 250]';• • % testing images • N=[208 16 235 255 44 229 236 34 247; ...• 245 21 213 254 55 252 215 51 249; ... % class 1• 248 22 225 252 30 240 242 27 244; ...• • 255 241 208 255 28 255 194 234 188; ...• 237 243 237 237 19 251 227 225 237; ... % class 2• 224 251 215 245 31 222 233 255 254; ...• • 25 21 208 255 28 25 194 34 188; ...• 27 23 237 237 19 21 227 25 237; ... % class 3• 24 49 215 245 31 22 233 55 254]';• • % Normalization• P=P/256;• N=N/256;•
Neural Networks NN ver. 4h 37
• % display the training images • figure(1),• for i=1:n*nump• im=reshape(P(:,i), [3 3]);• %remove theline below to reflect the truth data input• % im=imresize(im,20); % resize the image to make it clear• subplot(nump,n,i),imshow(im);…• title(strcat('Train image/Class #', int2str(ceil(i/n))))• end• % display the testing images • figure,• for i=1:n*nump• im=reshape(N(:,i), [3 3]);• % remove theline below to reflect the truth data input• % im=imresize(im,20); % resize the image to make it clear • subplot(nump,n,i),imshow(im);title(strcat('test image #', int2str(i)))• end•
Neural Networks NN ver. 4h 38
• • • % targets• T=[ 1 1 1 0 0 0 0 0 0• 0 0 0 1 1 1 0 0 0• 0 0 0 0 0 0 1 1 1 ];• • S1=5; % numbe of hidden layers• S2=3; % number of output layers (= number of classes)• • [R,Q]=size(P); • epochs = 10000; % number of iterations• goal_err = 10e-5; % goal error• a=0.3; % define the range of random variables• b=-0.3;• W1=a + (b-a) *rand(S1,R); % Weights between Input and Hidden Neurons• W2=a + (b-a) *rand(S2,S1); % Weights between Hidden and Output Neurons• b1=a + (b-a) *rand(S1,1); % Weights between Input and Hidden Neurons• b2=a + (b-a) *rand(S2,1); % Weights between Hidden and Output Neurons• n1=W1*P;• A1=logsig(n1); %feedforward the first time• n2=W2*A1;• A2=logsig(n2);%feedforward the first time• e=A2-T; %actually e=T-A2 in main loop• error =0.5* mean(mean(e.*e)); % better say e=T-A2 , but no harm to error here• nntwarn off
Neural Networks NN ver. 4h 39
• for itr =1:epochs• if error <= goal_err • break• else• for i=1:Q %i is index to a column in P(9x9), for each column of P
( P:,i)• % is a training sample image, 9 training samples, 3 for each class• %A1=5x9, A1 =outputs of hidden layer and input to output layer• % A2=3x9, A2=Outputs of output layer• %T=true class, each column in T is for 1 training sample • % hidden_layer =1, output_layer =2, • df1=dlogsig(n1,A1(:,i)); %df1 is 5x1 for 5 neurons in hidden layer• df2=dlogsig(n2,A2(:,i)); %df2 is 3x1 for output neurons• % s2 is sigma2=sensitvity2 from the output layer , equation(2) • s2 = -1*diag(df2) * e(:,i); %e=T-A2; df2=f’=f(1-f) of layer2
Neural Networks NN ver. 4h 40
• %s1=5x1• s1 = diag(df1)* W2'* s2; % eq(3),feedback, from s2 to S1• %dW= -n*s2*df(u)*x in ppt, =0.1, S2 is found, x is A1• • %W2 is 3x5 , each output neuron receives, update W2• % 5 inputs from 5 hidden neurons in the hidden layer• %sigma2=s2 = -1*diag(df2) * e(:,i); %e=T-A2; df2=f’=f(1-f) of layer2• %delta_W2 = -learning_rate*sigma2*input_to_output_layer • %delta_W2 = -0.1*sigma2*A1• W2 = W2-0.1*s2*A1(:,i)'; %learning rate=0.1, equ(2) output case• %3x5 =3x5- (3x1*1x5), • %A1=5 hidden neuron outputs (5 hidden neurons)• %A1(:,i)’=1x5=outputs of hidden layer, • • b2 = b2-0.1*s2; %threshold • % 3x1=3x1- 3x1• %P1(:,i)=1x9 =input t o hidden,• % s1=5x1 because each hidden note has 1 sensitivity (sigma)• W1 = W1-0.1*s1*P(:,i)';% update W1 in layer 1, see equ(3) hidden case• %5x9 = 5x9-(5x1* 1x9), since P is 9x9 and for an i, P(:,i)' =1x9
Neural Networks NN ver. 4h 41
• b1 = b1-0.1*s1;%threshold • %5x1=5x1-5x1• • A1(:,i)=logsig(W1*P(:,i)+b1);%forward• %5x1 = 5x1• A2(:,i)=logsig(W2*A1(:,i)+b2);%forward• %3x1=3x1• end• e = T - A2; % for this e, put -ve sign for finding s2• error =0.5*mean(mean(e.*e));• disp(sprintf('Iteration :%5d mse :%12.6f
%',itr,error));• mse(itr)=error;• end• end• Neural Networks NN ver. 4h 42
• threshold=0.9; % threshold of the system (higher threshold = more accuracy)• • % training images result• • %TrnOutput=real(A2)• TrnOutput=real(A2>threshold) • • % applying test images to NN , TESTING BEGINS HERE• n1=W1*N;• A1=logsig(n1);• n2=W2*A1;• A2test=logsig(n2);• • % testing images result• • %TstOutput=real(A2test)• TstOutput=real(A2test>threshold)• • • % recognition rate• wrong=size(find(TstOutput-T),1);• recognition_rate=100*(size(N,2)-wrong)/size(N,2)• % end of code
Neural Networks NN ver. 4h 43