Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | neal-lambert |
View: | 216 times |
Download: | 0 times |
Artificiel Neural Networks 2
Morten NielsenDepartment of Systems Biology,
DTU
Outline
• Optimization procedures – Gradient decent (this you already know)
• Network training– back propagation– cross-validation– Over-fitting– Examples– Deeplearning
Neural network. Error estimate
I1 I2
w1 w2
Linear function
o
Neural networks
Gradient decent (from wekipedia)
Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if
for > 0 a small enough number, then F(b)<F(a)
Gradient decent (example)
Gradient decent (example)
Gradient decent. Example
Weights are changed in the opposite direction of the gradient of the error
I1 I2
w1 w2
Linear function
o
Network architecture
Input layer
Hidden layer
Output layer
Ik
hj
Hj
o
O
vjk
wj
What about the hidden layer?
Hidden to output layer
Hidden to output layer
Hidden to output layer
Input to hidden layer
Summary
Or
Or
Ii=X[0][k]
Hj=X[1][j]
Oi=X[2][i]
Can you do it your self?
v22=1v12=1
v11=1v21=-1
w1=-1 w2=1
h2
H2
h1
H1
oO
I1=1 I2=1
What is the output (O) from the network?What are the wij and vjk values if the target value is 0 and =0.5?
Can you do it your self (=0.5). Has the error decreased?
v22=1v12=1
v11=1v21=-1
w1=-1 w2=1
h2=H2=
h1=
H1=
o=O=
I1=1 I2=1
v22=.
v12=V11=
v21=
w1= w2=
h2=H2=
h1=H1=
o=O=
I1=1 I2=1
Before After
h1=2H1=0.88
Can you do it your self (=0.5). Has the error decreased?
v22=.
v12=V11=
v21=
w1= w2=
h2=H2=
h1=H1=
o=O=
I1=1 I2=1
Before After
v22=1v12=1
v11=1v21=-1
w1=-1 w2=1
h2=0H2=0.5
o=-0.38O=0.41
I1=1 I2=1
Can you do it your self?
v22=1v12=1
v11=1v21=-1
w1=-1 w2=1
h2=0H2=0.5
h1=2H1=0.88
o=-0.38O=0.41
I1=1 I2=1
• What is the output (O) from the network?
• What are the wij and vjk values if the target value is 0?
Can you do it your self (=0.5). Has the error decreased?
v22=1
v12=1v11=1
v21=-1
w1=-1 w2=1
h2=0H2=0.5
h1=2H1=0.88
o=-0.38O=0.41
I1=1 I2=1
v22=.0.99v12=1.005
v11=1.005
v21=-1.01
w1=-1.043w2=0.975
h2=-0.02H2=0.495
h1=2.01H1=0.882
o=-0.44O=0.39
I1=1 I2=1
Sequence encoding
• Change in weight is linearly dependent on input value
• “True” sparse encoding (i.e 1/0) is therefore highly inefficient
• Sparse is most often encoded as– +1/-1 or 0.9/0.05
Sequence encoding - rescaling
• Rescaling the input values
If the input (o or h) is too large or too small, g´ is zero and the weights are not changed. Optimal performance is when o,h are close to 0.5
Training and error reduction
Training and error reduction
Training and error reduction
Size matters
Example
Neural network training. Cross validation
Cross validation
Train on 4/5 of dataTest on 1/5=>Produce 5 different neural networks each with a different prediction focus
Neural network training curve
Maximum test set performanceMost cable of generalizing
5 fold training
Which network to choose?
5 fold training
Conventional 5 fold cross validation
“Nested” 5 fold cross validation
When to be careful
• When data is scarce, the performance obtained used “conventional” versus “Nested” cross validation can be very large
• When data is abundant the difference is small, and “nested” cross validation might even be higher than “conventional” cross validation due to the ensemble aspect of the “nested” cross validation approach
Do hidden neurons matter?
• The environment matters
NetMHCpan
Context matters
• FMIDWILDA YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.89 A0201• FMIDWILDA YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.08 A0101• DSDGSFFLY YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.08 A0201• DSDGSFFLY YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.85 A0101
Example
Summary
• Gradient decent is used to determine the updates for the synapses in the neural network
• Some relatively simple math defines the gradients– Networks without hidden layers can be solved on
the back of an envelope (SMM exercise)– Hidden layers are a bit more complex, but still ok
• Always train networks using a test set to stop training– Be careful when reporting predictive performance
• Use “nested” cross-validation for small data sets
• And hidden neurons do matter (sometimes)
And some more stuff for the long cold winter nights
• Can it might be made differently?
Predicting accuracy
• Can it be made differently?
Reliability
• Identification of position specific receptor ligand interactions by use of artificial neural network decomposition. An investigation of interactions in the MHC:peptide system
Master thesis by Frederik Otzen Bagger and Piotr Chmura
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep(er) Network architecture
Deeper Network architecture
Il Input layer, l
1. Hidden layer, k
Output layer
h1k
H1k
h3
H3
ukl
wj
2. Hidden layer, j
vjk
h2j
H2j
Network architecture (hidden to hidden)
Network architecture (input to hidden)
Network architecture (input to hidden)
Speed. Use delta’s
j
k
l
dj
vjk
ukl
dk
Bishop, Christopher (1995). Neural networks for pattern recognition. Oxford: Clarendon Press. ISBN 0-19-853864-2.
Use delta’s
Deep learning – time is not an issue
17000 17500 18000 18500 19000 19500 200000
50
100
150
200
250
Number of weights
CP
U (
u)
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.516000
16500
17000
17500
18000
18500
19000
19500
N layer
# w
eig
hts
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Auto encoder
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Pan-specific prediction methods
NetMHC NetMHCpan
ExamplePeptide Amino acids of HLA pockets HLA Aff VVLQQHSIA YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.131751SQVSFQQPL YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.487500SQCQAIHNV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.364186LQQSTYQLV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.582749LQPFLQPQL YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.206700VLAGLLGNV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.727865VLAGLLGNV YFAVWTWYGEKVHTHVDTLLRYHY A0202 0.706274VLAGLLGNV YFAEWTWYGEKVHTHVDTLVRYHY A0203 1.000000VLAGLLGNV YYAVLTWYGEKVHTHVDTLVRYHY A0206 0.682619VLAGLLGNV YYAVWTWYRNNVQTDVDTLIRYHY A6802 0.407855
Going Deep – One hidden layer
0 50 100 150 200 250 300 350 400 450 5000
0.02
0.04
0.06
0.08
20
# Ietrations
MSE
0 50 100 150 200 250 300 350 400 450 5000
0.010.020.030.040.050.060.07
# Iterations
MSE
0 50 100 150 200 250 300 350 400 450 5000
0.2
0.4
0.6
0.8
1
# Iterations
PC
C
Train
Test
Test
Going Deep – 3 hidden layers
0 50 100 150 200 250 300 350 400 450 5000
0.05
0.1
20 20+20 20+20+20
# Ietrations
MSE
0 50 100 150 200 250 300 350 400 450 5000
0.020.040.060.080.1
# Iterations
MSE
0 50 100 150 200 250 300 350 400 450 5000
0.2
0.4
0.6
0.8
1
# Iterations
PC
C
Train
Test
Test
Going Deep – more than 3 hidden layers
0 50 100 150 200 250 300 350 400 450 5000
0.02
0.04
0.06
0.08
0.1
20 20+20 20+20+20 20+20+20+20 20+20+20+20+20
# Ietrations
MSE
0 50 100 150 200 250 300 350 400 450 5000
0.05
0.1
# Iterations
MSE
0 50 100 150 200 250 300 350 400 450 5000
0.5
1
# Iterations
PC
C
Train
Test
Test
Going Deep – Using Auto-encoders
0 50 100 150 200 250 300 350 400 450 5000
0.05
0.1
20 20+20 20+20+2020+20+20+20 20+20+20+20+20 20+20+20+20+Auto
# Ietrations
MSE
0 50 100 150 200 250 300 350 400 450 5000
0.02
0.04
0.06
0.08
0.1
# Iterations
MSE
0 50 100 150 200 250 300 350 400 450 5000
0.5
1
# Iterations
PC
C
Train
Test
Test
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Conclusions -2
• Implementing Deep networks using deltas1 makes CPU time scale linearly with respect to the number of weights – So going Deep is not more CPU intensive
than going wide• Back-propagation is an efficient
method for NN training for shallow networks with up to 3 hidden layers
• For deeper network, pre-training is required using for instance Auto-encoders
1Bishop, Christopher (1995). Neural networks for pattern recognition. Oxford: Clarendon Press. ISBN 0-19-853864-2.