Date post: | 18-Jan-2018 |
Category: |
Documents |
Upload: | christian-cole |
View: | 217 times |
Download: | 0 times |
Optimization methods
Morten NielsenDepartment of Systems biology,
DTUIIB-INTECH, UNSAM, Argentina
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
The path to the closest local minimum = local minimization
Minimization
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
The path to the closest local minimum = local minimization
Minimization
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
Minimization
Outline
• Optimization procedures – Gradient descent– Monte Carlo
• Overfitting – cross-validation
• Method evaluation
Linear methods. Error estimate
I1 I2
w1 w2
Linear function
o
Gradient descent (from wekipedia)
Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if
for > 0 a small enough number, then F(b)<F(a)
Gradient descent (example)
Gradient descent
Gradient descent
Weights are changed in the opposite direction of the gradient of the error
Gradient descent (Linear function)
Weights are changed in the opposite direction of the gradient of the error
I1 I2
w1 w2
Linear function
o
Gradient descent
Weights are changed in the opposite direction of the gradient of the error
I1 I2
w1 w2
Linear function
o
Gradient descent. Example
Weights are changed in the opposite direction of the gradient of the error
I1 I2
w1 w2
Linear function
o
Gradient descent. Example
Weights are changed in the opposite direction of the gradient of the error
I1 I2
w1 w2
Linear function
o
Gradient descent. Doing it your selfWeights are changed in the opposite direction of the gradient of the error
1 0
W1=0.1 W2=0.1
Linear function
o
What are the weights after 2 forward (calculate predictions) and backward (update weights) iterations with the given input, and has the error decrease (use =0.1, and t=1)?
Fill out the table
itr W1 W2 O
0 0.1 0.1
1
2
What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use =0.1, t=1)?
1 0
W1=0.1 W2=0.1
Linear function
o
Fill out the table
itr W1 W2 O
0 0.1 0.1 0.1
1 0.19 0.1 0.19
2 0.27 0.1 0.27
What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use =0.1, t=1)?
1 0
W1=0.1 W2=0.1
Linear function
o
Monte Carlo
Because of their reliance on repeated computation of random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithmOr when you are too stupid to do the math yourself?
Example: Estimating Π by Independent
Monte-Carlo SamplesSuppose we throw darts randomly (and uniformly) at the square:
Algorithm:For i=[1..ntrials] x = (random# in [0..r]) y = (random# in [0..r]) distance = sqrt (x^2 + y^2) if distance ≤ r hits++EndOutput:
Adapted from course slides by Craig Douglas
http://www.chem.unl.edu/zeng/joy/mclab/mcintro.html
Estimating P
Monte Carlo (Minimization)
dE<0dE>0
The Traveling Salesman
Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf
Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf
Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf
Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf
Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf
Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf
Gibbs sampler. Monte Carlo simulations RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE
RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPAGSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE
E1 = 5.4 E2 = 5.7
E2 = 5.2
dE>0; Paccept =1
dE<0; 0 < Paccept < 1
Note the sign. Maximization
Monte Carlo Temperature
• What is the Monte Carlo temperature?
• Say dE=-0.2, T=1
• T=0.001
MC minimization
Monte Carlo - Examples
• Why a temperature?
Local minima
Stabilization matrix method
• A prediction method contains a very large set of parameters
– A matrix for predicting binding for 9meric peptides has 9x20=180 weights
• Over fitting is a problem
Data driven method training
yearsTe
mpe
rature
Regression methods. The mathematics
y = ax + b2 parameter model
Good description, poor fit
y = ax6+bx5+cx4+dx3+ex2+fx+g
7 parameter modelPoor description, good fit
Model over-fitting
Stabilization matrix method (Ridge regression). The mathematics
y = ax + b2 parameter model
Good description, poor fit
y = ax6+bx5+cx4+dx3+ex2+fx+g
7 parameter modelPoor description, good fit
SMM training
Evaluate on 600 MHC:peptide binding dataL=0: PCC=0.70L=0.1 PCC = 0.78
Stabilization matrix method.The analytic solution
Each peptide is represented as 9*20 number (180)H is a stack of such vectors of 180 valuest is the target value (the measured binding)l is a parameter introduced to suppress the effect of noise in the experimental data and lower the effect of overfitting
SMM - Stabilization matrix method
I1 I2
w1 w2
Linear function
o
Sum over weights
Sum over data points
SMM - Stabilization matrix method
I1 I2
w1 w2
Linear function
o
Per target error:
Global error:
Sum over weights
Sum over data points
SMM - Stabilization matrix methodDo it yourself
I1 I2
w1 w2
Linear function
o
l per target
SMM - Stabilization matrix method
I1 I2
w1 w2
Linear function
o
l per target
SMM - Stabilization matrix method
I1 I2
w1 w2
Linear function
o
SMM - Stabilization matrix methodMonte Carlo
I1 I2
w1 w2
Linear function
o
Global:
• Make random change to weights
• Calculate change in “global” error
• Update weights if MC move is accepted Note difference between MC
and GD in the use of “global” versus “per target” error
Training/evaluation procedure• Define method• Select data• Deal with data redundancy
– In method (sequence weighting)– In data (Hobohm)
• Deal with over-fitting either– in method (SMM regulation term) or– in training (stop fitting on test set
performance)• Evaluate method using cross-validation
A small doit script//home/user1/bin/doit_ex
#! /bin/tcsh foreach a ( `cat allelefile` )mkdir -p $cd $aforeach l ( 0 1 2.5 5 10 20 30 )mkdir -p l.$lcd l.$lforeach n ( 0 1 2 3 4 )smm -nc 500 -l $l train.$n > mat.$npep2score -mat mat.$n eval.$n > eval.$n.predendecho $a $l `cat eval.?.pred | grep -v "#" | gawk '{print $2,$3}' | xycorr`cd ..endcd ..end