+ All Categories
Home > Documents > Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate...

Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate...

Date post: 23-Dec-2015
Category:
Upload: gervais-logan
View: 221 times
Download: 0 times
Share this document with a friend
35
Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing: Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN, MASSACHUSETTS, U.S.A. I N N O V A T I O N L E C T U R E S (I N N O l E C)
Transcript
Page 1: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

Binding and Kinetics for Experimental Biologists

Lecture 2 Evolutionary Computing: Initial Estimate Problem

Petr Kuzmič, Ph.D.BioKin, Ltd.

WATERTOWN, MASSACHUSETTS, U.S.A.

I N N O V A T I O N L E C T U R E S (I N N O l E C)

Page 2: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 2

Lecture outline

• The problem:

Fitting nonlinear data usually requires an initial estimate of model parameters.This initial estimate must be close enough to the “true” values.

• The solution:

Use a data-fitting method that does not depend on initial estimates.

• An implementation:

The Differential Evolution algorithm (Price et al., 2005).

• An example:

Kinetics of forked DNA binding to the protein-protein complex formedby DNA-polymerase sliding clamp (gp45) and clamp loader (gp44/62).

Page 3: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 3

The ultimate goal of analyzing kinetic / binding data

SELECT AMONG POSSIBLE MOLECULAR MECHANISMS

concentration

signal

computer

Select most plausible model

competitive ?

E + S E.S E + P

E + I E.I

mechanism B

mechanism C

mechanism A

EXPERIMENTAL DATA

A VARIETY OFPOSSIBLEMECHANISMS

Page 4: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 4

Most models in natural sciences are nonlinear

LINEAR VS. NONLINEAR MODELS

Linear

y = A + k x

Nonlinear

y = A [1 - exp(-k x)]

x

0.0 0.2 0.4 0.6 0.8 1.0

y

0.0

0.5

1.0

1.5

2.0

y = 2 [1 - exp(-5 x)]

x0.0 0.2 0.4 0.6 0.8 1.0

y

1

2

3

4

5

6

7

y = 2 +5 x

Page 5: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 5

We need initial estimates of model parameters

NONLINEAR MODELS REQUIRE INITIAL ESTIMATES OF PARAMETERS

computer

E + S E.S E + P

E + I E.I

k +1

k -1

k +2

k +3

k -3

k+1 k-1 k+2

k+3 k-3

A GIVEN MODEL

ESTIMATEDPARAMETERS

k+1 k-1 k+2

k+3 k-3

REFINEDPARAMETERS

concentration

signal

EXPERIMENTAL DATA

The Initial Estimate Problem:

• Estimated parametersmost be “close enough”.

• How can we guess them?

• How can we be sure thatthey are “close enough”?

Page 6: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 6

The crux of the problem: Finding global minima

• Least-squares fitting only goes "downhill"

• How do we know where to start?

MODEL PARAMETER

SUM OF SQUARED DEVIATIONS (data - model)2

global minimum

data - model = "residual"

Page 7: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 7

Charles Darwin to the rescue

BIOLOGICAL EVOLUTION IMITATED IN "DE"

ISBN-10: 3540209506

Charles Darwin (1809-1882)

Page 8: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 8

Specialized numerical software: DynaFit

http://www.biokin.com/dynafitDOWNLOAD

Kuzmic (2009) Meth. Enzymol., 467, 247-280

2009

DynaFit implements the

Differential Evolution algorithm

for global sum-of-squares minimization.

Page 9: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 9

Biological metaphor: "Gene, allele"

gene

BIOLOGY COMPUTER

...AAGTCG...GTAACCGG...

four-letter alphabet variable length

"keratin"

• sequence of bits representing a number

...01110011000001101101110011...

• two letter alphabet• fixed length (16 or 32 bits)

"KM" "kcat"

Page 10: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 10

"Chromosome, genotype, phenotype"

genotype

BIOLOGY COMPUTER

...AAGTCGGTTCGGAAGTCGGTTTA...

keratin

oncoprotein

phenotype

• particular combination of all model parameters

isMM

M

KKSKS

KSVv

/][/][1

/][2max

011010110110011110011010001111101101

KM=4.56

Vmax=1.23Kis=78.9

full set of parameters

Page 11: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 11

"Organism, fitness"

genotype

BIOLOGY COMPUTER

...AAGTCGGTTCGGAAGTCGGTTTA...

keratin

oncoprotein

• FITNESS: agreement between the data and the model

[S]

0 20 40 60 80

v

0.0

0.2

0.4

0.6

0.8

1.0

Vmax = 1.3

KM = 9.1

Kis = 137.8

FITNESS:"agreement" with the environment

Page 12: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 12

"Population"

BIOLOGY COMPUTER

low fitness

Vmax KM Kis

mediumfitness Vmax KM Kis

high fitness

Vmax KM Kis

Page 13: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 13

DE Population size in DynaFit

number of population

members per optimizedmodel parameter

number of population

members per order ofmagnitude

Page 14: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 14

"Sexual reproduction, crossover"

BIOLOGY COMPUTER

01101011011001111001101 00011111011

01101011011001111001101 11100011011

mother

father

"sexual mating" probability pcross

01101011011001111001101 11100011011

child

random crossover point

Vmax KM Kis

Page 15: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 15

"Mutation, genetic diversity"

BIOLOGY COMPUTER

01101011011 001111001101 11100011011father

Vmax KM Kis

11100111011 001011010101 11001011001mutant father

Vmax(*)

KM(*) Kis

(*)

mutation

Page 16: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 16

"Mutation, genetic diversity"

01101011011 001111001101 11100011011aunt #1

Vmax(1)

KM(1) Kis

(1)

11100111011 001011010101 11001011001aunt #2

Vmax(2)

KM(2) Kis

(2)

THE "DIFFERENTIAL" IN DIFFERENTIAL EVOLUTION ALGORITHM - STEP 1

Compute difference between two randomly chosen “auntie” phenotypes

subtract

11100111011 001011010101 11001011001

aunt #2 minus aunt #1

Vmax(2)-Vmax

(1)KM

(2)-KM(1) Kis

(2)-Kis(1)

Page 17: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 17

"Mutation, genetic diversity"

01101011011 001111001101 11100011011father

Vmax KM Kis

11100111011 001011010101 11001011001

mutant father

THE "DIFFERENTIAL" IN DIFFERENTIAL EVOLUTION ALGORITHM - STEP 2

Add weighted difference between two “uncle” phenotypes to “father”

add a fraction of

11100111011 001011010101 11001011001

aunt #2 minus aunt #1

Vmax(2)-Vmax

(2)KM

(2)-KM(1) Kis

(2)-Kis(1)

Vmax*

KM* Kis

*

Page 18: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 18

"Mutation, genetic diversity"

THE "DIFFERENTIAL" IN DIFFERENTIAL EVOLUTION ALGORITHM

KM* = KM + F (KM

(1) KM(2))

EXAMPLE: Michaelis-Menten equationM

max ][

][v

KS

SV

"father" “aunt 1" “aunt 2"

"mutant father"

weight (fraction)mutation rate

Page 19: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 19

DE “undocumented” settings in DynaFit

probability that“child” inherits“father's” genes, not“mother's” genes

These DE tuning constantsare “undocumented” in theDynaFit distribution.

fractional differenceused in mutationsKM

* = KM + F (KM(1) KM

(2))

six differentmutation strategies

Page 20: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 20

"Selection"

BIOLOGY COMPUTER

high fitness

more likelyto breed

0110101101100111100110100011111011

Vmax KM Kis

more likelyto be carried to the next generation

low sum of squares

low fitnessless likelyto breed

0000000000111111111111100000000000

Vmax KM Kis

less likelyto be carried to the next generation

high sum of squares

Page 21: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 21

Basic Differential Evolution Algorithm - Summary

1 Randomly create the initial population (size N)

Repeat until almost all population members have very high fitness:

2 Evaluate fitness: sum of squares for all population members

5 Natural selection: keep child in gene pool if more fit than mother

4 Sexual reproduction: random crossover with probability Pcross

3 Mutation: random gene modification (mutate father, weight F)

Page 22: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 22

Example: DNA + clamp / clamp loader complex

DETERMINE ASSOCIATION AND DISSOCIATION RATE CONSTANT IN AN A + B AB SYSTEM

Courtesy of Senthil Perumal, Penn State University (Steven Benkovic lab)

see Lecture 1 for details

Page 23: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 23

Example: DynaFit script for Differential Evolution

INSERT A SINGLE LINE IN THE [TASK] SECTION

constraints !

Page 24: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 24

Example: Initial population

BOTH RATE CONSTANTS SPAN TWELVE ORDERS OF MAGNITUDE

Page 25: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 25

Example: The evolutionary process

SNAPSHOTS OF k1 / k2 CORRELATION DIAGRAM - SPACED BY 10 “GENERATIONS”

0 10 20 30

40 50 60 70

Page 26: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 26

Example: Final population

BOTH RATE CONSTANTS SPAN AT MOST ±30% RANGE RELATIVE TO NOMINAL VALUE

Page 27: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 27

Example: The “fittest” member of final population

THIS IS (PRESUMABLY) THE GLOBAL MINIMUM OF SUM-OF-SQUARES

data

model

residuals

compare with“good” estimatefrom Lecture 1

Page 28: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 28

Example: Comparison of DE and regular data fitting

DIFFERENTIAL EVOLUTION (DE) FOUND THE SAME FIT AS THE “GOOD” ESTIMATE

sum ofsquares

relativesum of sq.

“best-fit”constants

initialestimate

k1 = 1k2 = 1

k1 = 100k2 = 0.01

0.002308

0.002354

1.00

1.02

k1 = 2.2 ± 0.5k2 = 0.030 ± 0.015

k1 = 0.2 ± 3.4k2 = 0.2 ± 0.6

“good”

“bad”

k1 = 10-6 – 10+6

k2 = 10-6 – 10+6

0.002308 1.00 k1 = 2.2 ± 0.5k2 = 0.030 ± 0.015

1000random

estimates

lect

ure

1

Page 29: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 29

Significant disadvantage of DE: very slow

DYNAFIT CAN TAKE MULTIPLE DAYS TO RUN A COMPLEX PROBLEM

algorithm computation time

Levenberg-Marquardtwith two restarts

Differential Evolutionwith four restarts

(population size: 1000)

0.88 sec

12 min 31 sec

DynaFit 4.065 on DNA / clamp / clamp loader example:

1

853

relative time

1 second1 minute

10 minutes

15 minutes15 hours

6 days

Page 30: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 30

Example of Differential Evolution in DynaFit

J. Biol. Chem. 283, 11677 (2007)

This took one weekof computingon the Linux clusterin Heidelberg.

Page 31: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 31

Example: Systematic scan of many initial estimates

CAREFUL! THIS IS FASTER THAN DIFFERENTIAL EVOLUTION BUT DOES NOT ALWAYS WORK

1. generate all possiblecombinations of rate constants

2. compute initial sum of squaresfor each combination

3. rank combinations by initialsum of squares

4. select the best N combinations

5. perform a full fit for those N

6. rank results again

ALGORITHM

7 7 = 49 combinations of kon and koff

Page 32: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 32

Example: Systematic scan – Phase 1

AFTER EVALUATING THE INITIAL SUM OF SQUARES FOR ALL 49 COMBINATIONS OF k1 and k2

Page 33: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 33

Example: Systematic scan – Phase 2

AFTER RANKING THE INITIAL ESTIMATES AND SELECTING 20 BEST ONES BY SUM OF SQUARES

Page 34: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 34

Example: Systematic scan – Phase 3

AFTER PERFORMING FULL REFINEMENT FOR 20 BEST ESTIMATES OUT OF 49 TRIED

“success zone”

The best initialestimates do notproduce the bestrefined solution!

Page 35: Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,

BKEB Lec 2: Evolutionary Computing 35

Summary and conclusions

1. Finding good-enough initial estimates is a very difficult problem.

2. One should use system-specific information as much as possible.This includes using the literature and/or general principles for “intelligent” guesses.

3. Always use the “Try” method in DynaFit to display the initial fit.Make sure that the initial estimate is at least approximately correct.

4. The Differential Evolution algorithm almost always helps.However, it can be excruciatingly slow (running typically for multiple hours).

5. The systematic scan (task = estimate) sometimes helps.However, the “best” initial estimates almost never produce the desired solution!

6. DynaFit is not a “silver bullet”: You must still use your brain a lot.


Recommended