+ All Categories
Home > Documents > RA Fisher 1890 - 1962

RA Fisher 1890 - 1962

Date post: 13-Jan-2016
Category:
Upload: mulan
View: 25 times
Download: 0 times
Share this document with a friend
Description:
RA Fisher 1890 - 1962. “Natural selection is a mechanism for generating an exceedingly high degree of improbability”. Testing for the Extreme Value Domain of Attraction of Beneficial Fitness Effects. Craig J. Beisel Bioinformatics and Computational Biology Department of Mathematics - PowerPoint PPT Presentation
Popular Tags:
52
RA Fisher RA Fisher 1890 - 1962 1890 - 1962 Natural Natural selection is selection is a mechanism a mechanism for for generating an generating an exceedingly exceedingly high degree high degree of of improbability improbability
Transcript
Page 1: RA Fisher 1890 - 1962

RA FisherRA Fisher1890 - 19621890 - 1962

““Natural selection is a Natural selection is a mechanism for mechanism for generating an generating an

exceedingly high exceedingly high degree of degree of

improbability”improbability”

Page 2: RA Fisher 1890 - 1962

Testing for the Extreme Value Testing for the Extreme Value Domain of Attraction of Beneficial Domain of Attraction of Beneficial

Fitness EffectsFitness Effects

Craig J. BeiselCraig J. BeiselBioinformatics and Computational BiologyBioinformatics and Computational Biology

Department of MathematicsDepartment of [email protected]

Page 3: RA Fisher 1890 - 1962

ConceptsConcepts

Natural SelectionThe differential survival and reproduction of

individuals within a population based on hereditary characteristics.

Page 4: RA Fisher 1890 - 1962

ConceptsConcepts

AdaptationThe adjustment of an organism or population to a

new or altered environment through genetic changes brought about by natural selection.

Page 5: RA Fisher 1890 - 1962

ConceptsConcepts

PhenotypeThe overall attributes of an organism arising due to

the interaction of its genotype with the environment.

Page 6: RA Fisher 1890 - 1962

ConceptsConcepts

GenotypeThe specific genetic makeup of an individual

Page 7: RA Fisher 1890 - 1962

ConceptsConcepts

FitnessDescribes the ability of a genotype to reproduce.

More formally, it is defined as the ratio of the counts of a genotype before and after one generation.

Page 8: RA Fisher 1890 - 1962

ConceptsConcepts

Fitness LandscapeA function mapping genotype into fitness.

Page 9: RA Fisher 1890 - 1962

ConceptsConcepts

Fitness DistributionThe distribution of fitness for every possible

genotype in a fixed environment.

Lethal

Moderate

High

Page 10: RA Fisher 1890 - 1962

Mutational Landscape ModelMutational Landscape Model

John Maynard Smith John Maynard Smith (1920 – 2004)(1920 – 2004)

First remarked that First remarked that adaptation does not adaptation does not

take place in take place in phenotypic space, but phenotypic space, but in sequence space…in sequence space…

Page 11: RA Fisher 1890 - 1962

Mutational Landscape ModelMutational Landscape Model

Gillespie (1983)Gillespie (1983)

Given a sequence of nucleotides of length L,Given a sequence of nucleotides of length L,

There are 4There are 4LL possible sequences. possible sequences.

Each sequence has 3L neighboring sequences Each sequence has 3L neighboring sequences which are exactly one point mutation away.which are exactly one point mutation away.

Page 12: RA Fisher 1890 - 1962

Mutational Landscape ModelMutational Landscape Model

Additionally, if we assume Strong Selection Additionally, if we assume Strong Selection and Weak Mutation (SSWM) then we can and Weak Mutation (SSWM) then we can

ignore the possibility of clonal interference.ignore the possibility of clonal interference.

Formally 2Ns >>1, NFormally 2Ns >>1, Nμ<1μ<1

Therefore new mutants will fix (or not) in the Therefore new mutants will fix (or not) in the population before the next mutant arises.population before the next mutant arises.

Also, double mutants and neutral/deleterious Also, double mutants and neutral/deleterious mutations can be ignored.mutations can be ignored.

Page 13: RA Fisher 1890 - 1962

Mutational Landscape ModelMutational Landscape Model

Consider a sequence in an environment where it Consider a sequence in an environment where it is currently the most fit.is currently the most fit.

A small change occurs in the environment which A small change occurs in the environment which shifts it to be the shifts it to be the iithth most fit sequence among most fit sequence among

its one-step mutant neighbors where its one-step mutant neighbors where ii is small. is small.

Page 14: RA Fisher 1890 - 1962

Mutational Landscape ModelMutational Landscape Model

There are then There are then i-1i-1 more fit sequences which more fit sequences which the population could move to.the population could move to.

Notice that the fitnesses of these sequences Notice that the fitnesses of these sequences are in the tail of the fitness distribution.are in the tail of the fitness distribution.

Page 15: RA Fisher 1890 - 1962

Mutational Landscape ModelMutational Landscape Model

We would like to find the probability of the We would like to find the probability of the population fixing mutant population fixing mutant jj when starting with when starting with

sequence sequence ii..

Since we are dealing with only the tail of the Since we are dealing with only the tail of the fitness distribution we can apply EVT.fitness distribution we can apply EVT.

Page 16: RA Fisher 1890 - 1962

Orr’s One Step ModelOrr’s One Step Model

AssumptionsAssumptions

The fitness distribution is in the Gumbel The fitness distribution is in the Gumbel domain of attraction and therefore the domain of attraction and therefore the

fitnesses of the fitnesses of the i-1i-1 more fit one-step mutants more fit one-step mutants can be considered to be drawn from an can be considered to be drawn from an

‘exponential’ distribution by GPD.‘exponential’ distribution by GPD.

This will allow a result which is independent of This will allow a result which is independent of the underlying fitness distribution.the underlying fitness distribution.

Page 17: RA Fisher 1890 - 1962

Orr’s One Step ModelOrr’s One Step Model

LemmaLemma

Let XLet X11,…, X,…, Xnn be iid observations where X be iid observations where Xii~Exp ~Exp and Xand X(1)(1),…,X,…,X(n)(n) be their corresponding order be their corresponding order

statistics.statistics.

Then the spacings defined Then the spacings defined ΔXΔXii = X = X(i-1)(i-1) – X – X(i) (i) are are distributed exponential anddistributed exponential and

E(ΔXE(ΔXii)) = ΔX= ΔX11 / i / i

Sukhatme (1937)Sukhatme (1937)

Page 18: RA Fisher 1890 - 1962

Orr’s One Step ModelOrr’s One Step Model

SinceSince j j 2s2sjj (Haldane 1927) (Haldane 1927)

Page 19: RA Fisher 1890 - 1962

Orr’s One Step ModelOrr’s One Step Model

Taking the expected value…Taking the expected value…

Page 20: RA Fisher 1890 - 1962

Orr’s One Step ModelOrr’s One Step Model

Notice, we have an expression for the Notice, we have an expression for the expected transition probability which is expected transition probability which is

independent of the fitness of the individual independent of the fitness of the individual sequences and depends only on sequences and depends only on ii and and jj..

Page 21: RA Fisher 1890 - 1962

Orr’s One Step ModelOrr’s One Step Model

Can this model be validated empirically?Can this model be validated empirically?

Page 22: RA Fisher 1890 - 1962

Orr’s One Step ModelOrr’s One Step Model

ExperimentalExperimentalEvolutionEvolution

Natural Isolate ID11Natural Isolate ID11~3% differ from G4~3% differ from G4

MicroviridaeMicroviridaeHost - E. ColiHost - E. Coli

5577 bp5577 bp

Page 23: RA Fisher 1890 - 1962

Orr’s One Step ModelOrr’s One Step Model

20 one-step walks20 one-step walks9 observed mutations9 observed mutations

Rokyta et al (2005)Rokyta et al (2005)

Page 24: RA Fisher 1890 - 1962

Orr’s One Step ModelOrr’s One Step Model

Concluded Orr’s transition probabilities did not Concluded Orr’s transition probabilities did not explain data as well as Wahl model even explain data as well as Wahl model even

after correcting the model for mutation bias.after correcting the model for mutation bias.

Page 25: RA Fisher 1890 - 1962

Orr’s One Step ModelOrr’s One Step Model

Where did Orr go wrong?Where did Orr go wrong?

Perhaps, the tail of the fitness distribution is Perhaps, the tail of the fitness distribution is not in the Gumbel domain of attraction and not in the Gumbel domain of attraction and

therefore not exponentially distributed?therefore not exponentially distributed?

Page 26: RA Fisher 1890 - 1962

Extreme Value TheoryExtreme Value Theory

Extreme Value TheoryExtreme Value Theory

Field of statistical theory Field of statistical theory which attempts to which attempts to

describe the distribution describe the distribution of extreme values of extreme values

(maxima and minima) of (maxima and minima) of a sample from a given a sample from a given

probability distribution.probability distribution.

Page 27: RA Fisher 1890 - 1962

Extreme Value TheoryExtreme Value Theory

Notice that extreme values of a sample Notice that extreme values of a sample generally fall in the tail of the underlying generally fall in the tail of the underlying probability distribution. For example the probability distribution. For example the maximum of a sample of size 10 from a maximum of a sample of size 10 from a

standard normal distribution…standard normal distribution…

Page 28: RA Fisher 1890 - 1962

Extreme Value TheoryExtreme Value Theory

Since the tail is all that must be considered, Since the tail is all that must be considered, many results of extreme value theory are many results of extreme value theory are independent of the underlying probability independent of the underlying probability

distribution.distribution.

In fact, EVT shows almost all probability In fact, EVT shows almost all probability distributions can be classified into three distributions can be classified into three

groups by their tail behavior.groups by their tail behavior.

Page 29: RA Fisher 1890 - 1962

Extreme Value TheoryExtreme Value Theory

WeibullWeibull Finite Tail distributions Finite Tail distributions

These three types are…These three types are…

GumbelGumbel Most Common Distributions Most Common DistributionsExponential, Normal, Gamma, etc.Exponential, Normal, Gamma, etc.

FrFrééchetchet Heavy Tail Distributions Heavy Tail DistributionsCauchyCauchy

Page 30: RA Fisher 1890 - 1962

Extreme Value TheoryExtreme Value Theory

EVT allows all three types of tail behavior to EVT allows all three types of tail behavior to be described by the Generalized Pareto be described by the Generalized Pareto

Distribution (GPD)Distribution (GPD)

tau – scaletau – scale kappa-shapekappa-shape

Page 31: RA Fisher 1890 - 1962

Extreme Value TheoryExtreme Value Theory

EVT allows all three types of tail behavior to EVT allows all three types of tail behavior to be described by the Generalized Pareto be described by the Generalized Pareto

Distribution (GPD)Distribution (GPD)

Page 32: RA Fisher 1890 - 1962

Extreme Value TheoryExtreme Value Theory

The GPD not only provides the natural The GPD not only provides the natural alternative distribution for testing against alternative distribution for testing against

the exponential in this context, the null the exponential in this context, the null model of k=0 is nested which allows the model of k=0 is nested which allows the application of Maximum Likelihood and application of Maximum Likelihood and

Likelihood Ratio Testing.Likelihood Ratio Testing.

Page 33: RA Fisher 1890 - 1962

Maximum Likelihood and LRTMaximum Likelihood and LRT

Log-Likelihood for the GPD is given…Log-Likelihood for the GPD is given…

Page 34: RA Fisher 1890 - 1962

Maximum Likelihood and LRTMaximum Likelihood and LRT

Distribution of the LRT test statistic?Distribution of the LRT test statistic?

Although a common approximation is to Although a common approximation is to assume Chi-squared with one degree of assume Chi-squared with one degree of freedom, this does not appear to be the freedom, this does not appear to be the

case here.case here.

Distribution of the test statistic was Distribution of the test statistic was calculated using parametric bootstrap.calculated using parametric bootstrap.

Page 35: RA Fisher 1890 - 1962

Maximum Likelihood and LRTMaximum Likelihood and LRT

PowerPower

Probability of rejecting the null when the Probability of rejecting the null when the alternative is true.alternative is true.

1-P(Type II error)1-P(Type II error)

Can we hope to reject the null with a given Can we hope to reject the null with a given data set?data set?

Page 36: RA Fisher 1890 - 1962

Maximum Likelihood and LRTMaximum Likelihood and LRT

Page 37: RA Fisher 1890 - 1962

Maximum Likelihood and LRTMaximum Likelihood and LRT

Sensitivity AnalysisSensitivity Analysis

Determine the inflation of the Type I error Determine the inflation of the Type I error rate under violations of the null.rate under violations of the null.

If null is rejected, what is the chance that If null is rejected, what is the chance that rejection was due to inflation of alpha due rejection was due to inflation of alpha due to violations in the assumptions of the null to violations in the assumptions of the null

hypothesis?hypothesis?

Page 38: RA Fisher 1890 - 1962

Maximum Likelihood and LRTMaximum Likelihood and LRT

Violations of the Null AssumptionsViolations of the Null Assumptions

1.1. Small effect mutations have low Small effect mutations have low probability of fixation and therefore may probability of fixation and therefore may

not be observed.not be observed.

2.2. Observations include measurement error Observations include measurement error which may be normal or log-normal.which may be normal or log-normal.

Page 39: RA Fisher 1890 - 1962

Maximum Likelihood and LRTMaximum Likelihood and LRT

Page 40: RA Fisher 1890 - 1962

Maximum Likelihood and LRTMaximum Likelihood and LRT

GPD is stable to shifts of GPD is stable to shifts of threshold, analyze data threshold, analyze data relative to the smallest relative to the smallest

observed!observed!

Page 41: RA Fisher 1890 - 1962

Maximum Likelihood and LRTMaximum Likelihood and LRT

Page 42: RA Fisher 1890 - 1962

Maximum Likelihood and LRTMaximum Likelihood and LRT

If measurement error is not considered and If measurement error is not considered and our test rejects it is likely that we are safe our test rejects it is likely that we are safe in our conclusion assuming error is small.in our conclusion assuming error is small.

In the event that we fail to reject, it is likely In the event that we fail to reject, it is likely due to the loss of power encountered due to the loss of power encountered

when operating under a false null when operating under a false null hypothesis.hypothesis.

In this case, we must reanalyze our data In this case, we must reanalyze our data incorporating measurement error.incorporating measurement error.

Page 43: RA Fisher 1890 - 1962

Maximum Likelihood and LRTMaximum Likelihood and LRT

The likelihood equation of normal or The likelihood equation of normal or lognormal measurement error conditional lognormal measurement error conditional

on the GPD has no closed form ;(on the GPD has no closed form ;(

Page 44: RA Fisher 1890 - 1962

Maximum Likelihood and LRTMaximum Likelihood and LRT

Page 45: RA Fisher 1890 - 1962

Maximum Likelihood and LRTMaximum Likelihood and LRT

Standard optimization procedures fail to Standard optimization procedures fail to converge…converge…

Page 46: RA Fisher 1890 - 1962

Metropolis-Hastings and Bayesian MethodsMetropolis-Hastings and Bayesian Methods

MH AlgorithmMH Algorithm

Given X(t)Given X(t)

1. Generate Y(t) ~ g(y-x(t))1. Generate Y(t) ~ g(y-x(t))

2. Take X(t) = 2. Take X(t) = Y(t) with probability min(1,f(Y(t))/f(X(t)))Y(t) with probability min(1,f(Y(t))/f(X(t)))

X(t) otherwiseX(t) otherwise

If g(z) is normal (symmetric) then If g(z) is normal (symmetric) then convergence to posterior is assuredconvergence to posterior is assured

Page 47: RA Fisher 1890 - 1962

Metropolis-Hastings and Bayesian MethodsMetropolis-Hastings and Bayesian Methods

tau=1, kappa=-2, sigma=.1tau=1, kappa=-2, sigma=.1mean=-1.64mean=-1.64 95%CI=(-.826,-2.70)95%CI=(-.826,-2.70)

Page 48: RA Fisher 1890 - 1962

Metropolis-Hastings and Bayesian MethodsMetropolis-Hastings and Bayesian Methods

tau=1, kappa=-2, sigma=.1tau=1, kappa=-2, sigma=.1mean=.893mean=.893 95%CI=(.509,1.41)95%CI=(.509,1.41)

Page 49: RA Fisher 1890 - 1962

Metropolis-Hastings and Bayesian MethodsMetropolis-Hastings and Bayesian Methods

tau=1, kappa=-2, sigma=.1tau=1, kappa=-2, sigma=.1mean=-1.818mean=-1.818CI=(-1.47,-2.23)CI=(-1.47,-2.23)

Page 50: RA Fisher 1890 - 1962

Metropolis-Hastings and Bayesian MethodsMetropolis-Hastings and Bayesian Methods

tau=1, kappa=-2, sigma=.1tau=1, kappa=-2, sigma=.1mean=.083mean=.083 95%CI=(.034,.160)95%CI=(.034,.160)

Page 51: RA Fisher 1890 - 1962

Thanks to…Thanks to…

Darin RokytaDarin Rokyta

Paul JoycePaul Joyce

Holly WichmanHolly Wichman

Jim BullJim Bull

IBESTIBEST

NIHNIH

E. ColiE. Coli

Page 52: RA Fisher 1890 - 1962

ReferencesReferences

Gillespie, J. H. 1984. Molecular evolution over the mutational landscape. Evolution 38:1116–Gillespie, J. H. 1984. Molecular evolution over the mutational landscape. Evolution 38:1116–1129.1129.

Gillespie, J. H. 1991. The causes of molecular evolution. Oxford Univ. Press, New York.Gillespie, J. H. 1991. The causes of molecular evolution. Oxford Univ. Press, New York.

Gumbel, E. J. 1958. Statistics of Extremes. Columbia Univ. Press, New York.Gumbel, E. J. 1958. Statistics of Extremes. Columbia Univ. Press, New York.

Orr, H. A. 2002. The population genetics of adaptation: The adaptation of DNA sequences.Orr, H. A. 2002. The population genetics of adaptation: The adaptation of DNA sequences.

Evolution 56:1317–1330.Evolution 56:1317–1330.

Orr, H. A. 2003a. The distribution of fitness effects among beneficial mutations. GeneticsOrr, H. A. 2003a. The distribution of fitness effects among beneficial mutations. Genetics

163:1519–1526.163:1519–1526.

Rokyta, D. R., Joyce, P., Caudle, S. B., and Wichman, H. A. 2005. An empirical test of the Rokyta, D. R., Joyce, P., Caudle, S. B., and Wichman, H. A. 2005. An empirical test of the mutational landscape model of adaptation using a single-stranded DNA virus. Nat. Gen. mutational landscape model of adaptation using a single-stranded DNA virus. Nat. Gen. 37:441–444.37:441–444.

Rokyta, R., C.J. Beisel and P. Joyce. Properties of adaptive walks on uncorrelated Rokyta, R., C.J. Beisel and P. Joyce. Properties of adaptive walks on uncorrelated landscapes under strong selection and weak mutation. Journal of Theoretical Biology , landscapes under strong selection and weak mutation. Journal of Theoretical Biology , 243, (1), 114-120, 2006.243, (1), 114-120, 2006.

Beisel, C.J., R. Rokyta, H.A. Wichman, P. Joyce. Testing the extreme value domain of Beisel, C.J., R. Rokyta, H.A. Wichman, P. Joyce. Testing the extreme value domain of attraction for beneficial fitness effects. (Submitted Genetics)attraction for beneficial fitness effects. (Submitted Genetics)


Recommended