RA Fisher 1890 - 1962

RA FisherRA Fisher1890 - 19621890 - 1962

““Natural selection is a Natural selection is a mechanism for mechanism for generating an generating an

exceedingly high exceedingly high degree of degree of

improbability”improbability”

Testing for the Extreme Value Testing for the Extreme Value Domain of Attraction of Beneficial Domain of Attraction of Beneficial

Fitness EffectsFitness Effects

Craig J. BeiselCraig J. BeiselBioinformatics and Computational BiologyBioinformatics and Computational Biology

Department of MathematicsDepartment of [email protected]

mailto:[email protected]

ConceptsConcepts

Natural SelectionThe differential survival and reproduction of

individuals within a population based on hereditary characteristics.

ConceptsConcepts

AdaptationThe adjustment of an organism or population to a

new or altered environment through genetic changes brought about by natural selection.

ConceptsConcepts

PhenotypeThe overall attributes of an organism arising due to

the interaction of its genotype with the environment.

ConceptsConcepts

GenotypeThe specific genetic makeup of an individual

ConceptsConcepts

FitnessDescribes the ability of a genotype to reproduce.

More formally, it is defined as the ratio of the counts of a genotype before and after one generation.

ConceptsConcepts

Fitness LandscapeA function mapping genotype into fitness.

ConceptsConcepts

Fitness DistributionThe distribution of fitness for every possible

genotype in a fixed environment.

Lethal

Moderate

High

Mutational Landscape ModelMutational Landscape Model

John Maynard Smith John Maynard Smith (1920 – 2004)(1920 – 2004)

First remarked that First remarked that adaptation does not adaptation does not

take place in take place in phenotypic space, but phenotypic space, but in sequence space…in sequence space…


Gillespie (1983)Gillespie (1983)

Given a sequence of nucleotides of length L,Given a sequence of nucleotides of length L,

There are 4There are 4LL possible sequences. possible sequences.

Each sequence has 3L neighboring sequences Each sequence has 3L neighboring sequences which are exactly one point mutation away.which are exactly one point mutation away.


Additionally, if we assume Strong Selection Additionally, if we assume Strong Selection and Weak Mutation (SSWM) then we can and Weak Mutation (SSWM) then we can

ignore the possibility of clonal interference.ignore the possibility of clonal interference.

Formally 2Ns >>1, NFormally 2Ns >>1, Nμ<1μ<1

Therefore new mutants will fix (or not) in the Therefore new mutants will fix (or not) in the population before the next mutant arises.population before the next mutant arises.

Also, double mutants and neutral/deleterious Also, double mutants and neutral/deleterious mutations can be ignored.mutations can be ignored.


Consider a sequence in an environment where it Consider a sequence in an environment where it is currently the most fit.is currently the most fit.

A small change occurs in the environment which A small change occurs in the environment which shifts it to be the shifts it to be the iithth most fit sequence among most fit sequence among

its one-step mutant neighbors where its one-step mutant neighbors where ii is small. is small.


There are then There are then i-1i-1 more fit sequences which more fit sequences which the population could move to.the population could move to.

Notice that the fitnesses of these sequences Notice that the fitnesses of these sequences are in the tail of the fitness distribution.are in the tail of the fitness distribution.


We would like to find the probability of the We would like to find the probability of the population fixing mutant population fixing mutant jj when starting with when starting with

sequence sequence ii..

Since we are dealing with only the tail of the Since we are dealing with only the tail of the fitness distribution we can apply EVT.fitness distribution we can apply EVT.

Orr’s One Step ModelOrr’s One Step Model

AssumptionsAssumptions

The fitness distribution is in the Gumbel The fitness distribution is in the Gumbel domain of attraction and therefore the domain of attraction and therefore the

fitnesses of the fitnesses of the i-1i-1 more fit one-step mutants more fit one-step mutants can be considered to be drawn from an can be considered to be drawn from an

‘exponential’ distribution by GPD.‘exponential’ distribution by GPD.

This will allow a result which is independent of This will allow a result which is independent of the underlying fitness distribution.the underlying fitness distribution.


LemmaLemma

Let XLet X11,…, X,…, Xnn be iid observations where X be iid observations where Xii~Exp ~Exp and Xand X(1)(1),…,X,…,X(n)(n) be their corresponding order be their corresponding order

statistics.statistics.

Then the spacings defined Then the spacings defined ΔXΔXii = X = X(i-1)(i-1) – X – X(i) (i) are are distributed exponential anddistributed exponential and

E(ΔXE(ΔXii)) = ΔX= ΔX11 / i / i

Sukhatme (1937)Sukhatme (1937)


SinceSince j j 2s2sjj (Haldane 1927) (Haldane 1927)


Taking the expected value…Taking the expected value…


Notice, we have an expression for the Notice, we have an expression for the expected transition probability which is expected transition probability which is

independent of the fitness of the individual independent of the fitness of the individual sequences and depends only on sequences and depends only on ii and and jj..


Can this model be validated empirically?Can this model be validated empirically?


ExperimentalExperimentalEvolutionEvolution

Natural Isolate ID11Natural Isolate ID11~3% differ from G4~3% differ from G4

MicroviridaeMicroviridaeHost - E. ColiHost - E. Coli

5577 bp5577 bp


20 one-step walks20 one-step walks9 observed mutations9 observed mutations

Rokyta et al (2005)Rokyta et al (2005)


Concluded Orr’s transition probabilities did not Concluded Orr’s transition probabilities did not explain data as well as Wahl model even explain data as well as Wahl model even

after correcting the model for mutation bias.after correcting the model for mutation bias.


Where did Orr go wrong?Where did Orr go wrong?

Perhaps, the tail of the fitness distribution is Perhaps, the tail of the fitness distribution is not in the Gumbel domain of attraction and not in the Gumbel domain of attraction and

therefore not exponentially distributed?therefore not exponentially distributed?

Extreme Value TheoryExtreme Value Theory


Field of statistical theory Field of statistical theory which attempts to which attempts to

describe the distribution describe the distribution of extreme values of extreme values

(maxima and minima) of (maxima and minima) of a sample from a given a sample from a given

probability distribution.probability distribution.


Notice that extreme values of a sample Notice that extreme values of a sample generally fall in the tail of the underlying generally fall in the tail of the underlying probability distribution. For example the probability distribution. For example the maximum of a sample of size 10 from a maximum of a sample of size 10 from a

standard normal distribution…standard normal distribution…


Since the tail is all that must be considered, Since the tail is all that must be considered, many results of extreme value theory are many results of extreme value theory are independent of the underlying probability independent of the underlying probability

distribution.distribution.

In fact, EVT shows almost all probability In fact, EVT shows almost all probability distributions can be classified into three distributions can be classified into three

groups by their tail behavior.groups by their tail behavior.


WeibullWeibull Finite Tail distributions Finite Tail distributions

These three types are…These three types are…

GumbelGumbel Most Common Distributions Most Common DistributionsExponential, Normal, Gamma, etc.Exponential, Normal, Gamma, etc.

FrFrééchetchet Heavy Tail Distributions Heavy Tail DistributionsCauchyCauchy


EVT allows all three types of tail behavior to EVT allows all three types of tail behavior to be described by the Generalized Pareto be described by the Generalized Pareto

Distribution (GPD)Distribution (GPD)

tau – scaletau – scale kappa-shapekappa-shape


EVT allows all three types of tail behavior to EVT allows all three types of tail behavior to be described by the Generalized Pareto be described by the Generalized Pareto

Distribution (GPD)Distribution (GPD)


The GPD not only provides the natural The GPD not only provides the natural alternative distribution for testing against alternative distribution for testing against

the exponential in this context, the null the exponential in this context, the null model of k=0 is nested which allows the model of k=0 is nested which allows the application of Maximum Likelihood and application of Maximum Likelihood and

Likelihood Ratio Testing.Likelihood Ratio Testing.

Maximum Likelihood and LRTMaximum Likelihood and LRT

Log-Likelihood for the GPD is given…Log-Likelihood for the GPD is given…


Distribution of the LRT test statistic?Distribution of the LRT test statistic?

Although a common approximation is to Although a common approximation is to assume Chi-squared with one degree of assume Chi-squared with one degree of freedom, this does not appear to be the freedom, this does not appear to be the

case here.case here.

Distribution of the test statistic was Distribution of the test statistic was calculated using parametric bootstrap.calculated using parametric bootstrap.


PowerPower

Probability of rejecting the null when the Probability of rejecting the null when the alternative is true.alternative is true.

1-P(Type II error)1-P(Type II error)

Can we hope to reject the null with a given Can we hope to reject the null with a given data set?data set?



Sensitivity AnalysisSensitivity Analysis

Determine the inflation of the Type I error Determine the inflation of the Type I error rate under violations of the null.rate under violations of the null.

If null is rejected, what is the chance that If null is rejected, what is the chance that rejection was due to inflation of alpha due rejection was due to inflation of alpha due to violations in the assumptions of the null to violations in the assumptions of the null

hypothesis?hypothesis?


Violations of the Null AssumptionsViolations of the Null Assumptions

1.1. Small effect mutations have low Small effect mutations have low probability of fixation and therefore may probability of fixation and therefore may

not be observed.not be observed.

2.2. Observations include measurement error Observations include measurement error which may be normal or log-normal.which may be normal or log-normal.



GPD is stable to shifts of GPD is stable to shifts of threshold, analyze data threshold, analyze data relative to the smallest relative to the smallest

observed!observed!



If measurement error is not considered and If measurement error is not considered and our test rejects it is likely that we are safe our test rejects it is likely that we are safe in our conclusion assuming error is small.in our conclusion assuming error is small.

In the event that we fail to reject, it is likely In the event that we fail to reject, it is likely due to the loss of power encountered due to the loss of power encountered

when operating under a false null when operating under a false null hypothesis.hypothesis.

In this case, we must reanalyze our data In this case, we must reanalyze our data incorporating measurement error.incorporating measurement error.


The likelihood equation of normal or The likelihood equation of normal or lognormal measurement error conditional lognormal measurement error conditional

on the GPD has no closed form ;(on the GPD has no closed form ;(



Standard optimization procedures fail to Standard optimization procedures fail to converge…converge…

Metropolis-Hastings and Bayesian MethodsMetropolis-Hastings and Bayesian Methods

MH AlgorithmMH Algorithm

Given X(t)Given X(t)

1. Generate Y(t) ~ g(y-x(t))1. Generate Y(t) ~ g(y-x(t))

2. Take X(t) = 2. Take X(t) = Y(t) with probability min(1,f(Y(t))/f(X(t)))Y(t) with probability min(1,f(Y(t))/f(X(t)))

X(t) otherwiseX(t) otherwise

If g(z) is normal (symmetric) then If g(z) is normal (symmetric) then convergence to posterior is assuredconvergence to posterior is assured


tau=1, kappa=-2, sigma=.1tau=1, kappa=-2, sigma=.1mean=-1.64mean=-1.64 95%CI=(-.826,-2.70)95%CI=(-.826,-2.70)


tau=1, kappa=-2, sigma=.1tau=1, kappa=-2, sigma=.1mean=.893mean=.893 95%CI=(.509,1.41)95%CI=(.509,1.41)


tau=1, kappa=-2, sigma=.1tau=1, kappa=-2, sigma=.1mean=-1.818mean=-1.818CI=(-1.47,-2.23)CI=(-1.47,-2.23)


tau=1, kappa=-2, sigma=.1tau=1, kappa=-2, sigma=.1mean=.083mean=.083 95%CI=(.034,.160)95%CI=(.034,.160)

Thanks to…Thanks to…

Darin RokytaDarin Rokyta

Paul JoycePaul Joyce

Holly WichmanHolly Wichman

Jim BullJim Bull

IBESTIBEST

NIHNIH

E. ColiE. Coli

ReferencesReferences

Gillespie, J. H. 1984. Molecular evolution over the mutational landscape. Evolution 38:1116–Gillespie, J. H. 1984. Molecular evolution over the mutational landscape. Evolution 38:1116–1129.1129.

Gillespie, J. H. 1991. The causes of molecular evolution. Oxford Univ. Press, New York.Gillespie, J. H. 1991. The causes of molecular evolution. Oxford Univ. Press, New York.

Gumbel, E. J. 1958. Statistics of Extremes. Columbia Univ. Press, New York.Gumbel, E. J. 1958. Statistics of Extremes. Columbia Univ. Press, New York.

Orr, H. A. 2002. The population genetics of adaptation: The adaptation of DNA sequences.Orr, H. A. 2002. The population genetics of adaptation: The adaptation of DNA sequences.

Evolution 56:1317–1330.Evolution 56:1317–1330.

Orr, H. A. 2003a. The distribution of fitness effects among beneficial mutations. GeneticsOrr, H. A. 2003a. The distribution of fitness effects among beneficial mutations. Genetics

163:1519–1526.163:1519–1526.

Rokyta, D. R., Joyce, P., Caudle, S. B., and Wichman, H. A. 2005. An empirical test of the Rokyta, D. R., Joyce, P., Caudle, S. B., and Wichman, H. A. 2005. An empirical test of the mutational landscape model of adaptation using a single-stranded DNA virus. Nat. Gen. mutational landscape model of adaptation using a single-stranded DNA virus. Nat. Gen. 37:441–444.37:441–444.

Rokyta, R., C.J. Beisel and P. Joyce. Properties of adaptive walks on uncorrelated Rokyta, R., C.J. Beisel and P. Joyce. Properties of adaptive walks on uncorrelated landscapes under strong selection and weak mutation. Journal of Theoretical Biology , landscapes under strong selection and weak mutation. Journal of Theoretical Biology , 243, (1), 114-120, 2006.243, (1), 114-120, 2006.

Beisel, C.J., R. Rokyta, H.A. Wichman, P. Joyce. Testing the extreme value domain of Beisel, C.J., R. Rokyta, H.A. Wichman, P. Joyce. Testing the extreme value domain of attraction for beneficial fitness effects. (Submitted Genetics)attraction for beneficial fitness effects. (Submitted Genetics)

Date post:	13-Jan-2016
Category:	Documents
Upload:	mulan
View:	25 times
Download:	0 times

RA Fisher 1890 - 1962

Documents