Comparative computational biology

Post on 24-Feb-2016

36 views 0 download

Tags:

description

Comparative computational biology. Positive selection. What is positive selection?. Positive selection is selection of a particular trait - and the increased frequency of an allele in a population. Intraspecific level Positive selection driving continuous adaptation to changes - PowerPoint PPT Presentation

transcript

Comparative computational biology

Positive selection

What is positive selection?

Positive selection is selection of a particular trait

- and the increased frequency of an allele in a population

Species B Species CSpecies A

Recurrent adaptation

Intraspecific levelPositive selection driving continuous adaptation to changes

in the environment

Woolhouse et al, 2002. Nat. Genet

Positive selection and recurrent selective sweeps

Positive selection and dynamics of polymorphisms

Allele frequencies in host-pathogen interactions

Host

Pathogen

Fixation of alleles

Species B Species CSpecies A

Diversifying positive selection

Interspecific levelPositive selection driving divergence

Why is it interesting to identify traits which

have undergone or are under positive selection?

FunctionEvolution

Environment……

How can we detect positive selection?

by comparing homologous sequences from

different individuals and identify an unexpectedly high proportion of

non-neutral mutational changes

Positive selection acts on beneficial mutations

…… which give rise to changes in the amino acid

sequences of proteins

Rate of synonymous mutations dS or KS

Rate of non-synonymous mutations dN or KA

To measure positive selection:

1 2 3 4 5

Pro Phe Gly Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA

Pro Phe Ala Leu Val How many possible non-synonymous and

synonymous mutations??

We need to know the degeneracy of each codons to compute the number of possible synonymous mutations in codons (S):

ProlineCCUCCCCCACCG

Counted as S=0 (N=1)

A non-degenerate site (all possible nucleotides will result in an amino acid change):

A fourfold degenerate site (all possible nucleotides can be tolerated without an an amino acid change :

Counted as S=1 (N=0)Possible synonymous changes for proline:S = 0 + 0 + 1 = 1

We need to know the degeneracy of each codons to compute the number of possible synonymous mutations in codons (S):

PhenylalanineUUUUUC

Counted as S=0 (N=1)

A non-degenerate site (all possible nucleotides will result in an amino acid change):

A two-fold degenerate site (two possible nucleotides can be tolerated without an an amino acid change:

Counted as S=1/3 (N=2/3)Possible synonymous changes for phenylalanineS = 0 + 0 + 1/3 = 1/3

1. Proline S = 0 + 0 + 1 = 1

2. Phenylalanine S = 0 + 0 + 1/3 = 1/3

3. For Glycine S = 0 + 0 + 1 = 1, 4. Leucine for UUA, S = 1/3 + 0 + 1/3 = 2/3 for CUA, S = 1/3 + 0 + 1 = 4/3 Take the average of these: S = 1

5. Phenylalanine for UUU, S = 1/3 Valine, S = 1 Take average: S = 2/3 For the whole sequence: S = 1 + 1/3 + 1 + 1 + 2/3 = 4 N = total number of sites: 15 - 4 = 11

Counts of possible synonymous sites for each gene (S) 1 2 3

4 5Pro Phe Gly

Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA

Pro Phe Ala Leu Val

Counts of synonymous and non-synonymous changes for each gene (Sd and Nd)

1 2 3 4 5

Pro Phe Gly Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA

Pro Phe Ala Leu Val

Calculate Sd and Nd for each codon. 1. Sd = 0, Nd = 0

2. Sd = 1, Nd = 0

3. Sd = 0, Nd = 1

4. Sd = 1, Nd = 0

5. this could happen in two ways UUU --> GUU --> GUANd = 1 Sd = 1 Route 1: Sd = 1, Nd = 1

UUU --> UUA --> GUANd = 1 Nd = 1 Route 2: Sd = 0, Nd = 2 Average: Sd = 0.5, Nd = 1.5

Total Sd = 2.5, total Nd = 2.5

1 2 3 4 5

Pro Phe Gly Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA

Pro Phe Ala Leu Val Possible synonymous sites: 4

Possible non-synonymous sites: 11Synonymous changes: 2.5Non-synonymous changes: 2.5

Synonymous rate: KS=Sd / S = 2.5/4 = 0.625

Non-synonymous rate: KA= Nd / N = 2.5/11 = 0.227

Finally, we can compute the rates of syn and non-syn changes

Evaluating the effect of positive selection by computing

the RATIO of syn and non-syn changes

Neutral evolution

Purifying selection

Positive selection

f.ex. immune related genes

f.ex. housekeeping genes

f.ex. pseudogenes

Ks or dS

Ka or dN

KA or dN: rate of non-synonymous divergence

KS or dS: rate of synonymous divergence

KA>KS

KA<KS

KA=KS

1 2 3 4 5

Pro Phe Gly Leu PheSeq 1 CCC UUU GGG UUA UUUSeq 2 CCC UUC GAG CUA GUA

Pro Phe Ala Leu Val Possible synonymous sites: 4

Possible non-synonymous sites: 11Synonymous changes: 2.5Non-synonymous changes: 2.5

Synonymous rate: KS=Sd / S = 2.5/4 = 0.625

Non-synonymous rate: KA= Nd / N = 2.5/11 = 0.227

Finally, we can compute the rates of syn and non-syn changes

Ka/Ks = 0.36

Species A Species B Species C

PN / Ps

Positive selection affects gene evolution during species divergence and during evolution of a population

Ka/Ks

Within species:

Non-synonymous proportion of polymorphisms: P N = Nd / N

Synonymous proportion of polymorphisms: PS = Sd / S

Species A Species B Species C

PN / Ps

Positive selection affects gene evolution during species divergence and during evolution of a population

Ka/Ks

We can use the two ratios Ka/Ks and Pn/Ps

to infer when selection has acted (or is acting)

Past selection (during speciation)

Present day selection (adaptation)

McDonald Kreitman (MK) test to contrast within and between species variation

Question: Are adaptive mutations in the alcohol dehydrogenase in Drosophila species a result of species divergence or current positive selection?

Repl: Nonsynonymous, Syn: SynonymousFixed: Substitution, Poly: Polymorphisms

Drosophila dataset alcohol dehydrogenase

The proportion of non-synonymous fixed differences between species much higher than the proportion of non-synonymous polymorphisms

MK test contrasts within and between species synonymous and non-synonymous differences

Contingency table can be tested by a G-test

Conclusion from MK-test:

Adh locus in Drosophila has accumulated

adaptive mutations (been under positive

selection) when the Drosophila species

diverged

One problem with the “counting methods”

Sometimes the signal of selection is not very strong

Positive selection on one or few particular codonsor in one particular branch

Evolutionary model to detect selection in particular codons or branches