Evolutionary Change in Nucleotide Sequences

Post on 11-Feb-2016

55 views 0 download

Tags:

description

Evolutionary Change in Nucleotide Sequences. Dan Graur. - PowerPoint PPT Presentation

transcript

1

Evolutionary Change in Nucleotide Sequences

Dan Graur

2

So far, we described the evolutionary process as a series of gene substitutions in which new alleles, each arising as a mutation in a single individuala single individual, progressively increase their frequency and ultimately become fixed in the populationin the population.

3

We may look at the process from a different point of view.An allele that becomes fixed is different in its sequence from the allele that it replaces. That is, the substitution of a new allele for an old one is the substitution of a new sequence for a previous sequence.

1 2 3

4

If we use a time scale in which one time unit is larger than the time of fixation, then the DNA sequence at any given locus will appear to change with time. 1. actgggggtaaactatcggtatagatcat2. actgggggttaactatcggtatagatcat2. actgggggttaactatcggtatagatcat2. actgggggttaactatcggtatagatcat3. actgggggtgaactatcggtatagatcat4. actgggggtgaactatcggtacagatcat

5

1. actgggggtaaactatcggtatagatcat2. actgggggttaactatcggtatagatcat2. actgggggttaactatcggtatagatcat2. actgggggttaactatcggtatagatcat3. actgggggtgaactatcggtatagatcat4. actgggggtgaactatcggtacagatcat

Nucleotide Substitution

6

To study the dynamics of nucleotide substitution, we must make several assumptions regarding the probability of substitution of a nucleotide by another.

7

Jukes & Cantor’s Jukes & Cantor’s one-parameter modelone-parameter model

8

Assumption:Assumption:• Substitutions occur with equal probabilities Substitutions occur with equal probabilities among the four nucleotide types.among the four nucleotide types.

9

If the nucleotide residing at a certain site in a DNA sequence is A at time 0, what is the probability, PA(t), that this site will be occupied by A at time t?

10

PA(1) =1− 3α

Since we start with A, PA(0) = 1. At time 1, the probability of still having A at this site is

where 3 is the probability of A changing to T, C, or G, and 1 – 3 is the probability that A has remained unchanged.

11

1. The nucleotide has remained unchanged from time 0 to time 2.

To derive the probability of having A at time 2, we consider two possible scenarios:

2. The nucleotide has changed to T, C or G at time 1, but has reverted to A at time 2.

12€

PA(2) = (1− 3α )PA(1) +α 1− PA(1) ⎡ ⎣ ⎢

⎤ ⎦ ⎥

13

PA(t+1)=(1−3a)PA(t)+a1−PA(t)⎡ ⎣

⎤ ⎦

The following equation applies to any t and any t+1

14

ΔPA(t) = PA(t + 1) − PA(t) = −3αPA(t) +α 1 − PA(t) ⎡ ⎣ ⎢

⎤ ⎦ ⎥= −4αPA(t) +α

We can rewrite the equation in terms of the amount of change in PA(t) per unit time as:

15

dPA(t)

dt=−4αPA(t) +α

We approximate the discrete-time process by a continuous-time model, by regarding ΔPA(t) as the rate of change at time t.

16

The solution is:

PA(t) =14

+ PA(0) −14

⎛ ⎝ ⎜

⎞ ⎠ ⎟e−4α t

17

In the Jukes and Cantor model, the probability of each of the four nucleotides at equilibrium (t = ) is 1/4.

PA(0) = 0 : PA(t) =14

−14

e−4α t

PA(0) = 1 : PA(t) =14

+34

e−4α t

18

So far, we treated PA(t) as a probability.

However, PA(t) can also be interpreted as the frequency of A in a DNA sequence at time t.

For example, if we start with a sequence made of adenines only, then PA(0) = 1, and PA(t) is the expected frequency of A in the sequence at time t.

The expected frequency of A in the sequence at equilibrium will be 1/4, and so will the expected frequencies of T, C, and G.

19

After reaching equilibrium no further change in the nucleotide frequencies is expected to occur. However, the actual frequencies of the nucleotides will remain unchanged only in DNA sequences of infinite length. In practice, fluctuations in nucleotide frequencies are likely to occur.

20

21

Kimura’s two-

parameter model

22

Assumptions:

•The rate of transitional substitution at each nucleotide site is per unit time.

•The rate of each type of transversional substitution is per unit time.

23

α ⁄ β ≈ 5−10

24

If the nucleotide residing at a certain site in a DNA sequence is A at time 0, what is the probability, PA(t), that this site will be occupied by A at time t?

25

PAA(1) =1−α−2β

After one time unit the probability of A changing into G is , the probability of A changing into C is and the probability of A changing into T is . Thus, the probability of A remaining unchanged after one time unit is:

26

To derive the probability of having A at time 2, we consider four possible scenarios:

27

1. A remained unchanged at t = 1 and t = 2

28

2. A changed into G at t = 1 and reverted by a transition to A at t = 2

29

3. A changed into C at t = 1 and reverted by a transversion to A at t = 2

30

4. A changed into T at t = 1 and reverted by a transversion to A at t = 2

31

X(t) = 14+1

4e−4βt+1

2e−2(α+β)t

X(t) = The probability that a nucleotide at a site at time t is identical to that at time 0

At equilibrium, the equation reduces to X() = 1/4. Thus, as in the case of Jukes and Cantor's model, the equilibrium frequencies of the four nucleotides are 1/4.

3 probabilities3 probabilities

32

Y(t) =14+1

4e−4βt−1

2e−2(α+β)t

Y(t) = The probability that the initial nucleotide and the nucleotide at time t differ from each other by a transition.

Because of the symmetry of the substitution scheme, Y(t) = PAG(t) = PGA(t) = PTC(t) = PCT(t).

3 probabilities3 probabilities

33

Z(t) = 14−1

4e−4βt

Z(t) = The probability that the nucleotide at time t and the initial nucleotide differ by a specific type of transversion is given by

3 probabilities3 probabilities

34

Each nucleotide is subject to two types of transversion, but only one type of transition. Therefore, the probability that the initial nucleotide and the nucleotide at time t differ by a transversion is twice the probability that differ by a transition

X(t) + Y(t) + 2Z(t) = 1

35

Problem with the “t” approach. Too long even for Methuselah, who is said to have lived 187 years (Genesis 5:25)

36

37

38

=

39

40

NUMBER OF NUCLEOTIDE NUMBER OF NUCLEOTIDE SUBSTITUTIONS BETWEEN SUBSTITUTIONS BETWEEN

TWO DNA SEQUENCESTWO DNA SEQUENCES

41

After two nucleotide sequences diverge from each other, each of them will start accumulating nucleotide substitutions.

If two sequences of length N differ from each other at n sites, then the proportion of differences, n/N, is referred to as the degree of divergence or Hamming distance.

Degrees of divergence are usually expressed as percentages (n/N 100%).

42

43

The observed number of differences is likely to be smaller than the actual number of substitutions due to multiple hits at the same site.

44

13 substitution

s=3

differences

45

46

Number of substitutions between two

noncoding (NOT protein coding)

sequences

47

The one-parameter

model

p= 34 1−e−8αt⎛

⎝ ⎞ ⎠

The probability that the two sequences are different at a site at time t is p = 1 – I(t).

Where is the probability of a change from one nucleotide to another in one unit time, and t is the time of divergence.

48

The one-parameter

model

p= 34 1−e−8αt⎛

⎝ ⎞ ⎠ Problem: t and are usually not known. Instead, we compute K, which is the number of substitutions per site since the time of divergence between the two sequences.

49

L = number of sites compared between the two sequences.

50

In the two-parameter model:

The differences between two sequences are classified into transitions and transversions.

P = proportion of transitional differencesQ = proportion of transversional differences

51

V(K)=1L

P1

1−2P−Q⎛ ⎝ ⎜ ⎞

⎠ 2

+Q 12−4P−2Q

+ 12−4Q

⎛ ⎝ ⎜ ⎞

⎠ 2

− P1−2P−Q

+ Q2−4P−2Q

+ Q2−4Q

⎛ ⎝ ⎜ ⎞

⎠ 2⎡

⎣ ⎢

⎦ ⎥

52

53

Numerical example (2P-model)

54

There are substitution

schemes with more than two

parameters!