+ All Categories
Home > Documents > ASBP Training_Alignment and Phylogeny

ASBP Training_Alignment and Phylogeny

Date post: 06-Apr-2018
Category:
Upload: rodrigo-cordero
View: 225 times
Download: 0 times
Share this document with a friend

of 36

Transcript
  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    1/36

    III. Evolutionary Change in DNA Sequences

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    2/36

    Kinds of questions

    Indentification of INDIVIDUALS does the fish in the freezer match the carcass on

    the field

    Detecting RELATEDNESS can kin selection (i.e. high level of relatedness)explain cooperative courtship behavior?

    Assigning INDIVIDUALS to POPULATIONS do fish populations across the Bohol

    Sea show sufficient differentiation to allow us to identify unknown samples to asource population with a high level of confidence?

    Defining structure of POPULATIONS what forces could explain the geneticdifferentiation among populations of rabbit fish in western Philippines.

    Identifying SPECIES boundaries are these two forms of rock fish a single

    species or tow distinct speices

    PHYLOGENETIC TREES where do whales (Cetaceans) fit in a phlogenetictreee of mammalian groups.

    What is the grand arrangement of the tree of life in terms of kingdoms and phyla?

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    3/36

    WHY USE MOLECULAR MARKERS?

    Only genetically transmitted traits are informative to phylogenyestimation

    Molecular markers open the whole biological world to geneticscrutiny

    Genetic markers access an almost unlimited pool of geneticvariability

    Molecular data distinguishes Homology (common ancestry) fromAnalogy (convergence from different ancestors

    Provides a common yardstick for measuring divergence

    Facilitate Mechanistic appraisals of evolution

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    4/36

    PHYLOGENY and SYSTEMATICS

    How are taxa arranged in thetree of life?

    MORPHOLOGY

    MOLECULARAPPROACHES

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    5/36

    0.092 0.060 0.019 0.0075

    Gibbon

    Orangutan

    Human

    Chimpanzee

    Gorilla

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    6/36

    Terms

    Nodes (terminal observed taxa; internalhypotheticalancestors)

    Dichotomous or polytonous (uncertainty of relationships

    or multiple simultaneous branching) Rooted vs unrooted trees

    Clades and ingroups monophyly vs paraphyletic

    Ingroup and outgroup

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    7/36

    Nucleotide Difference Between Sequences

    A simple measure of the extent of sequence divergenceis p proportion of nucleotide sites at which the twosequences are different. This is estimated by:

    p = nd/n

    And is called the p distance. Although the overallnucleotide difference.

    ^

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    8/36

    Different types of nucleotide pairs between Xand Y

    Class Nucleotide Pair

    Identical nucleotidesfrequency

    AA TT CC GG Total

    O1 O2 O3 O4 O

    Transition-type pair frequency AG GA TC CT

    P11 P12 P21 P22 P

    Transversion-type pairfrequency

    AT TA AC CA

    Q11

    Q12

    Q21

    Q22

    Q

    TG GT CG GC

    Q31 Q32 Q41 Q42

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    9/36

    Transition/ Transversion ratio

    R = P / Q

    R is usually 0.5 2.0 in many nuclear genes. In mtDNA it can be as highas 15.

    R is subject to a large sampling error when the number of nucleotidesexamined (n) is small.

    V(R) = R2 (1/nP + 1/nQ)

    Assumption

    P11= P12 ; P21= P22; Q11= Q12; Q 21= Q22; Q31= Q32 ;Q41= Q42

    ^ ^ ^

    ^

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    10/36

    Estimation of the number of substitutions

    When p is large, it gives an underestimateWhy?

    It does not consider backward and parallel substitutions

    A number of mathematical models have beendeveloped to address this. We will discuss:

    Jukes and Cantors Method

    Kimuras Two-Parameter Method

    Tajima and Neis MethodTamuras Method

    Tamura and Neis Method

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    11/36

    Jukes-Cantor model

    Assumes that nucleotide substitution occurs at anynucleotide site with equal frequency

    Each site and nucleotide changes to one of the

    remaining nucleotides with a probability of per year

    Probability of change in nucleotide= rate of substitution

    r = 3

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    12/36

    Jukes-Cantor model

    A T C G

    A -

    T

    -

    C -

    G -

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    13/36

    Consider X and Y

    Let qt = proportion of identical nucleotides at time t

    Let pt = 1-qt = proportion of different nucleotides

    Probability that site with similar nucleotides in X and Y

    at t will be remain similar by t+1:(1-r)2 or approximately 1-2r

    Probability that site with different nucleotides in X and Ywill be similar by t+1:

    (1-r) * 2

    = 2r (1-r)/3 or approximately 2r/3

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    14/36

    Deriving a value for d

    qt+1 = (1-2r) qt + 2r/3 (1-qt)qt+1 qt =2r/3 8r/3 qt

    Using a continuous time model using dq/dtto

    represent qt+1 qt

    dq/dt =2r/3 8r/3 q

    The solution of this equation with initial

    conditions q=1 at t=0

    q= 1-3/4 (1-e-8rt/3)

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    15/36

    Under our present model, the expected number ofnucleotide usbstituions per site (d) for the twosequences is 2rt. Therefore, d is given by:

    d = -(3/4) ln [1-(4/3 p)]

    where; p= 1-q is the proportion of different nucelotidesbetween X and Y. An estimate d can be obtained by

    using p. The large-sample variance of d is:

    V(d) = 9p(1-p)

    (3-4p)2 n

    ^ ^

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    16/36

    Kimura Two Parameter model

    Considers the higher rate of transitional vs trasversionalnucleotide substitution and 2

    Total substitution rate per year r = + 2

    C

    A

    T

    G

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    17/36

    Kimura Two Parameter model

    A T C G

    A -

    T

    -

    C

    -

    G -

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    18/36

    Deriving d

    P = (1-2 e-4(+)t +e -8t)

    Q = (1-e-8 t)

    Where t is the time for transitional substitution:

    d = 2rt = 2t + 4rt

    = - ln (1-2P-Q)- ln (1-2Q)

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    19/36

    Variance of d (Kimuras model)

    Variance of d is:

    V(d) = 1/n [c12P + c3

    2Q (c1P + c3Q)2]

    Where;

    c1 = 1 , c2 = 1 , and c3 = (c1 +c2)/2

    1-2P-Q 1-2Q

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    20/36

    Notes:

    In both the Kimura and Jukes Cantor models, theexpected frequencies of A,C,T and G will eventuallybecome equal to 0.25.

    Both models make no assumption about the initialfrequencies. This property makes the two modelsapplicable to a wider condition than may other models.

    There is no need to assume the stationarity ofnucleotide frequencies for estimating d.

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    21/36

    Tajima-Nei (Equal-input) model

    A T C G

    A - gT gC gG

    T gA - gC gG

    C gA gT - gG

    G gA gT gC -

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    22/36

    Tajima-Nei (Equal-input) model

    Similar model was proposed independently byFelsenstein (1981) and Tajima and Nei (1982)

    It is necessary to assume stationarity of nucleotide

    frequencies for estimating the number of nucleotidesubstitutions:

    d = -b ln (1-p/b)

    where,

    b = [ 1- gi2 +p2/c]i=1

    4

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    23/36

    And c is given by:

    c = xij2

    2gigj

    Where xij (i

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    24/36

    Tamura model

    In Kimuras model, the four nucleotides eventuallybecome 0.25. In real data, however, nucleotidefrequencies are rarely equal and the GC content isoften quite different from 0.5. (Drosophila for example

    = 0.1) Tamuras (1992) model was developed as an extension

    of Kimuras modelto the case of low or high GC content.

    d = -h ln (1-P/ h-Q) () (1-h) ln (1-2Q)

    Where h = 2 (1-), and is the GC content

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    25/36

    Tamura model

    A T C G

    A - 2 1 1

    T 2 - 1 1

    C 2 2 - 1

    G 2 2 1 -

    1 = gG + gC2 = gA + gT

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    26/36

    Tamura-Nei model

    Hasegawa et al (1985) maximum likelihood method.This is a hybrid of Kimuras model, equal input model

    and considers both the transition/ transversion and GC

    content biases mentioned earlier. The formula for d isquite complicated but similar to Tamura and Neismodel of which it is a special case.

    d = - 2gAgG ln [ 1- gR P1 1 Q]gR 2gAgG 2gR

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    27/36

    d = - 2gAgG ln [ 1- gR P1 1 Q]

    gR 2gAgG 2gR

    - 2gTgC ln [ 1- gY P2 1 Q]gY 2gTgC 2gY

    - 2 [gRgY gAgGgR gTgCgR] ln [ 1 - 1 Q]

    gR gY 2gRgY

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    28/36

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    29/36

    Gamma Distances

    For our list of distances, the rate of nucleotidesubstitution is assumed to be the same for allnucleotide sites. In reality, this assumption rarely holds,and the rate varies from site to site.

    Statistical analyses of rate substion at differentnucleotide sties suggested that the rate variationapproximately follows a gamma distribution

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    30/36

    Comparison of DifferentDistance Measures

    0

    0.5

    1

    1.5

    2

    0

    0

    .3

    0.75

    1

    .2

    1

    .5

    1

    .8

    Expected number of substitutions per s ite

    Estimatednu

    mberof

    substitutions

    persite

    Tamura-NeiTamuraKimuraJukes-Cantorp

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    31/36

    Alignment of NucleotideSequences

    ATGCGTCGTT

    ATCCGCGAT

    ATGCGTCGTTATCCG_CGAT

    ATGC_GTCGTT

    AT_CCG_CGAT

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    32/36

    Methods

    Similarity index - Needleman and Wunsch (1970)

    Alignment distance Sellers (1974)

    E = Min w1 +w2

    w1 andw2 are penalties for a mismatch and a gap (e.g 1and 4). The gap penalty is a function of the gap length.Similarly, mismatches are can be divided into

    transitional and transversional mismatches and differentpenalities are given to them.

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    33/36

    Alignment of Multiple Sequences

    Customary to use progressive alignment algorithm -pairs of sequences with small distances are first alignedand the alignment of more distantly related sequencesis done progressively for larger and larger groups.

    Pairs of sequences are aligned using the progressivealignment algorithm. Groups of sequences are alignedwith each other using a profile alignment algorithm

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    34/36

    Handling sequence gaps in estimation ofevolutionary distances

    Complete deletion delete all sites with gaps from thedata analysis. Generally desirable because differentregions of DNA sequences oftern evolve differently.

    Pairwise-deletion if the number of nucloties invovledin the gap is small and gaps are distributed more orless at random, distances may be computed from pairsof sequences ignoring only those gaps that in the two

    sequences compared

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    35/36

    ExampleA-AC-GGAT-AGGA-ATAAA

    AT-CC?GATAA?GAAAC-A

    ATTCC-GA/TACGATA-AGA

    Differences/Comparison

    Option Sequence (1,2) (1,3) (2,3)

    Complete- deletion1 A C GA A GA A A A 1/10 0/10 1/10

    2 A C GA A GA A C A

    3 A C GA A GA A A A

    Pairwise-deletion1 A-AC-GGAT-AGGA-ATAAA 2/12 3/12 3/14

    2 AT-CC?GATAA?GAAAAC-A

    3 ATTCC-GA?TACGATA-AGA

  • 8/3/2019 ASBP Training_Alignment and Phylogeny

    36/36

    Assignment:

    Reading assignments

    ClustalX-

    http://inn-prot.weizmann.ac.i./software/ClustalX.html

    http://www.biozentrum.unibas.ch/`biphit/slustal/ClustalX_help.html

    Mega 2

    http://www.megasoftware.net/

    http://inn-prot.weizmann.ac.i./software/ClustalX.htmlhttp://www.biozentrum.unibas.ch/%60biphit/slustal/ClustalX_help.htmlhttp://www.megasoftware.net/http://www.megasoftware.net/http://www.biozentrum.unibas.ch/%60biphit/slustal/ClustalX_help.htmlhttp://inn-prot.weizmann.ac.i./software/ClustalX.htmlhttp://inn-prot.weizmann.ac.i./software/ClustalX.htmlhttp://inn-prot.weizmann.ac.i./software/ClustalX.html

Recommended