+ All Categories
Home > Documents > Joe Felsenstein Department of Genome Sciences and … · Joe Felsenstein Department of Genome...

Joe Felsenstein Department of Genome Sciences and … · Joe Felsenstein Department of Genome...

Date post: 20-Oct-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
52
Reconstructing phylogenies: how? how well? why? Joe Felsenstein Department of Genome Sciences and Department of Biology University of Washington, Seattle Reconstructing phylogenies: how? how well? why? – p.1/29
Transcript
  • Reconstructing phylogenies: how? how well? why?

    Joe Felsenstein

    Department of Genome Sciences and Department of Biology

    University of Washington, Seattle

    Reconstructing phylogenies: how? how well? why? – p.1/29

  • A review that asks these questions

    What are some of the strengths and weaknesses of different ways of

    reconstructing evolutionary trees (phylogenies)?

    How can we find out how accurate we may have been inreconstructing the phylogeny?

    Why do we want to reconstruct it? What are phylogenies used for?

    Reconstructing phylogenies: how? how well? why? – p.2/29

  • A review that asks these questions

    What are some of the strengths and weaknesses of different ways of

    reconstructing evolutionary trees (phylogenies)?

    How can we find out how accurate we may have been inreconstructing the phylogeny?

    Why do we want to reconstruct it? What are phylogenies used for?

    Reconstructing phylogenies: how? how well? why? – p.2/29

  • A review that asks these questions

    What are some of the strengths and weaknesses of different ways of

    reconstructing evolutionary trees (phylogenies)?

    How can we find out how accurate we may have been inreconstructing the phylogeny?

    Why do we want to reconstruct it? What are phylogenies used for?

    Reconstructing phylogenies: how? how well? why? – p.2/29

  • What does “tree space” (with branch lengths) look like?

    t1t2

    t1t2

    an example: three species with a clock

    A B C

    t 1

    t 2

    t 1

    t 2

    OK

    not possible

    trifurcation

    etc.

    when we consider all three possible topologies, the space looks like:

    Reconstructing phylogenies: how? how well? why? – p.3/29

  • For one tree topology

    The space of trees varying all 2n − 3 branch lengths, each a nonegativenumber, defines an “orthant" (open corner) of a 2n − 3-dimensional realspace:

    A

    B

    C

    D

    E

    F

    v1

    v

    vv

    v

    v

    v23

    v4

    5

    6

    78

    v9

    wall wall

    floor v9

    Reconstructing phylogenies: how? how well? why? – p.4/29

  • Through the looking-glass

    Shrinking one of the n − 1 interior branches to 0, we arrive at atrifurcation:

    A

    B

    C

    D

    E

    F

    v1

    v

    vv

    v

    v

    v23

    v4

    5

    6

    78

    v9

    A

    B

    C

    E

    F

    v1

    v

    vv

    vv2

    3

    D

    v

    v4

    56

    78

    A

    B

    C

    D

    E

    F

    v1

    v

    v

    vv

    v

    v23

    v4

    5

    6

    7

    8

    v9

    A

    B

    C

    D

    E

    F

    v1

    v

    vv

    v

    v

    v23

    v4

    56

    78

    v9

    Here, as we pass “through the looking glass" we are also touch the space

    for two other tree topologies, and we could decide to enter either.

    Reconstructing phylogenies: how? how well? why? – p.5/29

  • The graph of all trees of 5 speciesThe space of all these orthants, one for each topology, connecting ones

    that share faces (looking glasses):

    C DB EA

    D BC EA

    D BE CA

    C ED AB D C

    A EB

    A CD EB

    E BC DA B C

    D EA

    C BD EA

    A BD EC

    A BE CD

    B CE DA

    B DC EA E B

    D CA

    E CB DA

    The Schoenberg graph (all 15 trees of size 5 connected by NNI’s)Reconstructing phylogenies: how? how well? why? – p.6/29

  • There are very large numbers of trees

    For 21 species, the number of possible unrooted tree topologies exceeds

    Avogadro’s Number: it is

    3 × 5 × 7 × 9 × 11 × 13 × 15 × 17 × 19×21 × 23 × 25 × 27 × 29 × 31 × 33 × 35 × 37

    = 8, 200, 794, 532, 637, 891, 559, 375

    ... and that’s not even asking about how hard it is to optimize the 39

    branch lengths for each of these trees.

    What this goes with is that most methods of finding the best tree are

    NP-hard, and not easy to approximate either.

    Reconstructing phylogenies: how? how well? why? – p.7/29

  • Parsimony methods

    Alpha Delta Gamma Beta Epsilon

    1

    23

    4 45 56

    SitesSpecies 1 2 3 4 5 6

    Alpha T A G C A TBeta C A A G C TGamma T C G G C TDelta T C G C A AEpsilon C A A C A T

    Reconstructing phylogenies: how? how well? why? – p.8/29

  • Advantages and disadvantages of parsimony methods

    Disadvantage: not model-based so people think it makes noassumptions.

    Advantage: reasonably fast, no search of branch lengths needed

    and quick to compute the criterion.

    Advantage: good statistical properties when amounts of change are

    small.

    Disadvantage: statistical misbehavior (inconsistency) when some

    nearby branches on the tree are long (Long Branch Attraction).

    Disadvantage: likely to make you think you have William of

    Ockham’s endorsement.

    Disadvantage: may lead to the delusion that you know exactly whathappened in evolution, in detail.

    Reconstructing phylogenies: how? how well? why? – p.9/29

  • Advantages and disadvantages of parsimony methods

    Disadvantage: not model-based so people think it makes noassumptions.

    Advantage: reasonably fast, no search of branch lengths needed

    and quick to compute the criterion.

    Advantage: good statistical properties when amounts of change are

    small.

    Disadvantage: statistical misbehavior (inconsistency) when some

    nearby branches on the tree are long (Long Branch Attraction).

    Disadvantage: likely to make you think you have William of

    Ockham’s endorsement.

    Disadvantage: may lead to the delusion that you know exactly whathappened in evolution, in detail.

    Reconstructing phylogenies: how? how well? why? – p.9/29

  • Advantages and disadvantages of parsimony methods

    Disadvantage: not model-based so people think it makes noassumptions.

    Advantage: reasonably fast, no search of branch lengths needed

    and quick to compute the criterion.

    Advantage: good statistical properties when amounts of change are

    small.

    Disadvantage: statistical misbehavior (inconsistency) when some

    nearby branches on the tree are long (Long Branch Attraction).

    Disadvantage: likely to make you think you have William of

    Ockham’s endorsement.

    Disadvantage: may lead to the delusion that you know exactly whathappened in evolution, in detail.

    Reconstructing phylogenies: how? how well? why? – p.9/29

  • Advantages and disadvantages of parsimony methods

    Disadvantage: not model-based so people think it makes noassumptions.

    Advantage: reasonably fast, no search of branch lengths needed

    and quick to compute the criterion.

    Advantage: good statistical properties when amounts of change are

    small.

    Disadvantage: statistical misbehavior (inconsistency) when some

    nearby branches on the tree are long (Long Branch Attraction).

    Disadvantage: likely to make you think you have William of

    Ockham’s endorsement.

    Disadvantage: may lead to the delusion that you know exactly whathappened in evolution, in detail.

    Reconstructing phylogenies: how? how well? why? – p.9/29

  • Advantages and disadvantages of parsimony methods

    Disadvantage: not model-based so people think it makes noassumptions.

    Advantage: reasonably fast, no search of branch lengths needed

    and quick to compute the criterion.

    Advantage: good statistical properties when amounts of change are

    small.

    Disadvantage: statistical misbehavior (inconsistency) when some

    nearby branches on the tree are long (Long Branch Attraction).

    Disadvantage: likely to make you think you have William of

    Ockham’s endorsement.

    Disadvantage: may lead to the delusion that you know exactly whathappened in evolution, in detail.

    Reconstructing phylogenies: how? how well? why? – p.9/29

  • Advantages and disadvantages of parsimony methods

    Disadvantage: not model-based so people think it makes noassumptions.

    Advantage: reasonably fast, no search of branch lengths needed

    and quick to compute the criterion.

    Advantage: good statistical properties when amounts of change are

    small.

    Disadvantage: statistical misbehavior (inconsistency) when some

    nearby branches on the tree are long (Long Branch Attraction).

    Disadvantage: likely to make you think you have William of

    Ockham’s endorsement.

    Disadvantage: may lead to the delusion that you know exactly whathappened in evolution, in detail.

    Reconstructing phylogenies: how? how well? why? – p.9/29

  • Distance matrix methods

    A B C D E

    A

    B

    C

    D

    E

    0

    0

    0

    0

    0

    0.20

    0.24

    0.20

    0.24

    0.19

    0.19

    0.17

    0.17

    0.16

    0.16

    0.24

    0.24

    0.15

    0.15

    0.25

    0.25

    0.10

    0.10

    0.24

    0.24

    ABCDE

    CCTAACCTCTGACCC ...CGTAACCTCCGGCCC ...CGTAACCTCTGGCCC ...CGCAACCTCTGGCTC ...CCTAACCTCTGGCCC ...

    The sequences: yield distances:

    compare:alter tree until predictions matchobserved distances as closely as possible

    A B C D E

    A

    B

    C

    D

    E

    0

    0

    0

    0

    0

    0.23 0.16 0.20 0.17

    0.23 0.17 0.24

    0.11

    0.21

    0.23

    0.16

    0.20

    0.17

    0.23

    0.17

    0.24 0.11 0.21

    0.10

    0.07

    0.05

    0.08

    0.030.06

    0.05

    A B

    CD

    E0.20

    0.20

    A suggested tree: predicts:

    Reconstructing phylogenies: how? how well? why? – p.10/29

  • Advantages and disadvantages of distance methods

    Advantage: model-based so assumptions are clearer.

    Advantage: it’s geometry so mathematical scientists love it.

    Advantage: often fast (especially Neighbor-Joining method), canhandle large numbers of sequences.

    Disadvantage: not using data fully statistically efficiently.

    Advantage: when tested by simulation, found to be surprisinglyefficient anyway.

    Disadvantage: cannot easily propagate some information about

    local features in the sequences from one distance calculation toanother.

    Disadvantage: it’s geometry so mathematical scientists hang onto

    it beyond the point of reason.

    Reconstructing phylogenies: how? how well? why? – p.11/29

  • Advantages and disadvantages of distance methods

    Advantage: model-based so assumptions are clearer.

    Advantage: it’s geometry so mathematical scientists love it.

    Advantage: often fast (especially Neighbor-Joining method), canhandle large numbers of sequences.

    Disadvantage: not using data fully statistically efficiently.

    Advantage: when tested by simulation, found to be surprisinglyefficient anyway.

    Disadvantage: cannot easily propagate some information about

    local features in the sequences from one distance calculation toanother.

    Disadvantage: it’s geometry so mathematical scientists hang onto

    it beyond the point of reason.

    Reconstructing phylogenies: how? how well? why? – p.11/29

  • Advantages and disadvantages of distance methods

    Advantage: model-based so assumptions are clearer.

    Advantage: it’s geometry so mathematical scientists love it.

    Advantage: often fast (especially Neighbor-Joining method), canhandle large numbers of sequences.

    Disadvantage: not using data fully statistically efficiently.

    Advantage: when tested by simulation, found to be surprisinglyefficient anyway.

    Disadvantage: cannot easily propagate some information about

    local features in the sequences from one distance calculation toanother.

    Disadvantage: it’s geometry so mathematical scientists hang onto

    it beyond the point of reason.

    Reconstructing phylogenies: how? how well? why? – p.11/29

  • Advantages and disadvantages of distance methods

    Advantage: model-based so assumptions are clearer.

    Advantage: it’s geometry so mathematical scientists love it.

    Advantage: often fast (especially Neighbor-Joining method), canhandle large numbers of sequences.

    Disadvantage: not using data fully statistically efficiently.

    Advantage: when tested by simulation, found to be surprisinglyefficient anyway.

    Disadvantage: cannot easily propagate some information about

    local features in the sequences from one distance calculation toanother.

    Disadvantage: it’s geometry so mathematical scientists hang onto

    it beyond the point of reason.

    Reconstructing phylogenies: how? how well? why? – p.11/29

  • Advantages and disadvantages of distance methods

    Advantage: model-based so assumptions are clearer.

    Advantage: it’s geometry so mathematical scientists love it.

    Advantage: often fast (especially Neighbor-Joining method), canhandle large numbers of sequences.

    Disadvantage: not using data fully statistically efficiently.

    Advantage: when tested by simulation, found to be surprisinglyefficient anyway.

    Disadvantage: cannot easily propagate some information about

    local features in the sequences from one distance calculation toanother.

    Disadvantage: it’s geometry so mathematical scientists hang onto

    it beyond the point of reason.

    Reconstructing phylogenies: how? how well? why? – p.11/29

  • Advantages and disadvantages of distance methods

    Advantage: model-based so assumptions are clearer.

    Advantage: it’s geometry so mathematical scientists love it.

    Advantage: often fast (especially Neighbor-Joining method), canhandle large numbers of sequences.

    Disadvantage: not using data fully statistically efficiently.

    Advantage: when tested by simulation, found to be surprisinglyefficient anyway.

    Disadvantage: cannot easily propagate some information about

    local features in the sequences from one distance calculation toanother.

    Disadvantage: it’s geometry so mathematical scientists hang onto

    it beyond the point of reason.

    Reconstructing phylogenies: how? how well? why? – p.11/29

  • Advantages and disadvantages of distance methods

    Advantage: model-based so assumptions are clearer.

    Advantage: it’s geometry so mathematical scientists love it.

    Advantage: often fast (especially Neighbor-Joining method), canhandle large numbers of sequences.

    Disadvantage: not using data fully statistically efficiently.

    Advantage: when tested by simulation, found to be surprisinglyefficient anyway.

    Disadvantage: cannot easily propagate some information about

    local features in the sequences from one distance calculation toanother.

    Disadvantage: it’s geometry so mathematical scientists hang onto

    it beyond the point of reason.

    Reconstructing phylogenies: how? how well? why? – p.11/29

  • Maximum likelihood

    A C C C G

    xy

    z

    w

    t1 t t

    t t

    t

    t

    23

    4 5

    6t7

    8

    t i are"branch lengths",

    X (rate time)

    To compute the likelihood for one site, sum over all possible states(bases) at interior nodes:

    L(i) =∑

    x

    y

    z

    w

    Prob (w) Prob (x | w, t7)

    × Prob (A | x, t1) Prob (C | x, t2) Prob (z | w, t8)

    × Prob (C | z, t3) Prob (y | z, t6) Prob (C | y, t4) Prob (G | y, t5)

    Reconstructing phylogenies: how? how well? why? – p.12/29

  • Advantages and disadvantages of likelihood

    Advantage: uses a model, so assumptions are clear.

    Advantage: fully statistically efficient.

    Disadvantage: computationally slower.

    Advantage: statistical testing by likelihood ratio tests available

    Disadvantage: can’t use the LRT test to test tree topologies.

    Reconstructing phylogenies: how? how well? why? – p.13/29

  • Advantages and disadvantages of likelihood

    Advantage: uses a model, so assumptions are clear.

    Advantage: fully statistically efficient.

    Disadvantage: computationally slower.

    Advantage: statistical testing by likelihood ratio tests available

    Disadvantage: can’t use the LRT test to test tree topologies.

    Reconstructing phylogenies: how? how well? why? – p.13/29

  • Advantages and disadvantages of likelihood

    Advantage: uses a model, so assumptions are clear.

    Advantage: fully statistically efficient.

    Disadvantage: computationally slower.

    Advantage: statistical testing by likelihood ratio tests available

    Disadvantage: can’t use the LRT test to test tree topologies.

    Reconstructing phylogenies: how? how well? why? – p.13/29

  • Advantages and disadvantages of likelihood

    Advantage: uses a model, so assumptions are clear.

    Advantage: fully statistically efficient.

    Disadvantage: computationally slower.

    Advantage: statistical testing by likelihood ratio tests available

    Disadvantage: can’t use the LRT test to test tree topologies.

    Reconstructing phylogenies: how? how well? why? – p.13/29

  • Advantages and disadvantages of likelihood

    Advantage: uses a model, so assumptions are clear.

    Advantage: fully statistically efficient.

    Disadvantage: computationally slower.

    Advantage: statistical testing by likelihood ratio tests available

    Disadvantage: can’t use the LRT test to test tree topologies.

    Reconstructing phylogenies: how? how well? why? – p.13/29

  • Bayesian inference methods

    Basically uses the likelihood machinery, and adds priors on parameters

    and on trees.

    Implemented by Markov chain Monte Carlo methods to sample from theposterior on trees (or parameters, or both).

    Very popular right now.

    Advantage: interpretation is straightforward, once theassumptions are met.

    Advantage: gives you what you want, the probability of the result.

    Disadvantage: how long is long enough to run the MCMC?

    Disadvantage: where do we get priors from, what effect do they

    have?

    Disadvantage: they keep chanting in unison “We are the

    statisticians of Bayes – you will be assimilated.”

    Reconstructing phylogenies: how? how well? why? – p.14/29

  • Bayesian inference methods

    Basically uses the likelihood machinery, and adds priors on parameters

    and on trees.

    Implemented by Markov chain Monte Carlo methods to sample from theposterior on trees (or parameters, or both).

    Very popular right now.

    Advantage: interpretation is straightforward, once theassumptions are met.

    Advantage: gives you what you want, the probability of the result.

    Disadvantage: how long is long enough to run the MCMC?

    Disadvantage: where do we get priors from, what effect do they

    have?

    Disadvantage: they keep chanting in unison “We are the

    statisticians of Bayes – you will be assimilated.”

    Reconstructing phylogenies: how? how well? why? – p.14/29

  • Bayesian inference methods

    Basically uses the likelihood machinery, and adds priors on parameters

    and on trees.

    Implemented by Markov chain Monte Carlo methods to sample from theposterior on trees (or parameters, or both).

    Very popular right now.

    Advantage: interpretation is straightforward, once theassumptions are met.

    Advantage: gives you what you want, the probability of the result.

    Disadvantage: how long is long enough to run the MCMC?

    Disadvantage: where do we get priors from, what effect do they

    have?

    Disadvantage: they keep chanting in unison “We are the

    statisticians of Bayes – you will be assimilated.”

    Reconstructing phylogenies: how? how well? why? – p.14/29

  • Bayesian inference methods

    Basically uses the likelihood machinery, and adds priors on parameters

    and on trees.

    Implemented by Markov chain Monte Carlo methods to sample from theposterior on trees (or parameters, or both).

    Very popular right now.

    Advantage: interpretation is straightforward, once theassumptions are met.

    Advantage: gives you what you want, the probability of the result.

    Disadvantage: how long is long enough to run the MCMC?

    Disadvantage: where do we get priors from, what effect do they

    have?

    Disadvantage: they keep chanting in unison “We are the

    statisticians of Bayes – you will be assimilated.”

    Reconstructing phylogenies: how? how well? why? – p.14/29

  • Bayesian inference methods

    Basically uses the likelihood machinery, and adds priors on parameters

    and on trees.

    Implemented by Markov chain Monte Carlo methods to sample from theposterior on trees (or parameters, or both).

    Very popular right now.

    Advantage: interpretation is straightforward, once theassumptions are met.

    Advantage: gives you what you want, the probability of the result.

    Disadvantage: how long is long enough to run the MCMC?

    Disadvantage: where do we get priors from, what effect do they

    have?

    Disadvantage: they keep chanting in unison “We are the

    statisticians of Bayes – you will be assimilated.”

    Reconstructing phylogenies: how? how well? why? – p.14/29

  • Aren’t these graphical models?

    x1

    x4

    x3

    x5

    x6

    x7

    x8

    x9

    x0

    x1

    x4

    x3

    x5

    x6

    x7

    x8

    x9

    x0

    x1

    x4

    x3

    x5

    x6

    x7

    x8

    x9

    x0

    x1

    x4

    x3

    x5

    x6

    x7

    x8

    x9

    x0

    x1

    x4

    x3

    x5

    x6

    x7

    x8

    x9

    x0

    v1

    v4 v

    3

    v6

    v5

    v8

    v9

    v7

    (You have to imaging it going back 500 layers or so). The problem is to

    use the data, which is at the tips but not available for the interior nodes,to infer the topology and branch lengths of the tree that is shared by allsites.

    Reconstructing phylogenies: how? how well? why? – p.15/29

  • Could we use graphical model machinery here?

    Like Moliére’s character who is delighted to discover that he’s beenspeaking prose all his life, we found we had already been using the

    relevant Graphical Model machinery since about 1973.

    So alas there was nothing to gain.

    The same thing is true for statistical genetics, where the graphical model

    machinery reinvents the standard “peeling” algorithms for computing

    likelihoods on pedigrees, in use since 1970.

    Reconstructing phylogenies: how? how well? why? – p.16/29

  • Bootstrap sampling of phylogenies

    OriginalData

    sequences

    sites

    Reconstructing phylogenies: how? how well? why? – p.17/29

  • Draw columns randomly with replacement

    OriginalData

    sequences

    sites

    Bootstrapsample#1

    Estimate of the tree

    sample same numberof sites, with replacementsequences

    sites

    Reconstructing phylogenies: how? how well? why? – p.18/29

  • Make a tree from that resampled data set

    OriginalData

    sequences

    sites

    Bootstrapsample#1

    Estimate of the tree

    Bootstrap estimate ofthe tree, #1

    sample same numberof sites, with replacementsequences

    sites

    Reconstructing phylogenies: how? how well? why? – p.19/29

  • Draw another bootstrap sample

    OriginalData

    sequences

    sites

    Bootstrapsample#1

    Bootstrapsample

    #2

    Estimate of the tree

    Bootstrap estimate ofthe tree, #1

    sample same numberof sites, with replacement

    sample same numberof sites, with replacement

    sequences

    sequences

    sites

    sites

    (and so on)

    Reconstructing phylogenies: how? how well? why? – p.20/29

  • ... and get a tree for it too. And so on.

    OriginalData

    sequences

    sites

    Bootstrapsample#1

    Bootstrapsample

    #2

    Estimate of the tree

    Bootstrap estimate ofthe tree, #1

    Bootstrap estimate of

    sample same numberof sites, with replacement

    sample same numberof sites, with replacement

    sequences

    sequences

    sites

    sites

    (and so on)the tree, #2

    Reconstructing phylogenies: how? how well? why? – p.21/29

  • Summarizing the cloud of trees by support for branches

    Bovine

    Mouse

    Squir Monk

    Chimp

    Human

    Gorilla

    Orang

    Gibbon

    Rhesus Mac

    Jpn Macaq

    Crab−E.Mac

    BarbMacaq

    Tarsier

    Lemur

    80

    72

    74

    9999

    100

    77

    42

    35

    49

    84

    Reconstructing phylogenies: how? how well? why? – p.22/29

  • Some alternatives to bootstrapping

    Parametric bootstrapping – same, but simulate data sets from our

    best estimate of the tree instead of sampling sites.

    Bayesian inference of course gets statistical support informationfrom the posterior.

    The Kishino-Hasegawa-Templeton test (KHT test) which comparesprespecified trees to each other by paired sites tests.

    Reconstructing phylogenies: how? how well? why? – p.23/29

  • Some alternatives to bootstrapping

    Parametric bootstrapping – same, but simulate data sets from our

    best estimate of the tree instead of sampling sites.

    Bayesian inference of course gets statistical support informationfrom the posterior.

    The Kishino-Hasegawa-Templeton test (KHT test) which comparesprespecified trees to each other by paired sites tests.

    Reconstructing phylogenies: how? how well? why? – p.23/29

  • Some alternatives to bootstrapping

    Parametric bootstrapping – same, but simulate data sets from our

    best estimate of the tree instead of sampling sites.

    Bayesian inference of course gets statistical support informationfrom the posterior.

    The Kishino-Hasegawa-Templeton test (KHT test) which comparesprespecified trees to each other by paired sites tests.

    Reconstructing phylogenies: how? how well? why? – p.23/29

  • Why want to know the tree?

    It affects all parts of the genomes – it is the essential part of propagating

    information about the evolution of one part of the genome to inquiriesabout another part.

    The standard method for finding functional regions of the genome isnow using “PhyloHMMs” which use Hidden Markov Model machinery

    together with phylogenies to find regions that have unusually low rates

    of evolution.

    Reconstructing phylogenies: how? how well? why? – p.24/29

  • Another kind of tree: the coalescentCoalescent trees are trees of ancestry of copies of a single gene locus

    within a species. They are weakly inferrable as most have only a few sites

    (SNPs) varying among individuals.

    Since each coalescent tree applies to a very short region of genome,maybe as little as one gene, there is less interest in the tree.

    But they do illuminate the values of parameters such as population

    size, migration rates, recombination rates etc. This allows us to

    accumulate information across different loci (genes).

    To don this we have to sum over our uncertainty about the tree byusing MCMC methods, accumulating the information (as log

    likelihood or using Bayesian machinery) to make inferences aboutthe parameters.

    This is the interface between within-species population geneticsand between-species work on phylogenies.

    It is also the statistical foundation of inferences frommitochondrial genealogies (“mitochondrial Eve”) and Y

    chromosome genealogies, and of the samples from the rest of the

    genome that are now being added to this.Reconstructing phylogenies: how? how well? why? – p.25/29

  • A coalescent

    Time

    Reconstructing phylogenies: how? how well? why? – p.26/29

  • Yet another kind of tree: trees of gene families

    Gene duplications in evolution create new genes. Both the new gene andthe original one then evolve.

    Frog Human Monkey Squirrel

    gene duplication

    a a ab b b

    species

    boundary

    tree of genes

    Some forks are gene duplications, leading to subtrees that are all

    supposed to have the same phylogeny as they are in the same set ofspecies. Example: Hemoglobin proteins.

    Reconstructing phylogenies: how? how well? why? – p.27/29

  • Yet another kind of tree: trees of gene families

    Gene duplications in evolution create new genes. Both the new gene andthe original one then evolve.

    Frog Human Monkey Squirrel

    gene duplication

    a a ab b b

    species

    boundary

    tree of genes

    FrogHuman Monkey Squirrel Human Monkey Squirrel

    a a a b b b

    These twotrees should beidentical

    Some forks are gene duplications, leading to subtrees that are all

    supposed to have the same phylogeny as they are in the same set ofspecies. Example: Hemoglobin proteins.

    Reconstructing phylogenies: how? how well? why? – p.28/29

  • References

    Felsenstein, J. 2004. Inferring Phylogenies. Sinauer Associates, Sunderland,Massachusetts. [Book you and all your friends must rush out andbuy]

    Semple, C. and M. Steel. 2003. Phylogenetics. Oxford Lecture Series inMathematics and Its Applications, 24. Oxford University Press. [Morerigorous mathematical treatment]

    Yang, Z. 2007. Computational Molecular Evolution. Oxford Series in Ecology

    and Evolution. Oxford University Press, Oxford. [Careful survey ofmolecular phylogeny methods, from a leader]

    For a list of 348 phylogeny programs, many available free, see

    http://evolution.gs.washington.edu/phylip/software.html

    Reconstructing phylogenies: how? how well? why? – p.29/29

    A review that asks these questions What does ``tree space'' (with branch lengths)look like?For one tree topologyThrough the looking-glassThe graph of all trees of 5 speciesThere are very large numbers of treesParsimony methodsAdvantages and disadvantages of parsimony methodsDistance matrix methodsAdvantages and disadvantages of distance methodsMaximum likelihoodAdvantages and disadvantages of likelihoodBayesian inference methodsAren't these graphical models? Could we use graphical model machinery here? Bootstrap sampling of phylogeniesDraw columns randomly with replacementMake a tree from that resampled data setDraw another bootstrap sample ... and get a tree for it too. And so on.Summarizing the cloud of trees by support for branchesSome alternatives to bootstrappingWhy want to know the tree? Another kind of tree: the coalescentA coalescentYet another kind of tree: trees of gene familiesYet another kind of tree: trees of gene familiesReferences


Recommended