1) Origins of Classification-Organization of variation
2) Modern Systematics-Taxonomy and phylogenetics
3) Cladistics -Shared derived characters
-Outgroup-Parsimony
4) Maximum Likelihood and Bayesian Inference
Lecture 2: Principles of Phylogenetics
Origins of Biological Classification
Aristotle384-322 BC
“An effort to show the relationships of living things as a scala naturae”1
1C. Singer, A Short History of Biology (1931)
Scala Naturae — From Charles Bonnet's Œuvresd'histoire naturelle et de philosophie, 1781
Linnaeus1707-1778
"God created, Linnaeus organized."
Systematics
Phylogenetic Systematics-Relationships reflected in taxonomy
vertebral column
complete jaw
“bony vertebrates”
4 legs
amniotic egg
Maxilla separated from quadratojugal by jugal
Anatomy of a phylogenetic tree
Node
Outgroup
Terminal taxa
Terminal branch
Sister-taxa
Internalbranch
older splits
younger splits
Common Ancestor
Bifurcating vs multifurcating trees
polytomytrichotomy
A German entomologist, Willi Hennig developed the field of “Phylogenetic Systematics” which provides a framework for reconstructing phylogenies and using them to study evolutionary history
Hennig (1950)
Cladistics-Builds trees by identifying monophyletic groups-All other widely used methods are derived
How do you identify synapomorphies?
Close Outgroups
Distant Outgroups
Amphioxus (Cephalochordate)
Cladistics-Builds trees by identifying monophyletic groups-All other widely used methods are derived
Principle of Parsimony
Heuristic = educated guess; rule of thumb; common sense; a general way to approach problem solving.
3) Beak:
2) Long ears
4) Tail:
1) Gloves:
6) Feathers:
wiley rr bugs daffy tweety happy
0 0 1 0 0 0
1 0 1 0 0 0
0 1 0 1 1 0
1 1 1 1 1 0
0 1 0 1 1 0
character
taxon
5) Appendages:1 1 1 1 1 0
Make a tree: 1) use only derived character states2) minimize evolutionary change
outgroup
1 0 1 1 1 07) Thumb:
4 & 5
bugshappy
wileydaffytweety rr
+ tail+ appendages
3 & 6bugs
happywiley
daffytweety rr
+ beak
+ feathers
3, 4, 5, & 6.bugs
happywiley
daffytweety rr
+ beak
+ tail+ appendages
+ feathers
1, 2, 3, 4, 5, & 6. bugs
happywiley
daffytweety rr
+ beak
+ gloves
+ long ears
+ tail+ appendages
+ feathers
Autapomorphy
Phylogenetically uninformative
1, 2, 3, 4, 5, 6, & 7
bugshappy
wileydaffytweety rr
+ beak
+ gloves
+ long ears
+ tail+ appendages
+ feathers
+ thumb
- thumb
bugshappy
wileydaffytweety rr
+ beak
+ gloves
+ long ears
+ tail+ appendages
+ feathers
+ thumb
+ thumb
1) Exhaustive Search
2) Branch and Bound Search
3) Heuristic Search
Finding the Most Parsimonious Tree
1)ExhaustiveSearch
with stepwise addition of taxa
Exhaustive Searches Rarely Used
N =
The number of bifurcating unrooted trees:
(2n-5)!2n-3(n-3)!
Where n = the number of terminal taxa
For 6 taxa 105 trees
For 20 taxa 2 x 1020 trees
3) Heuristic Search
No guarantee best tree will be foundImpossible to “pass through” poorer trees to get to more parsimonious
Adenine
Guanine
Purines Pyrimidines
Thymine
Cytosine
Transversions
Transitions Transitions
The Problem with Parsimony:
Molecular Phylogenetics
Multiple Substitutions at single sites can lead to “Long-branch attraction”
Weighted Parsimony
(Unweighted) Parsimony
C
CG
A
Maximum Likelihood
4) Repeat for all trees (in a heuristic search)
2) Sum probs across all ancestral reconstructions
3) Sum probs across each site
1) Start with one tree
A C
G T
4 bases6 different types of substitutions
But…we don’t know:
Simplest Model: Jukes-Cantor (JC)
All 6 substitutions - equal probability (α)
Kimura 2-parameter model (K2P)
α= transitions β = transversions
General Time Reversible (GTR)
C
CG
A
Wait…we’re using a tree to infer the model parameters that we will then use to find…the best tree?
Where do the parameters values come from?
T
C
T
ts
tv
tv
Maximum Likelihood Operationally
1. Select a model of sequence evolution; infer parameter values
2. With fixed parameter values, search tree space heuristically, with branch swapping
3. Select the topology that yields the greatest likelihood for the
Summary
Symmetrical Branch Lengths
Asymmetrical Branch Lengths
Positively misleading
Disadvantages of ML
Bayesian Phylogenetic InferenceSimilar to ML except:
1. Model parameters:
2. Simultaneously search
Pr(p|k)
p p
Bayesian Phylogenetic Inference
3. Save trees
Tree topology
Model parameters
Bayesian Phylogenetic InferenceSearching for trees and parameters
Markov-Chain Monte Carlo Search
Start: random tree, model parameter values. Calculate likelihood (L).
Slightly change the tree and/or parameter values; re-calculate L.
Accept or reject new tree/parameter values based on L scores.
Better L scores (fewer changes) are always accepted, lower or equal scores accepted with some probability (“hill-climbing” algorithm = Metropolis sampling)
Advantages of Bayesian Inference
2) Support for clades: evaluated across a large set of likely trees
1) Simultaneous exploration of parameter space and trees
3) MCMC: Faster
Reed et al. (2002)
ML heuristic search: 93 days
MCMC search: 9 daysNearly identical topologies