Post on 16-Jul-2015
transcript
Matthew Wesley PennellPhD Candidate - Bioinformatics and Computational BiologyInstitute for Bioinformatics and Evolutionary StudiesUniversity of Idaho
MODELS,MEANINGS, AND MACROEVOLUTION
How can statistical models help us understand the drivers of long-term
evolutionary change?
What we talk about when we talk about
MACROEVOLUTION
We know the ingredients ofevolutionary change within populations
Mutation
Selection
Drift
Gene flow
Mutation Selection
DriftGene flow
But how do these work together to
SHAPE DIVERSITY?
Simone Des Roches
Long term dynamics of evolutionary processes
MACROEVOLUTION
Peter Park
Time
Daniel Berner
Time
F(time)
Models for continuous traits
Brownian motion Ornstein-Uhlenbeck Early Burst
Random walkRandom walk with a
central tendencyEvolution is rapid early
& slows down over time
∞
-∞
∞
-∞
Θ
∞
-∞t
Models for discrete traits
Mk Threshold
Transitions are instantaneous& occur at some constant rate
Character states are determinedby a continuous “liability”
0
1
q01
q10
10
GEIGER
Pennell et al. 2014 Bioinformatics
https://github.com/mwpennell/geiger-v2
What can learn we about evolution
FROM TRAIT MODELS
MacroevolutionaryDynamics
Populationprocesses
Statisticaldescriptors
MacroevolutionaryDynamics
Populationprocesses
Statisticaldescriptors
In order to make ANY interpretation of the model, we need to know if our model is actually explaining our data
1. Is the model capturing the variation in the data we have observed?
2.What about the data we haven’t?
1. Is the model capturing the variation in the data we have observed?
2. What about the data we haven’t?
1. Is the model capturing the variation in the data we have observed?
2. What about the data we haven’t?
R2=0.67 p=0.002 R2=0.67 p=0.002
R2=0.67 p=0.002R2=0.67 p=0.002
●
●●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
Is the model
APPROPRIATEand if not...
WHAT ARE WE MISSING
Linear regression models
Observation
Cook
’s dis
tance
●
● ● ● ● ● ● ● ●● ●
●
●
●
●
●
●
●
●
●
●
●
Linear regression models
Fitted values
Resid
uals
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
Assessing the adequacy of
PHYLOGENETIC TRAIT MODELS
Establishing scope
Trait value
Univariate, quantitativetraits
Models that predict multivariate normal data
Fit a model to comparative data
Use fitted parameters to simulate data
Compare observed to simulated data
Old idea in statistics
θ
Pr(θ
|D)
Pr(D
|θ)
Parametric bootstrapping
Posterior predictive simulation
But new in comparative biology
θ
Pr(θ
|D)
Pr(D
|θ)
Parametric bootstrapping(Boettiger et al. 2012 Evolution)
Posterior predictive simulation(Slater and Pennell 2014 Sys Bio)
If we re-ran evolution, how likely are we to see a data set like ours
SIMILAR
DIFFERENT
Model is likely adequate
Model is likely inadequate
How similar is similarProblem: No two datasets are exactly alike
How similar is similarProblem: No two datasets are exactly alikeSolution: Use test statistics to summarize data in meaningful ways
How similar is similarProblem: No two datasets are exactly alikeSolution: Use test statistics to summarize data in meaningful ways
Problem: Species are not independent data points
How similar is similarProblem: No two datasets are exactly alikeSolution: Use test statistics to summarize data in meaningful ways
Problem: Species are not independent data pointsSolution: Calculate test statistics on contrasts rather than the data
A
B
C
Independent contrasts
Ci
Cj
n-1 contrasts for n tips
Under Brownian motionC ~ Normal(0, σ)
Felsenstein 1985 Am NatFelsenstein 1973 Am J Hum Gen
For non-Brownian models
Problem: Contrasts will no longer be normally distributed
For non-Brownian models
Problem: Contrasts will no longer be normally distributedSolution: Use model parameters to standardize branch lengths by theexpected (co)variance that will accumulate along them
For non-Brownian models
Problem: Contrasts will no longer be normally distributedSolution: Use model parameters to standardize branch lengths by theexpected (co)variance that will accumulate along them
Refer to rescaled tree as a unit tree
Test statistics
Slope of contrasts vs. ancestral
state
Slope of contrasts vs. expected
variances
Slope of contrasts vs. node height
Mean of squaredcontrasts
Coefficient of variation of contrasts
KS-Test for normality of
contrasts
Simulating datasets for comparison
Simulate a lot of new datasets on unit tree
Use a BM model with a rate of 1
Calculate test statistics on simulated dataset
Putting it all together...
1
TY TX 1
2
34
5
6
Fit model
TX
TY
Unit tree
Test statSim data
Test stat x m
Compare
BM
ARBUTUS
Pennell et al. 2015 Am Nat
https://github.com/mwpennell/arbutus
Cornwell et al. 2014 J Ecology
Lamiales
Solanales
GentianalesBoraginaceae
Garryal
es
Icacin
acea
e
Dipsac
ales
Paracry
phiac
eae
ApialesBr
unial
esAste
ralesEs
callo
niac
eae
Aqui
folia
les
Erica
les
Cornales
CaryophyllalesSantalales
Berberidopsidales
Malpighiales
OxalidalesCelastralesCucurbitales
Fagales
Rosales
Fabales
Zygophyllales
Brassicales
Malvales
Huerteales
Sapindales
Crosso
somata
les
Myrtale
sGer
anial
esVi
tales
Saxif
raga
lesDi
llenia
ceae
Gun
nera
les
Buxa
ceae
Prot
eale
sSa
biac
eae
Ranu
ncul
ales
Acor
acea
eAl
ism
atal
es selailiL
Asparagales
Poales
ArecacalesZingiberales
Commelinales
Dioscoreales
Pandanales
Magnoliales
Laurales
Piperales
Canellales
Chloranthaceae
Austrobaileyales
Nymphaeales
Pinales
Gnetales
Cycadales
Monilophyte
Arec.
Ast.Ast2.
Bras.
Cary.
Eric.
Fab.
Gymn.
Magn.
Mono.
Myrt.
Prot.
Rosid.
Leaf NSLAMax heightLeaf sizeSeed mass
Specific Leaf Area
Seed mass
Leaf Nitrogen Content
72 datasets (20 - 2,200 species)
226 datasets (20 - 22,817 species)
39 datasets (20 - 936 species)Kleyer et al. 2008 J Ecology
Kew Seed Information Database 2014
Wright et al. 2004 NatureZanne et al. 2014 Nature
Empirical analyses
1. Fit Brownian motion, Ornstein-Uhlenbeck and
Early Burst to each dataset
2. Calculate relative support using AIC
3. Assess adequacy of best-fitting model
Dataset
AIC
wei
ght Model
BMOUEB
Mode
l sup
port
(AIC)
Brownian motion
Ornstein-Uhlenbeck Early burst
Dataset (1 - 337)Pennell et al. 2015 Am Nat
Specific Leaf Area
Seed mass
Leaf Nitrogen Content
Model deviations detected in 32/72 datasets
Model deviations detected in 153/226 datasets
Model deviations detected in 19/39 datasets
Simple, commonly used models are often (woefully) inadequate
But we already knew that...
We are (often) here
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
We are (often) here
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
This is how we learn about biology
Learn about our data
●
●
●
●
●
●
●
●
●
●
●
Learn about our data
Phylogenetic error (topology and branch lengths)
Measurement error
Biologically interesting “outlier” species
Learn about evolutionary processes
●
●
●●
●
●
●
●
●
●
●
Time heterogeneous models
Different models for different parts of the tree
Biologically motivated models
Learn about evolutionary processes
Understanding how and why a model failscan provide new biological insights
1. Is the model capturing the variation in the data we have observed?
2. What about the data we haven’t?
True diversity
Sampled diversity
If missing data is non-randommodel parameters will be biased
HOW MANY SPECIES ARE WOODY
Willem van Aken
True diversity?
True diversity?Known species
316,000
True diversity?Known species
316,000 Trait data49,000
True diversity?Known species
316,000 Trait data49,000
Genetic data55,000
Sampling bias is
EVERYWHERE
Hinchliff and Smith 2014 PLoS ONE
Sampling bias in...
The groups we choose to study(Pennell, Sarver, and Harmon 2012 PLoS ONE)
And the traits we choose to measure(Uyeda, Caetano, and Pennell 2015 Sys Bio)
MISSING DATA HAS STRUCTURE
100% HERBACEOUS
100% WOODY
Gnangarra Willem van Aken
Microcoelia (Orchid family)
? ? ? ? ? ?
? ? ? ? ? ?
W ?H0 12 18 30
??
? ? ? ?
H H H H H H H H
H H H H
Strong PriorPr (All are ) = 1
Weak PriorPr (All are ) = 0.42
Pr ( At least 15 are ) = 0.90
?
?
Sampling withreplacement(Binomial)
?
Sampling without replacement
(Hypergeometric)
H
H
H
Distribution of woodiness bimodal
791 genera with records for >10 species
411W H
271
HW
58
Prob
abilit
y den
sity
Percentage of woody species per genus0 10020 40 60 80
Strong priorWeak prior
Global proportion of woody species
Prob
abilit
y den
sity
46 4844
Strong priorWeak prior
Global proportion of woody species
Prob
abilit
y den
sity
46 4844
Strong priorWeak prior
Taking the dataset at face value: 59% woody
WoodyHerb
MonilophytesGymnospermsBasal AngiospermsMonocotsEudicots
FitzJohn, Pennell, et al. 2014 J Ecology
Can use estimated sampling proportionsin model-based analyses
So we have a good model and haveincorporated sampling error...
WHAT CAN WE SAY?
MacroevolutionaryDynamics
Populationprocesses
Statisticaldescriptors
MacroevolutionaryDynamics
Populationprocesses
Statisticaldescriptors
Strict pop gen interpretation
Δz = σdW
Δz = 2VM
Brownian Motion
Mutation-Drift Equilibrium
Hansen and Martins 1996 Evolution
Quantitative genetics interpretation
Δz = σdW
Δz = 2VM
Brownian Motion
Mutation-Drift Equilibrium
Hansen and Martins 1996 Evolution
Rate
Diffusion process
Mutational variance
Change intrait mean
Lynch and Hill 1986 Evolution
By fitting alternative models we canevaluate the effects of microevolutionary
processes over long time scales
But such intuitive interpretations are
LIKELY NAÏVE
Micro to MacroUse population estimates to predict divergence over long time scales
Macro to Micro Use phylogenetic models to estimate
population level parameters
Micro to Macro
Hansen 2012 Book ChapterEstes and Arnold 2007 Am Nat
Use population estimates to predict divergence over long time scales
Macro to Micro Use phylogenetic models to estimate
population level parametersLynch 1990 Am Nat
THE NUMBERS DON’T ADD UP!
MacroevolutionaryDynamics
Populationprocesses
Statisticaldescriptors
Macroevolutionary models may reflectdynamics of adaptive landscapes
rather thanevolution along an adaptive landscape
Pennell et al. 2014 TREEPennell and Harmon 2013 NYAS
Pennell 2015 Sys Bio
Simpson 1944 Tempo and Mode
Dynamics of adaptive landscapes
Adaptive radiation
Adaptive zones
Red Queen (Van Valen)
Escape and radiate
Punctuated equilibrium
Diversity dependence
Key innovations
Ephemeral divergence
Adaptive radiation
Adaptive zones
Red Queen (Van Valen)
Escape and radiate
Punctuated equilibrium
Diversity dependence
Key innovations
Ephemeral divergence
Punctuated equilibrium
Eldredge and Gould 1972 Gould and Eldredge 1977 Paleobiology
Time
Morphology
What about punctuated equilibrium?
Eldredge and Gould 1972 Gould and Eldredge 1977 Paleobio
Time
Morphology
This is confusing (to everyone)
Is evolution gradual or pulsed?
Is trait evolution(mainly) associated
with speciation?
Is evolution duringspeciation adaptive
or neutral?
Does species selectiondrive evolutionary
trends?
Is evolution gradual or pulsed?
Is trait evolution(mainly) associated
with speciation?
Is evolution duringspeciation adaptive
or neutral?
Does species selectiondrive evolutionary
trends?
Is evolution gradual or pulsed?
Is trait evolution(mainly) associated
with speciation?
Is evolution duringspeciation adaptive
or neutral?
Does species selectiondrive evolutionary
trends?
Each can be tested with a specificmacroevolutionary model
MacroevolutionaryDynamics
Populationprocesses
Statisticaldescriptors
Nothing.
Models purely phenomenologicalCapture patterns not processes
A case study:
EVOLUTION OF KARYOTYPES
To a geneticist many of the comparisons (i.e., between karyotypes of different species) will seem of little significance, because to him [sic] it is not the shapes and sizes of chromosomes which are important, but the genes contained in them.
T. H. Morgan et al. 1925
Physical linkage keeps genes together
Genetic material lost/gained whenmutations change chromosome number/form
Structural changes may be involved inadaptation and speciation
New chromosomes arise from
Duplications (including polyploidy)
Fissions - chromosome breaks into two
Fusions - two chromosomes come together
New chromosomes arise from
Duplications (including polyploidy)
Fissions - chromosome breaks into two
Fusions - two chromosomes come together
XX XY
ZZ ZW
Sex chromosomes are natural
EVOLUTION EXPERIMENTS
Males
Hemizygous Homozygous
Females W
Y
X
Z
X
A
Y
A
X1
X1
X2Y X
Y1
Y2
Y-A X-A
Pennell et al. in press PLoS Genetics
FISHES
SQUAMATEREPTILES
Y-A fusionTotal XY 109
423
120400
3802
24024 Pennell et al. in press PLoS Genetics
Data from Tree of Sex Consortium 2014
X-A fusion
W-A fusionTotal ZW
Z-A fusion
Y-A fusionTotal XY 109
423
120400
3802
24024 Pennell et al. in press PLoS Genetics
Data from Tree of Sex Consortium 2014
X-A fusion
W-A fusionTotal ZW
Z-A fusion
Both highly significant (Fisher’s exact test)
Xiphophorus
GambusiaPoecilia
MegupsilonGarmanellaFundulusAllodontich
thys
Ilyodo
nAphyo
semion
&
Chromap
hyos
emion
Nothob
ranch
ius
Aploc
heilu
s
Pter
olebia
s
Oryz
iasLepa
doga
ster
Ore
ochr
omis
Saro
ther
odon
Pseu
docr
enila
brus
Sata
nope
rca
Geo
phag
usBo
thus
Para
licht
hysM
icrochirusTetrapturus
Mastacem
belus
Trichogaster
Rhinecanthus
Odonus
Stephanolepis
Takifugu
Arothron
Scatophagus
Lutjanus
Dicentrarchus
Parapercis
Pomoxis
Chionodraco
Chaenodraco
Pagetopsis
Pagothenia
TrematomusZingel
ArctoscopusCottusPungitius
Gobius,Neogobius,
& ProterorhinusBoleophthalmusAwaousSynechogobiusCtenogobius
DormitatorEleotris
Callionymus
MelamphaesScopelogadus
BeryxZeus
Synodus &
Trachinocephalus
Saurida
Stenobrachius
Scopelengys
Oncorhynchus
Salmo
Salvelinus
Hucho
Corego
nus
Bathyla
gus
Leuro
gloss
usArge
ntina
Syno
donti
s
Clar
iasNe
tum
aPi
mel
odel
laIm
parfi
nis
Om
pok
Mys
tus
His
onot
usPs
eudo
toci
nclu
sH
ypos
tom
usH
arttiaLoricariichthys
Leporinus
CharacidiumTriportheus
ThoracocharaxBrachyhypopomus
GymnotusVimba
Scardinius
Leuciscus,
Gnathopogon,
& Ctenopharyngodon
Barilius
Carassius
Cyprinus
Garra
Barbonymus
Cobitis
Lepidocephalichthys
Brevoortia
Anguilla
Conger
Gymnothorax
Osteoglossum
BrienomyrusAcipenser
XYZW
YA/WAXA/ZA
Pennell et al. in press PLoS Genetics
XYZW
YA/WAXA/ZA
EnhydrinaDisteira
Hydrophis &Pelamis
AipysurusEmydocephalusHemiaspisTropidechisNotechisHoplocephalusAustrelapsDrysdaliaPseudonajaOxyuranus
DenisoniaRhinoplocephalus
Elapognathus
SutaCacophisPseudechisAcan
thoph
is
Simos
elaps
FurinaDem
ansia
Latic
auda
Bung
arus
NajaDe
ndro
aspis
Micr
urus
Ger
arda
Cerb
erus
Clel
ia &
Pseu
dobo
a
Oxy
rhop
us
Trop
idod
ryas
Tham
nody
nast
es
Tom
odon
Philo
drya
s
Wag
lero
phis
Xeno
don
Liop
his
Hydrom
orphusG
eophisN
atrixStoreria
Thamnophis
Sinonatrix
Amphiesm
a,
Xenochrophis,
Rhabdophis,
& Macropisthodon
Drymarchon
Chironius,
Spilotes,
& Mastigodryas
Elaphe
Bogertophis
Dinodon
LycodonPtyas
Boiga
Chrysopelea
Dendrelaphis
AhaetullaCrotalusAgkistrodonBothriechisLachesis
BothropsCerrophidionPorthidiumAtropoidesVipera
DaboiaMacroviperaEchisSanziniaAcrantophisBoa
MoreliaLiasis
Sceloporus
UtaUm
a
Anol
is
Pris
tidac
tylu
sPh
ymat
urus
Polychrus
TropidurusPogonaPhrynocephalus
Varanus
LacertaTim
onPodarcis
DarevskiaAlgyroides
TakydromusEremias
OphisopsAcanthodactylus
MesalinaPedioplanis
Meroles
Heliobolus
Psammodromus
Gallotia
Calyptommatus
Nothobachia
Gymnophthalmus
Micrablepharus
Cnemidophorus
Aspidoscelis
Pseudemoia
Bassiana
Cyclodina
Saproscincus
Lampropholis
MabuyaScincella
Gekko
Lepidodactylus
Heteronotia
GehyraHemidactylusChristinus
Phyllodactylus
GonatodesDelmaLialisDibamus
Pennell et al. in press PLoS Genetics
XYFused
XYFused
XY
ZW
ZWFused
XYFused
C
A
T
G
XYFused
XYFused
XY
ZW
ZWFused
Difference in fusion rate between Y and other sex chromosomes0 00.01 0.02 0.03 0.05 0.1 0.15
Prob
abilit
y den
sity
Pennell et al. in press PLoS Genetics
Y chromosomes fuse with autosomesmore frequently that X, W, or Z
How can this help us understand
EVOLUTIONARY PROCESSES?
Y W Z
Male biasedmutationBatemangradient
All elseequal
≈X fusions >X fusions <X fusions
Neutral case
Y W Z
Male biasedmutationBatemangradient
All elseequal
Neutral case
Y W Z
Male biasedmutationBatemangradient
All elseequalNO
NO
NO
Direct fitness effects
X
A
Y
A
X1
X2Y
Causes expression changes near breakpoints
Direct fitness effects (deleterious)
Y W Z
Male biasedmutationBatemangradient
All elseequal
Direct fitness effects (deleterious)
Y W Z
Male biasedmutationBatemangradient
All elseequalNO
YES
YES
Sexually antagonistic selectionA
Fitne
ss
Males Females
If A fuses with Y, allele will only be found in males(assuming no recombination between X and Y)
Y W Z
Male biasedmutationBatemangradient
All elseequal
Sexually antagonistic selection
Y W Z
Male biasedmutationBatemangradient
All elseequal
Sexually antagonistic selection
NO
YES
NO
Most scenarios inconsistent with excess of
Y-A fusions
Fusions deleterious + male-biased mutation
Fusions deleterious + Bateman gradient
Fusions driven by sexually antagonistic selection + male-biased mutation
(requires very high male-biased mutation rate)
Phylogenetic models used not mechanistic
But model fits can ground truth theoretical analyses
MacroevolutionaryDynamics
Populationprocesses
Statisticaldescriptors
MacroevolutionaryDynamics
Populationprocesses
Statisticaldescriptors
Software forfitting models
MacroevolutionaryDynamics
Populationprocesses
Statisticaldescriptors
Assessing modeladequacy
Software forfitting models
MacroevolutionaryDynamics
Populationprocesses
Statisticaldescriptors
Assessing modeladequacy
Incorporating sampling bias
Software forfitting models
MacroevolutionaryDynamics
Populationprocesses
Statisticaldescriptors
Assessing modeladequacy
Incorporating sampling bias
Punctuated equilibrium
Software forfitting models
MacroevolutionaryDynamics
Populationprocesses
Statisticaldescriptors
Software forfitting models
Assessing modeladequacy
Incorporating sampling bias
Punctuated equilibrium Chromosome fusions
Because I can’t eat phylogeniesInstitute for Bioinform
atics & Evolution
ary Studies
Luke Harmon
Jack SullivanScott NuismerPaul JoyceArne Mooers
David TankLarry Forney
Rich FitzJohnJosef UyedaJon EastmanDavid Bapst
Michael AlfaroSteve ArnoldFrank BurbrinkWill CornwellBernie CrespiJoe FelsensteinDavid GreenPaul Harnik
Mark KirkpatrickCraig MillerBrian O’MearaErica Bree RosenblumCarl SimpsonGraham SlaterDavid SwoffordAmy Zanne
Joseph BrownDaniel CaetanoSimone Des RochesTravis HageyKayla HardwickDenim Jochimsen
Suzanne JonesonRafael MaiaEliot MillerTom PoortenJames RosindellJamie Voyles
Simon Uribe-ConversTyler HetherBrice Sarver
All y’all
Institute for Bioinform
atics & Evolution
ary Studies
Lisha AbendrothEva Top
Roxana Hickey
My family