Accommodating clustered divergences in phylogenetic inference

Post on 12-Apr-2017

244 views 0 download

transcript

Accommodatingclustered divergences inphylogenetic inference

Jamie R. Oaks1,2

1Department of Biology, University ofWashington

2Department of Biological Sciences,Auburn University

October 21, 2015

c© 2007 Boris Kulikov boris-kulikov.blogspot.com

Clustered diversification Jamie Oaks – phyletica.org 1/27

I Phylogenetics is rapidlyprogressing as an endeavorof statistical inference

I “Big data” present excitingpossibilities andcomputational challenges

I Exciting opportunities todevelop new ways to studybiology in the light ofphylogeny

c© 2007 Boris Kulikov boris-kulikov.blogspot.com

Clustered diversification Jamie Oaks – phyletica.org 2/27

I Phylogenetics is rapidlyprogressing as an endeavorof statistical inference

I “Big data” present excitingpossibilities andcomputational challenges

I Exciting opportunities todevelop new ways to studybiology in the light ofphylogeny

c© 2007 Boris Kulikov boris-kulikov.blogspot.com

Clustered diversification Jamie Oaks – phyletica.org 2/27

I Phylogenetics is rapidlyprogressing as an endeavorof statistical inference

I “Big data” present excitingpossibilities andcomputational challenges

I Exciting opportunities todevelop new ways to studybiology in the light ofphylogeny

c© 2007 Boris Kulikov boris-kulikov.blogspot.com

Clustered diversification Jamie Oaks – phyletica.org 2/27

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Current state of phylogenetics

I Assumption: Divergences are independent across the tree

I We know this assumptionis frequently violated

I Why account for thisnon-independence?

1. Improve inference

2. Provide a frameworkfor studying processesof co-diversification

I This is a model-choiceproblem

Clustered diversification Jamie Oaks – phyletica.org 3/27

Divergence model choice

τ1

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 4/27

Divergence model choice

τ1

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 4/27

Divergence model choice

τ2 τ1

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 4/27

Divergence model choice

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 4/27

Divergence model choice

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 4/27

Divergence model choice

τ3 τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 4/27

Inferring co-diversification

m1 m2 m3 m4 m5

τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Inferring co-diversification

m1 m2 m3 m4 m5

τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

We want to infer m and T given DNA sequence alignments X

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

We want to infer m and T given DNA sequence alignments X

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

We want to infer m and T given DNA sequence alignments X

p(mi |X) ∝ p(X |mi )p(mi )

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

We want to infer m and T given DNA sequence alignments X

p(mi |X) ∝ p(X |mi )p(mi )

p(X |mi ) =

∫θp(X | θ,mi )p(θ |mi )dθ

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

We want to infer m and T given DNA sequence alignments X

p(mi |X) ∝ p(X |mi )p(mi )

p(X |mi ) =

∫θp(X | θ,mi )p(θ |mi )dθ

I Divergence times

I Gene trees

I Substitution parameters

I Demographic parameters

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analytically

I Numerical approximation via approximate-likelihood Bayesiancomputation (ABC)

2. Sampling over all possible models

I 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analytically

I Numerical approximation via approximate-likelihood Bayesiancomputation (ABC)

2. Sampling over all possible models

I 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible models

I 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible models

I 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible modelsI 5 taxa = 52 models

I 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible modelsI 5 taxa = 52 modelsI 10 taxa = 115,975 models

I 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible modelsI 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!

I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible modelsI 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 5/27

“Easy” as ABC

A

A

A

G

G

G

C

C

C

C

C

C

G

G

G

G

G

G

A

A

A

A

A

T

A

A

A

A

A

A

T

T

C

C

C

C

G

G

G

G

G

G

T

T

T

T

T

T

G

G

G

G

G

G

C

C

C

T

T

T

T

T

T

C

C

C

C

C

C

C

C

C

G

G

G

G

G

G

C

C

T

T

T

T

A

A

A

A

A

A

C

C

C

C

C

C

G

G

G

G

G

G

T

T

T

T

T

T

A

A

A

G

G

G

C

C

C

C

C

C

C

C

C

C

C

C

A

A

A

T

T

T

G

G

G

G

G

G

T

T

T

T

C

C

A

A

A

A

A

A

C

C

C

C

C

C

C

C

C

T

T

T

G

G

G

G

G

G

G

G

G

G

G

G

T

T

T

T

T

T

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 6/27

“Easy” as ABC

A

A

A

G

G

G

C

C

C

C

C

C

G

G

G

G

G

G

A

A

A

A

A

T

A

A

A

A

A

A

T

T

C

C

C

C

G

G

G

G

G

G

T

T

T

T

T

T

G

G

G

G

G

G

C

C

C

T

T

T

T

T

T

C

C

C

C

C

C

C

C

C

G

G

G

G

G

G

C

C

T

T

T

T

A

A

A

A

A

A

C

C

C

C

C

C

G

G

G

G

G

G

T

T

T

T

T

T

A

A

A

G

G

G

C

C

C

C

C

C

C

C

C

C

C

C

A

A

A

T

T

T

G

G

G

G

G

G

T

T

T

T

C

C

A

A

A

A

A

A

C

C

C

C

C

C

C

C

C

T

T

T

G

G

G

G

G

G

G

G

G

G

G

G

T

T

T

T

T

T

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 6/27

“Easy” as ABC

A

A

A

G

G

G

C

C

C

C

C

C

G

G

G

G

G

G

A

A

A

A

A

T

A

A

A

A

A

A

T

T

C

C

C

C

G

G

G

G

G

G

T

T

T

T

T

T

G

G

G

G

G

G

C

C

C

T

T

T

T

T

T

C

C

C

C

C

C

C

C

C

G

G

G

G

G

G

C

C

T

T

T

T

A

A

A

A

A

A

C

C

C

C

C

C

G

G

G

G

G

G

T

T

T

T

T

T

A

A

A

G

G

G

C

C

C

C

C

C

C

C

C

C

C

C

A

A

A

T

T

T

G

G

G

G

G

G

T

T

T

T

C

C

A

A

A

A

A

A

C

C

C

C

C

C

C

C

C

T

T

T

G

G

G

G

G

G

G

G

G

G

G

G

T

T

T

T

T

T

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 6/27

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

“Easy” as ABC

0.00.2

0.40.6

0.81.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

S1

S2

S3

Clustered diversification Jamie Oaks – phyletica.org 7/27

Inferring co-diversification

p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1

T1

T2

T3

τ2 τ1

T1

T2

T3

τ1τ2

T1

T2

T3

τ1τ2

T1

T2

T3

τ3 τ1τ2

T1

T2

T3

Challenges:

1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian

computation (ABC)

2. Sampling over all possible modelsI 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)

J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150

Clustered diversification Jamie Oaks – phyletica.org 9/27

Sampling divergence models—a novel approach

I The divergence models are ways of assigning our taxa toevents

I A Dirichlet process prior (DPP) model is a convenient andflexible solution

I Common Bayesian approach to assigning variables to anunknown number of categories

I Controlled by “concentration” parameter: α

Peter Dirichlet

Clustered diversification Jamie Oaks – phyletica.org 10/27

Sampling divergence models—a novel approach

I The divergence models are ways of assigning our taxa toevents

I A Dirichlet process prior (DPP) model is a convenient andflexible solution

I Common Bayesian approach to assigning variables to anunknown number of categories

I Controlled by “concentration” parameter: α

Peter Dirichlet

Clustered diversification Jamie Oaks – phyletica.org 10/27

Sampling divergence models—a novel approach

I The divergence models are ways of assigning our taxa toevents

I A Dirichlet process prior (DPP) model is a convenient andflexible solution

I Common Bayesian approach to assigning variables to anunknown number of categories

I Controlled by “concentration” parameter: α

Peter Dirichlet

Clustered diversification Jamie Oaks – phyletica.org 10/27

Sampling divergence models—a novel approach

I The divergence models are ways of assigning our taxa toevents

I A Dirichlet process prior (DPP) model is a convenient andflexible solution

I Common Bayesian approach to assigning variables to anunknown number of categories

I Controlled by “concentration” parameter: α

Peter Dirichlet

Clustered diversification Jamie Oaks – phyletica.org 10/27

α =

(αα+1

)(αα+2

)

= 0.758

αα+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)

= 0.076

αα+2

(1

α+1

)(2

α+2

)

= 0.015

2α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

α =

(αα+1

)(αα+2

)

= 0.758

αα+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)

= 0.076

αα+2

(1

α+1

)(2

α+2

)

= 0.015

2α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

α =

(αα+1

)(αα+2

)

= 0.758

αα+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)

= 0.076

αα+2

(1

α+1

)(2

α+2

)

= 0.015

2α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

α =

(αα+1

)(αα+2

)

= 0.758

αα+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)

= 0.076

αα+2

(1

α+1

)(2

α+2

)

= 0.015

2α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

α =

(αα+1

)(αα+2

)

= 0.758

αα+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

(αα+1

)(1

α+2

)

= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)

= 0.076

αα+2

(1

α+1

)(2

α+2

)

= 0.015

2α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

α = 0.5

(αα+1

)(αα+2

)= 0.067

= 0.758

αα+2

(αα+1

)(1

α+2

)= 0.133

= 0.076

1α+2

(αα+1

)(1

α+2

)= 0.133

= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)= 0.133

= 0.076

αα+2

(1

α+1

)(2

α+2

)= 0.533

= 0.015

2α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

α = 10.0

(αα+1

)(αα+2

)= 0.758

αα+2

(αα+1

)(1

α+2

)= 0.076

1α+2

(αα+1

)(1

α+2

)= 0.076

1α+2

αα+1

(1

α+1

)(αα+2

)= 0.076

αα+2

(1

α+1

)(2

α+2

)= 0.0152

α+2

1α+1

Clustered diversification Jamie Oaks – phyletica.org 11/27

New method: dpp-msbayes

I Flexible Dirichlet-process prior (DPP) over all possibledivergence models

I Flexible priors on parameters to avoid strongly weightedposteriors

I Multi-processing to accommodate genomic datasets

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 12/27

New method: dpp-msbayes

I Flexible Dirichlet-process prior (DPP) over all possibledivergence models

I Flexible priors on parameters to avoid strongly weightedposteriors

I Multi-processing to accommodate genomic datasets

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 12/27

New method: dpp-msbayes

I Flexible Dirichlet-process prior (DPP) over all possibledivergence models

I Flexible priors on parameters to avoid strongly weightedposteriors

I Multi-processing to accommodate genomic datasets

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 12/27

dpp-msbayes: Simulation-based assessment

Validation:

I Simulate 50,000 datasets and analyze each under the samemodel

Robustness:

I Simulate datasets that violate model assumptions and analyzeeach of them

Clustered diversification Jamie Oaks – phyletica.org 13/27

dpp-msbayes: Simulation-based assessment

Validation:

I Simulate 50,000 datasets and analyze each under the samemodel

Robustness:

I Simulate datasets that violate model assumptions and analyzeeach of them

Clustered diversification Jamie Oaks – phyletica.org 13/27

dpp-msbayes: Validation results

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

Posterior probability of one divergence

True

prob

abili

tyof

one

dive

rgen

ce

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 14/27

dpp-msbayes: Robustness results

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

Posterior probability of one divergence

True

prob

abili

tyof

one

dive

rgen

ce

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 15/27

dpp-msbayes: Performance

I New method for estimating shared evolutionary history shows:

1. Model-choice accuracy2. Robustness to model violations3. Power to detect variation in divergence times4. It’s fast!

I A new tool for biologists to leverage comparativegenomic data to explore processes of co-diversification

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 16/27

dpp-msbayes: Performance

I New method for estimating shared evolutionary history shows:

1. Model-choice accuracy2. Robustness to model violations3. Power to detect variation in divergence times4. It’s fast!

I A new tool for biologists to leverage comparativegenomic data to explore processes of co-diversification

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 16/27

Empirical applications

Did repeatedfragmentation of islandsduring inter-glacial risesin sea level promotediversification?

Clustered diversification Jamie Oaks – phyletica.org 17/27

Empirical applications

Did repeatedfragmentation of islandsduring inter-glacial risesin sea level promotediversification?

Clustered diversification Jamie Oaks – phyletica.org 17/27

Climate-driven diversification

Clustered diversification Jamie Oaks – phyletica.org 18/27

Climate-driven diversification

Clustered diversification Jamie Oaks – phyletica.org 18/27

Climate-driven diversification

Clustered diversification Jamie Oaks – phyletica.org 18/27

Results

1 3 5 7 9 11 13 15 17 19 21Number of divergence events

0.00

0.02

0.04

0.06

0.08

0.10

Pos

terio

r pro

babi

lity

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 19/27

Results

1 3 5 7 9 11 13 15 17 19 21Number of divergence events

0.00

0.02

0.04

0.06

0.08

0.10

Pos

terio

r pro

babi

lity

0100200300400500Time (kya)

0

-50

-100

Sea le

vel (m

)

J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 19/27

More data!

I Collecting genomic data from taxa co-distributed acrossSoutheast Asian Islands and Mainland

I Preliminary results for 1000 loci from 5 pairs of Gekkomindorensis populations

Clustered diversification Jamie Oaks – phyletica.org 20/27

More data!

I Collecting genomic data from taxa co-distributed acrossSoutheast Asian Islands and Mainland

I Preliminary results for 1000 loci from 5 pairs of Gekkomindorensis populations

1 2 3 4 5Number of divergence events, j¿j

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.02l

n(B

ayes

fact

or)

Clustered diversification Jamie Oaks – phyletica.org 20/27

Diversification across African rainforests

I Did climate cycles drivediversification andcommunity assembly acrossrainforest taxa?

I Preliminary results with 300loci from 3 taxa

Clustered diversification Jamie Oaks – phyletica.org 21/27

Diversification across African rainforests

I Did climate cycles drivediversification andcommunity assembly acrossrainforest taxa?

I Preliminary results with 300loci from 3 taxa

1 2 3Number of divergence events, j¿j

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2ln(

Bay

es fa

ctor

)

Clustered diversification Jamie Oaks – phyletica.org 21/27

Conclusions

I New method for estimating shared evolutionary historyI Shows good “frequentist” behaviorI Relatively robust to model violations

I Finding support for temporally clustered divergences inmultiple systems

I However, there is a lot of uncertainty!

Clustered diversification Jamie Oaks – phyletica.org 22/27

Conclusions

I New method for estimating shared evolutionary historyI Shows good “frequentist” behaviorI Relatively robust to model violations

I Finding support for temporally clustered divergences inmultiple systems

I However, there is a lot of uncertainty!

Clustered diversification Jamie Oaks – phyletica.org 22/27

Conclusions

I New method for estimating shared evolutionary historyI Shows good “frequentist” behaviorI Relatively robust to model violations

I Finding support for temporally clustered divergences inmultiple systems

I However, there is a lot of uncertainty!

Clustered diversification Jamie Oaks – phyletica.org 22/27

Current work: More power

I Full-likelihood Bayesian implementation

I Uses all the information in the dataI Applicable to deeper timescales

I Analytically integrate over gene trees 1

I Very efficient numerical approximation of posteriorI Applicable to NGS datasets

1D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932

Clustered diversification Jamie Oaks – phyletica.org 23/27

Current work: More power

I Full-likelihood Bayesian implementationI Uses all the information in the dataI Applicable to deeper timescales

I Analytically integrate over gene trees 1

I Very efficient numerical approximation of posteriorI Applicable to NGS datasets

1D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932

Clustered diversification Jamie Oaks – phyletica.org 23/27

Current work: More power

I Full-likelihood Bayesian implementationI Uses all the information in the dataI Applicable to deeper timescales

I Analytically integrate over gene trees 1

I Very efficient numerical approximation of posteriorI Applicable to NGS datasets

1D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932

Clustered diversification Jamie Oaks – phyletica.org 23/27

Current work: More power

I Full-likelihood Bayesian implementationI Uses all the information in the dataI Applicable to deeper timescales

I Analytically integrate over gene trees 1

I Very efficient numerical approximation of posteriorI Applicable to NGS datasets

1D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932

Clustered diversification Jamie Oaks – phyletica.org 23/27

Next step: A general framework

I Develop a framework for inferringshared divergences acrossphylogenies

I Generalize Bayesian phylogeneticsto incorporate shared divergences

I Sample models numerically viareversible-jump Markov chainMonte Carlo

Benefits:

I Improve phylogenetic inference

I Framework for studying processesof co-diversification

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 24/27

Next step: A general framework

I Develop a framework for inferringshared divergences acrossphylogenies

I Generalize Bayesian phylogeneticsto incorporate shared divergences

I Sample models numerically viareversible-jump Markov chainMonte Carlo

Benefits:

I Improve phylogenetic inference

I Framework for studying processesof co-diversification

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 24/27

Next step: A general framework

I Develop a framework for inferringshared divergences acrossphylogenies

I Generalize Bayesian phylogeneticsto incorporate shared divergences

I Sample models numerically viareversible-jump Markov chainMonte Carlo

Benefits:

I Improve phylogenetic inference

I Framework for studying processesof co-diversification

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 24/27

Next step: A general framework

I Develop a framework for inferringshared divergences acrossphylogenies

I Generalize Bayesian phylogeneticsto incorporate shared divergences

I Sample models numerically viareversible-jump Markov chainMonte Carlo

Benefits:

I Improve phylogenetic inference

I Framework for studying processesof co-diversification

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 24/27

Next step: A general framework

I Develop a framework for inferringshared divergences acrossphylogenies

I Generalize Bayesian phylogeneticsto incorporate shared divergences

I Sample models numerically viareversible-jump Markov chainMonte Carlo

Benefits:

I Improve phylogenetic inference

I Framework for studying processesof co-diversification

τ1τ2

T1

T2

T3

Clustered diversification Jamie Oaks – phyletica.org 24/27

Everything is on GitHub. . .

Software:

I dpp-msbayes: https://github.com/joaks1/dpp-msbayes

I PyMsBayes: https://joaks1.github.io/PyMsBayes

I ABACUS: Approximate BAyesian C UtilitieS.https://github.com/joaks1/abacus

Open-Science Notebook:

I msbayes-experiments:https://github.com/joaks1/msbayes-experiments

Clustered diversification Jamie Oaks – phyletica.org 25/27

Acknowledgments

Ideas and feedback:

I Leache Lab

I Minin Lab

I Holder Lab

I Brown Lab/KU Herpetology

Computation:

Funding:

Photo credits:

I Rafe Brown, Cam Siler, JesseGrismer, & Jake Esselstyn

I FMNH Philippine MammalWebsite:

I D.S. Balete, M.R.M. Duya,& J. Holden

I PhyloPic!

Clustered diversification Jamie Oaks – phyletica.org 26/27

Questions?

joaks@auburn.edu

c© 2007 Boris Kulikov boris-kulikov.blogspot.com

Clustered diversification Jamie Oaks – phyletica.org 27/27