+ All Categories
Home > Documents > Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can...

Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can...

Date post: 31-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
13
Fast, accurate species tree estimation with ASTRID Pranjal Vachaspati
Transcript
Page 1: Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can scale to extremely large datasets 43,183 taxon supertree dataset based on RNAsim

Fast, accurate species tree estimation with ASTRID

Pranjal Vachaspati

Page 2: Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can scale to extremely large datasets 43,183 taxon supertree dataset based on RNAsim

ASTRID estimates species trees

INPUT: Gene treesOUTPUT: Species tree computed from internode distance matrices

Similar approach to NJst (Liu & Yu, 2011)

Page 3: Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can scale to extremely large datasets 43,183 taxon supertree dataset based on RNAsim

Internode distance matrices

Page 4: Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can scale to extremely large datasets 43,183 taxon supertree dataset based on RNAsim

Distance methods for species tree estimationDistance methods take a distance matrix as input, and output a species tree

NJst (predecessor to ASTRID) used neighbor-joining

ASTRID normally uses FastME-2 (Lefort, Desper & Gascuel 2015)

- Faster than neighbor-joining- More accurate

Can also use other methods (BIONJ*, RapidNJ, UPGMA*) in certain cases

Page 5: Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can scale to extremely large datasets 43,183 taxon supertree dataset based on RNAsim

Evaluating ASTRID with simulated data

Page 6: Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can scale to extremely large datasets 43,183 taxon supertree dataset based on RNAsim

High accuracy on datasets with ILS

47-taxon simulated data based on avian biological dataset

High ILS (47% AD - average distance between true gene trees and true species tree)

Page 7: Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can scale to extremely large datasets 43,183 taxon supertree dataset based on RNAsim

High accuracy on datasets with ILS

37-taxon simulated data based on avian biological dataset

Moderate ILS (29% AD - average distance between true gene trees and true species tree)

Page 8: Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can scale to extremely large datasets 43,183 taxon supertree dataset based on RNAsim

ASTRID is extremely fastSimulated ILS dataset from ASTRAL-II paper (Mirarab & Warnow, 2015)

1000 gene trees, high ILS

Completes in just 30 seconds!

Competing methods much slower: ASTRAL 5.7.3 took 90 seconds for 100 taxa, 2 hours for 1000 taxa

Page 9: Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can scale to extremely large datasets 43,183 taxon supertree dataset based on RNAsim

ASTRID can scale to extremely large datasets43,183 taxon supertree dataset based on RNAsim simulation

ASTRID completed in 5 hours 42 minutes (FastME+NNIs as distance method)

Running time dominated by distance method

- with RapidNJ, only 25 minutes! (but worse accuracy)

High RAM requirements - 132 GB for FastME analysis.

ASTRAL could not run (ran out of memory); MRL took over 24 hours

Page 10: Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can scale to extremely large datasets 43,183 taxon supertree dataset based on RNAsim

Getting ASTRID: github.com/pranjalv123/ASTRIDAvailable for Linux, Mac, and Windows

GPL-licensed

Page 11: Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can scale to extremely large datasets 43,183 taxon supertree dataset based on RNAsim

Using ASTRIDEasy-to-use command line interface, similar to ASTRAL:

ASTRID -i <input tree file> -o <output tree file>

Usually, this is sufficient!

Page 12: Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can scale to extremely large datasets 43,183 taxon supertree dataset based on RNAsim

ASTRID: accurate, extremely fast species tree estimation

Available at github.com/pranjalv123/ASTRID

Join the ASTRID user group at

https://groups.google.com/d/forum/astrid-users or [email protected]

Summary

Page 13: Fast, accurate species tree estimation with ASTRIDtandy.cs.illinois.edu/astrid-ssb-v2.pdfASTRID can scale to extremely large datasets 43,183 taxon supertree dataset based on RNAsim

Using ASTRID: multi-individual datasetsCreate a file mapping individual names to species names:

<file:species_mapping.txt>species1:indiv1,indiv2,indiv3species2:indiv4,indiv5…

ASTRID -i input.trees -o output.trees -a species_mapping.txt


Recommended