PHYLOGENOMIC APPLICATIONS
OF REPETITIVE ELEMENTSDaniel Vitales
Institut Botànic de Barcelona (IBB, CSIC-Ajuntament de Barcelona)
Laboratori de Botànica, Facultat de Farmàcia i Ciències de l’Alimentació, Universitat de Barcelona
I Simposio Anual de Botánica Española:
Filogenómica para comprender la diversidad y evolución de grupos complejos de plantas
February 8th, 2020
REPETITIVE ELEMENTS: INTRODUCTION
TANDEM REPEATS DISPERSED REPEATS
TRANSPOSONS OTHER DISPERSED(e.g. tRNA-like,
retropseudogens)
DNATRANSPOSONS
RNATRANSPOSONS
LTR RETROTRANSPOSONS
(e.g. copia, gypsy)
NON-LTR RETROTRANSPOSONS
(e.g. LINEs, SINEs)
TANDEM REPEAT GENES
SATELLITES
• SATELLITES
• MINISATELLITES
• MICROSATELLITES
• RIBOSOMAL DNA
• OTHER MULTIPLE-COPY GENES(e.g. histones)
“REPEATOME”REPETITIVE FRACTION OF
THE GENOME
Genlisea aurea 63.6 Mbp SMALLEST PLANT GENOMEArabidopsis thaliana 125 Mbp 25%Sugar beet Beta vulgaris 758 Mbp 63% Broad bean Vicia faba 12000 Mbp 85% Rye Secale cereale 8800 Mbp 92% Onion Allium cepa 15100 Mbp 95%Paris japonica 149000 Mbp LARGEST PLANT GENOMEHuman Homo sapiens 3000 Mbp >50%
Species Genome size Repeat content
Liu et al. 2013. International journal of molecular sciences, 14(7), 13559-13576.
7th Workshop on the Application of Next Generation Sequencing to Repetitive DNA Analysis in Plants. Ceske Budejovice. 22-24 May 2018.http://repeatexplorer.org/
Plant genome composition
Plant repeatome composition
REPETITIVE ELEMENTS: INTRODUCTION
• Limitations caused by short length of NGS sequences
• Repeat length > Read Length
• Copies of the repetitive elements accumulate mutations,
diverging along time
• (unless concerted evolution!)
Nieto Feliner & Rosselló. 2012. Plant genome diversity volume 1, 171-193.
Caveats of using repetitive elements for phylogenetic reconstruction
Repeats
Reads?
? ?
?
e.g. retrotransposon length: ~ 1000 - 20000bp
read length: 100~300nt
?
?
?
?
?
?
?
?
?
REPETITIVE ELEMENTS: INTRODUCTION
Identification of sequences clusters
Reconstruction of repetitive elements
Shotgun genomic sequencing
Dispersed RE(eg. transposons)
Tandem Repeats(e.g. rDNA, satellites)
Reads
Each cluster is a set of reads that frequently overlap and that are part of the same family of repetitive elements.
Novák, P., Neumann, P., Pech, J., Steinhaisl, J., & Macas, J. (2013). RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitiveelements from next-generation sequence reads. Bioinformatics, 29(6), 792-793.
REPETITIVE ELEMENTS: CHARACTERIZATION
7th Workshop on the Application of Next Generation Sequencing to Repetitive DNA Analysis in Plants. Ceske Budejovice. 22-24 May 2018.http://repeatexplorer.org/
REPETITIVE ELEMENTS: CHARACTERIZATION
7th Workshop on the Application of Next Generation Sequencing to Repetitive DNA Analysis in Plants. Ceske Budejovice. 22-24 May 2018.http://repeatexplorer.org/
7th Workshop on the Application of Next Generation Sequencing to Repetitive DNA Analysis in Plants. Ceske Budejovice. 22-24 May 2018.http://repeatexplorer.org/
Cluster annotation and quantification
REPETITIVE ELEMENTS: CHARACTERIZATION
7th Workshop on the Application of Next Generation Sequencing to Repetitive DNA Analysis in Plants. Ceske Budejovice. 22-24 May 2018. http://repeatexplorer.org/
Proportion of reads
Novák et al. 2014. PloS one, 9(6).
Clu
ster
s
CL1CL2CL3CL4CL5CL6CL7CL8CL9CL10CL11CL12CL13CL14CL15CL16CL17
CL18CL19CL20CL21--
REPETITIVE ELEMENTS: CHARACTERIZATION
Dodsworth et al. 2015. Systematic biology, 64(1), 112-126.
REPETITIVE ELEMENTS: PHYLOGENOMIC METHODSPhylogenetic reconstruction based on comparative repeat abundances
Phylogenetic reconstruction based on comparative repeat abundances
Dodsworth et al. 2015. Systematicbiology, 64(1), 112-126.
Fritillaria
Asclepias
Orobanche Fabeae
Drosophila
REPETITIVE ELEMENTS: PHYLOGENOMIC METHODS
Similarity A-B [CL n] = Observed N edges A-B [CL n]
Expected N edges A-B [CL n]
=Observed N edges A-B [CL n]
(N reads A + N reads B) [CL n]
N reads total [cluster n]
Vitales, Garcia & Dodsworth. 2019. BioRxiv. doi: https://doi.org/10.1101/624064
REPETITIVE ELEMENTS: PHYLOGENOMIC METHODSPhylogenetic reconstruction based on repeat similarities
Vitales, Garcia & Dodsworth. 2019. BioRxiv. doi: https://doi.org/10.1101/624064
REPETITIVE ELEMENTS: PHYLOGENOMIC METHODSPhylogenetic reconstruction based on repeat similarities
Vitales, Garcia & Dodsworth. 2019. BioRxiv. doi: https://doi.org/10.1101/624064
REPETITIVE ELEMENTS: PHYLOGENOMIC METHODSPhylogenetic reconstruction based on repeat similarities
Straub et al. 2012. American Journal of Botany, 99(2), 349-364.
Asclepias Sonoran Desert Clade
Vitales, Garcia & Dodsworth. 2019. BioRxiv. doi: https://doi.org/10.1101/624064
REPETITIVE ELEMENTS: PHYLOGENOMIC METHODSPhylogenetic reconstruction based on repeat similarities
repeat abundances repeat similarities
Phylogenetic reconstruction based on repeat similaritiesGenome representation Repeat types
REPETITIVE ELEMENTS: PHYLOGENOMIC METHODS
Vitales, Garcia & Dodsworth. 2019. BioRxiv. doi: https://doi.org/10.1101/624064
Bello et al. 2012. Annals of Botany 112, 1597-1612.
GS (pg)
13.6
8.3
10.7
12.4
11.8
10.6
9.5
13.4
10.1
Rosato et al. 2018. Annals of Botany 122(3), 387-395.
SpeciesN pop
(N ind)ITR site N
Heliocauta atlantica 3 (10) 6-17
A. clavatus 14 (38) 0-14
A. homogamos 2 (8) 0
A. linearilobus 3 (5) 19
A. maroccanus 3 (9) 0
A. monanthos 3 (10) 0-4
A. radiatus 3 (8) 0
A. pyrethrum 2 (6) 26-45
A. valentinus 9 (31) 0-10
Interstitial telomeric-like repeats (ITR) variability
Rosato et al. 2017. PloS one, 12(10).
REPETITIVE ELEMENTS: ANACYCLUS STUDY CASE
Hypothesis: activation of the repeat
machinery drive homoploid changes in GS
Karyological 45S rDNA site phenotypes
Vitales et al. 2019. Annals of Botany (in press). doi: https://doi.org/10.1093/aob/mcz183
Conservation levels of highly abundant TEs are decoupled from the actual GS of the species
REPETITIVE ELEMENTS: ANACYCLUS STUDY CASE
Comparative repeat composition of Anacyclus species Sequence conservation by differential stringency mapping
Alternative hypothesis: recombination events between
homologous chromosomes derived from distinct genomes
(i.e. from homoploid hybridization) leading to
chromosome arm exchanges, which would result in
different genome sizes.
REPETITIVE ELEMENTS: ANACYCLUS STUDY CASE
Vitales et al. 2019. Annals of Botany (in press). Doi: https://doi.org/10.1093/aob/mcz183
• Shallow sequencing of gDNA (genome skimming) might result in a depth characterization of repetitive DNA.
• Genomic repeat abundances and repeat sequence similarities contain phylogenetic signals and can be used as a complementary markers to infer evolutionary histories.
• Combined application of phylogenetic approaches based on repeat abundances and repeat sequence similarities can be helpful to understand mechanisms governing genome and repeatome evolution.
• Further development of these methods should focus on automating the data processing and obtaining support values for phylogenetic trees and networks.
SUMMARY
Acknowledgements:
Institut Botànic de Barcelona, CSIC-ICUBSònia GarciaTeresa GarnatjeJaume PellicerJoan Pere Pascual
Real Jardín Botánico, CSICGonzalo Nieto-FelinerInés ÁlvarezJavier Fuertes
Universitat de BarcelonaJoan VallèsOriane Hidalgo
Institute of Biophysics, BrnoAleš Kovařík
Jardí Botànic de la Universitat de ValènciaMarcela RosatoJosep Antoni Rosselló
University of BedfordshireSteven Dodsworth