+ All Categories
Home > Documents > The evolution, diversity, and host associations of ... · The evolution, diversity, and host...

The evolution, diversity, and host associations of ... · The evolution, diversity, and host...

Date post: 12-Feb-2019
Category:
Upload: habao
View: 214 times
Download: 0 times
Share this document with a friend
12
The evolution, diversity, and host associations of rhabdoviruses Ben Longdon, 1, * Gemma G. R. Murray, 1 William J. Palmer, 1 Jonathan P. Day, 1 Darren J Parker, 2,3 John J. Welch, 1 Darren J. Obbard 4 and Francis M. Jiggins 1 1 Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, 2 School of Biology, University of St Andrews, St Andrews, KY19 9ST, UK, 3 Department of Biological and Environmental Science, University of Jyva ¨ skyla ¨ , Jyva ¨ skyla ¨ , Finland and 4 Institute of Evolutionary Biology, and Centre for Immunity Infection and Evolution, University of Edinburgh, Edinburgh, EH9 3JT, UK *Corresponding author: E-mail: [email protected] Abstract Metagenomic studies are leading to the discovery of a hidden diversity of RNA viruses. These new viruses are poorly characterized and new approaches are needed predict the host species these viruses pose a risk to. The rhabdoviruses are a diverse family of RNA viruses that includes important pathogens of humans, animals, and plants. We have discovered thirty-two new rhabdoviruses through a combination of our own RNA sequencing of insects and searching public sequence databases. Combining these with previously known sequences we reconstructed the phylogeny of 195 rhabdovirus sequences, and produced the most in depth analysis of the family to date. In most cases we know nothing about the biology of the viruses beyond the host they were identified from, but our dataset provides a powerful phylogenetic approach to predict which are vector-borne viruses and which are specific to vertebrates or arthropods. By reconstructing ancestral and present host states we found that switches between major groups of hosts have occurred rarely during rhabdovirus evolution. This allowed us to propose seventy-six new likely vector-borne vertebrate viruses among viruses identified from vertebrates or biting insects. Based on currently available data, our analysis suggests it is likely there was a single origin of the known plant viruses and arthropod-borne vertebrate viruses, while vertebrate- and arthropod-specific viruses arose at least twice. There are also few transitions between aquatic and terrestrial ecosystems. Viruses also cluster together at a finer scale, with closely related viruses tending to be found in closely related hosts. Our data therefore suggest that throughout their evolution, rhabdoviruses have occasionally jumped between distantly related host species before spreading through related hosts in the same environment. This approach offers a way to predict the most probable biology and key traits of newly discovered viruses. Key words: virus; host shift; arthropod; insect; rhabdoviridae; mononegavirales. 1. Introduction RNA viruses are an abundant and diverse group of pathogens. In the past, viruses were typically isolated from hosts displaying symptoms of infection, before being characterized morphologi- cally and then sequenced following PCR (Liu, Vijayendran, and Bonning 2011; Lipkin and Anthony 2015). PCR-based detection of novel RNA viruses is problematic as there is no single con- served region of the genome shared by all viruses from a single family, let alone across all RNA viruses. High throughput next generation sequencing technology has revolutionized virus dis- covery, allowing rapid detection and sequencing of divergent vi- rus sequences simply by sequencing total RNA from infected individuals (Liu, Vijayendran, and Bonning 2011; Lipkin and Anthony, 2015). One particularly diverse family of RNA viruses is the Rhabdo- viridae. Rhabdoviruses are negative-sense single-stranded RNA V C The Author 2015. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 1 Virus Evolution, 2015, 1(1): vev014 doi: 10.1093/ve/vev014 Research article
Transcript

The evolution diversity and host associations of

rhabdovirusesBen Longdon1 Gemma G R Murray1 William J Palmer1 Jonathan P Day1

Darren J Parker23 John J Welch1 Darren J Obbard4 and Francis M Jiggins1

1Department of Genetics University of Cambridge Cambridge CB2 3EH 2School of Biology University ofSt Andrews St Andrews KY19 9ST UK 3Department of Biological and Environmental Science University ofJyvaskyla Jyvaskyla Finland and 4Institute of Evolutionary Biology and Centre for Immunity Infection andEvolution University of Edinburgh Edinburgh EH9 3JT UK

Corresponding author E-mail blongdongencamacuk

Abstract

Metagenomic studies are leading to the discovery of a hidden diversity of RNA viruses These new viruses are poorlycharacterized and new approaches are needed predict the host species these viruses pose a risk to The rhabdoviruses are adiverse family of RNA viruses that includes important pathogens of humans animals and plants We have discoveredthirty-two new rhabdoviruses through a combination of our own RNA sequencing of insects and searching public sequencedatabases Combining these with previously known sequences we reconstructed the phylogeny of 195 rhabdovirussequences and produced the most in depth analysis of the family to date In most cases we know nothing about the biologyof the viruses beyond the host they were identified from but our dataset provides a powerful phylogenetic approach topredict which are vector-borne viruses and which are specific to vertebrates or arthropods By reconstructing ancestral andpresent host states we found that switches between major groups of hosts have occurred rarely during rhabdovirusevolution This allowed us to propose seventy-six new likely vector-borne vertebrate viruses among viruses identified fromvertebrates or biting insects Based on currently available data our analysis suggests it is likely there was a single origin ofthe known plant viruses and arthropod-borne vertebrate viruses while vertebrate- and arthropod-specific viruses arose atleast twice There are also few transitions between aquatic and terrestrial ecosystems Viruses also cluster together at afiner scale with closely related viruses tending to be found in closely related hosts Our data therefore suggest thatthroughout their evolution rhabdoviruses have occasionally jumped between distantly related host species beforespreading through related hosts in the same environment This approach offers a way to predict the most probable biologyand key traits of newly discovered viruses

Key words virus host shift arthropod insect rhabdoviridae mononegavirales

1 Introduction

RNA viruses are an abundant and diverse group of pathogensIn the past viruses were typically isolated from hosts displayingsymptoms of infection before being characterized morphologi-cally and then sequenced following PCR (Liu Vijayendran andBonning 2011 Lipkin and Anthony 2015) PCR-based detectionof novel RNA viruses is problematic as there is no single con-served region of the genome shared by all viruses from a single

family let alone across all RNA viruses High throughput nextgeneration sequencing technology has revolutionized virus dis-covery allowing rapid detection and sequencing of divergent vi-rus sequences simply by sequencing total RNA from infectedindividuals (Liu Vijayendran and Bonning 2011 Lipkin andAnthony 2015)

One particularly diverse family of RNA viruses is the Rhabdo-viridae Rhabdoviruses are negative-sense single-stranded RNA

VC The Author 2015 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40)which permits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

1

Virus Evolution 2015 1(1) vev014

doi 101093vevev014Research article

viruses in the order Mononegavirales (Dietzgen and Kuzmin2012) They infect an extremely broad range of hosts and havebeen discovered in plants fish mammals reptiles and a broadrange of insects and other arthropods (Bourhy et al 2005) Thefamily includes important pathogens of humans and livestockPerhaps the most well-known is rabies virus which can infect adiverse array of mammals and causes a fatal infection killing59000 people per year with an estimated economic cost of $86billion (US) (Hampson et al 2015) Other rhabdoviruses such asvesicular stomatitis virus and bovine ephemeral fever virus areimportant pathogens of domesticated animals while others arepathogens of crops (Dietzgen and Kuzmin 2012)

Arthropods play a key role in the transmission of manyrhabdoviruses Many viruses found in vertebrates have alsobeen detected in arthropods including sandflies mosquitoesticks and midges (Walker Blasdell and Joubert 2012) The rhab-doviruses that infect plants are also often transmitted by ar-thropods (Hogenhout Redinbaugh and Ammar 2003) and somethat infect fish can potentially be vectored by ectoparasitic co-pepod sea-lice (Pfeilputzien 1978 Ahne et al 2002) Moreoverinsects are biological vectors rhabdoviruses replicate upon in-fection of insect vectors (Hogenhout Redinbaugh and Ammar2003) Other rhabdoviruses are insect-specific In particular thesigma viruses are a clade of vertically transmitted viruses thatinfect dipterans and are well-studied in Drosophila (Longdonet al 2011ab Longdon and Jiggins 2012) Recently a number ofrhabdoviruses have been found to be associated with a wide ar-ray of insect and other arthropod species suggesting they maybe common arthropod viruses (Li et al 2015 Walker et al 2015)Furthermore a number of arthropod genomes contain inte-grated endogenous viral elements (EVEs) with similarity torhabdoviruses suggesting that these species have been infectedwith rhabdoviruses at some point in their history (Katzourakisand Gifford 2010 Fort et al 2011 Ballinger Bruenn and Taylor2012 Aiewsakun and Katzourakis 2015)

Here we explore the diversity of the rhabdoviruses andexamine how they have switched between different host taxaduring their evolutionary history Insects infected with rhabdo-viruses commonly become paralysed on exposure to CO2 (Rosen1980 Shroyer and Rosen 1983 Longdon Wilfert and Jiggins2012) We exploited this fact to screen field collections of fliesfrom several continents for novel rhabdoviruses that were thensequenced using metagenomic RNA-sequencing (RNA-seq)Additionally we searched for rhabdovirus-like sequences inpublicly available RNA-seq data We identified thirty-two novelrhabdovirus-like sequences from a wide array of invertebratesand plants and combined them with recently discovered vi-ruses to produce the most comprehensive phylogeny of therhabdoviruses to date For many of the viruses we do not knowtheir true host range so we used the phylogeny to identify alarge number of new likely vector-borne viruses and to recon-struct the evolutionary history of this diverse group of viruses

2 Methods21 Discovery of new rhabdoviruses by RNA sequencing

Diptera (flies mostly Drosophilidae) were collected in the fieldfrom Spain USA Kenya France Ghana and the UK(Supplementary Data S1) Infection with rhabdoviruses cancause Drosophila and other insects to become paralysed after ex-posure to CO2 (Rosen 1980 Shroyer and Rosen 1983 LongdonWilfert and Jiggins 2012) so we enriched our sample for in-fected individuals by exposing them to CO2 at 12C for 15 min

only retaining individuals that showed symptoms of paralysis30 min later We extracted RNA from seventy-nine individualinsects (details in Supplementary Data S1) using Trizol reagent(Invitrogen) and combined the extracts into two pools (retainingnon-pooled individual RNA samples) RNA was then rRNA de-pleted with the Ribo-Zero Gold kit (epicenter USA) and used toconstruct Truseq total RNA libraries (Illumina) Libraries wereconstructed and sequenced by BGI (Hong Kong) on an IlluminaHi-Seq 2500 (one lane 100-bp paired end reads generating 175million reads) Sequences were quality trimmed withTrimmomatic (v3) Illumina adapters were clipped bases wereremoved from the beginning and end of reads if quality droppedbelow a threshold sequences were trimmed if the average qual-ity within a window fell below a threshold and reads lt20 bp inlength were removed We de novo assembled the RNA-seq readswith Trinity (release 25 February 2013) using default settingsand jaccard clip option for high gene density The assembly wasthen searched using tblastn to identify rhabdovirus-like se-quences with known rhabdovirus coding sequences as thequery Any contigs with high sequence similarity to rhabdovi-ruses were then reciprocally compared to GenBank cDNA andRefSeq nucleotide databases using tblastn and only retained ifthey most closely matched a virus-like sequence Raw read datawere deposited in the NCBI Sequence Read Archive (SRP057824)Putative viral sequences have been submitted to GenBank (ac-cession numbers in Supplementary Tables S1 and S2)

As the RNA-seq was performed on pooled samples we as-signed rhabdovirus sequences to individual insects by PCR onRNA from individual samples cDNA was produced usingPromega GoScript Reverse Transcriptase and random-hexamerprimers and PCR performed using primers designed from therhabdovirus sequences Infected host species were identified bysequencing the mitochondrial gene COI We were unable toidentify the host species of the virus from a Drosophila affinissub-group species (sequences appear similar to both Daffinisand the closely related Drosophila athabasca) despite the addi-tion of further mitochondrial and nuclear sequences to increaseconfidence In all cases we confirmed that viruses were onlypresent in cDNA and not in non-reverse-transcription (RT) con-trols (ie DNA) by PCR and so they cannot be integrated intothe insect genome (ie endogenous virus elements or EVEs[Katzourakis and Gifford 2010]) COI primers were used as a pos-itive control for the presence of DNA in the non-RT template

We identified sigma virus sequences in RNA-seq data fromDrosophila montana (Parker et al 2015) We used RT-PCR on an in-fected fly line to amplify the virus sequence and carried out ad-ditional Sanger sequencing with primers designed using theRNA-seq assembly Additional virus sequences were identifiedfrom an RNA-seq analysis of pools of wild caught DrosophilaDImmSV from Drosophila immigrans (collection and sequencingdescribed [van Mierlo et al 2014]) DTriSV from a pool ofDrosophila tristis and SDefSV from Scaptodrosophila deflexa (bothDarren Obbard unpublished data) GenBank accession numbersfor new virus sequences are (KR822817 KR822816 KR822823KR822813 KR822820 KR822821 KR822822 KR822815 KR822824KR822812 KR822811 KR822814 and KR822818) A full list of ac-cessions can be found in Supplementary Tables S1 and S2

22 Discovery of rhabdoviruses in public sequencedatabases

Rhabdovirus L gene sequences were used as queries to search(tblastn) expressed sequence tag and transcriptome shotgun as-sembly databases (NCBI) All sequences were reciprocally

2 | Virus Evolution 2015 Vol 1 No 1

BLAST searched against GenBank cDNA and RefSeq databasesand only retained if they matched a virus-like sequence Weused two approaches to examine whether sequences were pre-sent as RNA but not DNA First where assemblies of whole-ge-nome shotgun sequences were available we used BLAST to testwhether sequences were integrated into the host genomeSecond for the virus sequences in the butterfly Pararge aegeriaand the medfly Ceratitis capitata we were able to obtain infectedsamples to confirm whether sequences are only present in RNAby performing PCR on both genomic DNA and cDNA as de-scribed above (samples kindly provided by Casper BreukerMelanie Gibbs and Philip Leftwich respectively)

23 Phylogenetic analysis

All available rhabdovirus-like sequences were downloadedfrom GenBank (accessions in Supplementary Data S2) Aminoacid sequences for the L gene (encoding the RNA DependentRNA Polymerase or RDRP) were used to infer the phylogeny(L gene sequences) as they contain conserved domains that canbe aligned across this diverse group of viruses Sequences werealigned with MAFFT (Katoh and Standley 2013) under defaultsettings and then poorly aligned and divergent sites were re-moved with either TrimAl (v13 strict settings implemented onPhylemon v20 server alignment) (Capella-Gutierrez Silla-Martinez and Gabaldon 2009) or Gblocks (v091b selectingsmaller final blocks allowing gap positions and less strict flank-ing positions to produce a less stringent selection alignment)(Talavera and Castresana 2007) These resulted in alignments of1492 and 829 amino acids respectively

Phylogenetic trees were inferred using Maximum Likelihoodin PhyML (v30) (Guindon et al 2010) using the LG substitutionmodel (Le and Gascuel 2008) (preliminary analysis confirmedthe results were robust to the amino acid substitution model se-lected) with a gamma distribution of rate variation with fourcategories and a sub-tree pruning and regrafting topologysearching algorithm Branch support was estimated usingApproximate Likelihood-Ratio Tests (aLRT) that are reported tooutperform bootstrap methods (Anisimova and Gascuel 2006)Figures were created using FIGTREE (v 14) (Rambaut 2011)

24 Analysis of phylogenetic structure between virusestaken from different hosts and ecologies

We measured the degree of phylogenetic structure between vi-rus sequences identified in different categories of host (arthro-pods vertebrates and plants) and ecosystems (terrestrial andaquatic) Following Bhatia et al (2013) we measured the degreeof genetic structure between virus sequences from differentgroups of hostsecosystems using Hudsonrsquos Fst estimator(Hudson Slatkin and Maddison 1992) as in Bhatia et al (2013)We calculated Fst as 1 the mean number of differences be-tween sequences within or between populations where a popu-lation is a host category or ecosystem The significance of thisvalue was tested by comparison with 1000 replicates with hostcategories randomly permuted over sequences We also mea-sured the clustering of these categories over our phylogeny us-ing the genealogical sorting index (GSI) a measure of the degreeof exclusive ancestry of a group on a rooted genealogy(Cummings Neel and Shaw 2008) for each of our host associa-tion categories The index was estimated using thegenealogicalSorting R package (Bazinet Myers and Khatavkar2009) with significance estimated by permutation The tree waspruned to remove strains that could not be assigned to one of

the host association categories under consideration Finallysince arthropods are the most sampled host we tested for evi-dence of genetic structure within the arthropod-associated vi-ruses that would suggest co-divergence with their hosts orpreferential host-switching between closely related hosts Wecalculated the Pearson correlation coefficient of the evolution-ary distances between viruses and the evolutionary distancesbetween their hosts and tested for significance by permutation(as in Hommola et al [2009]) We used the patristic distances ofour ML tree for the virus data and a time-tree of arthropod gen-era using published estimates of divergence dates (Jeyaprakashand Hoy 2009 Misof et al 2014)

25 Reconstruction of host associations

Viruses were categorized as having one of four types of host as-sociation arthropod-specific vertebrate-specific arthropod-vectored plant or arthropod-vectored vertebrate However thehost association of some viruses is uncertain when they havebeen isolated from vertebrates biting-arthropods or plant-sap-feeding arthropods Due to limited sampling it was not clearwhether viruses isolated from vertebrates were vertebrate spe-cific or arthropod-vectored vertebrate viruses or whether vi-ruses isolated from biting-arthropods were arthropod specificviruses or arthropod-vectored vertebrate viruses or if virusesisolated from plant-sap-feeding arthropods were arthropod-specific or arthropod-vectored plant viruses

We classified a virus from a nematode as having its ownhost category We classified three of the fish infecting dimar-habdoviruses as vertebrate specific based on the fact they canbe transmitted via immersion in water containing virus duringexperimental conditions (Bootsma Dekinkelin and Leberre1975 Dorson et al 1987 Haenen and Davidse 1993) and thewidely held belief amongst the fisheries community that theseviruses are not typically vectored (Ahne et al 2002)] Howeverthere is some evidence these viruses can be transmitted by ar-thropods (sea lice) in experiments (Pfeilputzien 1978 Ahne et al2002) and so we would recommend this be interpreted withsome caution Additionally although we classified the virusesidentified in sea-lice as having biting arthropod hosts they maybe crustacean-specific The two viruses from Lepeophtheirus sal-monis do not seem to infect the fish they parasitize and are pre-sent in all developmental stages of the lice suggesting theymay be transmitted vertically (Okland et al 2014)

We simultaneously estimated both the current and ancestralhost associations and the phylogeny of the viruses using aBayesian analysis implemented in BEAST v18 (Drummondet al 2012 Weinert et al 2012) Because meaningful branchlengths are essential for this analysis (uncertainty about branchlengths will feed into uncertainty about the estimates) we useda subset of the sites and strains used in the maximum likeli-hood (ML) analysis We retained 189 taxa all rhabdoviruses ex-cluding the divergent fish-infecting novirhabdovirus clade andthe virus from Hydra as well as the viruses from Lolium perenneand Conwentzia psociformis which had a large number of missingsites Sequences were trimmed to a conserved region of 414amino acids where data was recorded for most of these viruses(the Gblocks alignment trimmed further by eye)

We used the host-association categories described abovewhich included ambiguous states To describe amino acid evo-lution we used an LG substitution model with gamma distrib-uted rate variation across sites (Le and Gascuel 2008) and anuncorrelated lognormal relaxed clock model of rate variationamong lineages (Drummond et al 2006) To describe the

B Longdon et al | 3

evolution of the host associations we used a strict clock modeland a discrete asymmetric transition rate matrix (allowing tran-sitions to and from a host association to take place at differentrates) as previously used to model migrations between discretegeographic locations (Edwards et al 2011) and host switches(Weinert et al 2012 Faria et al 2013) We also examined how of-ten these viruses jumped between different classes of hosts us-ing reconstructed counts of biologically feasible changes of hostassociation and their HPD confidence intervals (CIs) usingMarkov Jumps (Minin and Suchard 2008) These includedswitches between arthropod-specific and both arthropod-vec-tored vertebrate and arthropod-vectored plant states and be-tween vertebrate specific and arthropod-vectored vertebratestates We used a constant population size coalescent prior forthe relative node ages (using a birth-death prior gave equivalentresults) and the BEAUti v18 default priors for all other parame-ters (Drummond et al 2012) (BEAUti xml available asSupplementary Material) In Figure 2 we have transferred theancestral state reconstruction from the BEAST tree to the MLtree

Convergence was assessed using Tracer v16 (Rambaut andDrummond 2007) and a burn-in of 30 was removed prior tothe construction of a consensus tree which included a descrip-tion of ancestral host associations in the output file High effec-tive sample sizes were achieved for all parameters (gt200)Previous simulations in the context of biogeographical infer-ence have shown that the approach is robust to sampling bias(Edwards et al 2011) However to confirm this following(Lemey et al 2014) we tested whether sample size predicts rateto or from a host association

3 Results31 Novel rhabdoviruses from RNA-seq

To search for new rhabdoviruses we collected a variety of differ-ent species of flies screened them for CO2 sensitivity which is acommon symptom of infection and sequenced total RNA ofthese flies by RNA-seq We identified rhabdovirus-like se-quences from a de-novo assembly by BLAST and used PCR toidentify which samples these sequences came from

This approach resulted in eleven rhabdovirus-like sequencesfrom nine (possibly ten) species of fly Seven of these viruseswere previously unknown and four had been reported previ-ously from shorter sequences (Supplementary Tables S1 andS2) The novel viruses were highly divergent from known vi-ruses Sigma viruses known from other species of Drosophilatypically have genomes of 125 kb (Longdon Obbard andJiggins 2010 Longdon et al 2011b) and six of our sequenceswere approximately this size suggesting they are near-complete genomes None of the viruses discovered in our RNA-seq data were integrated into the host genome (see lsquoMethodsrsquosection for details)

To investigate the putative gene content of the viruses wepredicted genes based on open reading frames (ORFs) For theviruses with apparently complete genomes (Fig 1) we foundthat those from Drosophila ananassae Daffinis Dimmigrans andDrosophila sturtvanti contained ORFs corresponding to the fivecore genes found across all rhabdoviruses with an additionalORF between the P and M genes This is the location of the Xgene found in sigma viruses and in three of the four novel vi-ruses it showed BLAST sequence similarity to the X gene ofsigma viruses The virus from Drosophila busckii did not containan additional ORF between the P and M genes but instead con-tained an ORF between the G and L gene

Using the phylogeny described below we have classified ournewly discovered viruses as either sigma viruses rhabdovi-ruses or other viruses and named them after the host speciesthey were identified from (Fig 1) (Longdon and Walker 2011)We also found one other novel mononegavirales-like sequencefrom Drosophila unispina that groups with a recently discoveredclade of arthropod associated viruses (Nyamivirus clade [Li et al2015] see Supplementary Table S5 and the full phylogeny) aswell as five other RNA viruses from various families (data notshown) confirming our approach can detect a wide range of di-vergent viruses

Putative genes are shown in color non-coding regions areshown in black ORFs were designated as the first start codonfollowing the transcription termination sequence (7 Ursquos) of theprevious ORF to the first stop codon Dotted lines representparts of the genome not sequenced These viruses were eitherfrom our own RNA-seq data or were first found in in public

Scaptodrosophila deflexa sigmavirusDrosophila sturtevanti rhabdovirus

Drosophila tristis sigmavirusDrosophila montana sigmavirus

Drosophila subobscura rhabdovirusDrosophila algonquin sigmavirus

Pararge aegeria rhabdovirusCeratitis capitata sigmavirus

Drosophila busckii rhabdovirusDrosophila sturtevanti sigmavirusDrosophila immigrans sigmavirus

Drosophila ananassae sigmavirusDrosophila affinis sigmavirus

Sequence length (nucleotides)0 5000 10000 15000

N P M G L X Accessory

Figure 1 Genome organization of newly discovered viruses from metagenomic RNA sequencing of CO2 sensitive flies

4 | Virus Evolution 2015 Vol 1 No 1

databases and key features verified by PCR and Sanger sequenc-ing Rhabdovirus genomes are typically 11ndash13-kb long andcontain five core genes 30-N-P-M-G-L-50 (Dietzgen and Kuzmin2012) However a number of groups of rhabdoviruses containadditional accessory genes and can be up to 16-kb long(Walker et al 2011 2015)

32 New rhabdoviruses from public databases

We identified a further twenty-six novel rhabdovirus-like se-quences by searching public databases of assembled RNA-seqdata with BLAST These included nineteen viruses from arthro-pods (Fleas Crustacea Lepidoptera Diptera) one from aCnidarian (Hydra) and five from plants (Supplementary TableS3) Of these viruses nineteen had sufficient amounts of codingsequence (gt1000 bp) to include in the phylogenetic analysis(Supplementary Table S3) whilst the remainder were too short(Supplementary Table S4)

Four viruses from databases had near-complete genomesbased on their size These were from the moth Triodia sylvinathe house fly Musca domestica (99 nucleotide identity toWuhan house fly virus 2 [Li et al 2015]) the butterfly Paegeriaand the medfly Ccapitata all of which contain ORFs correspond-ing to the five core rhabdovirus genes The sequence fromCcapitata had an additional ORF between the P and M geneswith BLAST sequence similarity to the X gene in sigma virusesThere were several unusual sequences First in the virus fromPaegeria there appear to be two full-length glycoprotein ORFsbetween the M and L genes (we confirmed by Sanger sequencingthat both exist and the stop codon between the two genes wasnot an error) Second the Agave tequilana transcriptome con-tained a L gene ORF on a contig that was the length of a typicalrhabdovirus genome but did not appear to contain typical genecontent suggesting it has very atypical genome organization orhas been misassembled or is integrated into its host plant ge-nome (Chiba et al 2011) Finally the virus from Hydra magnipa-pillata contained six predicted genes but the L gene (RDRP) ORFwas unusually long Some of the viruses we detected may beEVEs inserted into the host genome and subsequently ex-pressed (Aiewsakun and Katzourakis 2015) For example this islikely the case for the sequence from the silkworm Bombyx morithat we also found in the silkworm genome and the L gene se-quence from Spodoptera exigua that contains stop codons Underthe assumption that viruses integrated into host genomes onceinfected those hosts this does not affect our conclusions belowabout the host range of these viruses (Katzourakis and Gifford2010 Fort et al 2011 Ballinger Bruenn and Taylor 2012) Wealso found nine other novel mononegavirale-like sequencesthat group with recently discovered clades of insect viruses (Liet al 2015) (see Supplementary Table S5 and SupplementaryFig S4)

33 Rhabdovirus phylogeny

To reconstruct the evolution of the Rhabdoviridae we have pro-duced the most complete phylogeny of the group to date (Fig 2)We aligned the relatively conserved L gene (RNA DependantRNA Polymerase) from our newly discovered viruses with se-quences of known rhabdoviruses to give an alignment of 195rhabdoviruses (and twenty-six other mononegavirales as anoutgroup) We reconstructed the phylogeny using different se-quence alignments and methodologies and these all gave qual-itatively similar results with the same major clades beingreconstructed The ML and Bayesian relaxed clock phylogenies

were very similar 149188 nodes are found in both reconstruc-tions and only two nodes present in the Bayesian relaxed clocktree with strong support are absent from the ML tree withstrong support These are found in a single basal clade of diver-gent but uniformly arthropod-specific strains where the differ-ence in topology will have no consequence for inference of hostassociation This suggests that our analysis is robust to the as-sumptions of a relaxed molecular clock The branching orderbetween the clades in the dimarhabdovirus supergroup wasgenerally poorly supported and differed between the methodsand alignments Eight sequences that we discovered were notincluded in this analysis as they were considered too short buttheir closest BLAST hits are listed in Supplementary Table S4

We recovered all of the major clades described previously(Fig 2) and found that the majority of known rhabdoviruses be-long to the dimarhabdovirus clade (Fig 2B) The RNA-seq vi-ruses from Drosophila fall into either the sigma virus clade(Fig 2B) or the arthropod clade sister to the cyto- and nucleo-rhabdoviruses (Fig 2A) The viruses from sequence databasesare diverse coming from almost all of the major clades with theexception of the lyssaviruses

34 Predicted host associations of viruses

With a few exceptions rhabdoviruses are either arthropod-vec-tored viruses of plants or vertebrates or are vertebrate- or ar-thropod- specific In many cases the only information about avirus is the host from which it was isolated Therefore a priori itis not clear whether viruses isolated from vertebrates are verte-brate-specific or arthropod-vectored or whether viruses iso-lated from biting arthropods (eg mosquitoes sandflies ticksmidges and sea lice) are arthropod specific or also infect verte-brates Likewise it is not clear whether viruses isolated fromsap-sucking insects (all Hemiptera aphids leafhoppers scaleinsect and mealybugs) are arthropod-specific or arthropod-vec-tored plant viruses By combining data on the ambiguous andknown host associations with phylogenetic information wewere able to predict both the ancestral and present host associ-ations of these viruses To do this we used a Bayesian phyloge-netic analysis that simultaneously estimated the phylogenyand host association of our data In the analysis we defined ourhost associations either as vertebrate-specific arthropod-spe-cific arthropod-vectored vertebrate arthropod-vectored plantnematode or as ambiguous between two (and in one case allfive) of these states (see lsquoMethodsrsquo section)

This approach identified a large number of viruses that arelikely to be new arthropod-vectored vertebrate viruses (Fig 2B)Of eighty viruses with ambiguous host associations eighty-ninewere assigned a host association with strong posterior support(gt095) Of the fifty-two viruses found in biting arthropods for-ty-five were predicted to be arthropod-vectored vertebrate vi-ruses and six to be arthropod-specific Of the thirty virusesfound in vertebrates twenty-two were predicted to be arthro-pod-vectored vertebrate viruses and two were predicted to bevertebrate-specific (both fish viruses) Of the seven virusesfound in plant-sap-feeding arthropods (Fig 2A) three were pre-dicted to be plant-associated and two arthropod-associated

To test the accuracy of our predictions of current host asso-ciations we randomly selected a set of viruses with known asso-ciations re-assigned their host association as ambiguousbetween all possible states (a greater level of uncertainty thanwe generally attributed to viruses in our data) and re-ran ouranalysis We repeated this ten times for nine sets of ten virusesand one set of nine viruses (randomly sampling without

B Longdon et al | 5

Hyd

ra m

agn

ipap

illat

a T

SA

Nor

ther

n ce

real

mos

aic

viru

sP

lan

oco

ccu

s ci

tri T

SA

Bok

eloh

bat

lyss

aviru

s

Tac

heng

Tic

k V

irus

7

Ikom

a ly

ssav

irus

Wuh

an In

sect

viru

s 6

Mu

sca

do

mes

tica

TS

A

Wuh

an F

ly V

irus

3

Shu

anga

o In

sect

Viru

s 6

Per

sim

mon

viru

s A

Hira

me

rhab

dovi

rus

Shu

anga

o B

edbu

g V

irus

2

Wuh

an In

sect

viru

s 4

Jing

shan

Fly

Viru

s 2

Ker

ria

lacc

a T

SA

Wuh

an M

osqu

ito V

irus

9

Shi

mon

i bat

viru

s

Aus

tral

ian

bat l

yssa

viru

s b

Mok

ola

viru

s is

olat

e 86

100C

AM

Irku

t viru

s

Lo

tus

corn

icu

latu

s T

SA

Lago

s ba

t viru

s i8

619N

GA

Aus

tral

ian

bat l

yssa

viru

s a

Spo

dopt

era

frug

iper

da r

habd

oviru

s

Lettu

ce n

ecro

tic y

ello

ws

viru

s

Fra

nkl

inie

lla o

ccid

enta

lis T

SA

Lo

lium

per

enn

e T

SA

Mai

ze Ir

ania

n m

osai

c vi

rus

Son

chus

yel

low

net

Dro

sop

hila

stu

rtva

nti

rh

abd

ovi

rus

Mai

ze m

osai

c vi

rus

Mok

ola

viru

s 86

101R

CA

Eur

opea

n ba

t lys

savi

rus

RV

9 1

Oro

psy

lla s

ilan

tiew

i TS

A

Orc

hid

fleck

viru

s

Wes

t Cau

casi

an b

at v

irus

Lettu

ce y

ello

w m

ottle

viru

s

Taa

stru

p vi

rus

Ag

ave

teq

uila

na

TS

A

Wuh

an In

sect

viru

s 5

Far

min

gton

viru

s

Wuh

an H

ouse

Fly

Viru

s 2

Rab

ies

viru

s

Ric

e ye

llow

stu

nt v

irus

Tar

o ve

in c

hlor

osis

viru

s

Vira

l hem

orrh

agic

sep

ticem

ia v

irus

Lyss

aviru

s O

zern

oe

Fox

feca

l rha

bdov

irus

Duv

enha

ge v

irus

8613

2SA

Mai

ze fi

ne s

trea

k vi

rus

Infe

ctio

us h

aem

atop

oiet

ic n

ecro

sis

viru

s

Med

icag

o sa

tiva

TS

A

San

xia

Wat

er S

trid

er V

irus

5

Tri

od

ia s

ylvi

na

TS

A

Sha

yang

Fly

Viru

s 3

Lago

s ba

t viru

s K

E13

1

Eur

opea

n ba

t lys

savi

rus

1 89

18F

RA

Dro

sop

hila

su

bo

bsc

ura

rh

abd

ovi

rus

Khu

jand

lyss

aviru

sE

urop

ean

bat l

yssa

viru

s 2

9018

HO

L

Egg

plan

t mot

tled

dwar

f viru

s

Soy

bean

cys

t nem

atod

e vi

rus

Sna

kehe

ad r

habd

oviru

s

Wuh

an A

nt V

irus

Dro

sop

hila

bu

scki

i rh

abd

ovi

rus

Ara

van

viru

s

Hyd

ra (

Cni

daria

n)

Cer

eals

and

leaf

hopp

ers

Citr

us m

ealy

bug

Bat

s

Tic

ks

Afr

ican

Civ

ets

Aph

id o

r its

par

asito

id w

asp

Mus

cid

hous

e fly

Dip

tera

spe

cies

(C

allip

horid

and

Sac

opha

gid

flies

)

Dip

tera

and

Lep

idop

tera

Per

sim

mon

tree

Fis

h

Bed

bug

Aph

id o

r its

par

asito

id w

asp

Sar

coph

agid

fles

h fly

Sca

le in

sect

Mos

quito

es

Bat

s

Bat

s an

d hu

man

s

Mam

mal

s sp

ecie

s

Bat

s

Flo

wer

ing

plan

t

Mam

mal

s sp

ecie

s

Bat

s an

d hu

man

s

Fal

l arm

y w

orm

mot

h

Lettu

ce o

ther

dic

ot p

lant

s an

d ap

hids

Wes

tern

flow

er th

rip

Rye

gra

ss

Cer

eals

and

pla

ntho

pper

Flo

wer

ing

plan

t and

aph

id

Dro

soph

ilid

frui

t fly

Mai

ze a

nd p

lant

hopp

ers

Mam

mal

s sp

ecie

s

Bat

s

Fle

a

Flo

wer

ing

plan

t

Bat

s

Lettu

ce a

nd a

phid

Leaf

hopp

er

Flo

wer

ing

plan

t

Aph

id o

r its

par

asito

id w

asp

Bird

spe

cies

Mus

cid

hous

e fly

Mam

mal

s sp

ecie

s in

clud

ing

hum

ans

Ric

e an

d le

afho

pper

s

Tar

o

Fis

h

Hum

ans

Fox

(fe

cal s

ampl

e)

Hum

ans

and

bats

Mai

ze a

nd le

afho

pper

Fis

h

Alfa

lfa

Wat

er S

trid

er

Ora

nge

swift

mot

h

Dip

tera

spe

cies

(C

allip

horid

and

Mus

cid

flies

)

Bat

s

Mam

mal

s sp

ecie

s

Dro

soph

ilid

frui

t fly

Bat

sH

uman

s an

d ba

ts

Egg

plan

t

Nem

atod

e

Fis

h

Japa

nese

car

pent

er a

nt

Dro

soph

ilid

frui

t fly

Bat

s

PAP

VS

BA

VS

AP

A AA PBA

AP

A AP

BA

VS

VS

VS

VS

P VS

VS

A PA P PPA P VS

VS

BA

P VS

PAP

PAP

VA VS

P P VS

UH

VS

PPAAA VS

VS

A VS

VS

PA A VS

N

Ass

ocia

ted

host

s A

rthro

pod-

vect

ored

pla

nt

Arth

ropo

ds

Ver

tebr

ate

spec

ific

Fig

2B0

4

Arth

ropo

d-ve

ctor

ed v

erte

brat

e Lo

w s

uppo

rt or

om

itted

N

emat

ode

lyssaviruses

cyto- and nucleo- rhabdoviruses

novi

rhab

dovi

ruse

s

A

Figu

re2

ML

ph

ylo

gen

yo

fth

eR

habd

ovir

idae

(A

)sh

ow

sth

eba

sal

fish

-in

fect

ing

no

virh

abd

ovi

ruse

san

un

assi

gned

gro

up

of

arth

rop

od

asso

ciat

edvi

ruse

sth

ep

lan

tin

fect

ing

cyto

-an

dn

ucl

eo-r

hab

do

viru

ses

asw

ella

sth

eve

rteb

rate

spec

ific

lyss

avir

use

s(B

)sh

ow

sth

ed

imar

hab

do

viru

ssu

per

gro

up

wh

ich

isp

red

om

inan

tly

com

po

sed

of

arth

rop

od

-vec

tore

dve

rteb

rate

viru

ses

alo

ng

wit

hth

ear

thro

po

d-s

pec

ific

sigm

avi

rus

clad

eB

ran

ches

are

colo

red

base

do

nth

e

Bay

esia

nh

ost

asso

ciat

ion

reco

nst

ruct

ion

anal

ysis

Bla

ckre

pre

sen

tsta

xao

mit

ted

fro

mh

ost

-sta

tere

con

stru

ctio

no

ras

soci

atio

ns

wit

hlt

095

sup

po

rtT

he

tree

was

infe

rred

fro

mL

gen

ese

qu

ence

su

sin

gth

eG

blo

cks

alig

nm

ent

Th

e

colu

mn

so

fte

xtar

eth

evi

rus

nam

eth

eh

ost

cate

gory

use

dfo

rre

con

stru

ctio

ns

and

kno

wn

ho

sts

(fro

mle

ftto

righ

t)C

od

esfo

rth

eh

ost

cate

gori

esar

eV

Sve

rteb

rate

-sp

ecifi

cV

Va

rth

rop

od

-vec

tore

dve

rteb

rate

Aa

rth

rop

od

spec

ific

BS

biti

ng-

arth

rop

od

(am

bigu

ou

sst

ate)

Vv

erte

brat

e(a

mbi

guo

us

stat

e)A

Pp

lan

t-sa

p-f

eed

ing-

arth

rop

od

(am

bigu

ou

sst

ate)

UH

un

cert

ain

-ho

st(a

mbi

guo

us

acro

ssal

lsta

tes)

an

dN

nem

atod

eN

ames

inbo

ldan

du

nd

erli

ned

are

vi-

ruse

sd

isco

vere

din

this

stu

dy

Th

etr

eeis

roo

ted

wit

hth

eC

hu

viru

scl

ade

(ro

ot

coll

apse

d)a

sid

enti

fied

asan

ou

tgro

up

in(L

iet

al2

015)

but

we

no

teth

isgi

ves

the

sam

ere

sult

asm

idp

oin

tan

dth

em

ole

cula

rcl

ock

roo

tin

gN

od

esla

-

bell

edw

ith

qu

esti

on

mar

ks(

)re

pre

sen

tn

od

esw

ith

aLR

T(a

pp

roxi

mat

eli

keli

ho

od

rati

ote

st)

stat

isti

cal

sup

po

rtva

lues

less

than

075

Sca

leba

rsh

ow

sn

um

ber

of

amin

o-a

cid

subs

titu

tio

ns

per

site

Bay

esia

nM

CC

tree

use

dto

infe

r

ance

stra

ltra

its

issh

ow

nin

Sup

ple

men

tary

Figu

reS4

(co

nti

nu

ed)

6 | Virus Evolution 2015 Vol 1 No 1

Kern Canyon virus

Mossuril virus

Yata virus

Drosophila obscura sigma virus

Gray Lodge virus

Oak Vale virus

Perinet virus

Morreton virus

Durham virus

Grass carp rhabdovirus

Wuhan Insect virus 7

Sripur virus

Itacaiunas virus

Spodoptera exigua TSA

Siniperca chuatsi virus

Fikirini bat rhabdovirus

Koolpinyah virus

Rochambeau virus

Tench rhabdovirus

Caligus rogercresseyi 11125273 TSA

Garba virus

Nishimuro virus

Keuraliba virus

New Minto virus

Long Island tick rhabdovirus

Klamath virus

Huangpi Tick Virus 3

Wuhan House Fly Virus 1

Dolphin rhabdovirus

Culex tritaeniorhynchus rhabdovirus

Scaptodrosophila deflexa sigma virus

Spring viremia of carp virus

Tibrogargan virus

La Joya virus

Vesicular stomatitis virus Alagoas Indiana 3

Berrimah virus

Taishun Tick Virus

Bovine ephmeral fever virus

Radi virus

Conwentzia psociformis TSA

Wuhan Louse Fly Virus 10

Oita virus

y

Chandipura virus

Vesicular stomatitis virus Indiana

Yongjia Tick Virus 2

Jurona virusYug Bogdanovac virus

Vesicular stomatitis virus New Jersey

Bole Tick Virus 2

Curionopolis virus

Shayang Fly Virus 2

Ceratitis capitata sigma virus

Mount Elgon bat virus

Lepeophtheirus salmonis rhabdovirus 127

Scophthalmus maximus rhabdovirus

Wuhan Fly Virus 2

Drosophila montana sigma virus

Wuhan Louse Fly Virus 5

Hart Park virus

Arboretum virus

Ord River virus

Bas Congo virus

Santa barbara virus

Wuhan Louse Fly Virus 9

Puerto Almendras virus

Lepeophtheirus salmonis rhabdovirus 9

Landjia virus

Sena Madureira virus

Isfahan virus

Sunguru virus

Pike fry rhabdovirus

Iriri virus

Caligus rogercresseyi 11114047 TSA

Wuhan Louse Fly Virus 8

Tacheng Tick Virus 3

Coastal Plains virus

Chaco virus

Drosophila ananassae sigma virus

Bahia Grande virus

Marco virus

Almpiwar virus

Malakal virus

Wuhan Tick Virus 1

Aruac virus

Sawgrass virus

Vesicular stomatitis virus Cocal

Parry Creek virus

Drosophila melanogaster sigma virus HAP23 isolate

Niakha virus

Drosophila melanogaster sigma virus AP30 isolate

Drosophila sturtvanti sigma virus

Joinjakaka virus

Moussa virus

Nkolbisson virus

Sweetwater Branch virus

Kwatta virus

Humulus lupulus TSA

Muscina stabulans sigma virus

Drosophila immigrans sigma virus

Eel Virus European X

Vesicular stomatitis virus New Jersey Hazelhurst

Barur virus

Connecticut virus

Starry flounder rhabdovirus

Harlingen virus

Inhangapi virus

North Creek Virus

Fukuoka virus

Kamese virus

Pararge aegeria rhabdovirus

Kimberley virus

Malpais Spring virus

Mosqueiro virus

Adelaide River virus

Flanders virus

Tupaia virus

Bivens Arm virus

Wuhan Louse Fly Virus 11

Drosophila tristis sigma virus

Muir Springs virus

Manitoba virus

Carajas oncolytic virus

Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus

Beaumont virus

Maraba virus

Wongabel virus

Ngaingan virus

Xiburema virus

Bats

Mosquitoes birds and mammals including humans

Mosquitoes

Drosophilid fruit fly

Mosquitoes

Mosquitoes and swine

Mosquitoes and sandflies

Sandflies

Birds

Grass carp

Aphid or its parasitoid wasp

Sandflies

Midges

Beet army worm moth

Mandarin fish

Bats

Cattle

Mosquitoes

Tench

Sea louse

Birds

Wild boar

Rodents

Ticks

Ticks

Voles

Ticks

Muscid house fly

Dolphins and porpoise

Mosquitoes

Drosophilid fruit fly

Common carp

Midges and bovids

Mosquitoes and rodents

Mammals including humans

Cattle

Ticks

Midges mosquitoes and ruminants

Sandflies

Lacewing

Louse fly

Bats

Sandflies and mammals including humans

Mammals including humans sandflies and mosquitoes

Ticks

MosquitoesSandflies

Mammals including humans biting and non-biting diptera

Ticks

Midges and mammals

Diptera species (Muscid house fly and Calliphorid laterine fly)

Tephritid fruit fly

Bats

Sea louse

Cultured turbot

Muscid house fly

Drosophilid fruit fly

Louse fly

Mosquitoes and birds

Mosquitoes

Mosquitoes

Humans

Psychodidae drain fly

Louse fly

Mosquitoes

Sea louse

Birds

Lizards

Mosquitoes ticks sandflies mammals including humans

Domestic chickens

Northern pike

Sandflies

Sea louse

Louse fly

Ticks

Bovids

Lizards

Drosophilid fruit fly

Mosquitoes

Lizards

Lizards

Mosquitoes

Ticks

Mosquitoes and birds

Ticks

Mites mosquitoes and mammals

Mosquitoes

Drosophilid fruit fly

Sandflies

Drosophilid fruit fly

Drosophilid fruit fly

Mosquitoes and cattle

Mosquitoes

Mosquitoes and humans

Midges and cattle

Mosquitoes ticks and mammals

Hops

False stable fly

Drosophilid fruit fly

European eel

Mammals including humans biting and non-biting diptera

Ticks mosquitoes fleas and mammals

Ticks and rabbits

Starry flounder

Mosquitoes

Sandflies and rodents

Mosquitoes

Midges mosquitoes and cattle

Mosquitoes and humans

Speckled wood butterfly

Midges mosquitoes and cattle

Mosquitoes

Mosquitoes

Cattle

Mosquitoes and birds

Tree shrews

Midges and cattle

Louse fly

Drosophilid fruit fly

Mosquitoes

Mosquitoes

Sandflies

Drosophilid fruit flyDrosophilid fruit fly

Mosquitoes

Sandflies

Midges and birds

Midges cattle and macropods

Mosquitoes

V

VV

BA

A

BA

VV

BA

BA

V

V

A

BA

BA

A

VS

V

V

BA

V

BA

V

V

V

BA

BA

V

BA

A

V

BA

A

VS

VV

VV

V

BA

BA

VV

BA

BA

V

VV

VV

BA

BABA

VV

BA

VV

A

A

V

BA

V

A

A

BA

VV

BA

BA

V

A

BA

BA

BA

V

V

VV

V

VS

BA

BA

BA

BA

V

V

A

BA

V

V

BA

BA

VV

BA

VV

BA

A

BA

A

A

VV

BA

VV

VV

VV

P

A

A

V

VV

VV

VV

V

BA

VV

BA

VV

VV

A

VV

BA

BA

V

VV

V

VV

BA

A

BA

BA

BA

AA

BA

BA

VV

VV

BA

04

sigma

viruses

dimarhabdovirus supergroup

BFig 2A

Figure 2 Continued

B Longdon et al | 7

replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states

We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled

35 Ancestral host associations and host-switches

Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)

Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups

Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade

Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was

contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips

There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species

Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)

We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in

8 | Virus Evolution 2015 Vol 1 No 1

the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice

4 Discussion

Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences

In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species

We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous

observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)

There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should

Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)

Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038

Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)

B Longdon et al | 9

reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)

Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis

Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses

Data availability

Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824

Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584

L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436

Funding

BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO

Supplementary data

Supplementary data is available at Virus Evolution online

Acknowledgements

Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments

Conflict of interest None declared

ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-

tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206

Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72

Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37

Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52

Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75

Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8

Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091

Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21

10 | Virus Evolution 2015 Vol 1 No 1

Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76

Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58

Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3

Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146

Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22

Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press

Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101

Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88

mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73

Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837

Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8

Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196

Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90

Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21

Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92

Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709

Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71

Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68

Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9

Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18

Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80

Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191

LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4

Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20

Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932

Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378

Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9

Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69

Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98

mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt

mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44

mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press

mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260

mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50

mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50

mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395

mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728

Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412

Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7

Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517

Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11

Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21

Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary

B Longdon et al | 11

Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23

Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt

Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt

Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91

Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59

Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9

Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77

van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256

Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press

mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25

mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664

Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63

Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210

Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32

12 | Virus Evolution 2015 Vol 1 No 1

viruses in the order Mononegavirales (Dietzgen and Kuzmin2012) They infect an extremely broad range of hosts and havebeen discovered in plants fish mammals reptiles and a broadrange of insects and other arthropods (Bourhy et al 2005) Thefamily includes important pathogens of humans and livestockPerhaps the most well-known is rabies virus which can infect adiverse array of mammals and causes a fatal infection killing59000 people per year with an estimated economic cost of $86billion (US) (Hampson et al 2015) Other rhabdoviruses such asvesicular stomatitis virus and bovine ephemeral fever virus areimportant pathogens of domesticated animals while others arepathogens of crops (Dietzgen and Kuzmin 2012)

Arthropods play a key role in the transmission of manyrhabdoviruses Many viruses found in vertebrates have alsobeen detected in arthropods including sandflies mosquitoesticks and midges (Walker Blasdell and Joubert 2012) The rhab-doviruses that infect plants are also often transmitted by ar-thropods (Hogenhout Redinbaugh and Ammar 2003) and somethat infect fish can potentially be vectored by ectoparasitic co-pepod sea-lice (Pfeilputzien 1978 Ahne et al 2002) Moreoverinsects are biological vectors rhabdoviruses replicate upon in-fection of insect vectors (Hogenhout Redinbaugh and Ammar2003) Other rhabdoviruses are insect-specific In particular thesigma viruses are a clade of vertically transmitted viruses thatinfect dipterans and are well-studied in Drosophila (Longdonet al 2011ab Longdon and Jiggins 2012) Recently a number ofrhabdoviruses have been found to be associated with a wide ar-ray of insect and other arthropod species suggesting they maybe common arthropod viruses (Li et al 2015 Walker et al 2015)Furthermore a number of arthropod genomes contain inte-grated endogenous viral elements (EVEs) with similarity torhabdoviruses suggesting that these species have been infectedwith rhabdoviruses at some point in their history (Katzourakisand Gifford 2010 Fort et al 2011 Ballinger Bruenn and Taylor2012 Aiewsakun and Katzourakis 2015)

Here we explore the diversity of the rhabdoviruses andexamine how they have switched between different host taxaduring their evolutionary history Insects infected with rhabdo-viruses commonly become paralysed on exposure to CO2 (Rosen1980 Shroyer and Rosen 1983 Longdon Wilfert and Jiggins2012) We exploited this fact to screen field collections of fliesfrom several continents for novel rhabdoviruses that were thensequenced using metagenomic RNA-sequencing (RNA-seq)Additionally we searched for rhabdovirus-like sequences inpublicly available RNA-seq data We identified thirty-two novelrhabdovirus-like sequences from a wide array of invertebratesand plants and combined them with recently discovered vi-ruses to produce the most comprehensive phylogeny of therhabdoviruses to date For many of the viruses we do not knowtheir true host range so we used the phylogeny to identify alarge number of new likely vector-borne viruses and to recon-struct the evolutionary history of this diverse group of viruses

2 Methods21 Discovery of new rhabdoviruses by RNA sequencing

Diptera (flies mostly Drosophilidae) were collected in the fieldfrom Spain USA Kenya France Ghana and the UK(Supplementary Data S1) Infection with rhabdoviruses cancause Drosophila and other insects to become paralysed after ex-posure to CO2 (Rosen 1980 Shroyer and Rosen 1983 LongdonWilfert and Jiggins 2012) so we enriched our sample for in-fected individuals by exposing them to CO2 at 12C for 15 min

only retaining individuals that showed symptoms of paralysis30 min later We extracted RNA from seventy-nine individualinsects (details in Supplementary Data S1) using Trizol reagent(Invitrogen) and combined the extracts into two pools (retainingnon-pooled individual RNA samples) RNA was then rRNA de-pleted with the Ribo-Zero Gold kit (epicenter USA) and used toconstruct Truseq total RNA libraries (Illumina) Libraries wereconstructed and sequenced by BGI (Hong Kong) on an IlluminaHi-Seq 2500 (one lane 100-bp paired end reads generating 175million reads) Sequences were quality trimmed withTrimmomatic (v3) Illumina adapters were clipped bases wereremoved from the beginning and end of reads if quality droppedbelow a threshold sequences were trimmed if the average qual-ity within a window fell below a threshold and reads lt20 bp inlength were removed We de novo assembled the RNA-seq readswith Trinity (release 25 February 2013) using default settingsand jaccard clip option for high gene density The assembly wasthen searched using tblastn to identify rhabdovirus-like se-quences with known rhabdovirus coding sequences as thequery Any contigs with high sequence similarity to rhabdovi-ruses were then reciprocally compared to GenBank cDNA andRefSeq nucleotide databases using tblastn and only retained ifthey most closely matched a virus-like sequence Raw read datawere deposited in the NCBI Sequence Read Archive (SRP057824)Putative viral sequences have been submitted to GenBank (ac-cession numbers in Supplementary Tables S1 and S2)

As the RNA-seq was performed on pooled samples we as-signed rhabdovirus sequences to individual insects by PCR onRNA from individual samples cDNA was produced usingPromega GoScript Reverse Transcriptase and random-hexamerprimers and PCR performed using primers designed from therhabdovirus sequences Infected host species were identified bysequencing the mitochondrial gene COI We were unable toidentify the host species of the virus from a Drosophila affinissub-group species (sequences appear similar to both Daffinisand the closely related Drosophila athabasca) despite the addi-tion of further mitochondrial and nuclear sequences to increaseconfidence In all cases we confirmed that viruses were onlypresent in cDNA and not in non-reverse-transcription (RT) con-trols (ie DNA) by PCR and so they cannot be integrated intothe insect genome (ie endogenous virus elements or EVEs[Katzourakis and Gifford 2010]) COI primers were used as a pos-itive control for the presence of DNA in the non-RT template

We identified sigma virus sequences in RNA-seq data fromDrosophila montana (Parker et al 2015) We used RT-PCR on an in-fected fly line to amplify the virus sequence and carried out ad-ditional Sanger sequencing with primers designed using theRNA-seq assembly Additional virus sequences were identifiedfrom an RNA-seq analysis of pools of wild caught DrosophilaDImmSV from Drosophila immigrans (collection and sequencingdescribed [van Mierlo et al 2014]) DTriSV from a pool ofDrosophila tristis and SDefSV from Scaptodrosophila deflexa (bothDarren Obbard unpublished data) GenBank accession numbersfor new virus sequences are (KR822817 KR822816 KR822823KR822813 KR822820 KR822821 KR822822 KR822815 KR822824KR822812 KR822811 KR822814 and KR822818) A full list of ac-cessions can be found in Supplementary Tables S1 and S2

22 Discovery of rhabdoviruses in public sequencedatabases

Rhabdovirus L gene sequences were used as queries to search(tblastn) expressed sequence tag and transcriptome shotgun as-sembly databases (NCBI) All sequences were reciprocally

2 | Virus Evolution 2015 Vol 1 No 1

BLAST searched against GenBank cDNA and RefSeq databasesand only retained if they matched a virus-like sequence Weused two approaches to examine whether sequences were pre-sent as RNA but not DNA First where assemblies of whole-ge-nome shotgun sequences were available we used BLAST to testwhether sequences were integrated into the host genomeSecond for the virus sequences in the butterfly Pararge aegeriaand the medfly Ceratitis capitata we were able to obtain infectedsamples to confirm whether sequences are only present in RNAby performing PCR on both genomic DNA and cDNA as de-scribed above (samples kindly provided by Casper BreukerMelanie Gibbs and Philip Leftwich respectively)

23 Phylogenetic analysis

All available rhabdovirus-like sequences were downloadedfrom GenBank (accessions in Supplementary Data S2) Aminoacid sequences for the L gene (encoding the RNA DependentRNA Polymerase or RDRP) were used to infer the phylogeny(L gene sequences) as they contain conserved domains that canbe aligned across this diverse group of viruses Sequences werealigned with MAFFT (Katoh and Standley 2013) under defaultsettings and then poorly aligned and divergent sites were re-moved with either TrimAl (v13 strict settings implemented onPhylemon v20 server alignment) (Capella-Gutierrez Silla-Martinez and Gabaldon 2009) or Gblocks (v091b selectingsmaller final blocks allowing gap positions and less strict flank-ing positions to produce a less stringent selection alignment)(Talavera and Castresana 2007) These resulted in alignments of1492 and 829 amino acids respectively

Phylogenetic trees were inferred using Maximum Likelihoodin PhyML (v30) (Guindon et al 2010) using the LG substitutionmodel (Le and Gascuel 2008) (preliminary analysis confirmedthe results were robust to the amino acid substitution model se-lected) with a gamma distribution of rate variation with fourcategories and a sub-tree pruning and regrafting topologysearching algorithm Branch support was estimated usingApproximate Likelihood-Ratio Tests (aLRT) that are reported tooutperform bootstrap methods (Anisimova and Gascuel 2006)Figures were created using FIGTREE (v 14) (Rambaut 2011)

24 Analysis of phylogenetic structure between virusestaken from different hosts and ecologies

We measured the degree of phylogenetic structure between vi-rus sequences identified in different categories of host (arthro-pods vertebrates and plants) and ecosystems (terrestrial andaquatic) Following Bhatia et al (2013) we measured the degreeof genetic structure between virus sequences from differentgroups of hostsecosystems using Hudsonrsquos Fst estimator(Hudson Slatkin and Maddison 1992) as in Bhatia et al (2013)We calculated Fst as 1 the mean number of differences be-tween sequences within or between populations where a popu-lation is a host category or ecosystem The significance of thisvalue was tested by comparison with 1000 replicates with hostcategories randomly permuted over sequences We also mea-sured the clustering of these categories over our phylogeny us-ing the genealogical sorting index (GSI) a measure of the degreeof exclusive ancestry of a group on a rooted genealogy(Cummings Neel and Shaw 2008) for each of our host associa-tion categories The index was estimated using thegenealogicalSorting R package (Bazinet Myers and Khatavkar2009) with significance estimated by permutation The tree waspruned to remove strains that could not be assigned to one of

the host association categories under consideration Finallysince arthropods are the most sampled host we tested for evi-dence of genetic structure within the arthropod-associated vi-ruses that would suggest co-divergence with their hosts orpreferential host-switching between closely related hosts Wecalculated the Pearson correlation coefficient of the evolution-ary distances between viruses and the evolutionary distancesbetween their hosts and tested for significance by permutation(as in Hommola et al [2009]) We used the patristic distances ofour ML tree for the virus data and a time-tree of arthropod gen-era using published estimates of divergence dates (Jeyaprakashand Hoy 2009 Misof et al 2014)

25 Reconstruction of host associations

Viruses were categorized as having one of four types of host as-sociation arthropod-specific vertebrate-specific arthropod-vectored plant or arthropod-vectored vertebrate However thehost association of some viruses is uncertain when they havebeen isolated from vertebrates biting-arthropods or plant-sap-feeding arthropods Due to limited sampling it was not clearwhether viruses isolated from vertebrates were vertebrate spe-cific or arthropod-vectored vertebrate viruses or whether vi-ruses isolated from biting-arthropods were arthropod specificviruses or arthropod-vectored vertebrate viruses or if virusesisolated from plant-sap-feeding arthropods were arthropod-specific or arthropod-vectored plant viruses

We classified a virus from a nematode as having its ownhost category We classified three of the fish infecting dimar-habdoviruses as vertebrate specific based on the fact they canbe transmitted via immersion in water containing virus duringexperimental conditions (Bootsma Dekinkelin and Leberre1975 Dorson et al 1987 Haenen and Davidse 1993) and thewidely held belief amongst the fisheries community that theseviruses are not typically vectored (Ahne et al 2002)] Howeverthere is some evidence these viruses can be transmitted by ar-thropods (sea lice) in experiments (Pfeilputzien 1978 Ahne et al2002) and so we would recommend this be interpreted withsome caution Additionally although we classified the virusesidentified in sea-lice as having biting arthropod hosts they maybe crustacean-specific The two viruses from Lepeophtheirus sal-monis do not seem to infect the fish they parasitize and are pre-sent in all developmental stages of the lice suggesting theymay be transmitted vertically (Okland et al 2014)

We simultaneously estimated both the current and ancestralhost associations and the phylogeny of the viruses using aBayesian analysis implemented in BEAST v18 (Drummondet al 2012 Weinert et al 2012) Because meaningful branchlengths are essential for this analysis (uncertainty about branchlengths will feed into uncertainty about the estimates) we useda subset of the sites and strains used in the maximum likeli-hood (ML) analysis We retained 189 taxa all rhabdoviruses ex-cluding the divergent fish-infecting novirhabdovirus clade andthe virus from Hydra as well as the viruses from Lolium perenneand Conwentzia psociformis which had a large number of missingsites Sequences were trimmed to a conserved region of 414amino acids where data was recorded for most of these viruses(the Gblocks alignment trimmed further by eye)

We used the host-association categories described abovewhich included ambiguous states To describe amino acid evo-lution we used an LG substitution model with gamma distrib-uted rate variation across sites (Le and Gascuel 2008) and anuncorrelated lognormal relaxed clock model of rate variationamong lineages (Drummond et al 2006) To describe the

B Longdon et al | 3

evolution of the host associations we used a strict clock modeland a discrete asymmetric transition rate matrix (allowing tran-sitions to and from a host association to take place at differentrates) as previously used to model migrations between discretegeographic locations (Edwards et al 2011) and host switches(Weinert et al 2012 Faria et al 2013) We also examined how of-ten these viruses jumped between different classes of hosts us-ing reconstructed counts of biologically feasible changes of hostassociation and their HPD confidence intervals (CIs) usingMarkov Jumps (Minin and Suchard 2008) These includedswitches between arthropod-specific and both arthropod-vec-tored vertebrate and arthropod-vectored plant states and be-tween vertebrate specific and arthropod-vectored vertebratestates We used a constant population size coalescent prior forthe relative node ages (using a birth-death prior gave equivalentresults) and the BEAUti v18 default priors for all other parame-ters (Drummond et al 2012) (BEAUti xml available asSupplementary Material) In Figure 2 we have transferred theancestral state reconstruction from the BEAST tree to the MLtree

Convergence was assessed using Tracer v16 (Rambaut andDrummond 2007) and a burn-in of 30 was removed prior tothe construction of a consensus tree which included a descrip-tion of ancestral host associations in the output file High effec-tive sample sizes were achieved for all parameters (gt200)Previous simulations in the context of biogeographical infer-ence have shown that the approach is robust to sampling bias(Edwards et al 2011) However to confirm this following(Lemey et al 2014) we tested whether sample size predicts rateto or from a host association

3 Results31 Novel rhabdoviruses from RNA-seq

To search for new rhabdoviruses we collected a variety of differ-ent species of flies screened them for CO2 sensitivity which is acommon symptom of infection and sequenced total RNA ofthese flies by RNA-seq We identified rhabdovirus-like se-quences from a de-novo assembly by BLAST and used PCR toidentify which samples these sequences came from

This approach resulted in eleven rhabdovirus-like sequencesfrom nine (possibly ten) species of fly Seven of these viruseswere previously unknown and four had been reported previ-ously from shorter sequences (Supplementary Tables S1 andS2) The novel viruses were highly divergent from known vi-ruses Sigma viruses known from other species of Drosophilatypically have genomes of 125 kb (Longdon Obbard andJiggins 2010 Longdon et al 2011b) and six of our sequenceswere approximately this size suggesting they are near-complete genomes None of the viruses discovered in our RNA-seq data were integrated into the host genome (see lsquoMethodsrsquosection for details)

To investigate the putative gene content of the viruses wepredicted genes based on open reading frames (ORFs) For theviruses with apparently complete genomes (Fig 1) we foundthat those from Drosophila ananassae Daffinis Dimmigrans andDrosophila sturtvanti contained ORFs corresponding to the fivecore genes found across all rhabdoviruses with an additionalORF between the P and M genes This is the location of the Xgene found in sigma viruses and in three of the four novel vi-ruses it showed BLAST sequence similarity to the X gene ofsigma viruses The virus from Drosophila busckii did not containan additional ORF between the P and M genes but instead con-tained an ORF between the G and L gene

Using the phylogeny described below we have classified ournewly discovered viruses as either sigma viruses rhabdovi-ruses or other viruses and named them after the host speciesthey were identified from (Fig 1) (Longdon and Walker 2011)We also found one other novel mononegavirales-like sequencefrom Drosophila unispina that groups with a recently discoveredclade of arthropod associated viruses (Nyamivirus clade [Li et al2015] see Supplementary Table S5 and the full phylogeny) aswell as five other RNA viruses from various families (data notshown) confirming our approach can detect a wide range of di-vergent viruses

Putative genes are shown in color non-coding regions areshown in black ORFs were designated as the first start codonfollowing the transcription termination sequence (7 Ursquos) of theprevious ORF to the first stop codon Dotted lines representparts of the genome not sequenced These viruses were eitherfrom our own RNA-seq data or were first found in in public

Scaptodrosophila deflexa sigmavirusDrosophila sturtevanti rhabdovirus

Drosophila tristis sigmavirusDrosophila montana sigmavirus

Drosophila subobscura rhabdovirusDrosophila algonquin sigmavirus

Pararge aegeria rhabdovirusCeratitis capitata sigmavirus

Drosophila busckii rhabdovirusDrosophila sturtevanti sigmavirusDrosophila immigrans sigmavirus

Drosophila ananassae sigmavirusDrosophila affinis sigmavirus

Sequence length (nucleotides)0 5000 10000 15000

N P M G L X Accessory

Figure 1 Genome organization of newly discovered viruses from metagenomic RNA sequencing of CO2 sensitive flies

4 | Virus Evolution 2015 Vol 1 No 1

databases and key features verified by PCR and Sanger sequenc-ing Rhabdovirus genomes are typically 11ndash13-kb long andcontain five core genes 30-N-P-M-G-L-50 (Dietzgen and Kuzmin2012) However a number of groups of rhabdoviruses containadditional accessory genes and can be up to 16-kb long(Walker et al 2011 2015)

32 New rhabdoviruses from public databases

We identified a further twenty-six novel rhabdovirus-like se-quences by searching public databases of assembled RNA-seqdata with BLAST These included nineteen viruses from arthro-pods (Fleas Crustacea Lepidoptera Diptera) one from aCnidarian (Hydra) and five from plants (Supplementary TableS3) Of these viruses nineteen had sufficient amounts of codingsequence (gt1000 bp) to include in the phylogenetic analysis(Supplementary Table S3) whilst the remainder were too short(Supplementary Table S4)

Four viruses from databases had near-complete genomesbased on their size These were from the moth Triodia sylvinathe house fly Musca domestica (99 nucleotide identity toWuhan house fly virus 2 [Li et al 2015]) the butterfly Paegeriaand the medfly Ccapitata all of which contain ORFs correspond-ing to the five core rhabdovirus genes The sequence fromCcapitata had an additional ORF between the P and M geneswith BLAST sequence similarity to the X gene in sigma virusesThere were several unusual sequences First in the virus fromPaegeria there appear to be two full-length glycoprotein ORFsbetween the M and L genes (we confirmed by Sanger sequencingthat both exist and the stop codon between the two genes wasnot an error) Second the Agave tequilana transcriptome con-tained a L gene ORF on a contig that was the length of a typicalrhabdovirus genome but did not appear to contain typical genecontent suggesting it has very atypical genome organization orhas been misassembled or is integrated into its host plant ge-nome (Chiba et al 2011) Finally the virus from Hydra magnipa-pillata contained six predicted genes but the L gene (RDRP) ORFwas unusually long Some of the viruses we detected may beEVEs inserted into the host genome and subsequently ex-pressed (Aiewsakun and Katzourakis 2015) For example this islikely the case for the sequence from the silkworm Bombyx morithat we also found in the silkworm genome and the L gene se-quence from Spodoptera exigua that contains stop codons Underthe assumption that viruses integrated into host genomes onceinfected those hosts this does not affect our conclusions belowabout the host range of these viruses (Katzourakis and Gifford2010 Fort et al 2011 Ballinger Bruenn and Taylor 2012) Wealso found nine other novel mononegavirale-like sequencesthat group with recently discovered clades of insect viruses (Liet al 2015) (see Supplementary Table S5 and SupplementaryFig S4)

33 Rhabdovirus phylogeny

To reconstruct the evolution of the Rhabdoviridae we have pro-duced the most complete phylogeny of the group to date (Fig 2)We aligned the relatively conserved L gene (RNA DependantRNA Polymerase) from our newly discovered viruses with se-quences of known rhabdoviruses to give an alignment of 195rhabdoviruses (and twenty-six other mononegavirales as anoutgroup) We reconstructed the phylogeny using different se-quence alignments and methodologies and these all gave qual-itatively similar results with the same major clades beingreconstructed The ML and Bayesian relaxed clock phylogenies

were very similar 149188 nodes are found in both reconstruc-tions and only two nodes present in the Bayesian relaxed clocktree with strong support are absent from the ML tree withstrong support These are found in a single basal clade of diver-gent but uniformly arthropod-specific strains where the differ-ence in topology will have no consequence for inference of hostassociation This suggests that our analysis is robust to the as-sumptions of a relaxed molecular clock The branching orderbetween the clades in the dimarhabdovirus supergroup wasgenerally poorly supported and differed between the methodsand alignments Eight sequences that we discovered were notincluded in this analysis as they were considered too short buttheir closest BLAST hits are listed in Supplementary Table S4

We recovered all of the major clades described previously(Fig 2) and found that the majority of known rhabdoviruses be-long to the dimarhabdovirus clade (Fig 2B) The RNA-seq vi-ruses from Drosophila fall into either the sigma virus clade(Fig 2B) or the arthropod clade sister to the cyto- and nucleo-rhabdoviruses (Fig 2A) The viruses from sequence databasesare diverse coming from almost all of the major clades with theexception of the lyssaviruses

34 Predicted host associations of viruses

With a few exceptions rhabdoviruses are either arthropod-vec-tored viruses of plants or vertebrates or are vertebrate- or ar-thropod- specific In many cases the only information about avirus is the host from which it was isolated Therefore a priori itis not clear whether viruses isolated from vertebrates are verte-brate-specific or arthropod-vectored or whether viruses iso-lated from biting arthropods (eg mosquitoes sandflies ticksmidges and sea lice) are arthropod specific or also infect verte-brates Likewise it is not clear whether viruses isolated fromsap-sucking insects (all Hemiptera aphids leafhoppers scaleinsect and mealybugs) are arthropod-specific or arthropod-vec-tored plant viruses By combining data on the ambiguous andknown host associations with phylogenetic information wewere able to predict both the ancestral and present host associ-ations of these viruses To do this we used a Bayesian phyloge-netic analysis that simultaneously estimated the phylogenyand host association of our data In the analysis we defined ourhost associations either as vertebrate-specific arthropod-spe-cific arthropod-vectored vertebrate arthropod-vectored plantnematode or as ambiguous between two (and in one case allfive) of these states (see lsquoMethodsrsquo section)

This approach identified a large number of viruses that arelikely to be new arthropod-vectored vertebrate viruses (Fig 2B)Of eighty viruses with ambiguous host associations eighty-ninewere assigned a host association with strong posterior support(gt095) Of the fifty-two viruses found in biting arthropods for-ty-five were predicted to be arthropod-vectored vertebrate vi-ruses and six to be arthropod-specific Of the thirty virusesfound in vertebrates twenty-two were predicted to be arthro-pod-vectored vertebrate viruses and two were predicted to bevertebrate-specific (both fish viruses) Of the seven virusesfound in plant-sap-feeding arthropods (Fig 2A) three were pre-dicted to be plant-associated and two arthropod-associated

To test the accuracy of our predictions of current host asso-ciations we randomly selected a set of viruses with known asso-ciations re-assigned their host association as ambiguousbetween all possible states (a greater level of uncertainty thanwe generally attributed to viruses in our data) and re-ran ouranalysis We repeated this ten times for nine sets of ten virusesand one set of nine viruses (randomly sampling without

B Longdon et al | 5

Hyd

ra m

agn

ipap

illat

a T

SA

Nor

ther

n ce

real

mos

aic

viru

sP

lan

oco

ccu

s ci

tri T

SA

Bok

eloh

bat

lyss

aviru

s

Tac

heng

Tic

k V

irus

7

Ikom

a ly

ssav

irus

Wuh

an In

sect

viru

s 6

Mu

sca

do

mes

tica

TS

A

Wuh

an F

ly V

irus

3

Shu

anga

o In

sect

Viru

s 6

Per

sim

mon

viru

s A

Hira

me

rhab

dovi

rus

Shu

anga

o B

edbu

g V

irus

2

Wuh

an In

sect

viru

s 4

Jing

shan

Fly

Viru

s 2

Ker

ria

lacc

a T

SA

Wuh

an M

osqu

ito V

irus

9

Shi

mon

i bat

viru

s

Aus

tral

ian

bat l

yssa

viru

s b

Mok

ola

viru

s is

olat

e 86

100C

AM

Irku

t viru

s

Lo

tus

corn

icu

latu

s T

SA

Lago

s ba

t viru

s i8

619N

GA

Aus

tral

ian

bat l

yssa

viru

s a

Spo

dopt

era

frug

iper

da r

habd

oviru

s

Lettu

ce n

ecro

tic y

ello

ws

viru

s

Fra

nkl

inie

lla o

ccid

enta

lis T

SA

Lo

lium

per

enn

e T

SA

Mai

ze Ir

ania

n m

osai

c vi

rus

Son

chus

yel

low

net

Dro

sop

hila

stu

rtva

nti

rh

abd

ovi

rus

Mai

ze m

osai

c vi

rus

Mok

ola

viru

s 86

101R

CA

Eur

opea

n ba

t lys

savi

rus

RV

9 1

Oro

psy

lla s

ilan

tiew

i TS

A

Orc

hid

fleck

viru

s

Wes

t Cau

casi

an b

at v

irus

Lettu

ce y

ello

w m

ottle

viru

s

Taa

stru

p vi

rus

Ag

ave

teq

uila

na

TS

A

Wuh

an In

sect

viru

s 5

Far

min

gton

viru

s

Wuh

an H

ouse

Fly

Viru

s 2

Rab

ies

viru

s

Ric

e ye

llow

stu

nt v

irus

Tar

o ve

in c

hlor

osis

viru

s

Vira

l hem

orrh

agic

sep

ticem

ia v

irus

Lyss

aviru

s O

zern

oe

Fox

feca

l rha

bdov

irus

Duv

enha

ge v

irus

8613

2SA

Mai

ze fi

ne s

trea

k vi

rus

Infe

ctio

us h

aem

atop

oiet

ic n

ecro

sis

viru

s

Med

icag

o sa

tiva

TS

A

San

xia

Wat

er S

trid

er V

irus

5

Tri

od

ia s

ylvi

na

TS

A

Sha

yang

Fly

Viru

s 3

Lago

s ba

t viru

s K

E13

1

Eur

opea

n ba

t lys

savi

rus

1 89

18F

RA

Dro

sop

hila

su

bo

bsc

ura

rh

abd

ovi

rus

Khu

jand

lyss

aviru

sE

urop

ean

bat l

yssa

viru

s 2

9018

HO

L

Egg

plan

t mot

tled

dwar

f viru

s

Soy

bean

cys

t nem

atod

e vi

rus

Sna

kehe

ad r

habd

oviru

s

Wuh

an A

nt V

irus

Dro

sop

hila

bu

scki

i rh

abd

ovi

rus

Ara

van

viru

s

Hyd

ra (

Cni

daria

n)

Cer

eals

and

leaf

hopp

ers

Citr

us m

ealy

bug

Bat

s

Tic

ks

Afr

ican

Civ

ets

Aph

id o

r its

par

asito

id w

asp

Mus

cid

hous

e fly

Dip

tera

spe

cies

(C

allip

horid

and

Sac

opha

gid

flies

)

Dip

tera

and

Lep

idop

tera

Per

sim

mon

tree

Fis

h

Bed

bug

Aph

id o

r its

par

asito

id w

asp

Sar

coph

agid

fles

h fly

Sca

le in

sect

Mos

quito

es

Bat

s

Bat

s an

d hu

man

s

Mam

mal

s sp

ecie

s

Bat

s

Flo

wer

ing

plan

t

Mam

mal

s sp

ecie

s

Bat

s an

d hu

man

s

Fal

l arm

y w

orm

mot

h

Lettu

ce o

ther

dic

ot p

lant

s an

d ap

hids

Wes

tern

flow

er th

rip

Rye

gra

ss

Cer

eals

and

pla

ntho

pper

Flo

wer

ing

plan

t and

aph

id

Dro

soph

ilid

frui

t fly

Mai

ze a

nd p

lant

hopp

ers

Mam

mal

s sp

ecie

s

Bat

s

Fle

a

Flo

wer

ing

plan

t

Bat

s

Lettu

ce a

nd a

phid

Leaf

hopp

er

Flo

wer

ing

plan

t

Aph

id o

r its

par

asito

id w

asp

Bird

spe

cies

Mus

cid

hous

e fly

Mam

mal

s sp

ecie

s in

clud

ing

hum

ans

Ric

e an

d le

afho

pper

s

Tar

o

Fis

h

Hum

ans

Fox

(fe

cal s

ampl

e)

Hum

ans

and

bats

Mai

ze a

nd le

afho

pper

Fis

h

Alfa

lfa

Wat

er S

trid

er

Ora

nge

swift

mot

h

Dip

tera

spe

cies

(C

allip

horid

and

Mus

cid

flies

)

Bat

s

Mam

mal

s sp

ecie

s

Dro

soph

ilid

frui

t fly

Bat

sH

uman

s an

d ba

ts

Egg

plan

t

Nem

atod

e

Fis

h

Japa

nese

car

pent

er a

nt

Dro

soph

ilid

frui

t fly

Bat

s

PAP

VS

BA

VS

AP

A AA PBA

AP

A AP

BA

VS

VS

VS

VS

P VS

VS

A PA P PPA P VS

VS

BA

P VS

PAP

PAP

VA VS

P P VS

UH

VS

PPAAA VS

VS

A VS

VS

PA A VS

N

Ass

ocia

ted

host

s A

rthro

pod-

vect

ored

pla

nt

Arth

ropo

ds

Ver

tebr

ate

spec

ific

Fig

2B0

4

Arth

ropo

d-ve

ctor

ed v

erte

brat

e Lo

w s

uppo

rt or

om

itted

N

emat

ode

lyssaviruses

cyto- and nucleo- rhabdoviruses

novi

rhab

dovi

ruse

s

A

Figu

re2

ML

ph

ylo

gen

yo

fth

eR

habd

ovir

idae

(A

)sh

ow

sth

eba

sal

fish

-in

fect

ing

no

virh

abd

ovi

ruse

san

un

assi

gned

gro

up

of

arth

rop

od

asso

ciat

edvi

ruse

sth

ep

lan

tin

fect

ing

cyto

-an

dn

ucl

eo-r

hab

do

viru

ses

asw

ella

sth

eve

rteb

rate

spec

ific

lyss

avir

use

s(B

)sh

ow

sth

ed

imar

hab

do

viru

ssu

per

gro

up

wh

ich

isp

red

om

inan

tly

com

po

sed

of

arth

rop

od

-vec

tore

dve

rteb

rate

viru

ses

alo

ng

wit

hth

ear

thro

po

d-s

pec

ific

sigm

avi

rus

clad

eB

ran

ches

are

colo

red

base

do

nth

e

Bay

esia

nh

ost

asso

ciat

ion

reco

nst

ruct

ion

anal

ysis

Bla

ckre

pre

sen

tsta

xao

mit

ted

fro

mh

ost

-sta

tere

con

stru

ctio

no

ras

soci

atio

ns

wit

hlt

095

sup

po

rtT

he

tree

was

infe

rred

fro

mL

gen

ese

qu

ence

su

sin

gth

eG

blo

cks

alig

nm

ent

Th

e

colu

mn

so

fte

xtar

eth

evi

rus

nam

eth

eh

ost

cate

gory

use

dfo

rre

con

stru

ctio

ns

and

kno

wn

ho

sts

(fro

mle

ftto

righ

t)C

od

esfo

rth

eh

ost

cate

gori

esar

eV

Sve

rteb

rate

-sp

ecifi

cV

Va

rth

rop

od

-vec

tore

dve

rteb

rate

Aa

rth

rop

od

spec

ific

BS

biti

ng-

arth

rop

od

(am

bigu

ou

sst

ate)

Vv

erte

brat

e(a

mbi

guo

us

stat

e)A

Pp

lan

t-sa

p-f

eed

ing-

arth

rop

od

(am

bigu

ou

sst

ate)

UH

un

cert

ain

-ho

st(a

mbi

guo

us

acro

ssal

lsta

tes)

an

dN

nem

atod

eN

ames

inbo

ldan

du

nd

erli

ned

are

vi-

ruse

sd

isco

vere

din

this

stu

dy

Th

etr

eeis

roo

ted

wit

hth

eC

hu

viru

scl

ade

(ro

ot

coll

apse

d)a

sid

enti

fied

asan

ou

tgro

up

in(L

iet

al2

015)

but

we

no

teth

isgi

ves

the

sam

ere

sult

asm

idp

oin

tan

dth

em

ole

cula

rcl

ock

roo

tin

gN

od

esla

-

bell

edw

ith

qu

esti

on

mar

ks(

)re

pre

sen

tn

od

esw

ith

aLR

T(a

pp

roxi

mat

eli

keli

ho

od

rati

ote

st)

stat

isti

cal

sup

po

rtva

lues

less

than

075

Sca

leba

rsh

ow

sn

um

ber

of

amin

o-a

cid

subs

titu

tio

ns

per

site

Bay

esia

nM

CC

tree

use

dto

infe

r

ance

stra

ltra

its

issh

ow

nin

Sup

ple

men

tary

Figu

reS4

(co

nti

nu

ed)

6 | Virus Evolution 2015 Vol 1 No 1

Kern Canyon virus

Mossuril virus

Yata virus

Drosophila obscura sigma virus

Gray Lodge virus

Oak Vale virus

Perinet virus

Morreton virus

Durham virus

Grass carp rhabdovirus

Wuhan Insect virus 7

Sripur virus

Itacaiunas virus

Spodoptera exigua TSA

Siniperca chuatsi virus

Fikirini bat rhabdovirus

Koolpinyah virus

Rochambeau virus

Tench rhabdovirus

Caligus rogercresseyi 11125273 TSA

Garba virus

Nishimuro virus

Keuraliba virus

New Minto virus

Long Island tick rhabdovirus

Klamath virus

Huangpi Tick Virus 3

Wuhan House Fly Virus 1

Dolphin rhabdovirus

Culex tritaeniorhynchus rhabdovirus

Scaptodrosophila deflexa sigma virus

Spring viremia of carp virus

Tibrogargan virus

La Joya virus

Vesicular stomatitis virus Alagoas Indiana 3

Berrimah virus

Taishun Tick Virus

Bovine ephmeral fever virus

Radi virus

Conwentzia psociformis TSA

Wuhan Louse Fly Virus 10

Oita virus

y

Chandipura virus

Vesicular stomatitis virus Indiana

Yongjia Tick Virus 2

Jurona virusYug Bogdanovac virus

Vesicular stomatitis virus New Jersey

Bole Tick Virus 2

Curionopolis virus

Shayang Fly Virus 2

Ceratitis capitata sigma virus

Mount Elgon bat virus

Lepeophtheirus salmonis rhabdovirus 127

Scophthalmus maximus rhabdovirus

Wuhan Fly Virus 2

Drosophila montana sigma virus

Wuhan Louse Fly Virus 5

Hart Park virus

Arboretum virus

Ord River virus

Bas Congo virus

Santa barbara virus

Wuhan Louse Fly Virus 9

Puerto Almendras virus

Lepeophtheirus salmonis rhabdovirus 9

Landjia virus

Sena Madureira virus

Isfahan virus

Sunguru virus

Pike fry rhabdovirus

Iriri virus

Caligus rogercresseyi 11114047 TSA

Wuhan Louse Fly Virus 8

Tacheng Tick Virus 3

Coastal Plains virus

Chaco virus

Drosophila ananassae sigma virus

Bahia Grande virus

Marco virus

Almpiwar virus

Malakal virus

Wuhan Tick Virus 1

Aruac virus

Sawgrass virus

Vesicular stomatitis virus Cocal

Parry Creek virus

Drosophila melanogaster sigma virus HAP23 isolate

Niakha virus

Drosophila melanogaster sigma virus AP30 isolate

Drosophila sturtvanti sigma virus

Joinjakaka virus

Moussa virus

Nkolbisson virus

Sweetwater Branch virus

Kwatta virus

Humulus lupulus TSA

Muscina stabulans sigma virus

Drosophila immigrans sigma virus

Eel Virus European X

Vesicular stomatitis virus New Jersey Hazelhurst

Barur virus

Connecticut virus

Starry flounder rhabdovirus

Harlingen virus

Inhangapi virus

North Creek Virus

Fukuoka virus

Kamese virus

Pararge aegeria rhabdovirus

Kimberley virus

Malpais Spring virus

Mosqueiro virus

Adelaide River virus

Flanders virus

Tupaia virus

Bivens Arm virus

Wuhan Louse Fly Virus 11

Drosophila tristis sigma virus

Muir Springs virus

Manitoba virus

Carajas oncolytic virus

Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus

Beaumont virus

Maraba virus

Wongabel virus

Ngaingan virus

Xiburema virus

Bats

Mosquitoes birds and mammals including humans

Mosquitoes

Drosophilid fruit fly

Mosquitoes

Mosquitoes and swine

Mosquitoes and sandflies

Sandflies

Birds

Grass carp

Aphid or its parasitoid wasp

Sandflies

Midges

Beet army worm moth

Mandarin fish

Bats

Cattle

Mosquitoes

Tench

Sea louse

Birds

Wild boar

Rodents

Ticks

Ticks

Voles

Ticks

Muscid house fly

Dolphins and porpoise

Mosquitoes

Drosophilid fruit fly

Common carp

Midges and bovids

Mosquitoes and rodents

Mammals including humans

Cattle

Ticks

Midges mosquitoes and ruminants

Sandflies

Lacewing

Louse fly

Bats

Sandflies and mammals including humans

Mammals including humans sandflies and mosquitoes

Ticks

MosquitoesSandflies

Mammals including humans biting and non-biting diptera

Ticks

Midges and mammals

Diptera species (Muscid house fly and Calliphorid laterine fly)

Tephritid fruit fly

Bats

Sea louse

Cultured turbot

Muscid house fly

Drosophilid fruit fly

Louse fly

Mosquitoes and birds

Mosquitoes

Mosquitoes

Humans

Psychodidae drain fly

Louse fly

Mosquitoes

Sea louse

Birds

Lizards

Mosquitoes ticks sandflies mammals including humans

Domestic chickens

Northern pike

Sandflies

Sea louse

Louse fly

Ticks

Bovids

Lizards

Drosophilid fruit fly

Mosquitoes

Lizards

Lizards

Mosquitoes

Ticks

Mosquitoes and birds

Ticks

Mites mosquitoes and mammals

Mosquitoes

Drosophilid fruit fly

Sandflies

Drosophilid fruit fly

Drosophilid fruit fly

Mosquitoes and cattle

Mosquitoes

Mosquitoes and humans

Midges and cattle

Mosquitoes ticks and mammals

Hops

False stable fly

Drosophilid fruit fly

European eel

Mammals including humans biting and non-biting diptera

Ticks mosquitoes fleas and mammals

Ticks and rabbits

Starry flounder

Mosquitoes

Sandflies and rodents

Mosquitoes

Midges mosquitoes and cattle

Mosquitoes and humans

Speckled wood butterfly

Midges mosquitoes and cattle

Mosquitoes

Mosquitoes

Cattle

Mosquitoes and birds

Tree shrews

Midges and cattle

Louse fly

Drosophilid fruit fly

Mosquitoes

Mosquitoes

Sandflies

Drosophilid fruit flyDrosophilid fruit fly

Mosquitoes

Sandflies

Midges and birds

Midges cattle and macropods

Mosquitoes

V

VV

BA

A

BA

VV

BA

BA

V

V

A

BA

BA

A

VS

V

V

BA

V

BA

V

V

V

BA

BA

V

BA

A

V

BA

A

VS

VV

VV

V

BA

BA

VV

BA

BA

V

VV

VV

BA

BABA

VV

BA

VV

A

A

V

BA

V

A

A

BA

VV

BA

BA

V

A

BA

BA

BA

V

V

VV

V

VS

BA

BA

BA

BA

V

V

A

BA

V

V

BA

BA

VV

BA

VV

BA

A

BA

A

A

VV

BA

VV

VV

VV

P

A

A

V

VV

VV

VV

V

BA

VV

BA

VV

VV

A

VV

BA

BA

V

VV

V

VV

BA

A

BA

BA

BA

AA

BA

BA

VV

VV

BA

04

sigma

viruses

dimarhabdovirus supergroup

BFig 2A

Figure 2 Continued

B Longdon et al | 7

replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states

We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled

35 Ancestral host associations and host-switches

Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)

Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups

Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade

Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was

contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips

There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species

Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)

We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in

8 | Virus Evolution 2015 Vol 1 No 1

the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice

4 Discussion

Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences

In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species

We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous

observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)

There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should

Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)

Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038

Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)

B Longdon et al | 9

reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)

Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis

Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses

Data availability

Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824

Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584

L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436

Funding

BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO

Supplementary data

Supplementary data is available at Virus Evolution online

Acknowledgements

Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments

Conflict of interest None declared

ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-

tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206

Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72

Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37

Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52

Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75

Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8

Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091

Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21

10 | Virus Evolution 2015 Vol 1 No 1

Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76

Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58

Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3

Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146

Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22

Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press

Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101

Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88

mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73

Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837

Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8

Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196

Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90

Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21

Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92

Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709

Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71

Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68

Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9

Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18

Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80

Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191

LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4

Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20

Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932

Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378

Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9

Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69

Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98

mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt

mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44

mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press

mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260

mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50

mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50

mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395

mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728

Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412

Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7

Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517

Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11

Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21

Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary

B Longdon et al | 11

Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23

Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt

Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt

Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91

Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59

Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9

Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77

van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256

Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press

mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25

mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664

Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63

Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210

Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32

12 | Virus Evolution 2015 Vol 1 No 1

BLAST searched against GenBank cDNA and RefSeq databasesand only retained if they matched a virus-like sequence Weused two approaches to examine whether sequences were pre-sent as RNA but not DNA First where assemblies of whole-ge-nome shotgun sequences were available we used BLAST to testwhether sequences were integrated into the host genomeSecond for the virus sequences in the butterfly Pararge aegeriaand the medfly Ceratitis capitata we were able to obtain infectedsamples to confirm whether sequences are only present in RNAby performing PCR on both genomic DNA and cDNA as de-scribed above (samples kindly provided by Casper BreukerMelanie Gibbs and Philip Leftwich respectively)

23 Phylogenetic analysis

All available rhabdovirus-like sequences were downloadedfrom GenBank (accessions in Supplementary Data S2) Aminoacid sequences for the L gene (encoding the RNA DependentRNA Polymerase or RDRP) were used to infer the phylogeny(L gene sequences) as they contain conserved domains that canbe aligned across this diverse group of viruses Sequences werealigned with MAFFT (Katoh and Standley 2013) under defaultsettings and then poorly aligned and divergent sites were re-moved with either TrimAl (v13 strict settings implemented onPhylemon v20 server alignment) (Capella-Gutierrez Silla-Martinez and Gabaldon 2009) or Gblocks (v091b selectingsmaller final blocks allowing gap positions and less strict flank-ing positions to produce a less stringent selection alignment)(Talavera and Castresana 2007) These resulted in alignments of1492 and 829 amino acids respectively

Phylogenetic trees were inferred using Maximum Likelihoodin PhyML (v30) (Guindon et al 2010) using the LG substitutionmodel (Le and Gascuel 2008) (preliminary analysis confirmedthe results were robust to the amino acid substitution model se-lected) with a gamma distribution of rate variation with fourcategories and a sub-tree pruning and regrafting topologysearching algorithm Branch support was estimated usingApproximate Likelihood-Ratio Tests (aLRT) that are reported tooutperform bootstrap methods (Anisimova and Gascuel 2006)Figures were created using FIGTREE (v 14) (Rambaut 2011)

24 Analysis of phylogenetic structure between virusestaken from different hosts and ecologies

We measured the degree of phylogenetic structure between vi-rus sequences identified in different categories of host (arthro-pods vertebrates and plants) and ecosystems (terrestrial andaquatic) Following Bhatia et al (2013) we measured the degreeof genetic structure between virus sequences from differentgroups of hostsecosystems using Hudsonrsquos Fst estimator(Hudson Slatkin and Maddison 1992) as in Bhatia et al (2013)We calculated Fst as 1 the mean number of differences be-tween sequences within or between populations where a popu-lation is a host category or ecosystem The significance of thisvalue was tested by comparison with 1000 replicates with hostcategories randomly permuted over sequences We also mea-sured the clustering of these categories over our phylogeny us-ing the genealogical sorting index (GSI) a measure of the degreeof exclusive ancestry of a group on a rooted genealogy(Cummings Neel and Shaw 2008) for each of our host associa-tion categories The index was estimated using thegenealogicalSorting R package (Bazinet Myers and Khatavkar2009) with significance estimated by permutation The tree waspruned to remove strains that could not be assigned to one of

the host association categories under consideration Finallysince arthropods are the most sampled host we tested for evi-dence of genetic structure within the arthropod-associated vi-ruses that would suggest co-divergence with their hosts orpreferential host-switching between closely related hosts Wecalculated the Pearson correlation coefficient of the evolution-ary distances between viruses and the evolutionary distancesbetween their hosts and tested for significance by permutation(as in Hommola et al [2009]) We used the patristic distances ofour ML tree for the virus data and a time-tree of arthropod gen-era using published estimates of divergence dates (Jeyaprakashand Hoy 2009 Misof et al 2014)

25 Reconstruction of host associations

Viruses were categorized as having one of four types of host as-sociation arthropod-specific vertebrate-specific arthropod-vectored plant or arthropod-vectored vertebrate However thehost association of some viruses is uncertain when they havebeen isolated from vertebrates biting-arthropods or plant-sap-feeding arthropods Due to limited sampling it was not clearwhether viruses isolated from vertebrates were vertebrate spe-cific or arthropod-vectored vertebrate viruses or whether vi-ruses isolated from biting-arthropods were arthropod specificviruses or arthropod-vectored vertebrate viruses or if virusesisolated from plant-sap-feeding arthropods were arthropod-specific or arthropod-vectored plant viruses

We classified a virus from a nematode as having its ownhost category We classified three of the fish infecting dimar-habdoviruses as vertebrate specific based on the fact they canbe transmitted via immersion in water containing virus duringexperimental conditions (Bootsma Dekinkelin and Leberre1975 Dorson et al 1987 Haenen and Davidse 1993) and thewidely held belief amongst the fisheries community that theseviruses are not typically vectored (Ahne et al 2002)] Howeverthere is some evidence these viruses can be transmitted by ar-thropods (sea lice) in experiments (Pfeilputzien 1978 Ahne et al2002) and so we would recommend this be interpreted withsome caution Additionally although we classified the virusesidentified in sea-lice as having biting arthropod hosts they maybe crustacean-specific The two viruses from Lepeophtheirus sal-monis do not seem to infect the fish they parasitize and are pre-sent in all developmental stages of the lice suggesting theymay be transmitted vertically (Okland et al 2014)

We simultaneously estimated both the current and ancestralhost associations and the phylogeny of the viruses using aBayesian analysis implemented in BEAST v18 (Drummondet al 2012 Weinert et al 2012) Because meaningful branchlengths are essential for this analysis (uncertainty about branchlengths will feed into uncertainty about the estimates) we useda subset of the sites and strains used in the maximum likeli-hood (ML) analysis We retained 189 taxa all rhabdoviruses ex-cluding the divergent fish-infecting novirhabdovirus clade andthe virus from Hydra as well as the viruses from Lolium perenneand Conwentzia psociformis which had a large number of missingsites Sequences were trimmed to a conserved region of 414amino acids where data was recorded for most of these viruses(the Gblocks alignment trimmed further by eye)

We used the host-association categories described abovewhich included ambiguous states To describe amino acid evo-lution we used an LG substitution model with gamma distrib-uted rate variation across sites (Le and Gascuel 2008) and anuncorrelated lognormal relaxed clock model of rate variationamong lineages (Drummond et al 2006) To describe the

B Longdon et al | 3

evolution of the host associations we used a strict clock modeland a discrete asymmetric transition rate matrix (allowing tran-sitions to and from a host association to take place at differentrates) as previously used to model migrations between discretegeographic locations (Edwards et al 2011) and host switches(Weinert et al 2012 Faria et al 2013) We also examined how of-ten these viruses jumped between different classes of hosts us-ing reconstructed counts of biologically feasible changes of hostassociation and their HPD confidence intervals (CIs) usingMarkov Jumps (Minin and Suchard 2008) These includedswitches between arthropod-specific and both arthropod-vec-tored vertebrate and arthropod-vectored plant states and be-tween vertebrate specific and arthropod-vectored vertebratestates We used a constant population size coalescent prior forthe relative node ages (using a birth-death prior gave equivalentresults) and the BEAUti v18 default priors for all other parame-ters (Drummond et al 2012) (BEAUti xml available asSupplementary Material) In Figure 2 we have transferred theancestral state reconstruction from the BEAST tree to the MLtree

Convergence was assessed using Tracer v16 (Rambaut andDrummond 2007) and a burn-in of 30 was removed prior tothe construction of a consensus tree which included a descrip-tion of ancestral host associations in the output file High effec-tive sample sizes were achieved for all parameters (gt200)Previous simulations in the context of biogeographical infer-ence have shown that the approach is robust to sampling bias(Edwards et al 2011) However to confirm this following(Lemey et al 2014) we tested whether sample size predicts rateto or from a host association

3 Results31 Novel rhabdoviruses from RNA-seq

To search for new rhabdoviruses we collected a variety of differ-ent species of flies screened them for CO2 sensitivity which is acommon symptom of infection and sequenced total RNA ofthese flies by RNA-seq We identified rhabdovirus-like se-quences from a de-novo assembly by BLAST and used PCR toidentify which samples these sequences came from

This approach resulted in eleven rhabdovirus-like sequencesfrom nine (possibly ten) species of fly Seven of these viruseswere previously unknown and four had been reported previ-ously from shorter sequences (Supplementary Tables S1 andS2) The novel viruses were highly divergent from known vi-ruses Sigma viruses known from other species of Drosophilatypically have genomes of 125 kb (Longdon Obbard andJiggins 2010 Longdon et al 2011b) and six of our sequenceswere approximately this size suggesting they are near-complete genomes None of the viruses discovered in our RNA-seq data were integrated into the host genome (see lsquoMethodsrsquosection for details)

To investigate the putative gene content of the viruses wepredicted genes based on open reading frames (ORFs) For theviruses with apparently complete genomes (Fig 1) we foundthat those from Drosophila ananassae Daffinis Dimmigrans andDrosophila sturtvanti contained ORFs corresponding to the fivecore genes found across all rhabdoviruses with an additionalORF between the P and M genes This is the location of the Xgene found in sigma viruses and in three of the four novel vi-ruses it showed BLAST sequence similarity to the X gene ofsigma viruses The virus from Drosophila busckii did not containan additional ORF between the P and M genes but instead con-tained an ORF between the G and L gene

Using the phylogeny described below we have classified ournewly discovered viruses as either sigma viruses rhabdovi-ruses or other viruses and named them after the host speciesthey were identified from (Fig 1) (Longdon and Walker 2011)We also found one other novel mononegavirales-like sequencefrom Drosophila unispina that groups with a recently discoveredclade of arthropod associated viruses (Nyamivirus clade [Li et al2015] see Supplementary Table S5 and the full phylogeny) aswell as five other RNA viruses from various families (data notshown) confirming our approach can detect a wide range of di-vergent viruses

Putative genes are shown in color non-coding regions areshown in black ORFs were designated as the first start codonfollowing the transcription termination sequence (7 Ursquos) of theprevious ORF to the first stop codon Dotted lines representparts of the genome not sequenced These viruses were eitherfrom our own RNA-seq data or were first found in in public

Scaptodrosophila deflexa sigmavirusDrosophila sturtevanti rhabdovirus

Drosophila tristis sigmavirusDrosophila montana sigmavirus

Drosophila subobscura rhabdovirusDrosophila algonquin sigmavirus

Pararge aegeria rhabdovirusCeratitis capitata sigmavirus

Drosophila busckii rhabdovirusDrosophila sturtevanti sigmavirusDrosophila immigrans sigmavirus

Drosophila ananassae sigmavirusDrosophila affinis sigmavirus

Sequence length (nucleotides)0 5000 10000 15000

N P M G L X Accessory

Figure 1 Genome organization of newly discovered viruses from metagenomic RNA sequencing of CO2 sensitive flies

4 | Virus Evolution 2015 Vol 1 No 1

databases and key features verified by PCR and Sanger sequenc-ing Rhabdovirus genomes are typically 11ndash13-kb long andcontain five core genes 30-N-P-M-G-L-50 (Dietzgen and Kuzmin2012) However a number of groups of rhabdoviruses containadditional accessory genes and can be up to 16-kb long(Walker et al 2011 2015)

32 New rhabdoviruses from public databases

We identified a further twenty-six novel rhabdovirus-like se-quences by searching public databases of assembled RNA-seqdata with BLAST These included nineteen viruses from arthro-pods (Fleas Crustacea Lepidoptera Diptera) one from aCnidarian (Hydra) and five from plants (Supplementary TableS3) Of these viruses nineteen had sufficient amounts of codingsequence (gt1000 bp) to include in the phylogenetic analysis(Supplementary Table S3) whilst the remainder were too short(Supplementary Table S4)

Four viruses from databases had near-complete genomesbased on their size These were from the moth Triodia sylvinathe house fly Musca domestica (99 nucleotide identity toWuhan house fly virus 2 [Li et al 2015]) the butterfly Paegeriaand the medfly Ccapitata all of which contain ORFs correspond-ing to the five core rhabdovirus genes The sequence fromCcapitata had an additional ORF between the P and M geneswith BLAST sequence similarity to the X gene in sigma virusesThere were several unusual sequences First in the virus fromPaegeria there appear to be two full-length glycoprotein ORFsbetween the M and L genes (we confirmed by Sanger sequencingthat both exist and the stop codon between the two genes wasnot an error) Second the Agave tequilana transcriptome con-tained a L gene ORF on a contig that was the length of a typicalrhabdovirus genome but did not appear to contain typical genecontent suggesting it has very atypical genome organization orhas been misassembled or is integrated into its host plant ge-nome (Chiba et al 2011) Finally the virus from Hydra magnipa-pillata contained six predicted genes but the L gene (RDRP) ORFwas unusually long Some of the viruses we detected may beEVEs inserted into the host genome and subsequently ex-pressed (Aiewsakun and Katzourakis 2015) For example this islikely the case for the sequence from the silkworm Bombyx morithat we also found in the silkworm genome and the L gene se-quence from Spodoptera exigua that contains stop codons Underthe assumption that viruses integrated into host genomes onceinfected those hosts this does not affect our conclusions belowabout the host range of these viruses (Katzourakis and Gifford2010 Fort et al 2011 Ballinger Bruenn and Taylor 2012) Wealso found nine other novel mononegavirale-like sequencesthat group with recently discovered clades of insect viruses (Liet al 2015) (see Supplementary Table S5 and SupplementaryFig S4)

33 Rhabdovirus phylogeny

To reconstruct the evolution of the Rhabdoviridae we have pro-duced the most complete phylogeny of the group to date (Fig 2)We aligned the relatively conserved L gene (RNA DependantRNA Polymerase) from our newly discovered viruses with se-quences of known rhabdoviruses to give an alignment of 195rhabdoviruses (and twenty-six other mononegavirales as anoutgroup) We reconstructed the phylogeny using different se-quence alignments and methodologies and these all gave qual-itatively similar results with the same major clades beingreconstructed The ML and Bayesian relaxed clock phylogenies

were very similar 149188 nodes are found in both reconstruc-tions and only two nodes present in the Bayesian relaxed clocktree with strong support are absent from the ML tree withstrong support These are found in a single basal clade of diver-gent but uniformly arthropod-specific strains where the differ-ence in topology will have no consequence for inference of hostassociation This suggests that our analysis is robust to the as-sumptions of a relaxed molecular clock The branching orderbetween the clades in the dimarhabdovirus supergroup wasgenerally poorly supported and differed between the methodsand alignments Eight sequences that we discovered were notincluded in this analysis as they were considered too short buttheir closest BLAST hits are listed in Supplementary Table S4

We recovered all of the major clades described previously(Fig 2) and found that the majority of known rhabdoviruses be-long to the dimarhabdovirus clade (Fig 2B) The RNA-seq vi-ruses from Drosophila fall into either the sigma virus clade(Fig 2B) or the arthropod clade sister to the cyto- and nucleo-rhabdoviruses (Fig 2A) The viruses from sequence databasesare diverse coming from almost all of the major clades with theexception of the lyssaviruses

34 Predicted host associations of viruses

With a few exceptions rhabdoviruses are either arthropod-vec-tored viruses of plants or vertebrates or are vertebrate- or ar-thropod- specific In many cases the only information about avirus is the host from which it was isolated Therefore a priori itis not clear whether viruses isolated from vertebrates are verte-brate-specific or arthropod-vectored or whether viruses iso-lated from biting arthropods (eg mosquitoes sandflies ticksmidges and sea lice) are arthropod specific or also infect verte-brates Likewise it is not clear whether viruses isolated fromsap-sucking insects (all Hemiptera aphids leafhoppers scaleinsect and mealybugs) are arthropod-specific or arthropod-vec-tored plant viruses By combining data on the ambiguous andknown host associations with phylogenetic information wewere able to predict both the ancestral and present host associ-ations of these viruses To do this we used a Bayesian phyloge-netic analysis that simultaneously estimated the phylogenyand host association of our data In the analysis we defined ourhost associations either as vertebrate-specific arthropod-spe-cific arthropod-vectored vertebrate arthropod-vectored plantnematode or as ambiguous between two (and in one case allfive) of these states (see lsquoMethodsrsquo section)

This approach identified a large number of viruses that arelikely to be new arthropod-vectored vertebrate viruses (Fig 2B)Of eighty viruses with ambiguous host associations eighty-ninewere assigned a host association with strong posterior support(gt095) Of the fifty-two viruses found in biting arthropods for-ty-five were predicted to be arthropod-vectored vertebrate vi-ruses and six to be arthropod-specific Of the thirty virusesfound in vertebrates twenty-two were predicted to be arthro-pod-vectored vertebrate viruses and two were predicted to bevertebrate-specific (both fish viruses) Of the seven virusesfound in plant-sap-feeding arthropods (Fig 2A) three were pre-dicted to be plant-associated and two arthropod-associated

To test the accuracy of our predictions of current host asso-ciations we randomly selected a set of viruses with known asso-ciations re-assigned their host association as ambiguousbetween all possible states (a greater level of uncertainty thanwe generally attributed to viruses in our data) and re-ran ouranalysis We repeated this ten times for nine sets of ten virusesand one set of nine viruses (randomly sampling without

B Longdon et al | 5

Hyd

ra m

agn

ipap

illat

a T

SA

Nor

ther

n ce

real

mos

aic

viru

sP

lan

oco

ccu

s ci

tri T

SA

Bok

eloh

bat

lyss

aviru

s

Tac

heng

Tic

k V

irus

7

Ikom

a ly

ssav

irus

Wuh

an In

sect

viru

s 6

Mu

sca

do

mes

tica

TS

A

Wuh

an F

ly V

irus

3

Shu

anga

o In

sect

Viru

s 6

Per

sim

mon

viru

s A

Hira

me

rhab

dovi

rus

Shu

anga

o B

edbu

g V

irus

2

Wuh

an In

sect

viru

s 4

Jing

shan

Fly

Viru

s 2

Ker

ria

lacc

a T

SA

Wuh

an M

osqu

ito V

irus

9

Shi

mon

i bat

viru

s

Aus

tral

ian

bat l

yssa

viru

s b

Mok

ola

viru

s is

olat

e 86

100C

AM

Irku

t viru

s

Lo

tus

corn

icu

latu

s T

SA

Lago

s ba

t viru

s i8

619N

GA

Aus

tral

ian

bat l

yssa

viru

s a

Spo

dopt

era

frug

iper

da r

habd

oviru

s

Lettu

ce n

ecro

tic y

ello

ws

viru

s

Fra

nkl

inie

lla o

ccid

enta

lis T

SA

Lo

lium

per

enn

e T

SA

Mai

ze Ir

ania

n m

osai

c vi

rus

Son

chus

yel

low

net

Dro

sop

hila

stu

rtva

nti

rh

abd

ovi

rus

Mai

ze m

osai

c vi

rus

Mok

ola

viru

s 86

101R

CA

Eur

opea

n ba

t lys

savi

rus

RV

9 1

Oro

psy

lla s

ilan

tiew

i TS

A

Orc

hid

fleck

viru

s

Wes

t Cau

casi

an b

at v

irus

Lettu

ce y

ello

w m

ottle

viru

s

Taa

stru

p vi

rus

Ag

ave

teq

uila

na

TS

A

Wuh

an In

sect

viru

s 5

Far

min

gton

viru

s

Wuh

an H

ouse

Fly

Viru

s 2

Rab

ies

viru

s

Ric

e ye

llow

stu

nt v

irus

Tar

o ve

in c

hlor

osis

viru

s

Vira

l hem

orrh

agic

sep

ticem

ia v

irus

Lyss

aviru

s O

zern

oe

Fox

feca

l rha

bdov

irus

Duv

enha

ge v

irus

8613

2SA

Mai

ze fi

ne s

trea

k vi

rus

Infe

ctio

us h

aem

atop

oiet

ic n

ecro

sis

viru

s

Med

icag

o sa

tiva

TS

A

San

xia

Wat

er S

trid

er V

irus

5

Tri

od

ia s

ylvi

na

TS

A

Sha

yang

Fly

Viru

s 3

Lago

s ba

t viru

s K

E13

1

Eur

opea

n ba

t lys

savi

rus

1 89

18F

RA

Dro

sop

hila

su

bo

bsc

ura

rh

abd

ovi

rus

Khu

jand

lyss

aviru

sE

urop

ean

bat l

yssa

viru

s 2

9018

HO

L

Egg

plan

t mot

tled

dwar

f viru

s

Soy

bean

cys

t nem

atod

e vi

rus

Sna

kehe

ad r

habd

oviru

s

Wuh

an A

nt V

irus

Dro

sop

hila

bu

scki

i rh

abd

ovi

rus

Ara

van

viru

s

Hyd

ra (

Cni

daria

n)

Cer

eals

and

leaf

hopp

ers

Citr

us m

ealy

bug

Bat

s

Tic

ks

Afr

ican

Civ

ets

Aph

id o

r its

par

asito

id w

asp

Mus

cid

hous

e fly

Dip

tera

spe

cies

(C

allip

horid

and

Sac

opha

gid

flies

)

Dip

tera

and

Lep

idop

tera

Per

sim

mon

tree

Fis

h

Bed

bug

Aph

id o

r its

par

asito

id w

asp

Sar

coph

agid

fles

h fly

Sca

le in

sect

Mos

quito

es

Bat

s

Bat

s an

d hu

man

s

Mam

mal

s sp

ecie

s

Bat

s

Flo

wer

ing

plan

t

Mam

mal

s sp

ecie

s

Bat

s an

d hu

man

s

Fal

l arm

y w

orm

mot

h

Lettu

ce o

ther

dic

ot p

lant

s an

d ap

hids

Wes

tern

flow

er th

rip

Rye

gra

ss

Cer

eals

and

pla

ntho

pper

Flo

wer

ing

plan

t and

aph

id

Dro

soph

ilid

frui

t fly

Mai

ze a

nd p

lant

hopp

ers

Mam

mal

s sp

ecie

s

Bat

s

Fle

a

Flo

wer

ing

plan

t

Bat

s

Lettu

ce a

nd a

phid

Leaf

hopp

er

Flo

wer

ing

plan

t

Aph

id o

r its

par

asito

id w

asp

Bird

spe

cies

Mus

cid

hous

e fly

Mam

mal

s sp

ecie

s in

clud

ing

hum

ans

Ric

e an

d le

afho

pper

s

Tar

o

Fis

h

Hum

ans

Fox

(fe

cal s

ampl

e)

Hum

ans

and

bats

Mai

ze a

nd le

afho

pper

Fis

h

Alfa

lfa

Wat

er S

trid

er

Ora

nge

swift

mot

h

Dip

tera

spe

cies

(C

allip

horid

and

Mus

cid

flies

)

Bat

s

Mam

mal

s sp

ecie

s

Dro

soph

ilid

frui

t fly

Bat

sH

uman

s an

d ba

ts

Egg

plan

t

Nem

atod

e

Fis

h

Japa

nese

car

pent

er a

nt

Dro

soph

ilid

frui

t fly

Bat

s

PAP

VS

BA

VS

AP

A AA PBA

AP

A AP

BA

VS

VS

VS

VS

P VS

VS

A PA P PPA P VS

VS

BA

P VS

PAP

PAP

VA VS

P P VS

UH

VS

PPAAA VS

VS

A VS

VS

PA A VS

N

Ass

ocia

ted

host

s A

rthro

pod-

vect

ored

pla

nt

Arth

ropo

ds

Ver

tebr

ate

spec

ific

Fig

2B0

4

Arth

ropo

d-ve

ctor

ed v

erte

brat

e Lo

w s

uppo

rt or

om

itted

N

emat

ode

lyssaviruses

cyto- and nucleo- rhabdoviruses

novi

rhab

dovi

ruse

s

A

Figu

re2

ML

ph

ylo

gen

yo

fth

eR

habd

ovir

idae

(A

)sh

ow

sth

eba

sal

fish

-in

fect

ing

no

virh

abd

ovi

ruse

san

un

assi

gned

gro

up

of

arth

rop

od

asso

ciat

edvi

ruse

sth

ep

lan

tin

fect

ing

cyto

-an

dn

ucl

eo-r

hab

do

viru

ses

asw

ella

sth

eve

rteb

rate

spec

ific

lyss

avir

use

s(B

)sh

ow

sth

ed

imar

hab

do

viru

ssu

per

gro

up

wh

ich

isp

red

om

inan

tly

com

po

sed

of

arth

rop

od

-vec

tore

dve

rteb

rate

viru

ses

alo

ng

wit

hth

ear

thro

po

d-s

pec

ific

sigm

avi

rus

clad

eB

ran

ches

are

colo

red

base

do

nth

e

Bay

esia

nh

ost

asso

ciat

ion

reco

nst

ruct

ion

anal

ysis

Bla

ckre

pre

sen

tsta

xao

mit

ted

fro

mh

ost

-sta

tere

con

stru

ctio

no

ras

soci

atio

ns

wit

hlt

095

sup

po

rtT

he

tree

was

infe

rred

fro

mL

gen

ese

qu

ence

su

sin

gth

eG

blo

cks

alig

nm

ent

Th

e

colu

mn

so

fte

xtar

eth

evi

rus

nam

eth

eh

ost

cate

gory

use

dfo

rre

con

stru

ctio

ns

and

kno

wn

ho

sts

(fro

mle

ftto

righ

t)C

od

esfo

rth

eh

ost

cate

gori

esar

eV

Sve

rteb

rate

-sp

ecifi

cV

Va

rth

rop

od

-vec

tore

dve

rteb

rate

Aa

rth

rop

od

spec

ific

BS

biti

ng-

arth

rop

od

(am

bigu

ou

sst

ate)

Vv

erte

brat

e(a

mbi

guo

us

stat

e)A

Pp

lan

t-sa

p-f

eed

ing-

arth

rop

od

(am

bigu

ou

sst

ate)

UH

un

cert

ain

-ho

st(a

mbi

guo

us

acro

ssal

lsta

tes)

an

dN

nem

atod

eN

ames

inbo

ldan

du

nd

erli

ned

are

vi-

ruse

sd

isco

vere

din

this

stu

dy

Th

etr

eeis

roo

ted

wit

hth

eC

hu

viru

scl

ade

(ro

ot

coll

apse

d)a

sid

enti

fied

asan

ou

tgro

up

in(L

iet

al2

015)

but

we

no

teth

isgi

ves

the

sam

ere

sult

asm

idp

oin

tan

dth

em

ole

cula

rcl

ock

roo

tin

gN

od

esla

-

bell

edw

ith

qu

esti

on

mar

ks(

)re

pre

sen

tn

od

esw

ith

aLR

T(a

pp

roxi

mat

eli

keli

ho

od

rati

ote

st)

stat

isti

cal

sup

po

rtva

lues

less

than

075

Sca

leba

rsh

ow

sn

um

ber

of

amin

o-a

cid

subs

titu

tio

ns

per

site

Bay

esia

nM

CC

tree

use

dto

infe

r

ance

stra

ltra

its

issh

ow

nin

Sup

ple

men

tary

Figu

reS4

(co

nti

nu

ed)

6 | Virus Evolution 2015 Vol 1 No 1

Kern Canyon virus

Mossuril virus

Yata virus

Drosophila obscura sigma virus

Gray Lodge virus

Oak Vale virus

Perinet virus

Morreton virus

Durham virus

Grass carp rhabdovirus

Wuhan Insect virus 7

Sripur virus

Itacaiunas virus

Spodoptera exigua TSA

Siniperca chuatsi virus

Fikirini bat rhabdovirus

Koolpinyah virus

Rochambeau virus

Tench rhabdovirus

Caligus rogercresseyi 11125273 TSA

Garba virus

Nishimuro virus

Keuraliba virus

New Minto virus

Long Island tick rhabdovirus

Klamath virus

Huangpi Tick Virus 3

Wuhan House Fly Virus 1

Dolphin rhabdovirus

Culex tritaeniorhynchus rhabdovirus

Scaptodrosophila deflexa sigma virus

Spring viremia of carp virus

Tibrogargan virus

La Joya virus

Vesicular stomatitis virus Alagoas Indiana 3

Berrimah virus

Taishun Tick Virus

Bovine ephmeral fever virus

Radi virus

Conwentzia psociformis TSA

Wuhan Louse Fly Virus 10

Oita virus

y

Chandipura virus

Vesicular stomatitis virus Indiana

Yongjia Tick Virus 2

Jurona virusYug Bogdanovac virus

Vesicular stomatitis virus New Jersey

Bole Tick Virus 2

Curionopolis virus

Shayang Fly Virus 2

Ceratitis capitata sigma virus

Mount Elgon bat virus

Lepeophtheirus salmonis rhabdovirus 127

Scophthalmus maximus rhabdovirus

Wuhan Fly Virus 2

Drosophila montana sigma virus

Wuhan Louse Fly Virus 5

Hart Park virus

Arboretum virus

Ord River virus

Bas Congo virus

Santa barbara virus

Wuhan Louse Fly Virus 9

Puerto Almendras virus

Lepeophtheirus salmonis rhabdovirus 9

Landjia virus

Sena Madureira virus

Isfahan virus

Sunguru virus

Pike fry rhabdovirus

Iriri virus

Caligus rogercresseyi 11114047 TSA

Wuhan Louse Fly Virus 8

Tacheng Tick Virus 3

Coastal Plains virus

Chaco virus

Drosophila ananassae sigma virus

Bahia Grande virus

Marco virus

Almpiwar virus

Malakal virus

Wuhan Tick Virus 1

Aruac virus

Sawgrass virus

Vesicular stomatitis virus Cocal

Parry Creek virus

Drosophila melanogaster sigma virus HAP23 isolate

Niakha virus

Drosophila melanogaster sigma virus AP30 isolate

Drosophila sturtvanti sigma virus

Joinjakaka virus

Moussa virus

Nkolbisson virus

Sweetwater Branch virus

Kwatta virus

Humulus lupulus TSA

Muscina stabulans sigma virus

Drosophila immigrans sigma virus

Eel Virus European X

Vesicular stomatitis virus New Jersey Hazelhurst

Barur virus

Connecticut virus

Starry flounder rhabdovirus

Harlingen virus

Inhangapi virus

North Creek Virus

Fukuoka virus

Kamese virus

Pararge aegeria rhabdovirus

Kimberley virus

Malpais Spring virus

Mosqueiro virus

Adelaide River virus

Flanders virus

Tupaia virus

Bivens Arm virus

Wuhan Louse Fly Virus 11

Drosophila tristis sigma virus

Muir Springs virus

Manitoba virus

Carajas oncolytic virus

Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus

Beaumont virus

Maraba virus

Wongabel virus

Ngaingan virus

Xiburema virus

Bats

Mosquitoes birds and mammals including humans

Mosquitoes

Drosophilid fruit fly

Mosquitoes

Mosquitoes and swine

Mosquitoes and sandflies

Sandflies

Birds

Grass carp

Aphid or its parasitoid wasp

Sandflies

Midges

Beet army worm moth

Mandarin fish

Bats

Cattle

Mosquitoes

Tench

Sea louse

Birds

Wild boar

Rodents

Ticks

Ticks

Voles

Ticks

Muscid house fly

Dolphins and porpoise

Mosquitoes

Drosophilid fruit fly

Common carp

Midges and bovids

Mosquitoes and rodents

Mammals including humans

Cattle

Ticks

Midges mosquitoes and ruminants

Sandflies

Lacewing

Louse fly

Bats

Sandflies and mammals including humans

Mammals including humans sandflies and mosquitoes

Ticks

MosquitoesSandflies

Mammals including humans biting and non-biting diptera

Ticks

Midges and mammals

Diptera species (Muscid house fly and Calliphorid laterine fly)

Tephritid fruit fly

Bats

Sea louse

Cultured turbot

Muscid house fly

Drosophilid fruit fly

Louse fly

Mosquitoes and birds

Mosquitoes

Mosquitoes

Humans

Psychodidae drain fly

Louse fly

Mosquitoes

Sea louse

Birds

Lizards

Mosquitoes ticks sandflies mammals including humans

Domestic chickens

Northern pike

Sandflies

Sea louse

Louse fly

Ticks

Bovids

Lizards

Drosophilid fruit fly

Mosquitoes

Lizards

Lizards

Mosquitoes

Ticks

Mosquitoes and birds

Ticks

Mites mosquitoes and mammals

Mosquitoes

Drosophilid fruit fly

Sandflies

Drosophilid fruit fly

Drosophilid fruit fly

Mosquitoes and cattle

Mosquitoes

Mosquitoes and humans

Midges and cattle

Mosquitoes ticks and mammals

Hops

False stable fly

Drosophilid fruit fly

European eel

Mammals including humans biting and non-biting diptera

Ticks mosquitoes fleas and mammals

Ticks and rabbits

Starry flounder

Mosquitoes

Sandflies and rodents

Mosquitoes

Midges mosquitoes and cattle

Mosquitoes and humans

Speckled wood butterfly

Midges mosquitoes and cattle

Mosquitoes

Mosquitoes

Cattle

Mosquitoes and birds

Tree shrews

Midges and cattle

Louse fly

Drosophilid fruit fly

Mosquitoes

Mosquitoes

Sandflies

Drosophilid fruit flyDrosophilid fruit fly

Mosquitoes

Sandflies

Midges and birds

Midges cattle and macropods

Mosquitoes

V

VV

BA

A

BA

VV

BA

BA

V

V

A

BA

BA

A

VS

V

V

BA

V

BA

V

V

V

BA

BA

V

BA

A

V

BA

A

VS

VV

VV

V

BA

BA

VV

BA

BA

V

VV

VV

BA

BABA

VV

BA

VV

A

A

V

BA

V

A

A

BA

VV

BA

BA

V

A

BA

BA

BA

V

V

VV

V

VS

BA

BA

BA

BA

V

V

A

BA

V

V

BA

BA

VV

BA

VV

BA

A

BA

A

A

VV

BA

VV

VV

VV

P

A

A

V

VV

VV

VV

V

BA

VV

BA

VV

VV

A

VV

BA

BA

V

VV

V

VV

BA

A

BA

BA

BA

AA

BA

BA

VV

VV

BA

04

sigma

viruses

dimarhabdovirus supergroup

BFig 2A

Figure 2 Continued

B Longdon et al | 7

replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states

We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled

35 Ancestral host associations and host-switches

Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)

Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups

Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade

Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was

contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips

There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species

Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)

We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in

8 | Virus Evolution 2015 Vol 1 No 1

the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice

4 Discussion

Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences

In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species

We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous

observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)

There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should

Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)

Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038

Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)

B Longdon et al | 9

reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)

Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis

Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses

Data availability

Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824

Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584

L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436

Funding

BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO

Supplementary data

Supplementary data is available at Virus Evolution online

Acknowledgements

Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments

Conflict of interest None declared

ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-

tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206

Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72

Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37

Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52

Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75

Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8

Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091

Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21

10 | Virus Evolution 2015 Vol 1 No 1

Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76

Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58

Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3

Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146

Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22

Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press

Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101

Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88

mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73

Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837

Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8

Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196

Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90

Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21

Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92

Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709

Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71

Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68

Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9

Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18

Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80

Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191

LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4

Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20

Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932

Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378

Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9

Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69

Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98

mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt

mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44

mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press

mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260

mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50

mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50

mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395

mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728

Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412

Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7

Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517

Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11

Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21

Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary

B Longdon et al | 11

Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23

Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt

Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt

Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91

Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59

Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9

Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77

van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256

Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press

mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25

mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664

Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63

Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210

Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32

12 | Virus Evolution 2015 Vol 1 No 1

evolution of the host associations we used a strict clock modeland a discrete asymmetric transition rate matrix (allowing tran-sitions to and from a host association to take place at differentrates) as previously used to model migrations between discretegeographic locations (Edwards et al 2011) and host switches(Weinert et al 2012 Faria et al 2013) We also examined how of-ten these viruses jumped between different classes of hosts us-ing reconstructed counts of biologically feasible changes of hostassociation and their HPD confidence intervals (CIs) usingMarkov Jumps (Minin and Suchard 2008) These includedswitches between arthropod-specific and both arthropod-vec-tored vertebrate and arthropod-vectored plant states and be-tween vertebrate specific and arthropod-vectored vertebratestates We used a constant population size coalescent prior forthe relative node ages (using a birth-death prior gave equivalentresults) and the BEAUti v18 default priors for all other parame-ters (Drummond et al 2012) (BEAUti xml available asSupplementary Material) In Figure 2 we have transferred theancestral state reconstruction from the BEAST tree to the MLtree

Convergence was assessed using Tracer v16 (Rambaut andDrummond 2007) and a burn-in of 30 was removed prior tothe construction of a consensus tree which included a descrip-tion of ancestral host associations in the output file High effec-tive sample sizes were achieved for all parameters (gt200)Previous simulations in the context of biogeographical infer-ence have shown that the approach is robust to sampling bias(Edwards et al 2011) However to confirm this following(Lemey et al 2014) we tested whether sample size predicts rateto or from a host association

3 Results31 Novel rhabdoviruses from RNA-seq

To search for new rhabdoviruses we collected a variety of differ-ent species of flies screened them for CO2 sensitivity which is acommon symptom of infection and sequenced total RNA ofthese flies by RNA-seq We identified rhabdovirus-like se-quences from a de-novo assembly by BLAST and used PCR toidentify which samples these sequences came from

This approach resulted in eleven rhabdovirus-like sequencesfrom nine (possibly ten) species of fly Seven of these viruseswere previously unknown and four had been reported previ-ously from shorter sequences (Supplementary Tables S1 andS2) The novel viruses were highly divergent from known vi-ruses Sigma viruses known from other species of Drosophilatypically have genomes of 125 kb (Longdon Obbard andJiggins 2010 Longdon et al 2011b) and six of our sequenceswere approximately this size suggesting they are near-complete genomes None of the viruses discovered in our RNA-seq data were integrated into the host genome (see lsquoMethodsrsquosection for details)

To investigate the putative gene content of the viruses wepredicted genes based on open reading frames (ORFs) For theviruses with apparently complete genomes (Fig 1) we foundthat those from Drosophila ananassae Daffinis Dimmigrans andDrosophila sturtvanti contained ORFs corresponding to the fivecore genes found across all rhabdoviruses with an additionalORF between the P and M genes This is the location of the Xgene found in sigma viruses and in three of the four novel vi-ruses it showed BLAST sequence similarity to the X gene ofsigma viruses The virus from Drosophila busckii did not containan additional ORF between the P and M genes but instead con-tained an ORF between the G and L gene

Using the phylogeny described below we have classified ournewly discovered viruses as either sigma viruses rhabdovi-ruses or other viruses and named them after the host speciesthey were identified from (Fig 1) (Longdon and Walker 2011)We also found one other novel mononegavirales-like sequencefrom Drosophila unispina that groups with a recently discoveredclade of arthropod associated viruses (Nyamivirus clade [Li et al2015] see Supplementary Table S5 and the full phylogeny) aswell as five other RNA viruses from various families (data notshown) confirming our approach can detect a wide range of di-vergent viruses

Putative genes are shown in color non-coding regions areshown in black ORFs were designated as the first start codonfollowing the transcription termination sequence (7 Ursquos) of theprevious ORF to the first stop codon Dotted lines representparts of the genome not sequenced These viruses were eitherfrom our own RNA-seq data or were first found in in public

Scaptodrosophila deflexa sigmavirusDrosophila sturtevanti rhabdovirus

Drosophila tristis sigmavirusDrosophila montana sigmavirus

Drosophila subobscura rhabdovirusDrosophila algonquin sigmavirus

Pararge aegeria rhabdovirusCeratitis capitata sigmavirus

Drosophila busckii rhabdovirusDrosophila sturtevanti sigmavirusDrosophila immigrans sigmavirus

Drosophila ananassae sigmavirusDrosophila affinis sigmavirus

Sequence length (nucleotides)0 5000 10000 15000

N P M G L X Accessory

Figure 1 Genome organization of newly discovered viruses from metagenomic RNA sequencing of CO2 sensitive flies

4 | Virus Evolution 2015 Vol 1 No 1

databases and key features verified by PCR and Sanger sequenc-ing Rhabdovirus genomes are typically 11ndash13-kb long andcontain five core genes 30-N-P-M-G-L-50 (Dietzgen and Kuzmin2012) However a number of groups of rhabdoviruses containadditional accessory genes and can be up to 16-kb long(Walker et al 2011 2015)

32 New rhabdoviruses from public databases

We identified a further twenty-six novel rhabdovirus-like se-quences by searching public databases of assembled RNA-seqdata with BLAST These included nineteen viruses from arthro-pods (Fleas Crustacea Lepidoptera Diptera) one from aCnidarian (Hydra) and five from plants (Supplementary TableS3) Of these viruses nineteen had sufficient amounts of codingsequence (gt1000 bp) to include in the phylogenetic analysis(Supplementary Table S3) whilst the remainder were too short(Supplementary Table S4)

Four viruses from databases had near-complete genomesbased on their size These were from the moth Triodia sylvinathe house fly Musca domestica (99 nucleotide identity toWuhan house fly virus 2 [Li et al 2015]) the butterfly Paegeriaand the medfly Ccapitata all of which contain ORFs correspond-ing to the five core rhabdovirus genes The sequence fromCcapitata had an additional ORF between the P and M geneswith BLAST sequence similarity to the X gene in sigma virusesThere were several unusual sequences First in the virus fromPaegeria there appear to be two full-length glycoprotein ORFsbetween the M and L genes (we confirmed by Sanger sequencingthat both exist and the stop codon between the two genes wasnot an error) Second the Agave tequilana transcriptome con-tained a L gene ORF on a contig that was the length of a typicalrhabdovirus genome but did not appear to contain typical genecontent suggesting it has very atypical genome organization orhas been misassembled or is integrated into its host plant ge-nome (Chiba et al 2011) Finally the virus from Hydra magnipa-pillata contained six predicted genes but the L gene (RDRP) ORFwas unusually long Some of the viruses we detected may beEVEs inserted into the host genome and subsequently ex-pressed (Aiewsakun and Katzourakis 2015) For example this islikely the case for the sequence from the silkworm Bombyx morithat we also found in the silkworm genome and the L gene se-quence from Spodoptera exigua that contains stop codons Underthe assumption that viruses integrated into host genomes onceinfected those hosts this does not affect our conclusions belowabout the host range of these viruses (Katzourakis and Gifford2010 Fort et al 2011 Ballinger Bruenn and Taylor 2012) Wealso found nine other novel mononegavirale-like sequencesthat group with recently discovered clades of insect viruses (Liet al 2015) (see Supplementary Table S5 and SupplementaryFig S4)

33 Rhabdovirus phylogeny

To reconstruct the evolution of the Rhabdoviridae we have pro-duced the most complete phylogeny of the group to date (Fig 2)We aligned the relatively conserved L gene (RNA DependantRNA Polymerase) from our newly discovered viruses with se-quences of known rhabdoviruses to give an alignment of 195rhabdoviruses (and twenty-six other mononegavirales as anoutgroup) We reconstructed the phylogeny using different se-quence alignments and methodologies and these all gave qual-itatively similar results with the same major clades beingreconstructed The ML and Bayesian relaxed clock phylogenies

were very similar 149188 nodes are found in both reconstruc-tions and only two nodes present in the Bayesian relaxed clocktree with strong support are absent from the ML tree withstrong support These are found in a single basal clade of diver-gent but uniformly arthropod-specific strains where the differ-ence in topology will have no consequence for inference of hostassociation This suggests that our analysis is robust to the as-sumptions of a relaxed molecular clock The branching orderbetween the clades in the dimarhabdovirus supergroup wasgenerally poorly supported and differed between the methodsand alignments Eight sequences that we discovered were notincluded in this analysis as they were considered too short buttheir closest BLAST hits are listed in Supplementary Table S4

We recovered all of the major clades described previously(Fig 2) and found that the majority of known rhabdoviruses be-long to the dimarhabdovirus clade (Fig 2B) The RNA-seq vi-ruses from Drosophila fall into either the sigma virus clade(Fig 2B) or the arthropod clade sister to the cyto- and nucleo-rhabdoviruses (Fig 2A) The viruses from sequence databasesare diverse coming from almost all of the major clades with theexception of the lyssaviruses

34 Predicted host associations of viruses

With a few exceptions rhabdoviruses are either arthropod-vec-tored viruses of plants or vertebrates or are vertebrate- or ar-thropod- specific In many cases the only information about avirus is the host from which it was isolated Therefore a priori itis not clear whether viruses isolated from vertebrates are verte-brate-specific or arthropod-vectored or whether viruses iso-lated from biting arthropods (eg mosquitoes sandflies ticksmidges and sea lice) are arthropod specific or also infect verte-brates Likewise it is not clear whether viruses isolated fromsap-sucking insects (all Hemiptera aphids leafhoppers scaleinsect and mealybugs) are arthropod-specific or arthropod-vec-tored plant viruses By combining data on the ambiguous andknown host associations with phylogenetic information wewere able to predict both the ancestral and present host associ-ations of these viruses To do this we used a Bayesian phyloge-netic analysis that simultaneously estimated the phylogenyand host association of our data In the analysis we defined ourhost associations either as vertebrate-specific arthropod-spe-cific arthropod-vectored vertebrate arthropod-vectored plantnematode or as ambiguous between two (and in one case allfive) of these states (see lsquoMethodsrsquo section)

This approach identified a large number of viruses that arelikely to be new arthropod-vectored vertebrate viruses (Fig 2B)Of eighty viruses with ambiguous host associations eighty-ninewere assigned a host association with strong posterior support(gt095) Of the fifty-two viruses found in biting arthropods for-ty-five were predicted to be arthropod-vectored vertebrate vi-ruses and six to be arthropod-specific Of the thirty virusesfound in vertebrates twenty-two were predicted to be arthro-pod-vectored vertebrate viruses and two were predicted to bevertebrate-specific (both fish viruses) Of the seven virusesfound in plant-sap-feeding arthropods (Fig 2A) three were pre-dicted to be plant-associated and two arthropod-associated

To test the accuracy of our predictions of current host asso-ciations we randomly selected a set of viruses with known asso-ciations re-assigned their host association as ambiguousbetween all possible states (a greater level of uncertainty thanwe generally attributed to viruses in our data) and re-ran ouranalysis We repeated this ten times for nine sets of ten virusesand one set of nine viruses (randomly sampling without

B Longdon et al | 5

Hyd

ra m

agn

ipap

illat

a T

SA

Nor

ther

n ce

real

mos

aic

viru

sP

lan

oco

ccu

s ci

tri T

SA

Bok

eloh

bat

lyss

aviru

s

Tac

heng

Tic

k V

irus

7

Ikom

a ly

ssav

irus

Wuh

an In

sect

viru

s 6

Mu

sca

do

mes

tica

TS

A

Wuh

an F

ly V

irus

3

Shu

anga

o In

sect

Viru

s 6

Per

sim

mon

viru

s A

Hira

me

rhab

dovi

rus

Shu

anga

o B

edbu

g V

irus

2

Wuh

an In

sect

viru

s 4

Jing

shan

Fly

Viru

s 2

Ker

ria

lacc

a T

SA

Wuh

an M

osqu

ito V

irus

9

Shi

mon

i bat

viru

s

Aus

tral

ian

bat l

yssa

viru

s b

Mok

ola

viru

s is

olat

e 86

100C

AM

Irku

t viru

s

Lo

tus

corn

icu

latu

s T

SA

Lago

s ba

t viru

s i8

619N

GA

Aus

tral

ian

bat l

yssa

viru

s a

Spo

dopt

era

frug

iper

da r

habd

oviru

s

Lettu

ce n

ecro

tic y

ello

ws

viru

s

Fra

nkl

inie

lla o

ccid

enta

lis T

SA

Lo

lium

per

enn

e T

SA

Mai

ze Ir

ania

n m

osai

c vi

rus

Son

chus

yel

low

net

Dro

sop

hila

stu

rtva

nti

rh

abd

ovi

rus

Mai

ze m

osai

c vi

rus

Mok

ola

viru

s 86

101R

CA

Eur

opea

n ba

t lys

savi

rus

RV

9 1

Oro

psy

lla s

ilan

tiew

i TS

A

Orc

hid

fleck

viru

s

Wes

t Cau

casi

an b

at v

irus

Lettu

ce y

ello

w m

ottle

viru

s

Taa

stru

p vi

rus

Ag

ave

teq

uila

na

TS

A

Wuh

an In

sect

viru

s 5

Far

min

gton

viru

s

Wuh

an H

ouse

Fly

Viru

s 2

Rab

ies

viru

s

Ric

e ye

llow

stu

nt v

irus

Tar

o ve

in c

hlor

osis

viru

s

Vira

l hem

orrh

agic

sep

ticem

ia v

irus

Lyss

aviru

s O

zern

oe

Fox

feca

l rha

bdov

irus

Duv

enha

ge v

irus

8613

2SA

Mai

ze fi

ne s

trea

k vi

rus

Infe

ctio

us h

aem

atop

oiet

ic n

ecro

sis

viru

s

Med

icag

o sa

tiva

TS

A

San

xia

Wat

er S

trid

er V

irus

5

Tri

od

ia s

ylvi

na

TS

A

Sha

yang

Fly

Viru

s 3

Lago

s ba

t viru

s K

E13

1

Eur

opea

n ba

t lys

savi

rus

1 89

18F

RA

Dro

sop

hila

su

bo

bsc

ura

rh

abd

ovi

rus

Khu

jand

lyss

aviru

sE

urop

ean

bat l

yssa

viru

s 2

9018

HO

L

Egg

plan

t mot

tled

dwar

f viru

s

Soy

bean

cys

t nem

atod

e vi

rus

Sna

kehe

ad r

habd

oviru

s

Wuh

an A

nt V

irus

Dro

sop

hila

bu

scki

i rh

abd

ovi

rus

Ara

van

viru

s

Hyd

ra (

Cni

daria

n)

Cer

eals

and

leaf

hopp

ers

Citr

us m

ealy

bug

Bat

s

Tic

ks

Afr

ican

Civ

ets

Aph

id o

r its

par

asito

id w

asp

Mus

cid

hous

e fly

Dip

tera

spe

cies

(C

allip

horid

and

Sac

opha

gid

flies

)

Dip

tera

and

Lep

idop

tera

Per

sim

mon

tree

Fis

h

Bed

bug

Aph

id o

r its

par

asito

id w

asp

Sar

coph

agid

fles

h fly

Sca

le in

sect

Mos

quito

es

Bat

s

Bat

s an

d hu

man

s

Mam

mal

s sp

ecie

s

Bat

s

Flo

wer

ing

plan

t

Mam

mal

s sp

ecie

s

Bat

s an

d hu

man

s

Fal

l arm

y w

orm

mot

h

Lettu

ce o

ther

dic

ot p

lant

s an

d ap

hids

Wes

tern

flow

er th

rip

Rye

gra

ss

Cer

eals

and

pla

ntho

pper

Flo

wer

ing

plan

t and

aph

id

Dro

soph

ilid

frui

t fly

Mai

ze a

nd p

lant

hopp

ers

Mam

mal

s sp

ecie

s

Bat

s

Fle

a

Flo

wer

ing

plan

t

Bat

s

Lettu

ce a

nd a

phid

Leaf

hopp

er

Flo

wer

ing

plan

t

Aph

id o

r its

par

asito

id w

asp

Bird

spe

cies

Mus

cid

hous

e fly

Mam

mal

s sp

ecie

s in

clud

ing

hum

ans

Ric

e an

d le

afho

pper

s

Tar

o

Fis

h

Hum

ans

Fox

(fe

cal s

ampl

e)

Hum

ans

and

bats

Mai

ze a

nd le

afho

pper

Fis

h

Alfa

lfa

Wat

er S

trid

er

Ora

nge

swift

mot

h

Dip

tera

spe

cies

(C

allip

horid

and

Mus

cid

flies

)

Bat

s

Mam

mal

s sp

ecie

s

Dro

soph

ilid

frui

t fly

Bat

sH

uman

s an

d ba

ts

Egg

plan

t

Nem

atod

e

Fis

h

Japa

nese

car

pent

er a

nt

Dro

soph

ilid

frui

t fly

Bat

s

PAP

VS

BA

VS

AP

A AA PBA

AP

A AP

BA

VS

VS

VS

VS

P VS

VS

A PA P PPA P VS

VS

BA

P VS

PAP

PAP

VA VS

P P VS

UH

VS

PPAAA VS

VS

A VS

VS

PA A VS

N

Ass

ocia

ted

host

s A

rthro

pod-

vect

ored

pla

nt

Arth

ropo

ds

Ver

tebr

ate

spec

ific

Fig

2B0

4

Arth

ropo

d-ve

ctor

ed v

erte

brat

e Lo

w s

uppo

rt or

om

itted

N

emat

ode

lyssaviruses

cyto- and nucleo- rhabdoviruses

novi

rhab

dovi

ruse

s

A

Figu

re2

ML

ph

ylo

gen

yo

fth

eR

habd

ovir

idae

(A

)sh

ow

sth

eba

sal

fish

-in

fect

ing

no

virh

abd

ovi

ruse

san

un

assi

gned

gro

up

of

arth

rop

od

asso

ciat

edvi

ruse

sth

ep

lan

tin

fect

ing

cyto

-an

dn

ucl

eo-r

hab

do

viru

ses

asw

ella

sth

eve

rteb

rate

spec

ific

lyss

avir

use

s(B

)sh

ow

sth

ed

imar

hab

do

viru

ssu

per

gro

up

wh

ich

isp

red

om

inan

tly

com

po

sed

of

arth

rop

od

-vec

tore

dve

rteb

rate

viru

ses

alo

ng

wit

hth

ear

thro

po

d-s

pec

ific

sigm

avi

rus

clad

eB

ran

ches

are

colo

red

base

do

nth

e

Bay

esia

nh

ost

asso

ciat

ion

reco

nst

ruct

ion

anal

ysis

Bla

ckre

pre

sen

tsta

xao

mit

ted

fro

mh

ost

-sta

tere

con

stru

ctio

no

ras

soci

atio

ns

wit

hlt

095

sup

po

rtT

he

tree

was

infe

rred

fro

mL

gen

ese

qu

ence

su

sin

gth

eG

blo

cks

alig

nm

ent

Th

e

colu

mn

so

fte

xtar

eth

evi

rus

nam

eth

eh

ost

cate

gory

use

dfo

rre

con

stru

ctio

ns

and

kno

wn

ho

sts

(fro

mle

ftto

righ

t)C

od

esfo

rth

eh

ost

cate

gori

esar

eV

Sve

rteb

rate

-sp

ecifi

cV

Va

rth

rop

od

-vec

tore

dve

rteb

rate

Aa

rth

rop

od

spec

ific

BS

biti

ng-

arth

rop

od

(am

bigu

ou

sst

ate)

Vv

erte

brat

e(a

mbi

guo

us

stat

e)A

Pp

lan

t-sa

p-f

eed

ing-

arth

rop

od

(am

bigu

ou

sst

ate)

UH

un

cert

ain

-ho

st(a

mbi

guo

us

acro

ssal

lsta

tes)

an

dN

nem

atod

eN

ames

inbo

ldan

du

nd

erli

ned

are

vi-

ruse

sd

isco

vere

din

this

stu

dy

Th

etr

eeis

roo

ted

wit

hth

eC

hu

viru

scl

ade

(ro

ot

coll

apse

d)a

sid

enti

fied

asan

ou

tgro

up

in(L

iet

al2

015)

but

we

no

teth

isgi

ves

the

sam

ere

sult

asm

idp

oin

tan

dth

em

ole

cula

rcl

ock

roo

tin

gN

od

esla

-

bell

edw

ith

qu

esti

on

mar

ks(

)re

pre

sen

tn

od

esw

ith

aLR

T(a

pp

roxi

mat

eli

keli

ho

od

rati

ote

st)

stat

isti

cal

sup

po

rtva

lues

less

than

075

Sca

leba

rsh

ow

sn

um

ber

of

amin

o-a

cid

subs

titu

tio

ns

per

site

Bay

esia

nM

CC

tree

use

dto

infe

r

ance

stra

ltra

its

issh

ow

nin

Sup

ple

men

tary

Figu

reS4

(co

nti

nu

ed)

6 | Virus Evolution 2015 Vol 1 No 1

Kern Canyon virus

Mossuril virus

Yata virus

Drosophila obscura sigma virus

Gray Lodge virus

Oak Vale virus

Perinet virus

Morreton virus

Durham virus

Grass carp rhabdovirus

Wuhan Insect virus 7

Sripur virus

Itacaiunas virus

Spodoptera exigua TSA

Siniperca chuatsi virus

Fikirini bat rhabdovirus

Koolpinyah virus

Rochambeau virus

Tench rhabdovirus

Caligus rogercresseyi 11125273 TSA

Garba virus

Nishimuro virus

Keuraliba virus

New Minto virus

Long Island tick rhabdovirus

Klamath virus

Huangpi Tick Virus 3

Wuhan House Fly Virus 1

Dolphin rhabdovirus

Culex tritaeniorhynchus rhabdovirus

Scaptodrosophila deflexa sigma virus

Spring viremia of carp virus

Tibrogargan virus

La Joya virus

Vesicular stomatitis virus Alagoas Indiana 3

Berrimah virus

Taishun Tick Virus

Bovine ephmeral fever virus

Radi virus

Conwentzia psociformis TSA

Wuhan Louse Fly Virus 10

Oita virus

y

Chandipura virus

Vesicular stomatitis virus Indiana

Yongjia Tick Virus 2

Jurona virusYug Bogdanovac virus

Vesicular stomatitis virus New Jersey

Bole Tick Virus 2

Curionopolis virus

Shayang Fly Virus 2

Ceratitis capitata sigma virus

Mount Elgon bat virus

Lepeophtheirus salmonis rhabdovirus 127

Scophthalmus maximus rhabdovirus

Wuhan Fly Virus 2

Drosophila montana sigma virus

Wuhan Louse Fly Virus 5

Hart Park virus

Arboretum virus

Ord River virus

Bas Congo virus

Santa barbara virus

Wuhan Louse Fly Virus 9

Puerto Almendras virus

Lepeophtheirus salmonis rhabdovirus 9

Landjia virus

Sena Madureira virus

Isfahan virus

Sunguru virus

Pike fry rhabdovirus

Iriri virus

Caligus rogercresseyi 11114047 TSA

Wuhan Louse Fly Virus 8

Tacheng Tick Virus 3

Coastal Plains virus

Chaco virus

Drosophila ananassae sigma virus

Bahia Grande virus

Marco virus

Almpiwar virus

Malakal virus

Wuhan Tick Virus 1

Aruac virus

Sawgrass virus

Vesicular stomatitis virus Cocal

Parry Creek virus

Drosophila melanogaster sigma virus HAP23 isolate

Niakha virus

Drosophila melanogaster sigma virus AP30 isolate

Drosophila sturtvanti sigma virus

Joinjakaka virus

Moussa virus

Nkolbisson virus

Sweetwater Branch virus

Kwatta virus

Humulus lupulus TSA

Muscina stabulans sigma virus

Drosophila immigrans sigma virus

Eel Virus European X

Vesicular stomatitis virus New Jersey Hazelhurst

Barur virus

Connecticut virus

Starry flounder rhabdovirus

Harlingen virus

Inhangapi virus

North Creek Virus

Fukuoka virus

Kamese virus

Pararge aegeria rhabdovirus

Kimberley virus

Malpais Spring virus

Mosqueiro virus

Adelaide River virus

Flanders virus

Tupaia virus

Bivens Arm virus

Wuhan Louse Fly Virus 11

Drosophila tristis sigma virus

Muir Springs virus

Manitoba virus

Carajas oncolytic virus

Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus

Beaumont virus

Maraba virus

Wongabel virus

Ngaingan virus

Xiburema virus

Bats

Mosquitoes birds and mammals including humans

Mosquitoes

Drosophilid fruit fly

Mosquitoes

Mosquitoes and swine

Mosquitoes and sandflies

Sandflies

Birds

Grass carp

Aphid or its parasitoid wasp

Sandflies

Midges

Beet army worm moth

Mandarin fish

Bats

Cattle

Mosquitoes

Tench

Sea louse

Birds

Wild boar

Rodents

Ticks

Ticks

Voles

Ticks

Muscid house fly

Dolphins and porpoise

Mosquitoes

Drosophilid fruit fly

Common carp

Midges and bovids

Mosquitoes and rodents

Mammals including humans

Cattle

Ticks

Midges mosquitoes and ruminants

Sandflies

Lacewing

Louse fly

Bats

Sandflies and mammals including humans

Mammals including humans sandflies and mosquitoes

Ticks

MosquitoesSandflies

Mammals including humans biting and non-biting diptera

Ticks

Midges and mammals

Diptera species (Muscid house fly and Calliphorid laterine fly)

Tephritid fruit fly

Bats

Sea louse

Cultured turbot

Muscid house fly

Drosophilid fruit fly

Louse fly

Mosquitoes and birds

Mosquitoes

Mosquitoes

Humans

Psychodidae drain fly

Louse fly

Mosquitoes

Sea louse

Birds

Lizards

Mosquitoes ticks sandflies mammals including humans

Domestic chickens

Northern pike

Sandflies

Sea louse

Louse fly

Ticks

Bovids

Lizards

Drosophilid fruit fly

Mosquitoes

Lizards

Lizards

Mosquitoes

Ticks

Mosquitoes and birds

Ticks

Mites mosquitoes and mammals

Mosquitoes

Drosophilid fruit fly

Sandflies

Drosophilid fruit fly

Drosophilid fruit fly

Mosquitoes and cattle

Mosquitoes

Mosquitoes and humans

Midges and cattle

Mosquitoes ticks and mammals

Hops

False stable fly

Drosophilid fruit fly

European eel

Mammals including humans biting and non-biting diptera

Ticks mosquitoes fleas and mammals

Ticks and rabbits

Starry flounder

Mosquitoes

Sandflies and rodents

Mosquitoes

Midges mosquitoes and cattle

Mosquitoes and humans

Speckled wood butterfly

Midges mosquitoes and cattle

Mosquitoes

Mosquitoes

Cattle

Mosquitoes and birds

Tree shrews

Midges and cattle

Louse fly

Drosophilid fruit fly

Mosquitoes

Mosquitoes

Sandflies

Drosophilid fruit flyDrosophilid fruit fly

Mosquitoes

Sandflies

Midges and birds

Midges cattle and macropods

Mosquitoes

V

VV

BA

A

BA

VV

BA

BA

V

V

A

BA

BA

A

VS

V

V

BA

V

BA

V

V

V

BA

BA

V

BA

A

V

BA

A

VS

VV

VV

V

BA

BA

VV

BA

BA

V

VV

VV

BA

BABA

VV

BA

VV

A

A

V

BA

V

A

A

BA

VV

BA

BA

V

A

BA

BA

BA

V

V

VV

V

VS

BA

BA

BA

BA

V

V

A

BA

V

V

BA

BA

VV

BA

VV

BA

A

BA

A

A

VV

BA

VV

VV

VV

P

A

A

V

VV

VV

VV

V

BA

VV

BA

VV

VV

A

VV

BA

BA

V

VV

V

VV

BA

A

BA

BA

BA

AA

BA

BA

VV

VV

BA

04

sigma

viruses

dimarhabdovirus supergroup

BFig 2A

Figure 2 Continued

B Longdon et al | 7

replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states

We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled

35 Ancestral host associations and host-switches

Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)

Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups

Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade

Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was

contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips

There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species

Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)

We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in

8 | Virus Evolution 2015 Vol 1 No 1

the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice

4 Discussion

Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences

In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species

We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous

observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)

There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should

Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)

Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038

Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)

B Longdon et al | 9

reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)

Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis

Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses

Data availability

Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824

Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584

L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436

Funding

BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO

Supplementary data

Supplementary data is available at Virus Evolution online

Acknowledgements

Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments

Conflict of interest None declared

ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-

tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206

Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72

Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37

Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52

Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75

Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8

Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091

Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21

10 | Virus Evolution 2015 Vol 1 No 1

Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76

Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58

Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3

Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146

Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22

Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press

Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101

Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88

mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73

Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837

Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8

Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196

Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90

Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21

Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92

Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709

Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71

Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68

Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9

Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18

Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80

Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191

LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4

Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20

Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932

Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378

Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9

Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69

Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98

mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt

mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44

mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press

mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260

mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50

mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50

mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395

mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728

Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412

Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7

Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517

Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11

Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21

Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary

B Longdon et al | 11

Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23

Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt

Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt

Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91

Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59

Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9

Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77

van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256

Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press

mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25

mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664

Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63

Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210

Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32

12 | Virus Evolution 2015 Vol 1 No 1

databases and key features verified by PCR and Sanger sequenc-ing Rhabdovirus genomes are typically 11ndash13-kb long andcontain five core genes 30-N-P-M-G-L-50 (Dietzgen and Kuzmin2012) However a number of groups of rhabdoviruses containadditional accessory genes and can be up to 16-kb long(Walker et al 2011 2015)

32 New rhabdoviruses from public databases

We identified a further twenty-six novel rhabdovirus-like se-quences by searching public databases of assembled RNA-seqdata with BLAST These included nineteen viruses from arthro-pods (Fleas Crustacea Lepidoptera Diptera) one from aCnidarian (Hydra) and five from plants (Supplementary TableS3) Of these viruses nineteen had sufficient amounts of codingsequence (gt1000 bp) to include in the phylogenetic analysis(Supplementary Table S3) whilst the remainder were too short(Supplementary Table S4)

Four viruses from databases had near-complete genomesbased on their size These were from the moth Triodia sylvinathe house fly Musca domestica (99 nucleotide identity toWuhan house fly virus 2 [Li et al 2015]) the butterfly Paegeriaand the medfly Ccapitata all of which contain ORFs correspond-ing to the five core rhabdovirus genes The sequence fromCcapitata had an additional ORF between the P and M geneswith BLAST sequence similarity to the X gene in sigma virusesThere were several unusual sequences First in the virus fromPaegeria there appear to be two full-length glycoprotein ORFsbetween the M and L genes (we confirmed by Sanger sequencingthat both exist and the stop codon between the two genes wasnot an error) Second the Agave tequilana transcriptome con-tained a L gene ORF on a contig that was the length of a typicalrhabdovirus genome but did not appear to contain typical genecontent suggesting it has very atypical genome organization orhas been misassembled or is integrated into its host plant ge-nome (Chiba et al 2011) Finally the virus from Hydra magnipa-pillata contained six predicted genes but the L gene (RDRP) ORFwas unusually long Some of the viruses we detected may beEVEs inserted into the host genome and subsequently ex-pressed (Aiewsakun and Katzourakis 2015) For example this islikely the case for the sequence from the silkworm Bombyx morithat we also found in the silkworm genome and the L gene se-quence from Spodoptera exigua that contains stop codons Underthe assumption that viruses integrated into host genomes onceinfected those hosts this does not affect our conclusions belowabout the host range of these viruses (Katzourakis and Gifford2010 Fort et al 2011 Ballinger Bruenn and Taylor 2012) Wealso found nine other novel mononegavirale-like sequencesthat group with recently discovered clades of insect viruses (Liet al 2015) (see Supplementary Table S5 and SupplementaryFig S4)

33 Rhabdovirus phylogeny

To reconstruct the evolution of the Rhabdoviridae we have pro-duced the most complete phylogeny of the group to date (Fig 2)We aligned the relatively conserved L gene (RNA DependantRNA Polymerase) from our newly discovered viruses with se-quences of known rhabdoviruses to give an alignment of 195rhabdoviruses (and twenty-six other mononegavirales as anoutgroup) We reconstructed the phylogeny using different se-quence alignments and methodologies and these all gave qual-itatively similar results with the same major clades beingreconstructed The ML and Bayesian relaxed clock phylogenies

were very similar 149188 nodes are found in both reconstruc-tions and only two nodes present in the Bayesian relaxed clocktree with strong support are absent from the ML tree withstrong support These are found in a single basal clade of diver-gent but uniformly arthropod-specific strains where the differ-ence in topology will have no consequence for inference of hostassociation This suggests that our analysis is robust to the as-sumptions of a relaxed molecular clock The branching orderbetween the clades in the dimarhabdovirus supergroup wasgenerally poorly supported and differed between the methodsand alignments Eight sequences that we discovered were notincluded in this analysis as they were considered too short buttheir closest BLAST hits are listed in Supplementary Table S4

We recovered all of the major clades described previously(Fig 2) and found that the majority of known rhabdoviruses be-long to the dimarhabdovirus clade (Fig 2B) The RNA-seq vi-ruses from Drosophila fall into either the sigma virus clade(Fig 2B) or the arthropod clade sister to the cyto- and nucleo-rhabdoviruses (Fig 2A) The viruses from sequence databasesare diverse coming from almost all of the major clades with theexception of the lyssaviruses

34 Predicted host associations of viruses

With a few exceptions rhabdoviruses are either arthropod-vec-tored viruses of plants or vertebrates or are vertebrate- or ar-thropod- specific In many cases the only information about avirus is the host from which it was isolated Therefore a priori itis not clear whether viruses isolated from vertebrates are verte-brate-specific or arthropod-vectored or whether viruses iso-lated from biting arthropods (eg mosquitoes sandflies ticksmidges and sea lice) are arthropod specific or also infect verte-brates Likewise it is not clear whether viruses isolated fromsap-sucking insects (all Hemiptera aphids leafhoppers scaleinsect and mealybugs) are arthropod-specific or arthropod-vec-tored plant viruses By combining data on the ambiguous andknown host associations with phylogenetic information wewere able to predict both the ancestral and present host associ-ations of these viruses To do this we used a Bayesian phyloge-netic analysis that simultaneously estimated the phylogenyand host association of our data In the analysis we defined ourhost associations either as vertebrate-specific arthropod-spe-cific arthropod-vectored vertebrate arthropod-vectored plantnematode or as ambiguous between two (and in one case allfive) of these states (see lsquoMethodsrsquo section)

This approach identified a large number of viruses that arelikely to be new arthropod-vectored vertebrate viruses (Fig 2B)Of eighty viruses with ambiguous host associations eighty-ninewere assigned a host association with strong posterior support(gt095) Of the fifty-two viruses found in biting arthropods for-ty-five were predicted to be arthropod-vectored vertebrate vi-ruses and six to be arthropod-specific Of the thirty virusesfound in vertebrates twenty-two were predicted to be arthro-pod-vectored vertebrate viruses and two were predicted to bevertebrate-specific (both fish viruses) Of the seven virusesfound in plant-sap-feeding arthropods (Fig 2A) three were pre-dicted to be plant-associated and two arthropod-associated

To test the accuracy of our predictions of current host asso-ciations we randomly selected a set of viruses with known asso-ciations re-assigned their host association as ambiguousbetween all possible states (a greater level of uncertainty thanwe generally attributed to viruses in our data) and re-ran ouranalysis We repeated this ten times for nine sets of ten virusesand one set of nine viruses (randomly sampling without

B Longdon et al | 5

Hyd

ra m

agn

ipap

illat

a T

SA

Nor

ther

n ce

real

mos

aic

viru

sP

lan

oco

ccu

s ci

tri T

SA

Bok

eloh

bat

lyss

aviru

s

Tac

heng

Tic

k V

irus

7

Ikom

a ly

ssav

irus

Wuh

an In

sect

viru

s 6

Mu

sca

do

mes

tica

TS

A

Wuh

an F

ly V

irus

3

Shu

anga

o In

sect

Viru

s 6

Per

sim

mon

viru

s A

Hira

me

rhab

dovi

rus

Shu

anga

o B

edbu

g V

irus

2

Wuh

an In

sect

viru

s 4

Jing

shan

Fly

Viru

s 2

Ker

ria

lacc

a T

SA

Wuh

an M

osqu

ito V

irus

9

Shi

mon

i bat

viru

s

Aus

tral

ian

bat l

yssa

viru

s b

Mok

ola

viru

s is

olat

e 86

100C

AM

Irku

t viru

s

Lo

tus

corn

icu

latu

s T

SA

Lago

s ba

t viru

s i8

619N

GA

Aus

tral

ian

bat l

yssa

viru

s a

Spo

dopt

era

frug

iper

da r

habd

oviru

s

Lettu

ce n

ecro

tic y

ello

ws

viru

s

Fra

nkl

inie

lla o

ccid

enta

lis T

SA

Lo

lium

per

enn

e T

SA

Mai

ze Ir

ania

n m

osai

c vi

rus

Son

chus

yel

low

net

Dro

sop

hila

stu

rtva

nti

rh

abd

ovi

rus

Mai

ze m

osai

c vi

rus

Mok

ola

viru

s 86

101R

CA

Eur

opea

n ba

t lys

savi

rus

RV

9 1

Oro

psy

lla s

ilan

tiew

i TS

A

Orc

hid

fleck

viru

s

Wes

t Cau

casi

an b

at v

irus

Lettu

ce y

ello

w m

ottle

viru

s

Taa

stru

p vi

rus

Ag

ave

teq

uila

na

TS

A

Wuh

an In

sect

viru

s 5

Far

min

gton

viru

s

Wuh

an H

ouse

Fly

Viru

s 2

Rab

ies

viru

s

Ric

e ye

llow

stu

nt v

irus

Tar

o ve

in c

hlor

osis

viru

s

Vira

l hem

orrh

agic

sep

ticem

ia v

irus

Lyss

aviru

s O

zern

oe

Fox

feca

l rha

bdov

irus

Duv

enha

ge v

irus

8613

2SA

Mai

ze fi

ne s

trea

k vi

rus

Infe

ctio

us h

aem

atop

oiet

ic n

ecro

sis

viru

s

Med

icag

o sa

tiva

TS

A

San

xia

Wat

er S

trid

er V

irus

5

Tri

od

ia s

ylvi

na

TS

A

Sha

yang

Fly

Viru

s 3

Lago

s ba

t viru

s K

E13

1

Eur

opea

n ba

t lys

savi

rus

1 89

18F

RA

Dro

sop

hila

su

bo

bsc

ura

rh

abd

ovi

rus

Khu

jand

lyss

aviru

sE

urop

ean

bat l

yssa

viru

s 2

9018

HO

L

Egg

plan

t mot

tled

dwar

f viru

s

Soy

bean

cys

t nem

atod

e vi

rus

Sna

kehe

ad r

habd

oviru

s

Wuh

an A

nt V

irus

Dro

sop

hila

bu

scki

i rh

abd

ovi

rus

Ara

van

viru

s

Hyd

ra (

Cni

daria

n)

Cer

eals

and

leaf

hopp

ers

Citr

us m

ealy

bug

Bat

s

Tic

ks

Afr

ican

Civ

ets

Aph

id o

r its

par

asito

id w

asp

Mus

cid

hous

e fly

Dip

tera

spe

cies

(C

allip

horid

and

Sac

opha

gid

flies

)

Dip

tera

and

Lep

idop

tera

Per

sim

mon

tree

Fis

h

Bed

bug

Aph

id o

r its

par

asito

id w

asp

Sar

coph

agid

fles

h fly

Sca

le in

sect

Mos

quito

es

Bat

s

Bat

s an

d hu

man

s

Mam

mal

s sp

ecie

s

Bat

s

Flo

wer

ing

plan

t

Mam

mal

s sp

ecie

s

Bat

s an

d hu

man

s

Fal

l arm

y w

orm

mot

h

Lettu

ce o

ther

dic

ot p

lant

s an

d ap

hids

Wes

tern

flow

er th

rip

Rye

gra

ss

Cer

eals

and

pla

ntho

pper

Flo

wer

ing

plan

t and

aph

id

Dro

soph

ilid

frui

t fly

Mai

ze a

nd p

lant

hopp

ers

Mam

mal

s sp

ecie

s

Bat

s

Fle

a

Flo

wer

ing

plan

t

Bat

s

Lettu

ce a

nd a

phid

Leaf

hopp

er

Flo

wer

ing

plan

t

Aph

id o

r its

par

asito

id w

asp

Bird

spe

cies

Mus

cid

hous

e fly

Mam

mal

s sp

ecie

s in

clud

ing

hum

ans

Ric

e an

d le

afho

pper

s

Tar

o

Fis

h

Hum

ans

Fox

(fe

cal s

ampl

e)

Hum

ans

and

bats

Mai

ze a

nd le

afho

pper

Fis

h

Alfa

lfa

Wat

er S

trid

er

Ora

nge

swift

mot

h

Dip

tera

spe

cies

(C

allip

horid

and

Mus

cid

flies

)

Bat

s

Mam

mal

s sp

ecie

s

Dro

soph

ilid

frui

t fly

Bat

sH

uman

s an

d ba

ts

Egg

plan

t

Nem

atod

e

Fis

h

Japa

nese

car

pent

er a

nt

Dro

soph

ilid

frui

t fly

Bat

s

PAP

VS

BA

VS

AP

A AA PBA

AP

A AP

BA

VS

VS

VS

VS

P VS

VS

A PA P PPA P VS

VS

BA

P VS

PAP

PAP

VA VS

P P VS

UH

VS

PPAAA VS

VS

A VS

VS

PA A VS

N

Ass

ocia

ted

host

s A

rthro

pod-

vect

ored

pla

nt

Arth

ropo

ds

Ver

tebr

ate

spec

ific

Fig

2B0

4

Arth

ropo

d-ve

ctor

ed v

erte

brat

e Lo

w s

uppo

rt or

om

itted

N

emat

ode

lyssaviruses

cyto- and nucleo- rhabdoviruses

novi

rhab

dovi

ruse

s

A

Figu

re2

ML

ph

ylo

gen

yo

fth

eR

habd

ovir

idae

(A

)sh

ow

sth

eba

sal

fish

-in

fect

ing

no

virh

abd

ovi

ruse

san

un

assi

gned

gro

up

of

arth

rop

od

asso

ciat

edvi

ruse

sth

ep

lan

tin

fect

ing

cyto

-an

dn

ucl

eo-r

hab

do

viru

ses

asw

ella

sth

eve

rteb

rate

spec

ific

lyss

avir

use

s(B

)sh

ow

sth

ed

imar

hab

do

viru

ssu

per

gro

up

wh

ich

isp

red

om

inan

tly

com

po

sed

of

arth

rop

od

-vec

tore

dve

rteb

rate

viru

ses

alo

ng

wit

hth

ear

thro

po

d-s

pec

ific

sigm

avi

rus

clad

eB

ran

ches

are

colo

red

base

do

nth

e

Bay

esia

nh

ost

asso

ciat

ion

reco

nst

ruct

ion

anal

ysis

Bla

ckre

pre

sen

tsta

xao

mit

ted

fro

mh

ost

-sta

tere

con

stru

ctio

no

ras

soci

atio

ns

wit

hlt

095

sup

po

rtT

he

tree

was

infe

rred

fro

mL

gen

ese

qu

ence

su

sin

gth

eG

blo

cks

alig

nm

ent

Th

e

colu

mn

so

fte

xtar

eth

evi

rus

nam

eth

eh

ost

cate

gory

use

dfo

rre

con

stru

ctio

ns

and

kno

wn

ho

sts

(fro

mle

ftto

righ

t)C

od

esfo

rth

eh

ost

cate

gori

esar

eV

Sve

rteb

rate

-sp

ecifi

cV

Va

rth

rop

od

-vec

tore

dve

rteb

rate

Aa

rth

rop

od

spec

ific

BS

biti

ng-

arth

rop

od

(am

bigu

ou

sst

ate)

Vv

erte

brat

e(a

mbi

guo

us

stat

e)A

Pp

lan

t-sa

p-f

eed

ing-

arth

rop

od

(am

bigu

ou

sst

ate)

UH

un

cert

ain

-ho

st(a

mbi

guo

us

acro

ssal

lsta

tes)

an

dN

nem

atod

eN

ames

inbo

ldan

du

nd

erli

ned

are

vi-

ruse

sd

isco

vere

din

this

stu

dy

Th

etr

eeis

roo

ted

wit

hth

eC

hu

viru

scl

ade

(ro

ot

coll

apse

d)a

sid

enti

fied

asan

ou

tgro

up

in(L

iet

al2

015)

but

we

no

teth

isgi

ves

the

sam

ere

sult

asm

idp

oin

tan

dth

em

ole

cula

rcl

ock

roo

tin

gN

od

esla

-

bell

edw

ith

qu

esti

on

mar

ks(

)re

pre

sen

tn

od

esw

ith

aLR

T(a

pp

roxi

mat

eli

keli

ho

od

rati

ote

st)

stat

isti

cal

sup

po

rtva

lues

less

than

075

Sca

leba

rsh

ow

sn

um

ber

of

amin

o-a

cid

subs

titu

tio

ns

per

site

Bay

esia

nM

CC

tree

use

dto

infe

r

ance

stra

ltra

its

issh

ow

nin

Sup

ple

men

tary

Figu

reS4

(co

nti

nu

ed)

6 | Virus Evolution 2015 Vol 1 No 1

Kern Canyon virus

Mossuril virus

Yata virus

Drosophila obscura sigma virus

Gray Lodge virus

Oak Vale virus

Perinet virus

Morreton virus

Durham virus

Grass carp rhabdovirus

Wuhan Insect virus 7

Sripur virus

Itacaiunas virus

Spodoptera exigua TSA

Siniperca chuatsi virus

Fikirini bat rhabdovirus

Koolpinyah virus

Rochambeau virus

Tench rhabdovirus

Caligus rogercresseyi 11125273 TSA

Garba virus

Nishimuro virus

Keuraliba virus

New Minto virus

Long Island tick rhabdovirus

Klamath virus

Huangpi Tick Virus 3

Wuhan House Fly Virus 1

Dolphin rhabdovirus

Culex tritaeniorhynchus rhabdovirus

Scaptodrosophila deflexa sigma virus

Spring viremia of carp virus

Tibrogargan virus

La Joya virus

Vesicular stomatitis virus Alagoas Indiana 3

Berrimah virus

Taishun Tick Virus

Bovine ephmeral fever virus

Radi virus

Conwentzia psociformis TSA

Wuhan Louse Fly Virus 10

Oita virus

y

Chandipura virus

Vesicular stomatitis virus Indiana

Yongjia Tick Virus 2

Jurona virusYug Bogdanovac virus

Vesicular stomatitis virus New Jersey

Bole Tick Virus 2

Curionopolis virus

Shayang Fly Virus 2

Ceratitis capitata sigma virus

Mount Elgon bat virus

Lepeophtheirus salmonis rhabdovirus 127

Scophthalmus maximus rhabdovirus

Wuhan Fly Virus 2

Drosophila montana sigma virus

Wuhan Louse Fly Virus 5

Hart Park virus

Arboretum virus

Ord River virus

Bas Congo virus

Santa barbara virus

Wuhan Louse Fly Virus 9

Puerto Almendras virus

Lepeophtheirus salmonis rhabdovirus 9

Landjia virus

Sena Madureira virus

Isfahan virus

Sunguru virus

Pike fry rhabdovirus

Iriri virus

Caligus rogercresseyi 11114047 TSA

Wuhan Louse Fly Virus 8

Tacheng Tick Virus 3

Coastal Plains virus

Chaco virus

Drosophila ananassae sigma virus

Bahia Grande virus

Marco virus

Almpiwar virus

Malakal virus

Wuhan Tick Virus 1

Aruac virus

Sawgrass virus

Vesicular stomatitis virus Cocal

Parry Creek virus

Drosophila melanogaster sigma virus HAP23 isolate

Niakha virus

Drosophila melanogaster sigma virus AP30 isolate

Drosophila sturtvanti sigma virus

Joinjakaka virus

Moussa virus

Nkolbisson virus

Sweetwater Branch virus

Kwatta virus

Humulus lupulus TSA

Muscina stabulans sigma virus

Drosophila immigrans sigma virus

Eel Virus European X

Vesicular stomatitis virus New Jersey Hazelhurst

Barur virus

Connecticut virus

Starry flounder rhabdovirus

Harlingen virus

Inhangapi virus

North Creek Virus

Fukuoka virus

Kamese virus

Pararge aegeria rhabdovirus

Kimberley virus

Malpais Spring virus

Mosqueiro virus

Adelaide River virus

Flanders virus

Tupaia virus

Bivens Arm virus

Wuhan Louse Fly Virus 11

Drosophila tristis sigma virus

Muir Springs virus

Manitoba virus

Carajas oncolytic virus

Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus

Beaumont virus

Maraba virus

Wongabel virus

Ngaingan virus

Xiburema virus

Bats

Mosquitoes birds and mammals including humans

Mosquitoes

Drosophilid fruit fly

Mosquitoes

Mosquitoes and swine

Mosquitoes and sandflies

Sandflies

Birds

Grass carp

Aphid or its parasitoid wasp

Sandflies

Midges

Beet army worm moth

Mandarin fish

Bats

Cattle

Mosquitoes

Tench

Sea louse

Birds

Wild boar

Rodents

Ticks

Ticks

Voles

Ticks

Muscid house fly

Dolphins and porpoise

Mosquitoes

Drosophilid fruit fly

Common carp

Midges and bovids

Mosquitoes and rodents

Mammals including humans

Cattle

Ticks

Midges mosquitoes and ruminants

Sandflies

Lacewing

Louse fly

Bats

Sandflies and mammals including humans

Mammals including humans sandflies and mosquitoes

Ticks

MosquitoesSandflies

Mammals including humans biting and non-biting diptera

Ticks

Midges and mammals

Diptera species (Muscid house fly and Calliphorid laterine fly)

Tephritid fruit fly

Bats

Sea louse

Cultured turbot

Muscid house fly

Drosophilid fruit fly

Louse fly

Mosquitoes and birds

Mosquitoes

Mosquitoes

Humans

Psychodidae drain fly

Louse fly

Mosquitoes

Sea louse

Birds

Lizards

Mosquitoes ticks sandflies mammals including humans

Domestic chickens

Northern pike

Sandflies

Sea louse

Louse fly

Ticks

Bovids

Lizards

Drosophilid fruit fly

Mosquitoes

Lizards

Lizards

Mosquitoes

Ticks

Mosquitoes and birds

Ticks

Mites mosquitoes and mammals

Mosquitoes

Drosophilid fruit fly

Sandflies

Drosophilid fruit fly

Drosophilid fruit fly

Mosquitoes and cattle

Mosquitoes

Mosquitoes and humans

Midges and cattle

Mosquitoes ticks and mammals

Hops

False stable fly

Drosophilid fruit fly

European eel

Mammals including humans biting and non-biting diptera

Ticks mosquitoes fleas and mammals

Ticks and rabbits

Starry flounder

Mosquitoes

Sandflies and rodents

Mosquitoes

Midges mosquitoes and cattle

Mosquitoes and humans

Speckled wood butterfly

Midges mosquitoes and cattle

Mosquitoes

Mosquitoes

Cattle

Mosquitoes and birds

Tree shrews

Midges and cattle

Louse fly

Drosophilid fruit fly

Mosquitoes

Mosquitoes

Sandflies

Drosophilid fruit flyDrosophilid fruit fly

Mosquitoes

Sandflies

Midges and birds

Midges cattle and macropods

Mosquitoes

V

VV

BA

A

BA

VV

BA

BA

V

V

A

BA

BA

A

VS

V

V

BA

V

BA

V

V

V

BA

BA

V

BA

A

V

BA

A

VS

VV

VV

V

BA

BA

VV

BA

BA

V

VV

VV

BA

BABA

VV

BA

VV

A

A

V

BA

V

A

A

BA

VV

BA

BA

V

A

BA

BA

BA

V

V

VV

V

VS

BA

BA

BA

BA

V

V

A

BA

V

V

BA

BA

VV

BA

VV

BA

A

BA

A

A

VV

BA

VV

VV

VV

P

A

A

V

VV

VV

VV

V

BA

VV

BA

VV

VV

A

VV

BA

BA

V

VV

V

VV

BA

A

BA

BA

BA

AA

BA

BA

VV

VV

BA

04

sigma

viruses

dimarhabdovirus supergroup

BFig 2A

Figure 2 Continued

B Longdon et al | 7

replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states

We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled

35 Ancestral host associations and host-switches

Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)

Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups

Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade

Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was

contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips

There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species

Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)

We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in

8 | Virus Evolution 2015 Vol 1 No 1

the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice

4 Discussion

Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences

In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species

We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous

observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)

There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should

Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)

Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038

Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)

B Longdon et al | 9

reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)

Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis

Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses

Data availability

Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824

Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584

L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436

Funding

BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO

Supplementary data

Supplementary data is available at Virus Evolution online

Acknowledgements

Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments

Conflict of interest None declared

ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-

tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206

Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72

Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37

Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52

Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75

Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8

Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091

Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21

10 | Virus Evolution 2015 Vol 1 No 1

Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76

Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58

Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3

Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146

Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22

Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press

Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101

Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88

mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73

Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837

Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8

Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196

Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90

Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21

Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92

Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709

Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71

Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68

Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9

Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18

Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80

Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191

LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4

Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20

Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932

Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378

Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9

Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69

Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98

mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt

mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44

mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press

mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260

mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50

mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50

mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395

mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728

Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412

Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7

Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517

Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11

Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21

Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary

B Longdon et al | 11

Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23

Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt

Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt

Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91

Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59

Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9

Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77

van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256

Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press

mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25

mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664

Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63

Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210

Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32

12 | Virus Evolution 2015 Vol 1 No 1

Hyd

ra m

agn

ipap

illat

a T

SA

Nor

ther

n ce

real

mos

aic

viru

sP

lan

oco

ccu

s ci

tri T

SA

Bok

eloh

bat

lyss

aviru

s

Tac

heng

Tic

k V

irus

7

Ikom

a ly

ssav

irus

Wuh

an In

sect

viru

s 6

Mu

sca

do

mes

tica

TS

A

Wuh

an F

ly V

irus

3

Shu

anga

o In

sect

Viru

s 6

Per

sim

mon

viru

s A

Hira

me

rhab

dovi

rus

Shu

anga

o B

edbu

g V

irus

2

Wuh

an In

sect

viru

s 4

Jing

shan

Fly

Viru

s 2

Ker

ria

lacc

a T

SA

Wuh

an M

osqu

ito V

irus

9

Shi

mon

i bat

viru

s

Aus

tral

ian

bat l

yssa

viru

s b

Mok

ola

viru

s is

olat

e 86

100C

AM

Irku

t viru

s

Lo

tus

corn

icu

latu

s T

SA

Lago

s ba

t viru

s i8

619N

GA

Aus

tral

ian

bat l

yssa

viru

s a

Spo

dopt

era

frug

iper

da r

habd

oviru

s

Lettu

ce n

ecro

tic y

ello

ws

viru

s

Fra

nkl

inie

lla o

ccid

enta

lis T

SA

Lo

lium

per

enn

e T

SA

Mai

ze Ir

ania

n m

osai

c vi

rus

Son

chus

yel

low

net

Dro

sop

hila

stu

rtva

nti

rh

abd

ovi

rus

Mai

ze m

osai

c vi

rus

Mok

ola

viru

s 86

101R

CA

Eur

opea

n ba

t lys

savi

rus

RV

9 1

Oro

psy

lla s

ilan

tiew

i TS

A

Orc

hid

fleck

viru

s

Wes

t Cau

casi

an b

at v

irus

Lettu

ce y

ello

w m

ottle

viru

s

Taa

stru

p vi

rus

Ag

ave

teq

uila

na

TS

A

Wuh

an In

sect

viru

s 5

Far

min

gton

viru

s

Wuh

an H

ouse

Fly

Viru

s 2

Rab

ies

viru

s

Ric

e ye

llow

stu

nt v

irus

Tar

o ve

in c

hlor

osis

viru

s

Vira

l hem

orrh

agic

sep

ticem

ia v

irus

Lyss

aviru

s O

zern

oe

Fox

feca

l rha

bdov

irus

Duv

enha

ge v

irus

8613

2SA

Mai

ze fi

ne s

trea

k vi

rus

Infe

ctio

us h

aem

atop

oiet

ic n

ecro

sis

viru

s

Med

icag

o sa

tiva

TS

A

San

xia

Wat

er S

trid

er V

irus

5

Tri

od

ia s

ylvi

na

TS

A

Sha

yang

Fly

Viru

s 3

Lago

s ba

t viru

s K

E13

1

Eur

opea

n ba

t lys

savi

rus

1 89

18F

RA

Dro

sop

hila

su

bo

bsc

ura

rh

abd

ovi

rus

Khu

jand

lyss

aviru

sE

urop

ean

bat l

yssa

viru

s 2

9018

HO

L

Egg

plan

t mot

tled

dwar

f viru

s

Soy

bean

cys

t nem

atod

e vi

rus

Sna

kehe

ad r

habd

oviru

s

Wuh

an A

nt V

irus

Dro

sop

hila

bu

scki

i rh

abd

ovi

rus

Ara

van

viru

s

Hyd

ra (

Cni

daria

n)

Cer

eals

and

leaf

hopp

ers

Citr

us m

ealy

bug

Bat

s

Tic

ks

Afr

ican

Civ

ets

Aph

id o

r its

par

asito

id w

asp

Mus

cid

hous

e fly

Dip

tera

spe

cies

(C

allip

horid

and

Sac

opha

gid

flies

)

Dip

tera

and

Lep

idop

tera

Per

sim

mon

tree

Fis

h

Bed

bug

Aph

id o

r its

par

asito

id w

asp

Sar

coph

agid

fles

h fly

Sca

le in

sect

Mos

quito

es

Bat

s

Bat

s an

d hu

man

s

Mam

mal

s sp

ecie

s

Bat

s

Flo

wer

ing

plan

t

Mam

mal

s sp

ecie

s

Bat

s an

d hu

man

s

Fal

l arm

y w

orm

mot

h

Lettu

ce o

ther

dic

ot p

lant

s an

d ap

hids

Wes

tern

flow

er th

rip

Rye

gra

ss

Cer

eals

and

pla

ntho

pper

Flo

wer

ing

plan

t and

aph

id

Dro

soph

ilid

frui

t fly

Mai

ze a

nd p

lant

hopp

ers

Mam

mal

s sp

ecie

s

Bat

s

Fle

a

Flo

wer

ing

plan

t

Bat

s

Lettu

ce a

nd a

phid

Leaf

hopp

er

Flo

wer

ing

plan

t

Aph

id o

r its

par

asito

id w

asp

Bird

spe

cies

Mus

cid

hous

e fly

Mam

mal

s sp

ecie

s in

clud

ing

hum

ans

Ric

e an

d le

afho

pper

s

Tar

o

Fis

h

Hum

ans

Fox

(fe

cal s

ampl

e)

Hum

ans

and

bats

Mai

ze a

nd le

afho

pper

Fis

h

Alfa

lfa

Wat

er S

trid

er

Ora

nge

swift

mot

h

Dip

tera

spe

cies

(C

allip

horid

and

Mus

cid

flies

)

Bat

s

Mam

mal

s sp

ecie

s

Dro

soph

ilid

frui

t fly

Bat

sH

uman

s an

d ba

ts

Egg

plan

t

Nem

atod

e

Fis

h

Japa

nese

car

pent

er a

nt

Dro

soph

ilid

frui

t fly

Bat

s

PAP

VS

BA

VS

AP

A AA PBA

AP

A AP

BA

VS

VS

VS

VS

P VS

VS

A PA P PPA P VS

VS

BA

P VS

PAP

PAP

VA VS

P P VS

UH

VS

PPAAA VS

VS

A VS

VS

PA A VS

N

Ass

ocia

ted

host

s A

rthro

pod-

vect

ored

pla

nt

Arth

ropo

ds

Ver

tebr

ate

spec

ific

Fig

2B0

4

Arth

ropo

d-ve

ctor

ed v

erte

brat

e Lo

w s

uppo

rt or

om

itted

N

emat

ode

lyssaviruses

cyto- and nucleo- rhabdoviruses

novi

rhab

dovi

ruse

s

A

Figu

re2

ML

ph

ylo

gen

yo

fth

eR

habd

ovir

idae

(A

)sh

ow

sth

eba

sal

fish

-in

fect

ing

no

virh

abd

ovi

ruse

san

un

assi

gned

gro

up

of

arth

rop

od

asso

ciat

edvi

ruse

sth

ep

lan

tin

fect

ing

cyto

-an

dn

ucl

eo-r

hab

do

viru

ses

asw

ella

sth

eve

rteb

rate

spec

ific

lyss

avir

use

s(B

)sh

ow

sth

ed

imar

hab

do

viru

ssu

per

gro

up

wh

ich

isp

red

om

inan

tly

com

po

sed

of

arth

rop

od

-vec

tore

dve

rteb

rate

viru

ses

alo

ng

wit

hth

ear

thro

po

d-s

pec

ific

sigm

avi

rus

clad

eB

ran

ches

are

colo

red

base

do

nth

e

Bay

esia

nh

ost

asso

ciat

ion

reco

nst

ruct

ion

anal

ysis

Bla

ckre

pre

sen

tsta

xao

mit

ted

fro

mh

ost

-sta

tere

con

stru

ctio

no

ras

soci

atio

ns

wit

hlt

095

sup

po

rtT

he

tree

was

infe

rred

fro

mL

gen

ese

qu

ence

su

sin

gth

eG

blo

cks

alig

nm

ent

Th

e

colu

mn

so

fte

xtar

eth

evi

rus

nam

eth

eh

ost

cate

gory

use

dfo

rre

con

stru

ctio

ns

and

kno

wn

ho

sts

(fro

mle

ftto

righ

t)C

od

esfo

rth

eh

ost

cate

gori

esar

eV

Sve

rteb

rate

-sp

ecifi

cV

Va

rth

rop

od

-vec

tore

dve

rteb

rate

Aa

rth

rop

od

spec

ific

BS

biti

ng-

arth

rop

od

(am

bigu

ou

sst

ate)

Vv

erte

brat

e(a

mbi

guo

us

stat

e)A

Pp

lan

t-sa

p-f

eed

ing-

arth

rop

od

(am

bigu

ou

sst

ate)

UH

un

cert

ain

-ho

st(a

mbi

guo

us

acro

ssal

lsta

tes)

an

dN

nem

atod

eN

ames

inbo

ldan

du

nd

erli

ned

are

vi-

ruse

sd

isco

vere

din

this

stu

dy

Th

etr

eeis

roo

ted

wit

hth

eC

hu

viru

scl

ade

(ro

ot

coll

apse

d)a

sid

enti

fied

asan

ou

tgro

up

in(L

iet

al2

015)

but

we

no

teth

isgi

ves

the

sam

ere

sult

asm

idp

oin

tan

dth

em

ole

cula

rcl

ock

roo

tin

gN

od

esla

-

bell

edw

ith

qu

esti

on

mar

ks(

)re

pre

sen

tn

od

esw

ith

aLR

T(a

pp

roxi

mat

eli

keli

ho

od

rati

ote

st)

stat

isti

cal

sup

po

rtva

lues

less

than

075

Sca

leba

rsh

ow

sn

um

ber

of

amin

o-a

cid

subs

titu

tio

ns

per

site

Bay

esia

nM

CC

tree

use

dto

infe

r

ance

stra

ltra

its

issh

ow

nin

Sup

ple

men

tary

Figu

reS4

(co

nti

nu

ed)

6 | Virus Evolution 2015 Vol 1 No 1

Kern Canyon virus

Mossuril virus

Yata virus

Drosophila obscura sigma virus

Gray Lodge virus

Oak Vale virus

Perinet virus

Morreton virus

Durham virus

Grass carp rhabdovirus

Wuhan Insect virus 7

Sripur virus

Itacaiunas virus

Spodoptera exigua TSA

Siniperca chuatsi virus

Fikirini bat rhabdovirus

Koolpinyah virus

Rochambeau virus

Tench rhabdovirus

Caligus rogercresseyi 11125273 TSA

Garba virus

Nishimuro virus

Keuraliba virus

New Minto virus

Long Island tick rhabdovirus

Klamath virus

Huangpi Tick Virus 3

Wuhan House Fly Virus 1

Dolphin rhabdovirus

Culex tritaeniorhynchus rhabdovirus

Scaptodrosophila deflexa sigma virus

Spring viremia of carp virus

Tibrogargan virus

La Joya virus

Vesicular stomatitis virus Alagoas Indiana 3

Berrimah virus

Taishun Tick Virus

Bovine ephmeral fever virus

Radi virus

Conwentzia psociformis TSA

Wuhan Louse Fly Virus 10

Oita virus

y

Chandipura virus

Vesicular stomatitis virus Indiana

Yongjia Tick Virus 2

Jurona virusYug Bogdanovac virus

Vesicular stomatitis virus New Jersey

Bole Tick Virus 2

Curionopolis virus

Shayang Fly Virus 2

Ceratitis capitata sigma virus

Mount Elgon bat virus

Lepeophtheirus salmonis rhabdovirus 127

Scophthalmus maximus rhabdovirus

Wuhan Fly Virus 2

Drosophila montana sigma virus

Wuhan Louse Fly Virus 5

Hart Park virus

Arboretum virus

Ord River virus

Bas Congo virus

Santa barbara virus

Wuhan Louse Fly Virus 9

Puerto Almendras virus

Lepeophtheirus salmonis rhabdovirus 9

Landjia virus

Sena Madureira virus

Isfahan virus

Sunguru virus

Pike fry rhabdovirus

Iriri virus

Caligus rogercresseyi 11114047 TSA

Wuhan Louse Fly Virus 8

Tacheng Tick Virus 3

Coastal Plains virus

Chaco virus

Drosophila ananassae sigma virus

Bahia Grande virus

Marco virus

Almpiwar virus

Malakal virus

Wuhan Tick Virus 1

Aruac virus

Sawgrass virus

Vesicular stomatitis virus Cocal

Parry Creek virus

Drosophila melanogaster sigma virus HAP23 isolate

Niakha virus

Drosophila melanogaster sigma virus AP30 isolate

Drosophila sturtvanti sigma virus

Joinjakaka virus

Moussa virus

Nkolbisson virus

Sweetwater Branch virus

Kwatta virus

Humulus lupulus TSA

Muscina stabulans sigma virus

Drosophila immigrans sigma virus

Eel Virus European X

Vesicular stomatitis virus New Jersey Hazelhurst

Barur virus

Connecticut virus

Starry flounder rhabdovirus

Harlingen virus

Inhangapi virus

North Creek Virus

Fukuoka virus

Kamese virus

Pararge aegeria rhabdovirus

Kimberley virus

Malpais Spring virus

Mosqueiro virus

Adelaide River virus

Flanders virus

Tupaia virus

Bivens Arm virus

Wuhan Louse Fly Virus 11

Drosophila tristis sigma virus

Muir Springs virus

Manitoba virus

Carajas oncolytic virus

Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus

Beaumont virus

Maraba virus

Wongabel virus

Ngaingan virus

Xiburema virus

Bats

Mosquitoes birds and mammals including humans

Mosquitoes

Drosophilid fruit fly

Mosquitoes

Mosquitoes and swine

Mosquitoes and sandflies

Sandflies

Birds

Grass carp

Aphid or its parasitoid wasp

Sandflies

Midges

Beet army worm moth

Mandarin fish

Bats

Cattle

Mosquitoes

Tench

Sea louse

Birds

Wild boar

Rodents

Ticks

Ticks

Voles

Ticks

Muscid house fly

Dolphins and porpoise

Mosquitoes

Drosophilid fruit fly

Common carp

Midges and bovids

Mosquitoes and rodents

Mammals including humans

Cattle

Ticks

Midges mosquitoes and ruminants

Sandflies

Lacewing

Louse fly

Bats

Sandflies and mammals including humans

Mammals including humans sandflies and mosquitoes

Ticks

MosquitoesSandflies

Mammals including humans biting and non-biting diptera

Ticks

Midges and mammals

Diptera species (Muscid house fly and Calliphorid laterine fly)

Tephritid fruit fly

Bats

Sea louse

Cultured turbot

Muscid house fly

Drosophilid fruit fly

Louse fly

Mosquitoes and birds

Mosquitoes

Mosquitoes

Humans

Psychodidae drain fly

Louse fly

Mosquitoes

Sea louse

Birds

Lizards

Mosquitoes ticks sandflies mammals including humans

Domestic chickens

Northern pike

Sandflies

Sea louse

Louse fly

Ticks

Bovids

Lizards

Drosophilid fruit fly

Mosquitoes

Lizards

Lizards

Mosquitoes

Ticks

Mosquitoes and birds

Ticks

Mites mosquitoes and mammals

Mosquitoes

Drosophilid fruit fly

Sandflies

Drosophilid fruit fly

Drosophilid fruit fly

Mosquitoes and cattle

Mosquitoes

Mosquitoes and humans

Midges and cattle

Mosquitoes ticks and mammals

Hops

False stable fly

Drosophilid fruit fly

European eel

Mammals including humans biting and non-biting diptera

Ticks mosquitoes fleas and mammals

Ticks and rabbits

Starry flounder

Mosquitoes

Sandflies and rodents

Mosquitoes

Midges mosquitoes and cattle

Mosquitoes and humans

Speckled wood butterfly

Midges mosquitoes and cattle

Mosquitoes

Mosquitoes

Cattle

Mosquitoes and birds

Tree shrews

Midges and cattle

Louse fly

Drosophilid fruit fly

Mosquitoes

Mosquitoes

Sandflies

Drosophilid fruit flyDrosophilid fruit fly

Mosquitoes

Sandflies

Midges and birds

Midges cattle and macropods

Mosquitoes

V

VV

BA

A

BA

VV

BA

BA

V

V

A

BA

BA

A

VS

V

V

BA

V

BA

V

V

V

BA

BA

V

BA

A

V

BA

A

VS

VV

VV

V

BA

BA

VV

BA

BA

V

VV

VV

BA

BABA

VV

BA

VV

A

A

V

BA

V

A

A

BA

VV

BA

BA

V

A

BA

BA

BA

V

V

VV

V

VS

BA

BA

BA

BA

V

V

A

BA

V

V

BA

BA

VV

BA

VV

BA

A

BA

A

A

VV

BA

VV

VV

VV

P

A

A

V

VV

VV

VV

V

BA

VV

BA

VV

VV

A

VV

BA

BA

V

VV

V

VV

BA

A

BA

BA

BA

AA

BA

BA

VV

VV

BA

04

sigma

viruses

dimarhabdovirus supergroup

BFig 2A

Figure 2 Continued

B Longdon et al | 7

replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states

We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled

35 Ancestral host associations and host-switches

Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)

Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups

Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade

Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was

contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips

There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species

Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)

We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in

8 | Virus Evolution 2015 Vol 1 No 1

the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice

4 Discussion

Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences

In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species

We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous

observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)

There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should

Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)

Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038

Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)

B Longdon et al | 9

reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)

Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis

Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses

Data availability

Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824

Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584

L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436

Funding

BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO

Supplementary data

Supplementary data is available at Virus Evolution online

Acknowledgements

Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments

Conflict of interest None declared

ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-

tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206

Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72

Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37

Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52

Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75

Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8

Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091

Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21

10 | Virus Evolution 2015 Vol 1 No 1

Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76

Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58

Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3

Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146

Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22

Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press

Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101

Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88

mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73

Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837

Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8

Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196

Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90

Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21

Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92

Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709

Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71

Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68

Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9

Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18

Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80

Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191

LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4

Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20

Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932

Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378

Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9

Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69

Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98

mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt

mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44

mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press

mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260

mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50

mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50

mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395

mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728

Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412

Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7

Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517

Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11

Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21

Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary

B Longdon et al | 11

Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23

Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt

Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt

Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91

Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59

Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9

Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77

van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256

Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press

mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25

mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664

Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63

Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210

Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32

12 | Virus Evolution 2015 Vol 1 No 1

Kern Canyon virus

Mossuril virus

Yata virus

Drosophila obscura sigma virus

Gray Lodge virus

Oak Vale virus

Perinet virus

Morreton virus

Durham virus

Grass carp rhabdovirus

Wuhan Insect virus 7

Sripur virus

Itacaiunas virus

Spodoptera exigua TSA

Siniperca chuatsi virus

Fikirini bat rhabdovirus

Koolpinyah virus

Rochambeau virus

Tench rhabdovirus

Caligus rogercresseyi 11125273 TSA

Garba virus

Nishimuro virus

Keuraliba virus

New Minto virus

Long Island tick rhabdovirus

Klamath virus

Huangpi Tick Virus 3

Wuhan House Fly Virus 1

Dolphin rhabdovirus

Culex tritaeniorhynchus rhabdovirus

Scaptodrosophila deflexa sigma virus

Spring viremia of carp virus

Tibrogargan virus

La Joya virus

Vesicular stomatitis virus Alagoas Indiana 3

Berrimah virus

Taishun Tick Virus

Bovine ephmeral fever virus

Radi virus

Conwentzia psociformis TSA

Wuhan Louse Fly Virus 10

Oita virus

y

Chandipura virus

Vesicular stomatitis virus Indiana

Yongjia Tick Virus 2

Jurona virusYug Bogdanovac virus

Vesicular stomatitis virus New Jersey

Bole Tick Virus 2

Curionopolis virus

Shayang Fly Virus 2

Ceratitis capitata sigma virus

Mount Elgon bat virus

Lepeophtheirus salmonis rhabdovirus 127

Scophthalmus maximus rhabdovirus

Wuhan Fly Virus 2

Drosophila montana sigma virus

Wuhan Louse Fly Virus 5

Hart Park virus

Arboretum virus

Ord River virus

Bas Congo virus

Santa barbara virus

Wuhan Louse Fly Virus 9

Puerto Almendras virus

Lepeophtheirus salmonis rhabdovirus 9

Landjia virus

Sena Madureira virus

Isfahan virus

Sunguru virus

Pike fry rhabdovirus

Iriri virus

Caligus rogercresseyi 11114047 TSA

Wuhan Louse Fly Virus 8

Tacheng Tick Virus 3

Coastal Plains virus

Chaco virus

Drosophila ananassae sigma virus

Bahia Grande virus

Marco virus

Almpiwar virus

Malakal virus

Wuhan Tick Virus 1

Aruac virus

Sawgrass virus

Vesicular stomatitis virus Cocal

Parry Creek virus

Drosophila melanogaster sigma virus HAP23 isolate

Niakha virus

Drosophila melanogaster sigma virus AP30 isolate

Drosophila sturtvanti sigma virus

Joinjakaka virus

Moussa virus

Nkolbisson virus

Sweetwater Branch virus

Kwatta virus

Humulus lupulus TSA

Muscina stabulans sigma virus

Drosophila immigrans sigma virus

Eel Virus European X

Vesicular stomatitis virus New Jersey Hazelhurst

Barur virus

Connecticut virus

Starry flounder rhabdovirus

Harlingen virus

Inhangapi virus

North Creek Virus

Fukuoka virus

Kamese virus

Pararge aegeria rhabdovirus

Kimberley virus

Malpais Spring virus

Mosqueiro virus

Adelaide River virus

Flanders virus

Tupaia virus

Bivens Arm virus

Wuhan Louse Fly Virus 11

Drosophila tristis sigma virus

Muir Springs virus

Manitoba virus

Carajas oncolytic virus

Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus

Beaumont virus

Maraba virus

Wongabel virus

Ngaingan virus

Xiburema virus

Bats

Mosquitoes birds and mammals including humans

Mosquitoes

Drosophilid fruit fly

Mosquitoes

Mosquitoes and swine

Mosquitoes and sandflies

Sandflies

Birds

Grass carp

Aphid or its parasitoid wasp

Sandflies

Midges

Beet army worm moth

Mandarin fish

Bats

Cattle

Mosquitoes

Tench

Sea louse

Birds

Wild boar

Rodents

Ticks

Ticks

Voles

Ticks

Muscid house fly

Dolphins and porpoise

Mosquitoes

Drosophilid fruit fly

Common carp

Midges and bovids

Mosquitoes and rodents

Mammals including humans

Cattle

Ticks

Midges mosquitoes and ruminants

Sandflies

Lacewing

Louse fly

Bats

Sandflies and mammals including humans

Mammals including humans sandflies and mosquitoes

Ticks

MosquitoesSandflies

Mammals including humans biting and non-biting diptera

Ticks

Midges and mammals

Diptera species (Muscid house fly and Calliphorid laterine fly)

Tephritid fruit fly

Bats

Sea louse

Cultured turbot

Muscid house fly

Drosophilid fruit fly

Louse fly

Mosquitoes and birds

Mosquitoes

Mosquitoes

Humans

Psychodidae drain fly

Louse fly

Mosquitoes

Sea louse

Birds

Lizards

Mosquitoes ticks sandflies mammals including humans

Domestic chickens

Northern pike

Sandflies

Sea louse

Louse fly

Ticks

Bovids

Lizards

Drosophilid fruit fly

Mosquitoes

Lizards

Lizards

Mosquitoes

Ticks

Mosquitoes and birds

Ticks

Mites mosquitoes and mammals

Mosquitoes

Drosophilid fruit fly

Sandflies

Drosophilid fruit fly

Drosophilid fruit fly

Mosquitoes and cattle

Mosquitoes

Mosquitoes and humans

Midges and cattle

Mosquitoes ticks and mammals

Hops

False stable fly

Drosophilid fruit fly

European eel

Mammals including humans biting and non-biting diptera

Ticks mosquitoes fleas and mammals

Ticks and rabbits

Starry flounder

Mosquitoes

Sandflies and rodents

Mosquitoes

Midges mosquitoes and cattle

Mosquitoes and humans

Speckled wood butterfly

Midges mosquitoes and cattle

Mosquitoes

Mosquitoes

Cattle

Mosquitoes and birds

Tree shrews

Midges and cattle

Louse fly

Drosophilid fruit fly

Mosquitoes

Mosquitoes

Sandflies

Drosophilid fruit flyDrosophilid fruit fly

Mosquitoes

Sandflies

Midges and birds

Midges cattle and macropods

Mosquitoes

V

VV

BA

A

BA

VV

BA

BA

V

V

A

BA

BA

A

VS

V

V

BA

V

BA

V

V

V

BA

BA

V

BA

A

V

BA

A

VS

VV

VV

V

BA

BA

VV

BA

BA

V

VV

VV

BA

BABA

VV

BA

VV

A

A

V

BA

V

A

A

BA

VV

BA

BA

V

A

BA

BA

BA

V

V

VV

V

VS

BA

BA

BA

BA

V

V

A

BA

V

V

BA

BA

VV

BA

VV

BA

A

BA

A

A

VV

BA

VV

VV

VV

P

A

A

V

VV

VV

VV

V

BA

VV

BA

VV

VV

A

VV

BA

BA

V

VV

V

VV

BA

A

BA

BA

BA

AA

BA

BA

VV

VV

BA

04

sigma

viruses

dimarhabdovirus supergroup

BFig 2A

Figure 2 Continued

B Longdon et al | 7

replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states

We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled

35 Ancestral host associations and host-switches

Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)

Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups

Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade

Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was

contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips

There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species

Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)

We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in

8 | Virus Evolution 2015 Vol 1 No 1

the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice

4 Discussion

Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences

In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species

We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous

observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)

There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should

Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)

Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038

Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)

B Longdon et al | 9

reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)

Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis

Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses

Data availability

Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824

Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584

L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436

Funding

BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO

Supplementary data

Supplementary data is available at Virus Evolution online

Acknowledgements

Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments

Conflict of interest None declared

ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-

tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206

Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72

Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37

Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52

Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75

Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8

Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091

Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21

10 | Virus Evolution 2015 Vol 1 No 1

Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76

Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58

Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3

Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146

Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22

Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press

Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101

Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88

mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73

Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837

Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8

Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196

Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90

Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21

Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92

Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709

Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71

Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68

Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9

Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18

Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80

Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191

LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4

Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20

Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932

Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378

Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9

Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69

Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98

mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt

mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44

mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press

mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260

mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50

mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50

mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395

mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728

Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412

Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7

Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517

Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11

Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21

Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary

B Longdon et al | 11

Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23

Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt

Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt

Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91

Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59

Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9

Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77

van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256

Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press

mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25

mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664

Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63

Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210

Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32

12 | Virus Evolution 2015 Vol 1 No 1

replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states

We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled

35 Ancestral host associations and host-switches

Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)

Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups

Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade

Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was

contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips

There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species

Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)

We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in

8 | Virus Evolution 2015 Vol 1 No 1

the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice

4 Discussion

Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences

In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species

We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous

observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)

There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should

Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)

Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038

Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)

B Longdon et al | 9

reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)

Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis

Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses

Data availability

Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824

Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584

L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436

Funding

BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO

Supplementary data

Supplementary data is available at Virus Evolution online

Acknowledgements

Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments

Conflict of interest None declared

ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-

tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206

Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72

Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37

Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52

Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75

Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8

Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091

Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21

10 | Virus Evolution 2015 Vol 1 No 1

Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76

Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58

Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3

Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146

Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22

Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press

Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101

Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88

mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73

Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837

Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8

Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196

Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90

Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21

Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92

Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709

Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71

Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68

Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9

Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18

Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80

Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191

LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4

Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20

Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932

Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378

Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9

Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69

Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98

mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt

mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44

mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press

mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260

mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50

mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50

mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395

mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728

Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412

Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7

Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517

Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11

Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21

Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary

B Longdon et al | 11

Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23

Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt

Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt

Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91

Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59

Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9

Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77

van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256

Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press

mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25

mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664

Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63

Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210

Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32

12 | Virus Evolution 2015 Vol 1 No 1

the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice

4 Discussion

Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences

In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species

We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous

observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)

There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should

Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)

Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038

Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)

B Longdon et al | 9

reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)

Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis

Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses

Data availability

Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824

Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584

L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436

Funding

BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO

Supplementary data

Supplementary data is available at Virus Evolution online

Acknowledgements

Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments

Conflict of interest None declared

ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-

tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206

Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72

Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37

Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52

Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75

Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8

Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091

Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21

10 | Virus Evolution 2015 Vol 1 No 1

Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76

Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58

Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3

Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146

Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22

Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press

Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101

Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88

mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73

Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837

Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8

Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196

Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90

Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21

Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92

Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709

Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71

Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68

Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9

Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18

Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80

Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191

LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4

Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20

Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932

Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378

Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9

Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69

Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98

mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt

mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44

mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press

mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260

mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50

mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50

mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395

mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728

Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412

Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7

Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517

Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11

Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21

Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary

B Longdon et al | 11

Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23

Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt

Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt

Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91

Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59

Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9

Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77

van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256

Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press

mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25

mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664

Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63

Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210

Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32

12 | Virus Evolution 2015 Vol 1 No 1

reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)

Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis

Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses

Data availability

Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824

Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584

L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436

Funding

BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO

Supplementary data

Supplementary data is available at Virus Evolution online

Acknowledgements

Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments

Conflict of interest None declared

ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-

tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206

Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72

Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37

Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52

Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75

Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8

Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091

Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21

10 | Virus Evolution 2015 Vol 1 No 1

Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76

Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58

Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3

Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146

Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22

Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press

Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101

Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88

mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73

Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837

Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8

Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196

Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90

Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21

Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92

Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709

Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71

Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68

Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9

Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18

Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80

Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191

LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4

Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20

Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932

Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378

Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9

Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69

Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98

mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt

mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44

mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press

mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260

mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50

mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50

mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395

mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728

Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412

Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7

Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517

Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11

Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21

Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary

B Longdon et al | 11

Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23

Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt

Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt

Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91

Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59

Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9

Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77

van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256

Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press

mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25

mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664

Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63

Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210

Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32

12 | Virus Evolution 2015 Vol 1 No 1

Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76

Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58

Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3

Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146

Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22

Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press

Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101

Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88

mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73

Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837

Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8

Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196

Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90

Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21

Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92

Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709

Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71

Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68

Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9

Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18

Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80

Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191

LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4

Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20

Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932

Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378

Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9

Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69

Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98

mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt

mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44

mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press

mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260

mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50

mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50

mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395

mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728

Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412

Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7

Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517

Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11

Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21

Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary

B Longdon et al | 11

Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23

Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt

Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt

Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91

Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59

Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9

Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77

van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256

Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press

mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25

mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664

Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63

Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210

Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32

12 | Virus Evolution 2015 Vol 1 No 1

Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23

Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt

Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt

Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91

Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59

Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9

Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77

van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256

Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press

mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25

mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664

Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63

Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210

Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32

12 | Virus Evolution 2015 Vol 1 No 1


Recommended