The evolution diversity and host associations of
rhabdovirusesBen Longdon1 Gemma G R Murray1 William J Palmer1 Jonathan P Day1
Darren J Parker23 John J Welch1 Darren J Obbard4 and Francis M Jiggins1
1Department of Genetics University of Cambridge Cambridge CB2 3EH 2School of Biology University ofSt Andrews St Andrews KY19 9ST UK 3Department of Biological and Environmental Science University ofJyvaskyla Jyvaskyla Finland and 4Institute of Evolutionary Biology and Centre for Immunity Infection andEvolution University of Edinburgh Edinburgh EH9 3JT UK
Corresponding author E-mail blongdongencamacuk
Abstract
Metagenomic studies are leading to the discovery of a hidden diversity of RNA viruses These new viruses are poorlycharacterized and new approaches are needed predict the host species these viruses pose a risk to The rhabdoviruses are adiverse family of RNA viruses that includes important pathogens of humans animals and plants We have discoveredthirty-two new rhabdoviruses through a combination of our own RNA sequencing of insects and searching public sequencedatabases Combining these with previously known sequences we reconstructed the phylogeny of 195 rhabdovirussequences and produced the most in depth analysis of the family to date In most cases we know nothing about the biologyof the viruses beyond the host they were identified from but our dataset provides a powerful phylogenetic approach topredict which are vector-borne viruses and which are specific to vertebrates or arthropods By reconstructing ancestral andpresent host states we found that switches between major groups of hosts have occurred rarely during rhabdovirusevolution This allowed us to propose seventy-six new likely vector-borne vertebrate viruses among viruses identified fromvertebrates or biting insects Based on currently available data our analysis suggests it is likely there was a single origin ofthe known plant viruses and arthropod-borne vertebrate viruses while vertebrate- and arthropod-specific viruses arose atleast twice There are also few transitions between aquatic and terrestrial ecosystems Viruses also cluster together at afiner scale with closely related viruses tending to be found in closely related hosts Our data therefore suggest thatthroughout their evolution rhabdoviruses have occasionally jumped between distantly related host species beforespreading through related hosts in the same environment This approach offers a way to predict the most probable biologyand key traits of newly discovered viruses
Key words virus host shift arthropod insect rhabdoviridae mononegavirales
1 Introduction
RNA viruses are an abundant and diverse group of pathogensIn the past viruses were typically isolated from hosts displayingsymptoms of infection before being characterized morphologi-cally and then sequenced following PCR (Liu Vijayendran andBonning 2011 Lipkin and Anthony 2015) PCR-based detectionof novel RNA viruses is problematic as there is no single con-served region of the genome shared by all viruses from a single
family let alone across all RNA viruses High throughput nextgeneration sequencing technology has revolutionized virus dis-covery allowing rapid detection and sequencing of divergent vi-rus sequences simply by sequencing total RNA from infectedindividuals (Liu Vijayendran and Bonning 2011 Lipkin andAnthony 2015)
One particularly diverse family of RNA viruses is the Rhabdo-viridae Rhabdoviruses are negative-sense single-stranded RNA
VC The Author 2015 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40)which permits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited
1
Virus Evolution 2015 1(1) vev014
doi 101093vevev014Research article
viruses in the order Mononegavirales (Dietzgen and Kuzmin2012) They infect an extremely broad range of hosts and havebeen discovered in plants fish mammals reptiles and a broadrange of insects and other arthropods (Bourhy et al 2005) Thefamily includes important pathogens of humans and livestockPerhaps the most well-known is rabies virus which can infect adiverse array of mammals and causes a fatal infection killing59000 people per year with an estimated economic cost of $86billion (US) (Hampson et al 2015) Other rhabdoviruses such asvesicular stomatitis virus and bovine ephemeral fever virus areimportant pathogens of domesticated animals while others arepathogens of crops (Dietzgen and Kuzmin 2012)
Arthropods play a key role in the transmission of manyrhabdoviruses Many viruses found in vertebrates have alsobeen detected in arthropods including sandflies mosquitoesticks and midges (Walker Blasdell and Joubert 2012) The rhab-doviruses that infect plants are also often transmitted by ar-thropods (Hogenhout Redinbaugh and Ammar 2003) and somethat infect fish can potentially be vectored by ectoparasitic co-pepod sea-lice (Pfeilputzien 1978 Ahne et al 2002) Moreoverinsects are biological vectors rhabdoviruses replicate upon in-fection of insect vectors (Hogenhout Redinbaugh and Ammar2003) Other rhabdoviruses are insect-specific In particular thesigma viruses are a clade of vertically transmitted viruses thatinfect dipterans and are well-studied in Drosophila (Longdonet al 2011ab Longdon and Jiggins 2012) Recently a number ofrhabdoviruses have been found to be associated with a wide ar-ray of insect and other arthropod species suggesting they maybe common arthropod viruses (Li et al 2015 Walker et al 2015)Furthermore a number of arthropod genomes contain inte-grated endogenous viral elements (EVEs) with similarity torhabdoviruses suggesting that these species have been infectedwith rhabdoviruses at some point in their history (Katzourakisand Gifford 2010 Fort et al 2011 Ballinger Bruenn and Taylor2012 Aiewsakun and Katzourakis 2015)
Here we explore the diversity of the rhabdoviruses andexamine how they have switched between different host taxaduring their evolutionary history Insects infected with rhabdo-viruses commonly become paralysed on exposure to CO2 (Rosen1980 Shroyer and Rosen 1983 Longdon Wilfert and Jiggins2012) We exploited this fact to screen field collections of fliesfrom several continents for novel rhabdoviruses that were thensequenced using metagenomic RNA-sequencing (RNA-seq)Additionally we searched for rhabdovirus-like sequences inpublicly available RNA-seq data We identified thirty-two novelrhabdovirus-like sequences from a wide array of invertebratesand plants and combined them with recently discovered vi-ruses to produce the most comprehensive phylogeny of therhabdoviruses to date For many of the viruses we do not knowtheir true host range so we used the phylogeny to identify alarge number of new likely vector-borne viruses and to recon-struct the evolutionary history of this diverse group of viruses
2 Methods21 Discovery of new rhabdoviruses by RNA sequencing
Diptera (flies mostly Drosophilidae) were collected in the fieldfrom Spain USA Kenya France Ghana and the UK(Supplementary Data S1) Infection with rhabdoviruses cancause Drosophila and other insects to become paralysed after ex-posure to CO2 (Rosen 1980 Shroyer and Rosen 1983 LongdonWilfert and Jiggins 2012) so we enriched our sample for in-fected individuals by exposing them to CO2 at 12C for 15 min
only retaining individuals that showed symptoms of paralysis30 min later We extracted RNA from seventy-nine individualinsects (details in Supplementary Data S1) using Trizol reagent(Invitrogen) and combined the extracts into two pools (retainingnon-pooled individual RNA samples) RNA was then rRNA de-pleted with the Ribo-Zero Gold kit (epicenter USA) and used toconstruct Truseq total RNA libraries (Illumina) Libraries wereconstructed and sequenced by BGI (Hong Kong) on an IlluminaHi-Seq 2500 (one lane 100-bp paired end reads generating 175million reads) Sequences were quality trimmed withTrimmomatic (v3) Illumina adapters were clipped bases wereremoved from the beginning and end of reads if quality droppedbelow a threshold sequences were trimmed if the average qual-ity within a window fell below a threshold and reads lt20 bp inlength were removed We de novo assembled the RNA-seq readswith Trinity (release 25 February 2013) using default settingsand jaccard clip option for high gene density The assembly wasthen searched using tblastn to identify rhabdovirus-like se-quences with known rhabdovirus coding sequences as thequery Any contigs with high sequence similarity to rhabdovi-ruses were then reciprocally compared to GenBank cDNA andRefSeq nucleotide databases using tblastn and only retained ifthey most closely matched a virus-like sequence Raw read datawere deposited in the NCBI Sequence Read Archive (SRP057824)Putative viral sequences have been submitted to GenBank (ac-cession numbers in Supplementary Tables S1 and S2)
As the RNA-seq was performed on pooled samples we as-signed rhabdovirus sequences to individual insects by PCR onRNA from individual samples cDNA was produced usingPromega GoScript Reverse Transcriptase and random-hexamerprimers and PCR performed using primers designed from therhabdovirus sequences Infected host species were identified bysequencing the mitochondrial gene COI We were unable toidentify the host species of the virus from a Drosophila affinissub-group species (sequences appear similar to both Daffinisand the closely related Drosophila athabasca) despite the addi-tion of further mitochondrial and nuclear sequences to increaseconfidence In all cases we confirmed that viruses were onlypresent in cDNA and not in non-reverse-transcription (RT) con-trols (ie DNA) by PCR and so they cannot be integrated intothe insect genome (ie endogenous virus elements or EVEs[Katzourakis and Gifford 2010]) COI primers were used as a pos-itive control for the presence of DNA in the non-RT template
We identified sigma virus sequences in RNA-seq data fromDrosophila montana (Parker et al 2015) We used RT-PCR on an in-fected fly line to amplify the virus sequence and carried out ad-ditional Sanger sequencing with primers designed using theRNA-seq assembly Additional virus sequences were identifiedfrom an RNA-seq analysis of pools of wild caught DrosophilaDImmSV from Drosophila immigrans (collection and sequencingdescribed [van Mierlo et al 2014]) DTriSV from a pool ofDrosophila tristis and SDefSV from Scaptodrosophila deflexa (bothDarren Obbard unpublished data) GenBank accession numbersfor new virus sequences are (KR822817 KR822816 KR822823KR822813 KR822820 KR822821 KR822822 KR822815 KR822824KR822812 KR822811 KR822814 and KR822818) A full list of ac-cessions can be found in Supplementary Tables S1 and S2
22 Discovery of rhabdoviruses in public sequencedatabases
Rhabdovirus L gene sequences were used as queries to search(tblastn) expressed sequence tag and transcriptome shotgun as-sembly databases (NCBI) All sequences were reciprocally
2 | Virus Evolution 2015 Vol 1 No 1
BLAST searched against GenBank cDNA and RefSeq databasesand only retained if they matched a virus-like sequence Weused two approaches to examine whether sequences were pre-sent as RNA but not DNA First where assemblies of whole-ge-nome shotgun sequences were available we used BLAST to testwhether sequences were integrated into the host genomeSecond for the virus sequences in the butterfly Pararge aegeriaand the medfly Ceratitis capitata we were able to obtain infectedsamples to confirm whether sequences are only present in RNAby performing PCR on both genomic DNA and cDNA as de-scribed above (samples kindly provided by Casper BreukerMelanie Gibbs and Philip Leftwich respectively)
23 Phylogenetic analysis
All available rhabdovirus-like sequences were downloadedfrom GenBank (accessions in Supplementary Data S2) Aminoacid sequences for the L gene (encoding the RNA DependentRNA Polymerase or RDRP) were used to infer the phylogeny(L gene sequences) as they contain conserved domains that canbe aligned across this diverse group of viruses Sequences werealigned with MAFFT (Katoh and Standley 2013) under defaultsettings and then poorly aligned and divergent sites were re-moved with either TrimAl (v13 strict settings implemented onPhylemon v20 server alignment) (Capella-Gutierrez Silla-Martinez and Gabaldon 2009) or Gblocks (v091b selectingsmaller final blocks allowing gap positions and less strict flank-ing positions to produce a less stringent selection alignment)(Talavera and Castresana 2007) These resulted in alignments of1492 and 829 amino acids respectively
Phylogenetic trees were inferred using Maximum Likelihoodin PhyML (v30) (Guindon et al 2010) using the LG substitutionmodel (Le and Gascuel 2008) (preliminary analysis confirmedthe results were robust to the amino acid substitution model se-lected) with a gamma distribution of rate variation with fourcategories and a sub-tree pruning and regrafting topologysearching algorithm Branch support was estimated usingApproximate Likelihood-Ratio Tests (aLRT) that are reported tooutperform bootstrap methods (Anisimova and Gascuel 2006)Figures were created using FIGTREE (v 14) (Rambaut 2011)
24 Analysis of phylogenetic structure between virusestaken from different hosts and ecologies
We measured the degree of phylogenetic structure between vi-rus sequences identified in different categories of host (arthro-pods vertebrates and plants) and ecosystems (terrestrial andaquatic) Following Bhatia et al (2013) we measured the degreeof genetic structure between virus sequences from differentgroups of hostsecosystems using Hudsonrsquos Fst estimator(Hudson Slatkin and Maddison 1992) as in Bhatia et al (2013)We calculated Fst as 1 the mean number of differences be-tween sequences within or between populations where a popu-lation is a host category or ecosystem The significance of thisvalue was tested by comparison with 1000 replicates with hostcategories randomly permuted over sequences We also mea-sured the clustering of these categories over our phylogeny us-ing the genealogical sorting index (GSI) a measure of the degreeof exclusive ancestry of a group on a rooted genealogy(Cummings Neel and Shaw 2008) for each of our host associa-tion categories The index was estimated using thegenealogicalSorting R package (Bazinet Myers and Khatavkar2009) with significance estimated by permutation The tree waspruned to remove strains that could not be assigned to one of
the host association categories under consideration Finallysince arthropods are the most sampled host we tested for evi-dence of genetic structure within the arthropod-associated vi-ruses that would suggest co-divergence with their hosts orpreferential host-switching between closely related hosts Wecalculated the Pearson correlation coefficient of the evolution-ary distances between viruses and the evolutionary distancesbetween their hosts and tested for significance by permutation(as in Hommola et al [2009]) We used the patristic distances ofour ML tree for the virus data and a time-tree of arthropod gen-era using published estimates of divergence dates (Jeyaprakashand Hoy 2009 Misof et al 2014)
25 Reconstruction of host associations
Viruses were categorized as having one of four types of host as-sociation arthropod-specific vertebrate-specific arthropod-vectored plant or arthropod-vectored vertebrate However thehost association of some viruses is uncertain when they havebeen isolated from vertebrates biting-arthropods or plant-sap-feeding arthropods Due to limited sampling it was not clearwhether viruses isolated from vertebrates were vertebrate spe-cific or arthropod-vectored vertebrate viruses or whether vi-ruses isolated from biting-arthropods were arthropod specificviruses or arthropod-vectored vertebrate viruses or if virusesisolated from plant-sap-feeding arthropods were arthropod-specific or arthropod-vectored plant viruses
We classified a virus from a nematode as having its ownhost category We classified three of the fish infecting dimar-habdoviruses as vertebrate specific based on the fact they canbe transmitted via immersion in water containing virus duringexperimental conditions (Bootsma Dekinkelin and Leberre1975 Dorson et al 1987 Haenen and Davidse 1993) and thewidely held belief amongst the fisheries community that theseviruses are not typically vectored (Ahne et al 2002)] Howeverthere is some evidence these viruses can be transmitted by ar-thropods (sea lice) in experiments (Pfeilputzien 1978 Ahne et al2002) and so we would recommend this be interpreted withsome caution Additionally although we classified the virusesidentified in sea-lice as having biting arthropod hosts they maybe crustacean-specific The two viruses from Lepeophtheirus sal-monis do not seem to infect the fish they parasitize and are pre-sent in all developmental stages of the lice suggesting theymay be transmitted vertically (Okland et al 2014)
We simultaneously estimated both the current and ancestralhost associations and the phylogeny of the viruses using aBayesian analysis implemented in BEAST v18 (Drummondet al 2012 Weinert et al 2012) Because meaningful branchlengths are essential for this analysis (uncertainty about branchlengths will feed into uncertainty about the estimates) we useda subset of the sites and strains used in the maximum likeli-hood (ML) analysis We retained 189 taxa all rhabdoviruses ex-cluding the divergent fish-infecting novirhabdovirus clade andthe virus from Hydra as well as the viruses from Lolium perenneand Conwentzia psociformis which had a large number of missingsites Sequences were trimmed to a conserved region of 414amino acids where data was recorded for most of these viruses(the Gblocks alignment trimmed further by eye)
We used the host-association categories described abovewhich included ambiguous states To describe amino acid evo-lution we used an LG substitution model with gamma distrib-uted rate variation across sites (Le and Gascuel 2008) and anuncorrelated lognormal relaxed clock model of rate variationamong lineages (Drummond et al 2006) To describe the
B Longdon et al | 3
evolution of the host associations we used a strict clock modeland a discrete asymmetric transition rate matrix (allowing tran-sitions to and from a host association to take place at differentrates) as previously used to model migrations between discretegeographic locations (Edwards et al 2011) and host switches(Weinert et al 2012 Faria et al 2013) We also examined how of-ten these viruses jumped between different classes of hosts us-ing reconstructed counts of biologically feasible changes of hostassociation and their HPD confidence intervals (CIs) usingMarkov Jumps (Minin and Suchard 2008) These includedswitches between arthropod-specific and both arthropod-vec-tored vertebrate and arthropod-vectored plant states and be-tween vertebrate specific and arthropod-vectored vertebratestates We used a constant population size coalescent prior forthe relative node ages (using a birth-death prior gave equivalentresults) and the BEAUti v18 default priors for all other parame-ters (Drummond et al 2012) (BEAUti xml available asSupplementary Material) In Figure 2 we have transferred theancestral state reconstruction from the BEAST tree to the MLtree
Convergence was assessed using Tracer v16 (Rambaut andDrummond 2007) and a burn-in of 30 was removed prior tothe construction of a consensus tree which included a descrip-tion of ancestral host associations in the output file High effec-tive sample sizes were achieved for all parameters (gt200)Previous simulations in the context of biogeographical infer-ence have shown that the approach is robust to sampling bias(Edwards et al 2011) However to confirm this following(Lemey et al 2014) we tested whether sample size predicts rateto or from a host association
3 Results31 Novel rhabdoviruses from RNA-seq
To search for new rhabdoviruses we collected a variety of differ-ent species of flies screened them for CO2 sensitivity which is acommon symptom of infection and sequenced total RNA ofthese flies by RNA-seq We identified rhabdovirus-like se-quences from a de-novo assembly by BLAST and used PCR toidentify which samples these sequences came from
This approach resulted in eleven rhabdovirus-like sequencesfrom nine (possibly ten) species of fly Seven of these viruseswere previously unknown and four had been reported previ-ously from shorter sequences (Supplementary Tables S1 andS2) The novel viruses were highly divergent from known vi-ruses Sigma viruses known from other species of Drosophilatypically have genomes of 125 kb (Longdon Obbard andJiggins 2010 Longdon et al 2011b) and six of our sequenceswere approximately this size suggesting they are near-complete genomes None of the viruses discovered in our RNA-seq data were integrated into the host genome (see lsquoMethodsrsquosection for details)
To investigate the putative gene content of the viruses wepredicted genes based on open reading frames (ORFs) For theviruses with apparently complete genomes (Fig 1) we foundthat those from Drosophila ananassae Daffinis Dimmigrans andDrosophila sturtvanti contained ORFs corresponding to the fivecore genes found across all rhabdoviruses with an additionalORF between the P and M genes This is the location of the Xgene found in sigma viruses and in three of the four novel vi-ruses it showed BLAST sequence similarity to the X gene ofsigma viruses The virus from Drosophila busckii did not containan additional ORF between the P and M genes but instead con-tained an ORF between the G and L gene
Using the phylogeny described below we have classified ournewly discovered viruses as either sigma viruses rhabdovi-ruses or other viruses and named them after the host speciesthey were identified from (Fig 1) (Longdon and Walker 2011)We also found one other novel mononegavirales-like sequencefrom Drosophila unispina that groups with a recently discoveredclade of arthropod associated viruses (Nyamivirus clade [Li et al2015] see Supplementary Table S5 and the full phylogeny) aswell as five other RNA viruses from various families (data notshown) confirming our approach can detect a wide range of di-vergent viruses
Putative genes are shown in color non-coding regions areshown in black ORFs were designated as the first start codonfollowing the transcription termination sequence (7 Ursquos) of theprevious ORF to the first stop codon Dotted lines representparts of the genome not sequenced These viruses were eitherfrom our own RNA-seq data or were first found in in public
Scaptodrosophila deflexa sigmavirusDrosophila sturtevanti rhabdovirus
Drosophila tristis sigmavirusDrosophila montana sigmavirus
Drosophila subobscura rhabdovirusDrosophila algonquin sigmavirus
Pararge aegeria rhabdovirusCeratitis capitata sigmavirus
Drosophila busckii rhabdovirusDrosophila sturtevanti sigmavirusDrosophila immigrans sigmavirus
Drosophila ananassae sigmavirusDrosophila affinis sigmavirus
Sequence length (nucleotides)0 5000 10000 15000
N P M G L X Accessory
Figure 1 Genome organization of newly discovered viruses from metagenomic RNA sequencing of CO2 sensitive flies
4 | Virus Evolution 2015 Vol 1 No 1
databases and key features verified by PCR and Sanger sequenc-ing Rhabdovirus genomes are typically 11ndash13-kb long andcontain five core genes 30-N-P-M-G-L-50 (Dietzgen and Kuzmin2012) However a number of groups of rhabdoviruses containadditional accessory genes and can be up to 16-kb long(Walker et al 2011 2015)
32 New rhabdoviruses from public databases
We identified a further twenty-six novel rhabdovirus-like se-quences by searching public databases of assembled RNA-seqdata with BLAST These included nineteen viruses from arthro-pods (Fleas Crustacea Lepidoptera Diptera) one from aCnidarian (Hydra) and five from plants (Supplementary TableS3) Of these viruses nineteen had sufficient amounts of codingsequence (gt1000 bp) to include in the phylogenetic analysis(Supplementary Table S3) whilst the remainder were too short(Supplementary Table S4)
Four viruses from databases had near-complete genomesbased on their size These were from the moth Triodia sylvinathe house fly Musca domestica (99 nucleotide identity toWuhan house fly virus 2 [Li et al 2015]) the butterfly Paegeriaand the medfly Ccapitata all of which contain ORFs correspond-ing to the five core rhabdovirus genes The sequence fromCcapitata had an additional ORF between the P and M geneswith BLAST sequence similarity to the X gene in sigma virusesThere were several unusual sequences First in the virus fromPaegeria there appear to be two full-length glycoprotein ORFsbetween the M and L genes (we confirmed by Sanger sequencingthat both exist and the stop codon between the two genes wasnot an error) Second the Agave tequilana transcriptome con-tained a L gene ORF on a contig that was the length of a typicalrhabdovirus genome but did not appear to contain typical genecontent suggesting it has very atypical genome organization orhas been misassembled or is integrated into its host plant ge-nome (Chiba et al 2011) Finally the virus from Hydra magnipa-pillata contained six predicted genes but the L gene (RDRP) ORFwas unusually long Some of the viruses we detected may beEVEs inserted into the host genome and subsequently ex-pressed (Aiewsakun and Katzourakis 2015) For example this islikely the case for the sequence from the silkworm Bombyx morithat we also found in the silkworm genome and the L gene se-quence from Spodoptera exigua that contains stop codons Underthe assumption that viruses integrated into host genomes onceinfected those hosts this does not affect our conclusions belowabout the host range of these viruses (Katzourakis and Gifford2010 Fort et al 2011 Ballinger Bruenn and Taylor 2012) Wealso found nine other novel mononegavirale-like sequencesthat group with recently discovered clades of insect viruses (Liet al 2015) (see Supplementary Table S5 and SupplementaryFig S4)
33 Rhabdovirus phylogeny
To reconstruct the evolution of the Rhabdoviridae we have pro-duced the most complete phylogeny of the group to date (Fig 2)We aligned the relatively conserved L gene (RNA DependantRNA Polymerase) from our newly discovered viruses with se-quences of known rhabdoviruses to give an alignment of 195rhabdoviruses (and twenty-six other mononegavirales as anoutgroup) We reconstructed the phylogeny using different se-quence alignments and methodologies and these all gave qual-itatively similar results with the same major clades beingreconstructed The ML and Bayesian relaxed clock phylogenies
were very similar 149188 nodes are found in both reconstruc-tions and only two nodes present in the Bayesian relaxed clocktree with strong support are absent from the ML tree withstrong support These are found in a single basal clade of diver-gent but uniformly arthropod-specific strains where the differ-ence in topology will have no consequence for inference of hostassociation This suggests that our analysis is robust to the as-sumptions of a relaxed molecular clock The branching orderbetween the clades in the dimarhabdovirus supergroup wasgenerally poorly supported and differed between the methodsand alignments Eight sequences that we discovered were notincluded in this analysis as they were considered too short buttheir closest BLAST hits are listed in Supplementary Table S4
We recovered all of the major clades described previously(Fig 2) and found that the majority of known rhabdoviruses be-long to the dimarhabdovirus clade (Fig 2B) The RNA-seq vi-ruses from Drosophila fall into either the sigma virus clade(Fig 2B) or the arthropod clade sister to the cyto- and nucleo-rhabdoviruses (Fig 2A) The viruses from sequence databasesare diverse coming from almost all of the major clades with theexception of the lyssaviruses
34 Predicted host associations of viruses
With a few exceptions rhabdoviruses are either arthropod-vec-tored viruses of plants or vertebrates or are vertebrate- or ar-thropod- specific In many cases the only information about avirus is the host from which it was isolated Therefore a priori itis not clear whether viruses isolated from vertebrates are verte-brate-specific or arthropod-vectored or whether viruses iso-lated from biting arthropods (eg mosquitoes sandflies ticksmidges and sea lice) are arthropod specific or also infect verte-brates Likewise it is not clear whether viruses isolated fromsap-sucking insects (all Hemiptera aphids leafhoppers scaleinsect and mealybugs) are arthropod-specific or arthropod-vec-tored plant viruses By combining data on the ambiguous andknown host associations with phylogenetic information wewere able to predict both the ancestral and present host associ-ations of these viruses To do this we used a Bayesian phyloge-netic analysis that simultaneously estimated the phylogenyand host association of our data In the analysis we defined ourhost associations either as vertebrate-specific arthropod-spe-cific arthropod-vectored vertebrate arthropod-vectored plantnematode or as ambiguous between two (and in one case allfive) of these states (see lsquoMethodsrsquo section)
This approach identified a large number of viruses that arelikely to be new arthropod-vectored vertebrate viruses (Fig 2B)Of eighty viruses with ambiguous host associations eighty-ninewere assigned a host association with strong posterior support(gt095) Of the fifty-two viruses found in biting arthropods for-ty-five were predicted to be arthropod-vectored vertebrate vi-ruses and six to be arthropod-specific Of the thirty virusesfound in vertebrates twenty-two were predicted to be arthro-pod-vectored vertebrate viruses and two were predicted to bevertebrate-specific (both fish viruses) Of the seven virusesfound in plant-sap-feeding arthropods (Fig 2A) three were pre-dicted to be plant-associated and two arthropod-associated
To test the accuracy of our predictions of current host asso-ciations we randomly selected a set of viruses with known asso-ciations re-assigned their host association as ambiguousbetween all possible states (a greater level of uncertainty thanwe generally attributed to viruses in our data) and re-ran ouranalysis We repeated this ten times for nine sets of ten virusesand one set of nine viruses (randomly sampling without
B Longdon et al | 5
Hyd
ra m
agn
ipap
illat
a T
SA
Nor
ther
n ce
real
mos
aic
viru
sP
lan
oco
ccu
s ci
tri T
SA
Bok
eloh
bat
lyss
aviru
s
Tac
heng
Tic
k V
irus
7
Ikom
a ly
ssav
irus
Wuh
an In
sect
viru
s 6
Mu
sca
do
mes
tica
TS
A
Wuh
an F
ly V
irus
3
Shu
anga
o In
sect
Viru
s 6
Per
sim
mon
viru
s A
Hira
me
rhab
dovi
rus
Shu
anga
o B
edbu
g V
irus
2
Wuh
an In
sect
viru
s 4
Jing
shan
Fly
Viru
s 2
Ker
ria
lacc
a T
SA
Wuh
an M
osqu
ito V
irus
9
Shi
mon
i bat
viru
s
Aus
tral
ian
bat l
yssa
viru
s b
Mok
ola
viru
s is
olat
e 86
100C
AM
Irku
t viru
s
Lo
tus
corn
icu
latu
s T
SA
Lago
s ba
t viru
s i8
619N
GA
Aus
tral
ian
bat l
yssa
viru
s a
Spo
dopt
era
frug
iper
da r
habd
oviru
s
Lettu
ce n
ecro
tic y
ello
ws
viru
s
Fra
nkl
inie
lla o
ccid
enta
lis T
SA
Lo
lium
per
enn
e T
SA
Mai
ze Ir
ania
n m
osai
c vi
rus
Son
chus
yel
low
net
Dro
sop
hila
stu
rtva
nti
rh
abd
ovi
rus
Mai
ze m
osai
c vi
rus
Mok
ola
viru
s 86
101R
CA
Eur
opea
n ba
t lys
savi
rus
RV
9 1
Oro
psy
lla s
ilan
tiew
i TS
A
Orc
hid
fleck
viru
s
Wes
t Cau
casi
an b
at v
irus
Lettu
ce y
ello
w m
ottle
viru
s
Taa
stru
p vi
rus
Ag
ave
teq
uila
na
TS
A
Wuh
an In
sect
viru
s 5
Far
min
gton
viru
s
Wuh
an H
ouse
Fly
Viru
s 2
Rab
ies
viru
s
Ric
e ye
llow
stu
nt v
irus
Tar
o ve
in c
hlor
osis
viru
s
Vira
l hem
orrh
agic
sep
ticem
ia v
irus
Lyss
aviru
s O
zern
oe
Fox
feca
l rha
bdov
irus
Duv
enha
ge v
irus
8613
2SA
Mai
ze fi
ne s
trea
k vi
rus
Infe
ctio
us h
aem
atop
oiet
ic n
ecro
sis
viru
s
Med
icag
o sa
tiva
TS
A
San
xia
Wat
er S
trid
er V
irus
5
Tri
od
ia s
ylvi
na
TS
A
Sha
yang
Fly
Viru
s 3
Lago
s ba
t viru
s K
E13
1
Eur
opea
n ba
t lys
savi
rus
1 89
18F
RA
Dro
sop
hila
su
bo
bsc
ura
rh
abd
ovi
rus
Khu
jand
lyss
aviru
sE
urop
ean
bat l
yssa
viru
s 2
9018
HO
L
Egg
plan
t mot
tled
dwar
f viru
s
Soy
bean
cys
t nem
atod
e vi
rus
Sna
kehe
ad r
habd
oviru
s
Wuh
an A
nt V
irus
Dro
sop
hila
bu
scki
i rh
abd
ovi
rus
Ara
van
viru
s
Hyd
ra (
Cni
daria
n)
Cer
eals
and
leaf
hopp
ers
Citr
us m
ealy
bug
Bat
s
Tic
ks
Afr
ican
Civ
ets
Aph
id o
r its
par
asito
id w
asp
Mus
cid
hous
e fly
Dip
tera
spe
cies
(C
allip
horid
and
Sac
opha
gid
flies
)
Dip
tera
and
Lep
idop
tera
Per
sim
mon
tree
Fis
h
Bed
bug
Aph
id o
r its
par
asito
id w
asp
Sar
coph
agid
fles
h fly
Sca
le in
sect
Mos
quito
es
Bat
s
Bat
s an
d hu
man
s
Mam
mal
s sp
ecie
s
Bat
s
Flo
wer
ing
plan
t
Mam
mal
s sp
ecie
s
Bat
s an
d hu
man
s
Fal
l arm
y w
orm
mot
h
Lettu
ce o
ther
dic
ot p
lant
s an
d ap
hids
Wes
tern
flow
er th
rip
Rye
gra
ss
Cer
eals
and
pla
ntho
pper
Flo
wer
ing
plan
t and
aph
id
Dro
soph
ilid
frui
t fly
Mai
ze a
nd p
lant
hopp
ers
Mam
mal
s sp
ecie
s
Bat
s
Fle
a
Flo
wer
ing
plan
t
Bat
s
Lettu
ce a
nd a
phid
Leaf
hopp
er
Flo
wer
ing
plan
t
Aph
id o
r its
par
asito
id w
asp
Bird
spe
cies
Mus
cid
hous
e fly
Mam
mal
s sp
ecie
s in
clud
ing
hum
ans
Ric
e an
d le
afho
pper
s
Tar
o
Fis
h
Hum
ans
Fox
(fe
cal s
ampl
e)
Hum
ans
and
bats
Mai
ze a
nd le
afho
pper
Fis
h
Alfa
lfa
Wat
er S
trid
er
Ora
nge
swift
mot
h
Dip
tera
spe
cies
(C
allip
horid
and
Mus
cid
flies
)
Bat
s
Mam
mal
s sp
ecie
s
Dro
soph
ilid
frui
t fly
Bat
sH
uman
s an
d ba
ts
Egg
plan
t
Nem
atod
e
Fis
h
Japa
nese
car
pent
er a
nt
Dro
soph
ilid
frui
t fly
Bat
s
PAP
VS
BA
VS
AP
A AA PBA
AP
A AP
BA
VS
VS
VS
VS
P VS
VS
A PA P PPA P VS
VS
BA
P VS
PAP
PAP
VA VS
P P VS
UH
VS
PPAAA VS
VS
A VS
VS
PA A VS
N
Ass
ocia
ted
host
s A
rthro
pod-
vect
ored
pla
nt
Arth
ropo
ds
Ver
tebr
ate
spec
ific
Fig
2B0
4
Arth
ropo
d-ve
ctor
ed v
erte
brat
e Lo
w s
uppo
rt or
om
itted
N
emat
ode
lyssaviruses
cyto- and nucleo- rhabdoviruses
novi
rhab
dovi
ruse
s
A
Figu
re2
ML
ph
ylo
gen
yo
fth
eR
habd
ovir
idae
(A
)sh
ow
sth
eba
sal
fish
-in
fect
ing
no
virh
abd
ovi
ruse
san
un
assi
gned
gro
up
of
arth
rop
od
asso
ciat
edvi
ruse
sth
ep
lan
tin
fect
ing
cyto
-an
dn
ucl
eo-r
hab
do
viru
ses
asw
ella
sth
eve
rteb
rate
spec
ific
lyss
avir
use
s(B
)sh
ow
sth
ed
imar
hab
do
viru
ssu
per
gro
up
wh
ich
isp
red
om
inan
tly
com
po
sed
of
arth
rop
od
-vec
tore
dve
rteb
rate
viru
ses
alo
ng
wit
hth
ear
thro
po
d-s
pec
ific
sigm
avi
rus
clad
eB
ran
ches
are
colo
red
base
do
nth
e
Bay
esia
nh
ost
asso
ciat
ion
reco
nst
ruct
ion
anal
ysis
Bla
ckre
pre
sen
tsta
xao
mit
ted
fro
mh
ost
-sta
tere
con
stru
ctio
no
ras
soci
atio
ns
wit
hlt
095
sup
po
rtT
he
tree
was
infe
rred
fro
mL
gen
ese
qu
ence
su
sin
gth
eG
blo
cks
alig
nm
ent
Th
e
colu
mn
so
fte
xtar
eth
evi
rus
nam
eth
eh
ost
cate
gory
use
dfo
rre
con
stru
ctio
ns
and
kno
wn
ho
sts
(fro
mle
ftto
righ
t)C
od
esfo
rth
eh
ost
cate
gori
esar
eV
Sve
rteb
rate
-sp
ecifi
cV
Va
rth
rop
od
-vec
tore
dve
rteb
rate
Aa
rth
rop
od
spec
ific
BS
biti
ng-
arth
rop
od
(am
bigu
ou
sst
ate)
Vv
erte
brat
e(a
mbi
guo
us
stat
e)A
Pp
lan
t-sa
p-f
eed
ing-
arth
rop
od
(am
bigu
ou
sst
ate)
UH
un
cert
ain
-ho
st(a
mbi
guo
us
acro
ssal
lsta
tes)
an
dN
nem
atod
eN
ames
inbo
ldan
du
nd
erli
ned
are
vi-
ruse
sd
isco
vere
din
this
stu
dy
Th
etr
eeis
roo
ted
wit
hth
eC
hu
viru
scl
ade
(ro
ot
coll
apse
d)a
sid
enti
fied
asan
ou
tgro
up
in(L
iet
al2
015)
but
we
no
teth
isgi
ves
the
sam
ere
sult
asm
idp
oin
tan
dth
em
ole
cula
rcl
ock
roo
tin
gN
od
esla
-
bell
edw
ith
qu
esti
on
mar
ks(
)re
pre
sen
tn
od
esw
ith
aLR
T(a
pp
roxi
mat
eli
keli
ho
od
rati
ote
st)
stat
isti
cal
sup
po
rtva
lues
less
than
075
Sca
leba
rsh
ow
sn
um
ber
of
amin
o-a
cid
subs
titu
tio
ns
per
site
Bay
esia
nM
CC
tree
use
dto
infe
r
ance
stra
ltra
its
issh
ow
nin
Sup
ple
men
tary
Figu
reS4
(co
nti
nu
ed)
6 | Virus Evolution 2015 Vol 1 No 1
Kern Canyon virus
Mossuril virus
Yata virus
Drosophila obscura sigma virus
Gray Lodge virus
Oak Vale virus
Perinet virus
Morreton virus
Durham virus
Grass carp rhabdovirus
Wuhan Insect virus 7
Sripur virus
Itacaiunas virus
Spodoptera exigua TSA
Siniperca chuatsi virus
Fikirini bat rhabdovirus
Koolpinyah virus
Rochambeau virus
Tench rhabdovirus
Caligus rogercresseyi 11125273 TSA
Garba virus
Nishimuro virus
Keuraliba virus
New Minto virus
Long Island tick rhabdovirus
Klamath virus
Huangpi Tick Virus 3
Wuhan House Fly Virus 1
Dolphin rhabdovirus
Culex tritaeniorhynchus rhabdovirus
Scaptodrosophila deflexa sigma virus
Spring viremia of carp virus
Tibrogargan virus
La Joya virus
Vesicular stomatitis virus Alagoas Indiana 3
Berrimah virus
Taishun Tick Virus
Bovine ephmeral fever virus
Radi virus
Conwentzia psociformis TSA
Wuhan Louse Fly Virus 10
Oita virus
y
Chandipura virus
Vesicular stomatitis virus Indiana
Yongjia Tick Virus 2
Jurona virusYug Bogdanovac virus
Vesicular stomatitis virus New Jersey
Bole Tick Virus 2
Curionopolis virus
Shayang Fly Virus 2
Ceratitis capitata sigma virus
Mount Elgon bat virus
Lepeophtheirus salmonis rhabdovirus 127
Scophthalmus maximus rhabdovirus
Wuhan Fly Virus 2
Drosophila montana sigma virus
Wuhan Louse Fly Virus 5
Hart Park virus
Arboretum virus
Ord River virus
Bas Congo virus
Santa barbara virus
Wuhan Louse Fly Virus 9
Puerto Almendras virus
Lepeophtheirus salmonis rhabdovirus 9
Landjia virus
Sena Madureira virus
Isfahan virus
Sunguru virus
Pike fry rhabdovirus
Iriri virus
Caligus rogercresseyi 11114047 TSA
Wuhan Louse Fly Virus 8
Tacheng Tick Virus 3
Coastal Plains virus
Chaco virus
Drosophila ananassae sigma virus
Bahia Grande virus
Marco virus
Almpiwar virus
Malakal virus
Wuhan Tick Virus 1
Aruac virus
Sawgrass virus
Vesicular stomatitis virus Cocal
Parry Creek virus
Drosophila melanogaster sigma virus HAP23 isolate
Niakha virus
Drosophila melanogaster sigma virus AP30 isolate
Drosophila sturtvanti sigma virus
Joinjakaka virus
Moussa virus
Nkolbisson virus
Sweetwater Branch virus
Kwatta virus
Humulus lupulus TSA
Muscina stabulans sigma virus
Drosophila immigrans sigma virus
Eel Virus European X
Vesicular stomatitis virus New Jersey Hazelhurst
Barur virus
Connecticut virus
Starry flounder rhabdovirus
Harlingen virus
Inhangapi virus
North Creek Virus
Fukuoka virus
Kamese virus
Pararge aegeria rhabdovirus
Kimberley virus
Malpais Spring virus
Mosqueiro virus
Adelaide River virus
Flanders virus
Tupaia virus
Bivens Arm virus
Wuhan Louse Fly Virus 11
Drosophila tristis sigma virus
Muir Springs virus
Manitoba virus
Carajas oncolytic virus
Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus
Beaumont virus
Maraba virus
Wongabel virus
Ngaingan virus
Xiburema virus
Bats
Mosquitoes birds and mammals including humans
Mosquitoes
Drosophilid fruit fly
Mosquitoes
Mosquitoes and swine
Mosquitoes and sandflies
Sandflies
Birds
Grass carp
Aphid or its parasitoid wasp
Sandflies
Midges
Beet army worm moth
Mandarin fish
Bats
Cattle
Mosquitoes
Tench
Sea louse
Birds
Wild boar
Rodents
Ticks
Ticks
Voles
Ticks
Muscid house fly
Dolphins and porpoise
Mosquitoes
Drosophilid fruit fly
Common carp
Midges and bovids
Mosquitoes and rodents
Mammals including humans
Cattle
Ticks
Midges mosquitoes and ruminants
Sandflies
Lacewing
Louse fly
Bats
Sandflies and mammals including humans
Mammals including humans sandflies and mosquitoes
Ticks
MosquitoesSandflies
Mammals including humans biting and non-biting diptera
Ticks
Midges and mammals
Diptera species (Muscid house fly and Calliphorid laterine fly)
Tephritid fruit fly
Bats
Sea louse
Cultured turbot
Muscid house fly
Drosophilid fruit fly
Louse fly
Mosquitoes and birds
Mosquitoes
Mosquitoes
Humans
Psychodidae drain fly
Louse fly
Mosquitoes
Sea louse
Birds
Lizards
Mosquitoes ticks sandflies mammals including humans
Domestic chickens
Northern pike
Sandflies
Sea louse
Louse fly
Ticks
Bovids
Lizards
Drosophilid fruit fly
Mosquitoes
Lizards
Lizards
Mosquitoes
Ticks
Mosquitoes and birds
Ticks
Mites mosquitoes and mammals
Mosquitoes
Drosophilid fruit fly
Sandflies
Drosophilid fruit fly
Drosophilid fruit fly
Mosquitoes and cattle
Mosquitoes
Mosquitoes and humans
Midges and cattle
Mosquitoes ticks and mammals
Hops
False stable fly
Drosophilid fruit fly
European eel
Mammals including humans biting and non-biting diptera
Ticks mosquitoes fleas and mammals
Ticks and rabbits
Starry flounder
Mosquitoes
Sandflies and rodents
Mosquitoes
Midges mosquitoes and cattle
Mosquitoes and humans
Speckled wood butterfly
Midges mosquitoes and cattle
Mosquitoes
Mosquitoes
Cattle
Mosquitoes and birds
Tree shrews
Midges and cattle
Louse fly
Drosophilid fruit fly
Mosquitoes
Mosquitoes
Sandflies
Drosophilid fruit flyDrosophilid fruit fly
Mosquitoes
Sandflies
Midges and birds
Midges cattle and macropods
Mosquitoes
V
VV
BA
A
BA
VV
BA
BA
V
V
A
BA
BA
A
VS
V
V
BA
V
BA
V
V
V
BA
BA
V
BA
A
V
BA
A
VS
VV
VV
V
BA
BA
VV
BA
BA
V
VV
VV
BA
BABA
VV
BA
VV
A
A
V
BA
V
A
A
BA
VV
BA
BA
V
A
BA
BA
BA
V
V
VV
V
VS
BA
BA
BA
BA
V
V
A
BA
V
V
BA
BA
VV
BA
VV
BA
A
BA
A
A
VV
BA
VV
VV
VV
P
A
A
V
VV
VV
VV
V
BA
VV
BA
VV
VV
A
VV
BA
BA
V
VV
V
VV
BA
A
BA
BA
BA
AA
BA
BA
VV
VV
BA
04
sigma
viruses
dimarhabdovirus supergroup
BFig 2A
Figure 2 Continued
B Longdon et al | 7
replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states
We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled
35 Ancestral host associations and host-switches
Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)
Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups
Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade
Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was
contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips
There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species
Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)
We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in
8 | Virus Evolution 2015 Vol 1 No 1
the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice
4 Discussion
Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences
In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species
We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous
observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)
There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should
Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)
Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038
Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)
B Longdon et al | 9
reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)
Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis
Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses
Data availability
Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824
Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584
L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436
Funding
BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO
Supplementary data
Supplementary data is available at Virus Evolution online
Acknowledgements
Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments
Conflict of interest None declared
ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-
tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206
Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72
Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37
Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52
Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75
Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8
Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091
Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21
10 | Virus Evolution 2015 Vol 1 No 1
Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76
Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58
Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3
Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146
Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22
Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press
Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101
Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88
mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73
Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837
Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8
Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196
Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90
Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21
Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92
Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709
Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71
Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68
Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9
Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18
Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80
Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191
LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4
Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20
Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932
Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378
Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9
Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69
Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98
mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt
mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44
mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press
mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260
mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50
mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50
mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395
mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728
Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412
Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7
Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517
Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11
Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21
Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary
B Longdon et al | 11
Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23
Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt
Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt
Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91
Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59
Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9
Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77
van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256
Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press
mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25
mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664
Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63
Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210
Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32
12 | Virus Evolution 2015 Vol 1 No 1
viruses in the order Mononegavirales (Dietzgen and Kuzmin2012) They infect an extremely broad range of hosts and havebeen discovered in plants fish mammals reptiles and a broadrange of insects and other arthropods (Bourhy et al 2005) Thefamily includes important pathogens of humans and livestockPerhaps the most well-known is rabies virus which can infect adiverse array of mammals and causes a fatal infection killing59000 people per year with an estimated economic cost of $86billion (US) (Hampson et al 2015) Other rhabdoviruses such asvesicular stomatitis virus and bovine ephemeral fever virus areimportant pathogens of domesticated animals while others arepathogens of crops (Dietzgen and Kuzmin 2012)
Arthropods play a key role in the transmission of manyrhabdoviruses Many viruses found in vertebrates have alsobeen detected in arthropods including sandflies mosquitoesticks and midges (Walker Blasdell and Joubert 2012) The rhab-doviruses that infect plants are also often transmitted by ar-thropods (Hogenhout Redinbaugh and Ammar 2003) and somethat infect fish can potentially be vectored by ectoparasitic co-pepod sea-lice (Pfeilputzien 1978 Ahne et al 2002) Moreoverinsects are biological vectors rhabdoviruses replicate upon in-fection of insect vectors (Hogenhout Redinbaugh and Ammar2003) Other rhabdoviruses are insect-specific In particular thesigma viruses are a clade of vertically transmitted viruses thatinfect dipterans and are well-studied in Drosophila (Longdonet al 2011ab Longdon and Jiggins 2012) Recently a number ofrhabdoviruses have been found to be associated with a wide ar-ray of insect and other arthropod species suggesting they maybe common arthropod viruses (Li et al 2015 Walker et al 2015)Furthermore a number of arthropod genomes contain inte-grated endogenous viral elements (EVEs) with similarity torhabdoviruses suggesting that these species have been infectedwith rhabdoviruses at some point in their history (Katzourakisand Gifford 2010 Fort et al 2011 Ballinger Bruenn and Taylor2012 Aiewsakun and Katzourakis 2015)
Here we explore the diversity of the rhabdoviruses andexamine how they have switched between different host taxaduring their evolutionary history Insects infected with rhabdo-viruses commonly become paralysed on exposure to CO2 (Rosen1980 Shroyer and Rosen 1983 Longdon Wilfert and Jiggins2012) We exploited this fact to screen field collections of fliesfrom several continents for novel rhabdoviruses that were thensequenced using metagenomic RNA-sequencing (RNA-seq)Additionally we searched for rhabdovirus-like sequences inpublicly available RNA-seq data We identified thirty-two novelrhabdovirus-like sequences from a wide array of invertebratesand plants and combined them with recently discovered vi-ruses to produce the most comprehensive phylogeny of therhabdoviruses to date For many of the viruses we do not knowtheir true host range so we used the phylogeny to identify alarge number of new likely vector-borne viruses and to recon-struct the evolutionary history of this diverse group of viruses
2 Methods21 Discovery of new rhabdoviruses by RNA sequencing
Diptera (flies mostly Drosophilidae) were collected in the fieldfrom Spain USA Kenya France Ghana and the UK(Supplementary Data S1) Infection with rhabdoviruses cancause Drosophila and other insects to become paralysed after ex-posure to CO2 (Rosen 1980 Shroyer and Rosen 1983 LongdonWilfert and Jiggins 2012) so we enriched our sample for in-fected individuals by exposing them to CO2 at 12C for 15 min
only retaining individuals that showed symptoms of paralysis30 min later We extracted RNA from seventy-nine individualinsects (details in Supplementary Data S1) using Trizol reagent(Invitrogen) and combined the extracts into two pools (retainingnon-pooled individual RNA samples) RNA was then rRNA de-pleted with the Ribo-Zero Gold kit (epicenter USA) and used toconstruct Truseq total RNA libraries (Illumina) Libraries wereconstructed and sequenced by BGI (Hong Kong) on an IlluminaHi-Seq 2500 (one lane 100-bp paired end reads generating 175million reads) Sequences were quality trimmed withTrimmomatic (v3) Illumina adapters were clipped bases wereremoved from the beginning and end of reads if quality droppedbelow a threshold sequences were trimmed if the average qual-ity within a window fell below a threshold and reads lt20 bp inlength were removed We de novo assembled the RNA-seq readswith Trinity (release 25 February 2013) using default settingsand jaccard clip option for high gene density The assembly wasthen searched using tblastn to identify rhabdovirus-like se-quences with known rhabdovirus coding sequences as thequery Any contigs with high sequence similarity to rhabdovi-ruses were then reciprocally compared to GenBank cDNA andRefSeq nucleotide databases using tblastn and only retained ifthey most closely matched a virus-like sequence Raw read datawere deposited in the NCBI Sequence Read Archive (SRP057824)Putative viral sequences have been submitted to GenBank (ac-cession numbers in Supplementary Tables S1 and S2)
As the RNA-seq was performed on pooled samples we as-signed rhabdovirus sequences to individual insects by PCR onRNA from individual samples cDNA was produced usingPromega GoScript Reverse Transcriptase and random-hexamerprimers and PCR performed using primers designed from therhabdovirus sequences Infected host species were identified bysequencing the mitochondrial gene COI We were unable toidentify the host species of the virus from a Drosophila affinissub-group species (sequences appear similar to both Daffinisand the closely related Drosophila athabasca) despite the addi-tion of further mitochondrial and nuclear sequences to increaseconfidence In all cases we confirmed that viruses were onlypresent in cDNA and not in non-reverse-transcription (RT) con-trols (ie DNA) by PCR and so they cannot be integrated intothe insect genome (ie endogenous virus elements or EVEs[Katzourakis and Gifford 2010]) COI primers were used as a pos-itive control for the presence of DNA in the non-RT template
We identified sigma virus sequences in RNA-seq data fromDrosophila montana (Parker et al 2015) We used RT-PCR on an in-fected fly line to amplify the virus sequence and carried out ad-ditional Sanger sequencing with primers designed using theRNA-seq assembly Additional virus sequences were identifiedfrom an RNA-seq analysis of pools of wild caught DrosophilaDImmSV from Drosophila immigrans (collection and sequencingdescribed [van Mierlo et al 2014]) DTriSV from a pool ofDrosophila tristis and SDefSV from Scaptodrosophila deflexa (bothDarren Obbard unpublished data) GenBank accession numbersfor new virus sequences are (KR822817 KR822816 KR822823KR822813 KR822820 KR822821 KR822822 KR822815 KR822824KR822812 KR822811 KR822814 and KR822818) A full list of ac-cessions can be found in Supplementary Tables S1 and S2
22 Discovery of rhabdoviruses in public sequencedatabases
Rhabdovirus L gene sequences were used as queries to search(tblastn) expressed sequence tag and transcriptome shotgun as-sembly databases (NCBI) All sequences were reciprocally
2 | Virus Evolution 2015 Vol 1 No 1
BLAST searched against GenBank cDNA and RefSeq databasesand only retained if they matched a virus-like sequence Weused two approaches to examine whether sequences were pre-sent as RNA but not DNA First where assemblies of whole-ge-nome shotgun sequences were available we used BLAST to testwhether sequences were integrated into the host genomeSecond for the virus sequences in the butterfly Pararge aegeriaand the medfly Ceratitis capitata we were able to obtain infectedsamples to confirm whether sequences are only present in RNAby performing PCR on both genomic DNA and cDNA as de-scribed above (samples kindly provided by Casper BreukerMelanie Gibbs and Philip Leftwich respectively)
23 Phylogenetic analysis
All available rhabdovirus-like sequences were downloadedfrom GenBank (accessions in Supplementary Data S2) Aminoacid sequences for the L gene (encoding the RNA DependentRNA Polymerase or RDRP) were used to infer the phylogeny(L gene sequences) as they contain conserved domains that canbe aligned across this diverse group of viruses Sequences werealigned with MAFFT (Katoh and Standley 2013) under defaultsettings and then poorly aligned and divergent sites were re-moved with either TrimAl (v13 strict settings implemented onPhylemon v20 server alignment) (Capella-Gutierrez Silla-Martinez and Gabaldon 2009) or Gblocks (v091b selectingsmaller final blocks allowing gap positions and less strict flank-ing positions to produce a less stringent selection alignment)(Talavera and Castresana 2007) These resulted in alignments of1492 and 829 amino acids respectively
Phylogenetic trees were inferred using Maximum Likelihoodin PhyML (v30) (Guindon et al 2010) using the LG substitutionmodel (Le and Gascuel 2008) (preliminary analysis confirmedthe results were robust to the amino acid substitution model se-lected) with a gamma distribution of rate variation with fourcategories and a sub-tree pruning and regrafting topologysearching algorithm Branch support was estimated usingApproximate Likelihood-Ratio Tests (aLRT) that are reported tooutperform bootstrap methods (Anisimova and Gascuel 2006)Figures were created using FIGTREE (v 14) (Rambaut 2011)
24 Analysis of phylogenetic structure between virusestaken from different hosts and ecologies
We measured the degree of phylogenetic structure between vi-rus sequences identified in different categories of host (arthro-pods vertebrates and plants) and ecosystems (terrestrial andaquatic) Following Bhatia et al (2013) we measured the degreeof genetic structure between virus sequences from differentgroups of hostsecosystems using Hudsonrsquos Fst estimator(Hudson Slatkin and Maddison 1992) as in Bhatia et al (2013)We calculated Fst as 1 the mean number of differences be-tween sequences within or between populations where a popu-lation is a host category or ecosystem The significance of thisvalue was tested by comparison with 1000 replicates with hostcategories randomly permuted over sequences We also mea-sured the clustering of these categories over our phylogeny us-ing the genealogical sorting index (GSI) a measure of the degreeof exclusive ancestry of a group on a rooted genealogy(Cummings Neel and Shaw 2008) for each of our host associa-tion categories The index was estimated using thegenealogicalSorting R package (Bazinet Myers and Khatavkar2009) with significance estimated by permutation The tree waspruned to remove strains that could not be assigned to one of
the host association categories under consideration Finallysince arthropods are the most sampled host we tested for evi-dence of genetic structure within the arthropod-associated vi-ruses that would suggest co-divergence with their hosts orpreferential host-switching between closely related hosts Wecalculated the Pearson correlation coefficient of the evolution-ary distances between viruses and the evolutionary distancesbetween their hosts and tested for significance by permutation(as in Hommola et al [2009]) We used the patristic distances ofour ML tree for the virus data and a time-tree of arthropod gen-era using published estimates of divergence dates (Jeyaprakashand Hoy 2009 Misof et al 2014)
25 Reconstruction of host associations
Viruses were categorized as having one of four types of host as-sociation arthropod-specific vertebrate-specific arthropod-vectored plant or arthropod-vectored vertebrate However thehost association of some viruses is uncertain when they havebeen isolated from vertebrates biting-arthropods or plant-sap-feeding arthropods Due to limited sampling it was not clearwhether viruses isolated from vertebrates were vertebrate spe-cific or arthropod-vectored vertebrate viruses or whether vi-ruses isolated from biting-arthropods were arthropod specificviruses or arthropod-vectored vertebrate viruses or if virusesisolated from plant-sap-feeding arthropods were arthropod-specific or arthropod-vectored plant viruses
We classified a virus from a nematode as having its ownhost category We classified three of the fish infecting dimar-habdoviruses as vertebrate specific based on the fact they canbe transmitted via immersion in water containing virus duringexperimental conditions (Bootsma Dekinkelin and Leberre1975 Dorson et al 1987 Haenen and Davidse 1993) and thewidely held belief amongst the fisheries community that theseviruses are not typically vectored (Ahne et al 2002)] Howeverthere is some evidence these viruses can be transmitted by ar-thropods (sea lice) in experiments (Pfeilputzien 1978 Ahne et al2002) and so we would recommend this be interpreted withsome caution Additionally although we classified the virusesidentified in sea-lice as having biting arthropod hosts they maybe crustacean-specific The two viruses from Lepeophtheirus sal-monis do not seem to infect the fish they parasitize and are pre-sent in all developmental stages of the lice suggesting theymay be transmitted vertically (Okland et al 2014)
We simultaneously estimated both the current and ancestralhost associations and the phylogeny of the viruses using aBayesian analysis implemented in BEAST v18 (Drummondet al 2012 Weinert et al 2012) Because meaningful branchlengths are essential for this analysis (uncertainty about branchlengths will feed into uncertainty about the estimates) we useda subset of the sites and strains used in the maximum likeli-hood (ML) analysis We retained 189 taxa all rhabdoviruses ex-cluding the divergent fish-infecting novirhabdovirus clade andthe virus from Hydra as well as the viruses from Lolium perenneand Conwentzia psociformis which had a large number of missingsites Sequences were trimmed to a conserved region of 414amino acids where data was recorded for most of these viruses(the Gblocks alignment trimmed further by eye)
We used the host-association categories described abovewhich included ambiguous states To describe amino acid evo-lution we used an LG substitution model with gamma distrib-uted rate variation across sites (Le and Gascuel 2008) and anuncorrelated lognormal relaxed clock model of rate variationamong lineages (Drummond et al 2006) To describe the
B Longdon et al | 3
evolution of the host associations we used a strict clock modeland a discrete asymmetric transition rate matrix (allowing tran-sitions to and from a host association to take place at differentrates) as previously used to model migrations between discretegeographic locations (Edwards et al 2011) and host switches(Weinert et al 2012 Faria et al 2013) We also examined how of-ten these viruses jumped between different classes of hosts us-ing reconstructed counts of biologically feasible changes of hostassociation and their HPD confidence intervals (CIs) usingMarkov Jumps (Minin and Suchard 2008) These includedswitches between arthropod-specific and both arthropod-vec-tored vertebrate and arthropod-vectored plant states and be-tween vertebrate specific and arthropod-vectored vertebratestates We used a constant population size coalescent prior forthe relative node ages (using a birth-death prior gave equivalentresults) and the BEAUti v18 default priors for all other parame-ters (Drummond et al 2012) (BEAUti xml available asSupplementary Material) In Figure 2 we have transferred theancestral state reconstruction from the BEAST tree to the MLtree
Convergence was assessed using Tracer v16 (Rambaut andDrummond 2007) and a burn-in of 30 was removed prior tothe construction of a consensus tree which included a descrip-tion of ancestral host associations in the output file High effec-tive sample sizes were achieved for all parameters (gt200)Previous simulations in the context of biogeographical infer-ence have shown that the approach is robust to sampling bias(Edwards et al 2011) However to confirm this following(Lemey et al 2014) we tested whether sample size predicts rateto or from a host association
3 Results31 Novel rhabdoviruses from RNA-seq
To search for new rhabdoviruses we collected a variety of differ-ent species of flies screened them for CO2 sensitivity which is acommon symptom of infection and sequenced total RNA ofthese flies by RNA-seq We identified rhabdovirus-like se-quences from a de-novo assembly by BLAST and used PCR toidentify which samples these sequences came from
This approach resulted in eleven rhabdovirus-like sequencesfrom nine (possibly ten) species of fly Seven of these viruseswere previously unknown and four had been reported previ-ously from shorter sequences (Supplementary Tables S1 andS2) The novel viruses were highly divergent from known vi-ruses Sigma viruses known from other species of Drosophilatypically have genomes of 125 kb (Longdon Obbard andJiggins 2010 Longdon et al 2011b) and six of our sequenceswere approximately this size suggesting they are near-complete genomes None of the viruses discovered in our RNA-seq data were integrated into the host genome (see lsquoMethodsrsquosection for details)
To investigate the putative gene content of the viruses wepredicted genes based on open reading frames (ORFs) For theviruses with apparently complete genomes (Fig 1) we foundthat those from Drosophila ananassae Daffinis Dimmigrans andDrosophila sturtvanti contained ORFs corresponding to the fivecore genes found across all rhabdoviruses with an additionalORF between the P and M genes This is the location of the Xgene found in sigma viruses and in three of the four novel vi-ruses it showed BLAST sequence similarity to the X gene ofsigma viruses The virus from Drosophila busckii did not containan additional ORF between the P and M genes but instead con-tained an ORF between the G and L gene
Using the phylogeny described below we have classified ournewly discovered viruses as either sigma viruses rhabdovi-ruses or other viruses and named them after the host speciesthey were identified from (Fig 1) (Longdon and Walker 2011)We also found one other novel mononegavirales-like sequencefrom Drosophila unispina that groups with a recently discoveredclade of arthropod associated viruses (Nyamivirus clade [Li et al2015] see Supplementary Table S5 and the full phylogeny) aswell as five other RNA viruses from various families (data notshown) confirming our approach can detect a wide range of di-vergent viruses
Putative genes are shown in color non-coding regions areshown in black ORFs were designated as the first start codonfollowing the transcription termination sequence (7 Ursquos) of theprevious ORF to the first stop codon Dotted lines representparts of the genome not sequenced These viruses were eitherfrom our own RNA-seq data or were first found in in public
Scaptodrosophila deflexa sigmavirusDrosophila sturtevanti rhabdovirus
Drosophila tristis sigmavirusDrosophila montana sigmavirus
Drosophila subobscura rhabdovirusDrosophila algonquin sigmavirus
Pararge aegeria rhabdovirusCeratitis capitata sigmavirus
Drosophila busckii rhabdovirusDrosophila sturtevanti sigmavirusDrosophila immigrans sigmavirus
Drosophila ananassae sigmavirusDrosophila affinis sigmavirus
Sequence length (nucleotides)0 5000 10000 15000
N P M G L X Accessory
Figure 1 Genome organization of newly discovered viruses from metagenomic RNA sequencing of CO2 sensitive flies
4 | Virus Evolution 2015 Vol 1 No 1
databases and key features verified by PCR and Sanger sequenc-ing Rhabdovirus genomes are typically 11ndash13-kb long andcontain five core genes 30-N-P-M-G-L-50 (Dietzgen and Kuzmin2012) However a number of groups of rhabdoviruses containadditional accessory genes and can be up to 16-kb long(Walker et al 2011 2015)
32 New rhabdoviruses from public databases
We identified a further twenty-six novel rhabdovirus-like se-quences by searching public databases of assembled RNA-seqdata with BLAST These included nineteen viruses from arthro-pods (Fleas Crustacea Lepidoptera Diptera) one from aCnidarian (Hydra) and five from plants (Supplementary TableS3) Of these viruses nineteen had sufficient amounts of codingsequence (gt1000 bp) to include in the phylogenetic analysis(Supplementary Table S3) whilst the remainder were too short(Supplementary Table S4)
Four viruses from databases had near-complete genomesbased on their size These were from the moth Triodia sylvinathe house fly Musca domestica (99 nucleotide identity toWuhan house fly virus 2 [Li et al 2015]) the butterfly Paegeriaand the medfly Ccapitata all of which contain ORFs correspond-ing to the five core rhabdovirus genes The sequence fromCcapitata had an additional ORF between the P and M geneswith BLAST sequence similarity to the X gene in sigma virusesThere were several unusual sequences First in the virus fromPaegeria there appear to be two full-length glycoprotein ORFsbetween the M and L genes (we confirmed by Sanger sequencingthat both exist and the stop codon between the two genes wasnot an error) Second the Agave tequilana transcriptome con-tained a L gene ORF on a contig that was the length of a typicalrhabdovirus genome but did not appear to contain typical genecontent suggesting it has very atypical genome organization orhas been misassembled or is integrated into its host plant ge-nome (Chiba et al 2011) Finally the virus from Hydra magnipa-pillata contained six predicted genes but the L gene (RDRP) ORFwas unusually long Some of the viruses we detected may beEVEs inserted into the host genome and subsequently ex-pressed (Aiewsakun and Katzourakis 2015) For example this islikely the case for the sequence from the silkworm Bombyx morithat we also found in the silkworm genome and the L gene se-quence from Spodoptera exigua that contains stop codons Underthe assumption that viruses integrated into host genomes onceinfected those hosts this does not affect our conclusions belowabout the host range of these viruses (Katzourakis and Gifford2010 Fort et al 2011 Ballinger Bruenn and Taylor 2012) Wealso found nine other novel mononegavirale-like sequencesthat group with recently discovered clades of insect viruses (Liet al 2015) (see Supplementary Table S5 and SupplementaryFig S4)
33 Rhabdovirus phylogeny
To reconstruct the evolution of the Rhabdoviridae we have pro-duced the most complete phylogeny of the group to date (Fig 2)We aligned the relatively conserved L gene (RNA DependantRNA Polymerase) from our newly discovered viruses with se-quences of known rhabdoviruses to give an alignment of 195rhabdoviruses (and twenty-six other mononegavirales as anoutgroup) We reconstructed the phylogeny using different se-quence alignments and methodologies and these all gave qual-itatively similar results with the same major clades beingreconstructed The ML and Bayesian relaxed clock phylogenies
were very similar 149188 nodes are found in both reconstruc-tions and only two nodes present in the Bayesian relaxed clocktree with strong support are absent from the ML tree withstrong support These are found in a single basal clade of diver-gent but uniformly arthropod-specific strains where the differ-ence in topology will have no consequence for inference of hostassociation This suggests that our analysis is robust to the as-sumptions of a relaxed molecular clock The branching orderbetween the clades in the dimarhabdovirus supergroup wasgenerally poorly supported and differed between the methodsand alignments Eight sequences that we discovered were notincluded in this analysis as they were considered too short buttheir closest BLAST hits are listed in Supplementary Table S4
We recovered all of the major clades described previously(Fig 2) and found that the majority of known rhabdoviruses be-long to the dimarhabdovirus clade (Fig 2B) The RNA-seq vi-ruses from Drosophila fall into either the sigma virus clade(Fig 2B) or the arthropod clade sister to the cyto- and nucleo-rhabdoviruses (Fig 2A) The viruses from sequence databasesare diverse coming from almost all of the major clades with theexception of the lyssaviruses
34 Predicted host associations of viruses
With a few exceptions rhabdoviruses are either arthropod-vec-tored viruses of plants or vertebrates or are vertebrate- or ar-thropod- specific In many cases the only information about avirus is the host from which it was isolated Therefore a priori itis not clear whether viruses isolated from vertebrates are verte-brate-specific or arthropod-vectored or whether viruses iso-lated from biting arthropods (eg mosquitoes sandflies ticksmidges and sea lice) are arthropod specific or also infect verte-brates Likewise it is not clear whether viruses isolated fromsap-sucking insects (all Hemiptera aphids leafhoppers scaleinsect and mealybugs) are arthropod-specific or arthropod-vec-tored plant viruses By combining data on the ambiguous andknown host associations with phylogenetic information wewere able to predict both the ancestral and present host associ-ations of these viruses To do this we used a Bayesian phyloge-netic analysis that simultaneously estimated the phylogenyand host association of our data In the analysis we defined ourhost associations either as vertebrate-specific arthropod-spe-cific arthropod-vectored vertebrate arthropod-vectored plantnematode or as ambiguous between two (and in one case allfive) of these states (see lsquoMethodsrsquo section)
This approach identified a large number of viruses that arelikely to be new arthropod-vectored vertebrate viruses (Fig 2B)Of eighty viruses with ambiguous host associations eighty-ninewere assigned a host association with strong posterior support(gt095) Of the fifty-two viruses found in biting arthropods for-ty-five were predicted to be arthropod-vectored vertebrate vi-ruses and six to be arthropod-specific Of the thirty virusesfound in vertebrates twenty-two were predicted to be arthro-pod-vectored vertebrate viruses and two were predicted to bevertebrate-specific (both fish viruses) Of the seven virusesfound in plant-sap-feeding arthropods (Fig 2A) three were pre-dicted to be plant-associated and two arthropod-associated
To test the accuracy of our predictions of current host asso-ciations we randomly selected a set of viruses with known asso-ciations re-assigned their host association as ambiguousbetween all possible states (a greater level of uncertainty thanwe generally attributed to viruses in our data) and re-ran ouranalysis We repeated this ten times for nine sets of ten virusesand one set of nine viruses (randomly sampling without
B Longdon et al | 5
Hyd
ra m
agn
ipap
illat
a T
SA
Nor
ther
n ce
real
mos
aic
viru
sP
lan
oco
ccu
s ci
tri T
SA
Bok
eloh
bat
lyss
aviru
s
Tac
heng
Tic
k V
irus
7
Ikom
a ly
ssav
irus
Wuh
an In
sect
viru
s 6
Mu
sca
do
mes
tica
TS
A
Wuh
an F
ly V
irus
3
Shu
anga
o In
sect
Viru
s 6
Per
sim
mon
viru
s A
Hira
me
rhab
dovi
rus
Shu
anga
o B
edbu
g V
irus
2
Wuh
an In
sect
viru
s 4
Jing
shan
Fly
Viru
s 2
Ker
ria
lacc
a T
SA
Wuh
an M
osqu
ito V
irus
9
Shi
mon
i bat
viru
s
Aus
tral
ian
bat l
yssa
viru
s b
Mok
ola
viru
s is
olat
e 86
100C
AM
Irku
t viru
s
Lo
tus
corn
icu
latu
s T
SA
Lago
s ba
t viru
s i8
619N
GA
Aus
tral
ian
bat l
yssa
viru
s a
Spo
dopt
era
frug
iper
da r
habd
oviru
s
Lettu
ce n
ecro
tic y
ello
ws
viru
s
Fra
nkl
inie
lla o
ccid
enta
lis T
SA
Lo
lium
per
enn
e T
SA
Mai
ze Ir
ania
n m
osai
c vi
rus
Son
chus
yel
low
net
Dro
sop
hila
stu
rtva
nti
rh
abd
ovi
rus
Mai
ze m
osai
c vi
rus
Mok
ola
viru
s 86
101R
CA
Eur
opea
n ba
t lys
savi
rus
RV
9 1
Oro
psy
lla s
ilan
tiew
i TS
A
Orc
hid
fleck
viru
s
Wes
t Cau
casi
an b
at v
irus
Lettu
ce y
ello
w m
ottle
viru
s
Taa
stru
p vi
rus
Ag
ave
teq
uila
na
TS
A
Wuh
an In
sect
viru
s 5
Far
min
gton
viru
s
Wuh
an H
ouse
Fly
Viru
s 2
Rab
ies
viru
s
Ric
e ye
llow
stu
nt v
irus
Tar
o ve
in c
hlor
osis
viru
s
Vira
l hem
orrh
agic
sep
ticem
ia v
irus
Lyss
aviru
s O
zern
oe
Fox
feca
l rha
bdov
irus
Duv
enha
ge v
irus
8613
2SA
Mai
ze fi
ne s
trea
k vi
rus
Infe
ctio
us h
aem
atop
oiet
ic n
ecro
sis
viru
s
Med
icag
o sa
tiva
TS
A
San
xia
Wat
er S
trid
er V
irus
5
Tri
od
ia s
ylvi
na
TS
A
Sha
yang
Fly
Viru
s 3
Lago
s ba
t viru
s K
E13
1
Eur
opea
n ba
t lys
savi
rus
1 89
18F
RA
Dro
sop
hila
su
bo
bsc
ura
rh
abd
ovi
rus
Khu
jand
lyss
aviru
sE
urop
ean
bat l
yssa
viru
s 2
9018
HO
L
Egg
plan
t mot
tled
dwar
f viru
s
Soy
bean
cys
t nem
atod
e vi
rus
Sna
kehe
ad r
habd
oviru
s
Wuh
an A
nt V
irus
Dro
sop
hila
bu
scki
i rh
abd
ovi
rus
Ara
van
viru
s
Hyd
ra (
Cni
daria
n)
Cer
eals
and
leaf
hopp
ers
Citr
us m
ealy
bug
Bat
s
Tic
ks
Afr
ican
Civ
ets
Aph
id o
r its
par
asito
id w
asp
Mus
cid
hous
e fly
Dip
tera
spe
cies
(C
allip
horid
and
Sac
opha
gid
flies
)
Dip
tera
and
Lep
idop
tera
Per
sim
mon
tree
Fis
h
Bed
bug
Aph
id o
r its
par
asito
id w
asp
Sar
coph
agid
fles
h fly
Sca
le in
sect
Mos
quito
es
Bat
s
Bat
s an
d hu
man
s
Mam
mal
s sp
ecie
s
Bat
s
Flo
wer
ing
plan
t
Mam
mal
s sp
ecie
s
Bat
s an
d hu
man
s
Fal
l arm
y w
orm
mot
h
Lettu
ce o
ther
dic
ot p
lant
s an
d ap
hids
Wes
tern
flow
er th
rip
Rye
gra
ss
Cer
eals
and
pla
ntho
pper
Flo
wer
ing
plan
t and
aph
id
Dro
soph
ilid
frui
t fly
Mai
ze a
nd p
lant
hopp
ers
Mam
mal
s sp
ecie
s
Bat
s
Fle
a
Flo
wer
ing
plan
t
Bat
s
Lettu
ce a
nd a
phid
Leaf
hopp
er
Flo
wer
ing
plan
t
Aph
id o
r its
par
asito
id w
asp
Bird
spe
cies
Mus
cid
hous
e fly
Mam
mal
s sp
ecie
s in
clud
ing
hum
ans
Ric
e an
d le
afho
pper
s
Tar
o
Fis
h
Hum
ans
Fox
(fe
cal s
ampl
e)
Hum
ans
and
bats
Mai
ze a
nd le
afho
pper
Fis
h
Alfa
lfa
Wat
er S
trid
er
Ora
nge
swift
mot
h
Dip
tera
spe
cies
(C
allip
horid
and
Mus
cid
flies
)
Bat
s
Mam
mal
s sp
ecie
s
Dro
soph
ilid
frui
t fly
Bat
sH
uman
s an
d ba
ts
Egg
plan
t
Nem
atod
e
Fis
h
Japa
nese
car
pent
er a
nt
Dro
soph
ilid
frui
t fly
Bat
s
PAP
VS
BA
VS
AP
A AA PBA
AP
A AP
BA
VS
VS
VS
VS
P VS
VS
A PA P PPA P VS
VS
BA
P VS
PAP
PAP
VA VS
P P VS
UH
VS
PPAAA VS
VS
A VS
VS
PA A VS
N
Ass
ocia
ted
host
s A
rthro
pod-
vect
ored
pla
nt
Arth
ropo
ds
Ver
tebr
ate
spec
ific
Fig
2B0
4
Arth
ropo
d-ve
ctor
ed v
erte
brat
e Lo
w s
uppo
rt or
om
itted
N
emat
ode
lyssaviruses
cyto- and nucleo- rhabdoviruses
novi
rhab
dovi
ruse
s
A
Figu
re2
ML
ph
ylo
gen
yo
fth
eR
habd
ovir
idae
(A
)sh
ow
sth
eba
sal
fish
-in
fect
ing
no
virh
abd
ovi
ruse
san
un
assi
gned
gro
up
of
arth
rop
od
asso
ciat
edvi
ruse
sth
ep
lan
tin
fect
ing
cyto
-an
dn
ucl
eo-r
hab
do
viru
ses
asw
ella
sth
eve
rteb
rate
spec
ific
lyss
avir
use
s(B
)sh
ow
sth
ed
imar
hab
do
viru
ssu
per
gro
up
wh
ich
isp
red
om
inan
tly
com
po
sed
of
arth
rop
od
-vec
tore
dve
rteb
rate
viru
ses
alo
ng
wit
hth
ear
thro
po
d-s
pec
ific
sigm
avi
rus
clad
eB
ran
ches
are
colo
red
base
do
nth
e
Bay
esia
nh
ost
asso
ciat
ion
reco
nst
ruct
ion
anal
ysis
Bla
ckre
pre
sen
tsta
xao
mit
ted
fro
mh
ost
-sta
tere
con
stru
ctio
no
ras
soci
atio
ns
wit
hlt
095
sup
po
rtT
he
tree
was
infe
rred
fro
mL
gen
ese
qu
ence
su
sin
gth
eG
blo
cks
alig
nm
ent
Th
e
colu
mn
so
fte
xtar
eth
evi
rus
nam
eth
eh
ost
cate
gory
use
dfo
rre
con
stru
ctio
ns
and
kno
wn
ho
sts
(fro
mle
ftto
righ
t)C
od
esfo
rth
eh
ost
cate
gori
esar
eV
Sve
rteb
rate
-sp
ecifi
cV
Va
rth
rop
od
-vec
tore
dve
rteb
rate
Aa
rth
rop
od
spec
ific
BS
biti
ng-
arth
rop
od
(am
bigu
ou
sst
ate)
Vv
erte
brat
e(a
mbi
guo
us
stat
e)A
Pp
lan
t-sa
p-f
eed
ing-
arth
rop
od
(am
bigu
ou
sst
ate)
UH
un
cert
ain
-ho
st(a
mbi
guo
us
acro
ssal
lsta
tes)
an
dN
nem
atod
eN
ames
inbo
ldan
du
nd
erli
ned
are
vi-
ruse
sd
isco
vere
din
this
stu
dy
Th
etr
eeis
roo
ted
wit
hth
eC
hu
viru
scl
ade
(ro
ot
coll
apse
d)a
sid
enti
fied
asan
ou
tgro
up
in(L
iet
al2
015)
but
we
no
teth
isgi
ves
the
sam
ere
sult
asm
idp
oin
tan
dth
em
ole
cula
rcl
ock
roo
tin
gN
od
esla
-
bell
edw
ith
qu
esti
on
mar
ks(
)re
pre
sen
tn
od
esw
ith
aLR
T(a
pp
roxi
mat
eli
keli
ho
od
rati
ote
st)
stat
isti
cal
sup
po
rtva
lues
less
than
075
Sca
leba
rsh
ow
sn
um
ber
of
amin
o-a
cid
subs
titu
tio
ns
per
site
Bay
esia
nM
CC
tree
use
dto
infe
r
ance
stra
ltra
its
issh
ow
nin
Sup
ple
men
tary
Figu
reS4
(co
nti
nu
ed)
6 | Virus Evolution 2015 Vol 1 No 1
Kern Canyon virus
Mossuril virus
Yata virus
Drosophila obscura sigma virus
Gray Lodge virus
Oak Vale virus
Perinet virus
Morreton virus
Durham virus
Grass carp rhabdovirus
Wuhan Insect virus 7
Sripur virus
Itacaiunas virus
Spodoptera exigua TSA
Siniperca chuatsi virus
Fikirini bat rhabdovirus
Koolpinyah virus
Rochambeau virus
Tench rhabdovirus
Caligus rogercresseyi 11125273 TSA
Garba virus
Nishimuro virus
Keuraliba virus
New Minto virus
Long Island tick rhabdovirus
Klamath virus
Huangpi Tick Virus 3
Wuhan House Fly Virus 1
Dolphin rhabdovirus
Culex tritaeniorhynchus rhabdovirus
Scaptodrosophila deflexa sigma virus
Spring viremia of carp virus
Tibrogargan virus
La Joya virus
Vesicular stomatitis virus Alagoas Indiana 3
Berrimah virus
Taishun Tick Virus
Bovine ephmeral fever virus
Radi virus
Conwentzia psociformis TSA
Wuhan Louse Fly Virus 10
Oita virus
y
Chandipura virus
Vesicular stomatitis virus Indiana
Yongjia Tick Virus 2
Jurona virusYug Bogdanovac virus
Vesicular stomatitis virus New Jersey
Bole Tick Virus 2
Curionopolis virus
Shayang Fly Virus 2
Ceratitis capitata sigma virus
Mount Elgon bat virus
Lepeophtheirus salmonis rhabdovirus 127
Scophthalmus maximus rhabdovirus
Wuhan Fly Virus 2
Drosophila montana sigma virus
Wuhan Louse Fly Virus 5
Hart Park virus
Arboretum virus
Ord River virus
Bas Congo virus
Santa barbara virus
Wuhan Louse Fly Virus 9
Puerto Almendras virus
Lepeophtheirus salmonis rhabdovirus 9
Landjia virus
Sena Madureira virus
Isfahan virus
Sunguru virus
Pike fry rhabdovirus
Iriri virus
Caligus rogercresseyi 11114047 TSA
Wuhan Louse Fly Virus 8
Tacheng Tick Virus 3
Coastal Plains virus
Chaco virus
Drosophila ananassae sigma virus
Bahia Grande virus
Marco virus
Almpiwar virus
Malakal virus
Wuhan Tick Virus 1
Aruac virus
Sawgrass virus
Vesicular stomatitis virus Cocal
Parry Creek virus
Drosophila melanogaster sigma virus HAP23 isolate
Niakha virus
Drosophila melanogaster sigma virus AP30 isolate
Drosophila sturtvanti sigma virus
Joinjakaka virus
Moussa virus
Nkolbisson virus
Sweetwater Branch virus
Kwatta virus
Humulus lupulus TSA
Muscina stabulans sigma virus
Drosophila immigrans sigma virus
Eel Virus European X
Vesicular stomatitis virus New Jersey Hazelhurst
Barur virus
Connecticut virus
Starry flounder rhabdovirus
Harlingen virus
Inhangapi virus
North Creek Virus
Fukuoka virus
Kamese virus
Pararge aegeria rhabdovirus
Kimberley virus
Malpais Spring virus
Mosqueiro virus
Adelaide River virus
Flanders virus
Tupaia virus
Bivens Arm virus
Wuhan Louse Fly Virus 11
Drosophila tristis sigma virus
Muir Springs virus
Manitoba virus
Carajas oncolytic virus
Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus
Beaumont virus
Maraba virus
Wongabel virus
Ngaingan virus
Xiburema virus
Bats
Mosquitoes birds and mammals including humans
Mosquitoes
Drosophilid fruit fly
Mosquitoes
Mosquitoes and swine
Mosquitoes and sandflies
Sandflies
Birds
Grass carp
Aphid or its parasitoid wasp
Sandflies
Midges
Beet army worm moth
Mandarin fish
Bats
Cattle
Mosquitoes
Tench
Sea louse
Birds
Wild boar
Rodents
Ticks
Ticks
Voles
Ticks
Muscid house fly
Dolphins and porpoise
Mosquitoes
Drosophilid fruit fly
Common carp
Midges and bovids
Mosquitoes and rodents
Mammals including humans
Cattle
Ticks
Midges mosquitoes and ruminants
Sandflies
Lacewing
Louse fly
Bats
Sandflies and mammals including humans
Mammals including humans sandflies and mosquitoes
Ticks
MosquitoesSandflies
Mammals including humans biting and non-biting diptera
Ticks
Midges and mammals
Diptera species (Muscid house fly and Calliphorid laterine fly)
Tephritid fruit fly
Bats
Sea louse
Cultured turbot
Muscid house fly
Drosophilid fruit fly
Louse fly
Mosquitoes and birds
Mosquitoes
Mosquitoes
Humans
Psychodidae drain fly
Louse fly
Mosquitoes
Sea louse
Birds
Lizards
Mosquitoes ticks sandflies mammals including humans
Domestic chickens
Northern pike
Sandflies
Sea louse
Louse fly
Ticks
Bovids
Lizards
Drosophilid fruit fly
Mosquitoes
Lizards
Lizards
Mosquitoes
Ticks
Mosquitoes and birds
Ticks
Mites mosquitoes and mammals
Mosquitoes
Drosophilid fruit fly
Sandflies
Drosophilid fruit fly
Drosophilid fruit fly
Mosquitoes and cattle
Mosquitoes
Mosquitoes and humans
Midges and cattle
Mosquitoes ticks and mammals
Hops
False stable fly
Drosophilid fruit fly
European eel
Mammals including humans biting and non-biting diptera
Ticks mosquitoes fleas and mammals
Ticks and rabbits
Starry flounder
Mosquitoes
Sandflies and rodents
Mosquitoes
Midges mosquitoes and cattle
Mosquitoes and humans
Speckled wood butterfly
Midges mosquitoes and cattle
Mosquitoes
Mosquitoes
Cattle
Mosquitoes and birds
Tree shrews
Midges and cattle
Louse fly
Drosophilid fruit fly
Mosquitoes
Mosquitoes
Sandflies
Drosophilid fruit flyDrosophilid fruit fly
Mosquitoes
Sandflies
Midges and birds
Midges cattle and macropods
Mosquitoes
V
VV
BA
A
BA
VV
BA
BA
V
V
A
BA
BA
A
VS
V
V
BA
V
BA
V
V
V
BA
BA
V
BA
A
V
BA
A
VS
VV
VV
V
BA
BA
VV
BA
BA
V
VV
VV
BA
BABA
VV
BA
VV
A
A
V
BA
V
A
A
BA
VV
BA
BA
V
A
BA
BA
BA
V
V
VV
V
VS
BA
BA
BA
BA
V
V
A
BA
V
V
BA
BA
VV
BA
VV
BA
A
BA
A
A
VV
BA
VV
VV
VV
P
A
A
V
VV
VV
VV
V
BA
VV
BA
VV
VV
A
VV
BA
BA
V
VV
V
VV
BA
A
BA
BA
BA
AA
BA
BA
VV
VV
BA
04
sigma
viruses
dimarhabdovirus supergroup
BFig 2A
Figure 2 Continued
B Longdon et al | 7
replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states
We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled
35 Ancestral host associations and host-switches
Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)
Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups
Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade
Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was
contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips
There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species
Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)
We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in
8 | Virus Evolution 2015 Vol 1 No 1
the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice
4 Discussion
Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences
In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species
We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous
observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)
There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should
Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)
Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038
Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)
B Longdon et al | 9
reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)
Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis
Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses
Data availability
Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824
Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584
L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436
Funding
BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO
Supplementary data
Supplementary data is available at Virus Evolution online
Acknowledgements
Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments
Conflict of interest None declared
ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-
tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206
Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72
Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37
Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52
Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75
Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8
Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091
Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21
10 | Virus Evolution 2015 Vol 1 No 1
Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76
Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58
Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3
Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146
Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22
Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press
Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101
Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88
mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73
Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837
Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8
Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196
Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90
Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21
Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92
Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709
Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71
Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68
Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9
Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18
Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80
Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191
LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4
Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20
Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932
Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378
Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9
Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69
Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98
mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt
mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44
mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press
mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260
mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50
mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50
mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395
mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728
Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412
Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7
Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517
Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11
Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21
Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary
B Longdon et al | 11
Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23
Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt
Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt
Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91
Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59
Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9
Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77
van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256
Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press
mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25
mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664
Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63
Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210
Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32
12 | Virus Evolution 2015 Vol 1 No 1
BLAST searched against GenBank cDNA and RefSeq databasesand only retained if they matched a virus-like sequence Weused two approaches to examine whether sequences were pre-sent as RNA but not DNA First where assemblies of whole-ge-nome shotgun sequences were available we used BLAST to testwhether sequences were integrated into the host genomeSecond for the virus sequences in the butterfly Pararge aegeriaand the medfly Ceratitis capitata we were able to obtain infectedsamples to confirm whether sequences are only present in RNAby performing PCR on both genomic DNA and cDNA as de-scribed above (samples kindly provided by Casper BreukerMelanie Gibbs and Philip Leftwich respectively)
23 Phylogenetic analysis
All available rhabdovirus-like sequences were downloadedfrom GenBank (accessions in Supplementary Data S2) Aminoacid sequences for the L gene (encoding the RNA DependentRNA Polymerase or RDRP) were used to infer the phylogeny(L gene sequences) as they contain conserved domains that canbe aligned across this diverse group of viruses Sequences werealigned with MAFFT (Katoh and Standley 2013) under defaultsettings and then poorly aligned and divergent sites were re-moved with either TrimAl (v13 strict settings implemented onPhylemon v20 server alignment) (Capella-Gutierrez Silla-Martinez and Gabaldon 2009) or Gblocks (v091b selectingsmaller final blocks allowing gap positions and less strict flank-ing positions to produce a less stringent selection alignment)(Talavera and Castresana 2007) These resulted in alignments of1492 and 829 amino acids respectively
Phylogenetic trees were inferred using Maximum Likelihoodin PhyML (v30) (Guindon et al 2010) using the LG substitutionmodel (Le and Gascuel 2008) (preliminary analysis confirmedthe results were robust to the amino acid substitution model se-lected) with a gamma distribution of rate variation with fourcategories and a sub-tree pruning and regrafting topologysearching algorithm Branch support was estimated usingApproximate Likelihood-Ratio Tests (aLRT) that are reported tooutperform bootstrap methods (Anisimova and Gascuel 2006)Figures were created using FIGTREE (v 14) (Rambaut 2011)
24 Analysis of phylogenetic structure between virusestaken from different hosts and ecologies
We measured the degree of phylogenetic structure between vi-rus sequences identified in different categories of host (arthro-pods vertebrates and plants) and ecosystems (terrestrial andaquatic) Following Bhatia et al (2013) we measured the degreeof genetic structure between virus sequences from differentgroups of hostsecosystems using Hudsonrsquos Fst estimator(Hudson Slatkin and Maddison 1992) as in Bhatia et al (2013)We calculated Fst as 1 the mean number of differences be-tween sequences within or between populations where a popu-lation is a host category or ecosystem The significance of thisvalue was tested by comparison with 1000 replicates with hostcategories randomly permuted over sequences We also mea-sured the clustering of these categories over our phylogeny us-ing the genealogical sorting index (GSI) a measure of the degreeof exclusive ancestry of a group on a rooted genealogy(Cummings Neel and Shaw 2008) for each of our host associa-tion categories The index was estimated using thegenealogicalSorting R package (Bazinet Myers and Khatavkar2009) with significance estimated by permutation The tree waspruned to remove strains that could not be assigned to one of
the host association categories under consideration Finallysince arthropods are the most sampled host we tested for evi-dence of genetic structure within the arthropod-associated vi-ruses that would suggest co-divergence with their hosts orpreferential host-switching between closely related hosts Wecalculated the Pearson correlation coefficient of the evolution-ary distances between viruses and the evolutionary distancesbetween their hosts and tested for significance by permutation(as in Hommola et al [2009]) We used the patristic distances ofour ML tree for the virus data and a time-tree of arthropod gen-era using published estimates of divergence dates (Jeyaprakashand Hoy 2009 Misof et al 2014)
25 Reconstruction of host associations
Viruses were categorized as having one of four types of host as-sociation arthropod-specific vertebrate-specific arthropod-vectored plant or arthropod-vectored vertebrate However thehost association of some viruses is uncertain when they havebeen isolated from vertebrates biting-arthropods or plant-sap-feeding arthropods Due to limited sampling it was not clearwhether viruses isolated from vertebrates were vertebrate spe-cific or arthropod-vectored vertebrate viruses or whether vi-ruses isolated from biting-arthropods were arthropod specificviruses or arthropod-vectored vertebrate viruses or if virusesisolated from plant-sap-feeding arthropods were arthropod-specific or arthropod-vectored plant viruses
We classified a virus from a nematode as having its ownhost category We classified three of the fish infecting dimar-habdoviruses as vertebrate specific based on the fact they canbe transmitted via immersion in water containing virus duringexperimental conditions (Bootsma Dekinkelin and Leberre1975 Dorson et al 1987 Haenen and Davidse 1993) and thewidely held belief amongst the fisheries community that theseviruses are not typically vectored (Ahne et al 2002)] Howeverthere is some evidence these viruses can be transmitted by ar-thropods (sea lice) in experiments (Pfeilputzien 1978 Ahne et al2002) and so we would recommend this be interpreted withsome caution Additionally although we classified the virusesidentified in sea-lice as having biting arthropod hosts they maybe crustacean-specific The two viruses from Lepeophtheirus sal-monis do not seem to infect the fish they parasitize and are pre-sent in all developmental stages of the lice suggesting theymay be transmitted vertically (Okland et al 2014)
We simultaneously estimated both the current and ancestralhost associations and the phylogeny of the viruses using aBayesian analysis implemented in BEAST v18 (Drummondet al 2012 Weinert et al 2012) Because meaningful branchlengths are essential for this analysis (uncertainty about branchlengths will feed into uncertainty about the estimates) we useda subset of the sites and strains used in the maximum likeli-hood (ML) analysis We retained 189 taxa all rhabdoviruses ex-cluding the divergent fish-infecting novirhabdovirus clade andthe virus from Hydra as well as the viruses from Lolium perenneand Conwentzia psociformis which had a large number of missingsites Sequences were trimmed to a conserved region of 414amino acids where data was recorded for most of these viruses(the Gblocks alignment trimmed further by eye)
We used the host-association categories described abovewhich included ambiguous states To describe amino acid evo-lution we used an LG substitution model with gamma distrib-uted rate variation across sites (Le and Gascuel 2008) and anuncorrelated lognormal relaxed clock model of rate variationamong lineages (Drummond et al 2006) To describe the
B Longdon et al | 3
evolution of the host associations we used a strict clock modeland a discrete asymmetric transition rate matrix (allowing tran-sitions to and from a host association to take place at differentrates) as previously used to model migrations between discretegeographic locations (Edwards et al 2011) and host switches(Weinert et al 2012 Faria et al 2013) We also examined how of-ten these viruses jumped between different classes of hosts us-ing reconstructed counts of biologically feasible changes of hostassociation and their HPD confidence intervals (CIs) usingMarkov Jumps (Minin and Suchard 2008) These includedswitches between arthropod-specific and both arthropod-vec-tored vertebrate and arthropod-vectored plant states and be-tween vertebrate specific and arthropod-vectored vertebratestates We used a constant population size coalescent prior forthe relative node ages (using a birth-death prior gave equivalentresults) and the BEAUti v18 default priors for all other parame-ters (Drummond et al 2012) (BEAUti xml available asSupplementary Material) In Figure 2 we have transferred theancestral state reconstruction from the BEAST tree to the MLtree
Convergence was assessed using Tracer v16 (Rambaut andDrummond 2007) and a burn-in of 30 was removed prior tothe construction of a consensus tree which included a descrip-tion of ancestral host associations in the output file High effec-tive sample sizes were achieved for all parameters (gt200)Previous simulations in the context of biogeographical infer-ence have shown that the approach is robust to sampling bias(Edwards et al 2011) However to confirm this following(Lemey et al 2014) we tested whether sample size predicts rateto or from a host association
3 Results31 Novel rhabdoviruses from RNA-seq
To search for new rhabdoviruses we collected a variety of differ-ent species of flies screened them for CO2 sensitivity which is acommon symptom of infection and sequenced total RNA ofthese flies by RNA-seq We identified rhabdovirus-like se-quences from a de-novo assembly by BLAST and used PCR toidentify which samples these sequences came from
This approach resulted in eleven rhabdovirus-like sequencesfrom nine (possibly ten) species of fly Seven of these viruseswere previously unknown and four had been reported previ-ously from shorter sequences (Supplementary Tables S1 andS2) The novel viruses were highly divergent from known vi-ruses Sigma viruses known from other species of Drosophilatypically have genomes of 125 kb (Longdon Obbard andJiggins 2010 Longdon et al 2011b) and six of our sequenceswere approximately this size suggesting they are near-complete genomes None of the viruses discovered in our RNA-seq data were integrated into the host genome (see lsquoMethodsrsquosection for details)
To investigate the putative gene content of the viruses wepredicted genes based on open reading frames (ORFs) For theviruses with apparently complete genomes (Fig 1) we foundthat those from Drosophila ananassae Daffinis Dimmigrans andDrosophila sturtvanti contained ORFs corresponding to the fivecore genes found across all rhabdoviruses with an additionalORF between the P and M genes This is the location of the Xgene found in sigma viruses and in three of the four novel vi-ruses it showed BLAST sequence similarity to the X gene ofsigma viruses The virus from Drosophila busckii did not containan additional ORF between the P and M genes but instead con-tained an ORF between the G and L gene
Using the phylogeny described below we have classified ournewly discovered viruses as either sigma viruses rhabdovi-ruses or other viruses and named them after the host speciesthey were identified from (Fig 1) (Longdon and Walker 2011)We also found one other novel mononegavirales-like sequencefrom Drosophila unispina that groups with a recently discoveredclade of arthropod associated viruses (Nyamivirus clade [Li et al2015] see Supplementary Table S5 and the full phylogeny) aswell as five other RNA viruses from various families (data notshown) confirming our approach can detect a wide range of di-vergent viruses
Putative genes are shown in color non-coding regions areshown in black ORFs were designated as the first start codonfollowing the transcription termination sequence (7 Ursquos) of theprevious ORF to the first stop codon Dotted lines representparts of the genome not sequenced These viruses were eitherfrom our own RNA-seq data or were first found in in public
Scaptodrosophila deflexa sigmavirusDrosophila sturtevanti rhabdovirus
Drosophila tristis sigmavirusDrosophila montana sigmavirus
Drosophila subobscura rhabdovirusDrosophila algonquin sigmavirus
Pararge aegeria rhabdovirusCeratitis capitata sigmavirus
Drosophila busckii rhabdovirusDrosophila sturtevanti sigmavirusDrosophila immigrans sigmavirus
Drosophila ananassae sigmavirusDrosophila affinis sigmavirus
Sequence length (nucleotides)0 5000 10000 15000
N P M G L X Accessory
Figure 1 Genome organization of newly discovered viruses from metagenomic RNA sequencing of CO2 sensitive flies
4 | Virus Evolution 2015 Vol 1 No 1
databases and key features verified by PCR and Sanger sequenc-ing Rhabdovirus genomes are typically 11ndash13-kb long andcontain five core genes 30-N-P-M-G-L-50 (Dietzgen and Kuzmin2012) However a number of groups of rhabdoviruses containadditional accessory genes and can be up to 16-kb long(Walker et al 2011 2015)
32 New rhabdoviruses from public databases
We identified a further twenty-six novel rhabdovirus-like se-quences by searching public databases of assembled RNA-seqdata with BLAST These included nineteen viruses from arthro-pods (Fleas Crustacea Lepidoptera Diptera) one from aCnidarian (Hydra) and five from plants (Supplementary TableS3) Of these viruses nineteen had sufficient amounts of codingsequence (gt1000 bp) to include in the phylogenetic analysis(Supplementary Table S3) whilst the remainder were too short(Supplementary Table S4)
Four viruses from databases had near-complete genomesbased on their size These were from the moth Triodia sylvinathe house fly Musca domestica (99 nucleotide identity toWuhan house fly virus 2 [Li et al 2015]) the butterfly Paegeriaand the medfly Ccapitata all of which contain ORFs correspond-ing to the five core rhabdovirus genes The sequence fromCcapitata had an additional ORF between the P and M geneswith BLAST sequence similarity to the X gene in sigma virusesThere were several unusual sequences First in the virus fromPaegeria there appear to be two full-length glycoprotein ORFsbetween the M and L genes (we confirmed by Sanger sequencingthat both exist and the stop codon between the two genes wasnot an error) Second the Agave tequilana transcriptome con-tained a L gene ORF on a contig that was the length of a typicalrhabdovirus genome but did not appear to contain typical genecontent suggesting it has very atypical genome organization orhas been misassembled or is integrated into its host plant ge-nome (Chiba et al 2011) Finally the virus from Hydra magnipa-pillata contained six predicted genes but the L gene (RDRP) ORFwas unusually long Some of the viruses we detected may beEVEs inserted into the host genome and subsequently ex-pressed (Aiewsakun and Katzourakis 2015) For example this islikely the case for the sequence from the silkworm Bombyx morithat we also found in the silkworm genome and the L gene se-quence from Spodoptera exigua that contains stop codons Underthe assumption that viruses integrated into host genomes onceinfected those hosts this does not affect our conclusions belowabout the host range of these viruses (Katzourakis and Gifford2010 Fort et al 2011 Ballinger Bruenn and Taylor 2012) Wealso found nine other novel mononegavirale-like sequencesthat group with recently discovered clades of insect viruses (Liet al 2015) (see Supplementary Table S5 and SupplementaryFig S4)
33 Rhabdovirus phylogeny
To reconstruct the evolution of the Rhabdoviridae we have pro-duced the most complete phylogeny of the group to date (Fig 2)We aligned the relatively conserved L gene (RNA DependantRNA Polymerase) from our newly discovered viruses with se-quences of known rhabdoviruses to give an alignment of 195rhabdoviruses (and twenty-six other mononegavirales as anoutgroup) We reconstructed the phylogeny using different se-quence alignments and methodologies and these all gave qual-itatively similar results with the same major clades beingreconstructed The ML and Bayesian relaxed clock phylogenies
were very similar 149188 nodes are found in both reconstruc-tions and only two nodes present in the Bayesian relaxed clocktree with strong support are absent from the ML tree withstrong support These are found in a single basal clade of diver-gent but uniformly arthropod-specific strains where the differ-ence in topology will have no consequence for inference of hostassociation This suggests that our analysis is robust to the as-sumptions of a relaxed molecular clock The branching orderbetween the clades in the dimarhabdovirus supergroup wasgenerally poorly supported and differed between the methodsand alignments Eight sequences that we discovered were notincluded in this analysis as they were considered too short buttheir closest BLAST hits are listed in Supplementary Table S4
We recovered all of the major clades described previously(Fig 2) and found that the majority of known rhabdoviruses be-long to the dimarhabdovirus clade (Fig 2B) The RNA-seq vi-ruses from Drosophila fall into either the sigma virus clade(Fig 2B) or the arthropod clade sister to the cyto- and nucleo-rhabdoviruses (Fig 2A) The viruses from sequence databasesare diverse coming from almost all of the major clades with theexception of the lyssaviruses
34 Predicted host associations of viruses
With a few exceptions rhabdoviruses are either arthropod-vec-tored viruses of plants or vertebrates or are vertebrate- or ar-thropod- specific In many cases the only information about avirus is the host from which it was isolated Therefore a priori itis not clear whether viruses isolated from vertebrates are verte-brate-specific or arthropod-vectored or whether viruses iso-lated from biting arthropods (eg mosquitoes sandflies ticksmidges and sea lice) are arthropod specific or also infect verte-brates Likewise it is not clear whether viruses isolated fromsap-sucking insects (all Hemiptera aphids leafhoppers scaleinsect and mealybugs) are arthropod-specific or arthropod-vec-tored plant viruses By combining data on the ambiguous andknown host associations with phylogenetic information wewere able to predict both the ancestral and present host associ-ations of these viruses To do this we used a Bayesian phyloge-netic analysis that simultaneously estimated the phylogenyand host association of our data In the analysis we defined ourhost associations either as vertebrate-specific arthropod-spe-cific arthropod-vectored vertebrate arthropod-vectored plantnematode or as ambiguous between two (and in one case allfive) of these states (see lsquoMethodsrsquo section)
This approach identified a large number of viruses that arelikely to be new arthropod-vectored vertebrate viruses (Fig 2B)Of eighty viruses with ambiguous host associations eighty-ninewere assigned a host association with strong posterior support(gt095) Of the fifty-two viruses found in biting arthropods for-ty-five were predicted to be arthropod-vectored vertebrate vi-ruses and six to be arthropod-specific Of the thirty virusesfound in vertebrates twenty-two were predicted to be arthro-pod-vectored vertebrate viruses and two were predicted to bevertebrate-specific (both fish viruses) Of the seven virusesfound in plant-sap-feeding arthropods (Fig 2A) three were pre-dicted to be plant-associated and two arthropod-associated
To test the accuracy of our predictions of current host asso-ciations we randomly selected a set of viruses with known asso-ciations re-assigned their host association as ambiguousbetween all possible states (a greater level of uncertainty thanwe generally attributed to viruses in our data) and re-ran ouranalysis We repeated this ten times for nine sets of ten virusesand one set of nine viruses (randomly sampling without
B Longdon et al | 5
Hyd
ra m
agn
ipap
illat
a T
SA
Nor
ther
n ce
real
mos
aic
viru
sP
lan
oco
ccu
s ci
tri T
SA
Bok
eloh
bat
lyss
aviru
s
Tac
heng
Tic
k V
irus
7
Ikom
a ly
ssav
irus
Wuh
an In
sect
viru
s 6
Mu
sca
do
mes
tica
TS
A
Wuh
an F
ly V
irus
3
Shu
anga
o In
sect
Viru
s 6
Per
sim
mon
viru
s A
Hira
me
rhab
dovi
rus
Shu
anga
o B
edbu
g V
irus
2
Wuh
an In
sect
viru
s 4
Jing
shan
Fly
Viru
s 2
Ker
ria
lacc
a T
SA
Wuh
an M
osqu
ito V
irus
9
Shi
mon
i bat
viru
s
Aus
tral
ian
bat l
yssa
viru
s b
Mok
ola
viru
s is
olat
e 86
100C
AM
Irku
t viru
s
Lo
tus
corn
icu
latu
s T
SA
Lago
s ba
t viru
s i8
619N
GA
Aus
tral
ian
bat l
yssa
viru
s a
Spo
dopt
era
frug
iper
da r
habd
oviru
s
Lettu
ce n
ecro
tic y
ello
ws
viru
s
Fra
nkl
inie
lla o
ccid
enta
lis T
SA
Lo
lium
per
enn
e T
SA
Mai
ze Ir
ania
n m
osai
c vi
rus
Son
chus
yel
low
net
Dro
sop
hila
stu
rtva
nti
rh
abd
ovi
rus
Mai
ze m
osai
c vi
rus
Mok
ola
viru
s 86
101R
CA
Eur
opea
n ba
t lys
savi
rus
RV
9 1
Oro
psy
lla s
ilan
tiew
i TS
A
Orc
hid
fleck
viru
s
Wes
t Cau
casi
an b
at v
irus
Lettu
ce y
ello
w m
ottle
viru
s
Taa
stru
p vi
rus
Ag
ave
teq
uila
na
TS
A
Wuh
an In
sect
viru
s 5
Far
min
gton
viru
s
Wuh
an H
ouse
Fly
Viru
s 2
Rab
ies
viru
s
Ric
e ye
llow
stu
nt v
irus
Tar
o ve
in c
hlor
osis
viru
s
Vira
l hem
orrh
agic
sep
ticem
ia v
irus
Lyss
aviru
s O
zern
oe
Fox
feca
l rha
bdov
irus
Duv
enha
ge v
irus
8613
2SA
Mai
ze fi
ne s
trea
k vi
rus
Infe
ctio
us h
aem
atop
oiet
ic n
ecro
sis
viru
s
Med
icag
o sa
tiva
TS
A
San
xia
Wat
er S
trid
er V
irus
5
Tri
od
ia s
ylvi
na
TS
A
Sha
yang
Fly
Viru
s 3
Lago
s ba
t viru
s K
E13
1
Eur
opea
n ba
t lys
savi
rus
1 89
18F
RA
Dro
sop
hila
su
bo
bsc
ura
rh
abd
ovi
rus
Khu
jand
lyss
aviru
sE
urop
ean
bat l
yssa
viru
s 2
9018
HO
L
Egg
plan
t mot
tled
dwar
f viru
s
Soy
bean
cys
t nem
atod
e vi
rus
Sna
kehe
ad r
habd
oviru
s
Wuh
an A
nt V
irus
Dro
sop
hila
bu
scki
i rh
abd
ovi
rus
Ara
van
viru
s
Hyd
ra (
Cni
daria
n)
Cer
eals
and
leaf
hopp
ers
Citr
us m
ealy
bug
Bat
s
Tic
ks
Afr
ican
Civ
ets
Aph
id o
r its
par
asito
id w
asp
Mus
cid
hous
e fly
Dip
tera
spe
cies
(C
allip
horid
and
Sac
opha
gid
flies
)
Dip
tera
and
Lep
idop
tera
Per
sim
mon
tree
Fis
h
Bed
bug
Aph
id o
r its
par
asito
id w
asp
Sar
coph
agid
fles
h fly
Sca
le in
sect
Mos
quito
es
Bat
s
Bat
s an
d hu
man
s
Mam
mal
s sp
ecie
s
Bat
s
Flo
wer
ing
plan
t
Mam
mal
s sp
ecie
s
Bat
s an
d hu
man
s
Fal
l arm
y w
orm
mot
h
Lettu
ce o
ther
dic
ot p
lant
s an
d ap
hids
Wes
tern
flow
er th
rip
Rye
gra
ss
Cer
eals
and
pla
ntho
pper
Flo
wer
ing
plan
t and
aph
id
Dro
soph
ilid
frui
t fly
Mai
ze a
nd p
lant
hopp
ers
Mam
mal
s sp
ecie
s
Bat
s
Fle
a
Flo
wer
ing
plan
t
Bat
s
Lettu
ce a
nd a
phid
Leaf
hopp
er
Flo
wer
ing
plan
t
Aph
id o
r its
par
asito
id w
asp
Bird
spe
cies
Mus
cid
hous
e fly
Mam
mal
s sp
ecie
s in
clud
ing
hum
ans
Ric
e an
d le
afho
pper
s
Tar
o
Fis
h
Hum
ans
Fox
(fe
cal s
ampl
e)
Hum
ans
and
bats
Mai
ze a
nd le
afho
pper
Fis
h
Alfa
lfa
Wat
er S
trid
er
Ora
nge
swift
mot
h
Dip
tera
spe
cies
(C
allip
horid
and
Mus
cid
flies
)
Bat
s
Mam
mal
s sp
ecie
s
Dro
soph
ilid
frui
t fly
Bat
sH
uman
s an
d ba
ts
Egg
plan
t
Nem
atod
e
Fis
h
Japa
nese
car
pent
er a
nt
Dro
soph
ilid
frui
t fly
Bat
s
PAP
VS
BA
VS
AP
A AA PBA
AP
A AP
BA
VS
VS
VS
VS
P VS
VS
A PA P PPA P VS
VS
BA
P VS
PAP
PAP
VA VS
P P VS
UH
VS
PPAAA VS
VS
A VS
VS
PA A VS
N
Ass
ocia
ted
host
s A
rthro
pod-
vect
ored
pla
nt
Arth
ropo
ds
Ver
tebr
ate
spec
ific
Fig
2B0
4
Arth
ropo
d-ve
ctor
ed v
erte
brat
e Lo
w s
uppo
rt or
om
itted
N
emat
ode
lyssaviruses
cyto- and nucleo- rhabdoviruses
novi
rhab
dovi
ruse
s
A
Figu
re2
ML
ph
ylo
gen
yo
fth
eR
habd
ovir
idae
(A
)sh
ow
sth
eba
sal
fish
-in
fect
ing
no
virh
abd
ovi
ruse
san
un
assi
gned
gro
up
of
arth
rop
od
asso
ciat
edvi
ruse
sth
ep
lan
tin
fect
ing
cyto
-an
dn
ucl
eo-r
hab
do
viru
ses
asw
ella
sth
eve
rteb
rate
spec
ific
lyss
avir
use
s(B
)sh
ow
sth
ed
imar
hab
do
viru
ssu
per
gro
up
wh
ich
isp
red
om
inan
tly
com
po
sed
of
arth
rop
od
-vec
tore
dve
rteb
rate
viru
ses
alo
ng
wit
hth
ear
thro
po
d-s
pec
ific
sigm
avi
rus
clad
eB
ran
ches
are
colo
red
base
do
nth
e
Bay
esia
nh
ost
asso
ciat
ion
reco
nst
ruct
ion
anal
ysis
Bla
ckre
pre
sen
tsta
xao
mit
ted
fro
mh
ost
-sta
tere
con
stru
ctio
no
ras
soci
atio
ns
wit
hlt
095
sup
po
rtT
he
tree
was
infe
rred
fro
mL
gen
ese
qu
ence
su
sin
gth
eG
blo
cks
alig
nm
ent
Th
e
colu
mn
so
fte
xtar
eth
evi
rus
nam
eth
eh
ost
cate
gory
use
dfo
rre
con
stru
ctio
ns
and
kno
wn
ho
sts
(fro
mle
ftto
righ
t)C
od
esfo
rth
eh
ost
cate
gori
esar
eV
Sve
rteb
rate
-sp
ecifi
cV
Va
rth
rop
od
-vec
tore
dve
rteb
rate
Aa
rth
rop
od
spec
ific
BS
biti
ng-
arth
rop
od
(am
bigu
ou
sst
ate)
Vv
erte
brat
e(a
mbi
guo
us
stat
e)A
Pp
lan
t-sa
p-f
eed
ing-
arth
rop
od
(am
bigu
ou
sst
ate)
UH
un
cert
ain
-ho
st(a
mbi
guo
us
acro
ssal
lsta
tes)
an
dN
nem
atod
eN
ames
inbo
ldan
du
nd
erli
ned
are
vi-
ruse
sd
isco
vere
din
this
stu
dy
Th
etr
eeis
roo
ted
wit
hth
eC
hu
viru
scl
ade
(ro
ot
coll
apse
d)a
sid
enti
fied
asan
ou
tgro
up
in(L
iet
al2
015)
but
we
no
teth
isgi
ves
the
sam
ere
sult
asm
idp
oin
tan
dth
em
ole
cula
rcl
ock
roo
tin
gN
od
esla
-
bell
edw
ith
qu
esti
on
mar
ks(
)re
pre
sen
tn
od
esw
ith
aLR
T(a
pp
roxi
mat
eli
keli
ho
od
rati
ote
st)
stat
isti
cal
sup
po
rtva
lues
less
than
075
Sca
leba
rsh
ow
sn
um
ber
of
amin
o-a
cid
subs
titu
tio
ns
per
site
Bay
esia
nM
CC
tree
use
dto
infe
r
ance
stra
ltra
its
issh
ow
nin
Sup
ple
men
tary
Figu
reS4
(co
nti
nu
ed)
6 | Virus Evolution 2015 Vol 1 No 1
Kern Canyon virus
Mossuril virus
Yata virus
Drosophila obscura sigma virus
Gray Lodge virus
Oak Vale virus
Perinet virus
Morreton virus
Durham virus
Grass carp rhabdovirus
Wuhan Insect virus 7
Sripur virus
Itacaiunas virus
Spodoptera exigua TSA
Siniperca chuatsi virus
Fikirini bat rhabdovirus
Koolpinyah virus
Rochambeau virus
Tench rhabdovirus
Caligus rogercresseyi 11125273 TSA
Garba virus
Nishimuro virus
Keuraliba virus
New Minto virus
Long Island tick rhabdovirus
Klamath virus
Huangpi Tick Virus 3
Wuhan House Fly Virus 1
Dolphin rhabdovirus
Culex tritaeniorhynchus rhabdovirus
Scaptodrosophila deflexa sigma virus
Spring viremia of carp virus
Tibrogargan virus
La Joya virus
Vesicular stomatitis virus Alagoas Indiana 3
Berrimah virus
Taishun Tick Virus
Bovine ephmeral fever virus
Radi virus
Conwentzia psociformis TSA
Wuhan Louse Fly Virus 10
Oita virus
y
Chandipura virus
Vesicular stomatitis virus Indiana
Yongjia Tick Virus 2
Jurona virusYug Bogdanovac virus
Vesicular stomatitis virus New Jersey
Bole Tick Virus 2
Curionopolis virus
Shayang Fly Virus 2
Ceratitis capitata sigma virus
Mount Elgon bat virus
Lepeophtheirus salmonis rhabdovirus 127
Scophthalmus maximus rhabdovirus
Wuhan Fly Virus 2
Drosophila montana sigma virus
Wuhan Louse Fly Virus 5
Hart Park virus
Arboretum virus
Ord River virus
Bas Congo virus
Santa barbara virus
Wuhan Louse Fly Virus 9
Puerto Almendras virus
Lepeophtheirus salmonis rhabdovirus 9
Landjia virus
Sena Madureira virus
Isfahan virus
Sunguru virus
Pike fry rhabdovirus
Iriri virus
Caligus rogercresseyi 11114047 TSA
Wuhan Louse Fly Virus 8
Tacheng Tick Virus 3
Coastal Plains virus
Chaco virus
Drosophila ananassae sigma virus
Bahia Grande virus
Marco virus
Almpiwar virus
Malakal virus
Wuhan Tick Virus 1
Aruac virus
Sawgrass virus
Vesicular stomatitis virus Cocal
Parry Creek virus
Drosophila melanogaster sigma virus HAP23 isolate
Niakha virus
Drosophila melanogaster sigma virus AP30 isolate
Drosophila sturtvanti sigma virus
Joinjakaka virus
Moussa virus
Nkolbisson virus
Sweetwater Branch virus
Kwatta virus
Humulus lupulus TSA
Muscina stabulans sigma virus
Drosophila immigrans sigma virus
Eel Virus European X
Vesicular stomatitis virus New Jersey Hazelhurst
Barur virus
Connecticut virus
Starry flounder rhabdovirus
Harlingen virus
Inhangapi virus
North Creek Virus
Fukuoka virus
Kamese virus
Pararge aegeria rhabdovirus
Kimberley virus
Malpais Spring virus
Mosqueiro virus
Adelaide River virus
Flanders virus
Tupaia virus
Bivens Arm virus
Wuhan Louse Fly Virus 11
Drosophila tristis sigma virus
Muir Springs virus
Manitoba virus
Carajas oncolytic virus
Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus
Beaumont virus
Maraba virus
Wongabel virus
Ngaingan virus
Xiburema virus
Bats
Mosquitoes birds and mammals including humans
Mosquitoes
Drosophilid fruit fly
Mosquitoes
Mosquitoes and swine
Mosquitoes and sandflies
Sandflies
Birds
Grass carp
Aphid or its parasitoid wasp
Sandflies
Midges
Beet army worm moth
Mandarin fish
Bats
Cattle
Mosquitoes
Tench
Sea louse
Birds
Wild boar
Rodents
Ticks
Ticks
Voles
Ticks
Muscid house fly
Dolphins and porpoise
Mosquitoes
Drosophilid fruit fly
Common carp
Midges and bovids
Mosquitoes and rodents
Mammals including humans
Cattle
Ticks
Midges mosquitoes and ruminants
Sandflies
Lacewing
Louse fly
Bats
Sandflies and mammals including humans
Mammals including humans sandflies and mosquitoes
Ticks
MosquitoesSandflies
Mammals including humans biting and non-biting diptera
Ticks
Midges and mammals
Diptera species (Muscid house fly and Calliphorid laterine fly)
Tephritid fruit fly
Bats
Sea louse
Cultured turbot
Muscid house fly
Drosophilid fruit fly
Louse fly
Mosquitoes and birds
Mosquitoes
Mosquitoes
Humans
Psychodidae drain fly
Louse fly
Mosquitoes
Sea louse
Birds
Lizards
Mosquitoes ticks sandflies mammals including humans
Domestic chickens
Northern pike
Sandflies
Sea louse
Louse fly
Ticks
Bovids
Lizards
Drosophilid fruit fly
Mosquitoes
Lizards
Lizards
Mosquitoes
Ticks
Mosquitoes and birds
Ticks
Mites mosquitoes and mammals
Mosquitoes
Drosophilid fruit fly
Sandflies
Drosophilid fruit fly
Drosophilid fruit fly
Mosquitoes and cattle
Mosquitoes
Mosquitoes and humans
Midges and cattle
Mosquitoes ticks and mammals
Hops
False stable fly
Drosophilid fruit fly
European eel
Mammals including humans biting and non-biting diptera
Ticks mosquitoes fleas and mammals
Ticks and rabbits
Starry flounder
Mosquitoes
Sandflies and rodents
Mosquitoes
Midges mosquitoes and cattle
Mosquitoes and humans
Speckled wood butterfly
Midges mosquitoes and cattle
Mosquitoes
Mosquitoes
Cattle
Mosquitoes and birds
Tree shrews
Midges and cattle
Louse fly
Drosophilid fruit fly
Mosquitoes
Mosquitoes
Sandflies
Drosophilid fruit flyDrosophilid fruit fly
Mosquitoes
Sandflies
Midges and birds
Midges cattle and macropods
Mosquitoes
V
VV
BA
A
BA
VV
BA
BA
V
V
A
BA
BA
A
VS
V
V
BA
V
BA
V
V
V
BA
BA
V
BA
A
V
BA
A
VS
VV
VV
V
BA
BA
VV
BA
BA
V
VV
VV
BA
BABA
VV
BA
VV
A
A
V
BA
V
A
A
BA
VV
BA
BA
V
A
BA
BA
BA
V
V
VV
V
VS
BA
BA
BA
BA
V
V
A
BA
V
V
BA
BA
VV
BA
VV
BA
A
BA
A
A
VV
BA
VV
VV
VV
P
A
A
V
VV
VV
VV
V
BA
VV
BA
VV
VV
A
VV
BA
BA
V
VV
V
VV
BA
A
BA
BA
BA
AA
BA
BA
VV
VV
BA
04
sigma
viruses
dimarhabdovirus supergroup
BFig 2A
Figure 2 Continued
B Longdon et al | 7
replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states
We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled
35 Ancestral host associations and host-switches
Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)
Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups
Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade
Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was
contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips
There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species
Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)
We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in
8 | Virus Evolution 2015 Vol 1 No 1
the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice
4 Discussion
Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences
In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species
We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous
observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)
There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should
Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)
Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038
Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)
B Longdon et al | 9
reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)
Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis
Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses
Data availability
Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824
Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584
L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436
Funding
BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO
Supplementary data
Supplementary data is available at Virus Evolution online
Acknowledgements
Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments
Conflict of interest None declared
ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-
tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206
Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72
Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37
Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52
Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75
Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8
Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091
Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21
10 | Virus Evolution 2015 Vol 1 No 1
Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76
Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58
Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3
Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146
Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22
Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press
Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101
Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88
mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73
Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837
Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8
Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196
Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90
Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21
Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92
Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709
Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71
Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68
Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9
Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18
Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80
Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191
LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4
Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20
Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932
Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378
Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9
Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69
Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98
mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt
mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44
mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press
mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260
mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50
mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50
mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395
mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728
Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412
Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7
Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517
Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11
Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21
Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary
B Longdon et al | 11
Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23
Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt
Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt
Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91
Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59
Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9
Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77
van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256
Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press
mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25
mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664
Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63
Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210
Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32
12 | Virus Evolution 2015 Vol 1 No 1
evolution of the host associations we used a strict clock modeland a discrete asymmetric transition rate matrix (allowing tran-sitions to and from a host association to take place at differentrates) as previously used to model migrations between discretegeographic locations (Edwards et al 2011) and host switches(Weinert et al 2012 Faria et al 2013) We also examined how of-ten these viruses jumped between different classes of hosts us-ing reconstructed counts of biologically feasible changes of hostassociation and their HPD confidence intervals (CIs) usingMarkov Jumps (Minin and Suchard 2008) These includedswitches between arthropod-specific and both arthropod-vec-tored vertebrate and arthropod-vectored plant states and be-tween vertebrate specific and arthropod-vectored vertebratestates We used a constant population size coalescent prior forthe relative node ages (using a birth-death prior gave equivalentresults) and the BEAUti v18 default priors for all other parame-ters (Drummond et al 2012) (BEAUti xml available asSupplementary Material) In Figure 2 we have transferred theancestral state reconstruction from the BEAST tree to the MLtree
Convergence was assessed using Tracer v16 (Rambaut andDrummond 2007) and a burn-in of 30 was removed prior tothe construction of a consensus tree which included a descrip-tion of ancestral host associations in the output file High effec-tive sample sizes were achieved for all parameters (gt200)Previous simulations in the context of biogeographical infer-ence have shown that the approach is robust to sampling bias(Edwards et al 2011) However to confirm this following(Lemey et al 2014) we tested whether sample size predicts rateto or from a host association
3 Results31 Novel rhabdoviruses from RNA-seq
To search for new rhabdoviruses we collected a variety of differ-ent species of flies screened them for CO2 sensitivity which is acommon symptom of infection and sequenced total RNA ofthese flies by RNA-seq We identified rhabdovirus-like se-quences from a de-novo assembly by BLAST and used PCR toidentify which samples these sequences came from
This approach resulted in eleven rhabdovirus-like sequencesfrom nine (possibly ten) species of fly Seven of these viruseswere previously unknown and four had been reported previ-ously from shorter sequences (Supplementary Tables S1 andS2) The novel viruses were highly divergent from known vi-ruses Sigma viruses known from other species of Drosophilatypically have genomes of 125 kb (Longdon Obbard andJiggins 2010 Longdon et al 2011b) and six of our sequenceswere approximately this size suggesting they are near-complete genomes None of the viruses discovered in our RNA-seq data were integrated into the host genome (see lsquoMethodsrsquosection for details)
To investigate the putative gene content of the viruses wepredicted genes based on open reading frames (ORFs) For theviruses with apparently complete genomes (Fig 1) we foundthat those from Drosophila ananassae Daffinis Dimmigrans andDrosophila sturtvanti contained ORFs corresponding to the fivecore genes found across all rhabdoviruses with an additionalORF between the P and M genes This is the location of the Xgene found in sigma viruses and in three of the four novel vi-ruses it showed BLAST sequence similarity to the X gene ofsigma viruses The virus from Drosophila busckii did not containan additional ORF between the P and M genes but instead con-tained an ORF between the G and L gene
Using the phylogeny described below we have classified ournewly discovered viruses as either sigma viruses rhabdovi-ruses or other viruses and named them after the host speciesthey were identified from (Fig 1) (Longdon and Walker 2011)We also found one other novel mononegavirales-like sequencefrom Drosophila unispina that groups with a recently discoveredclade of arthropod associated viruses (Nyamivirus clade [Li et al2015] see Supplementary Table S5 and the full phylogeny) aswell as five other RNA viruses from various families (data notshown) confirming our approach can detect a wide range of di-vergent viruses
Putative genes are shown in color non-coding regions areshown in black ORFs were designated as the first start codonfollowing the transcription termination sequence (7 Ursquos) of theprevious ORF to the first stop codon Dotted lines representparts of the genome not sequenced These viruses were eitherfrom our own RNA-seq data or were first found in in public
Scaptodrosophila deflexa sigmavirusDrosophila sturtevanti rhabdovirus
Drosophila tristis sigmavirusDrosophila montana sigmavirus
Drosophila subobscura rhabdovirusDrosophila algonquin sigmavirus
Pararge aegeria rhabdovirusCeratitis capitata sigmavirus
Drosophila busckii rhabdovirusDrosophila sturtevanti sigmavirusDrosophila immigrans sigmavirus
Drosophila ananassae sigmavirusDrosophila affinis sigmavirus
Sequence length (nucleotides)0 5000 10000 15000
N P M G L X Accessory
Figure 1 Genome organization of newly discovered viruses from metagenomic RNA sequencing of CO2 sensitive flies
4 | Virus Evolution 2015 Vol 1 No 1
databases and key features verified by PCR and Sanger sequenc-ing Rhabdovirus genomes are typically 11ndash13-kb long andcontain five core genes 30-N-P-M-G-L-50 (Dietzgen and Kuzmin2012) However a number of groups of rhabdoviruses containadditional accessory genes and can be up to 16-kb long(Walker et al 2011 2015)
32 New rhabdoviruses from public databases
We identified a further twenty-six novel rhabdovirus-like se-quences by searching public databases of assembled RNA-seqdata with BLAST These included nineteen viruses from arthro-pods (Fleas Crustacea Lepidoptera Diptera) one from aCnidarian (Hydra) and five from plants (Supplementary TableS3) Of these viruses nineteen had sufficient amounts of codingsequence (gt1000 bp) to include in the phylogenetic analysis(Supplementary Table S3) whilst the remainder were too short(Supplementary Table S4)
Four viruses from databases had near-complete genomesbased on their size These were from the moth Triodia sylvinathe house fly Musca domestica (99 nucleotide identity toWuhan house fly virus 2 [Li et al 2015]) the butterfly Paegeriaand the medfly Ccapitata all of which contain ORFs correspond-ing to the five core rhabdovirus genes The sequence fromCcapitata had an additional ORF between the P and M geneswith BLAST sequence similarity to the X gene in sigma virusesThere were several unusual sequences First in the virus fromPaegeria there appear to be two full-length glycoprotein ORFsbetween the M and L genes (we confirmed by Sanger sequencingthat both exist and the stop codon between the two genes wasnot an error) Second the Agave tequilana transcriptome con-tained a L gene ORF on a contig that was the length of a typicalrhabdovirus genome but did not appear to contain typical genecontent suggesting it has very atypical genome organization orhas been misassembled or is integrated into its host plant ge-nome (Chiba et al 2011) Finally the virus from Hydra magnipa-pillata contained six predicted genes but the L gene (RDRP) ORFwas unusually long Some of the viruses we detected may beEVEs inserted into the host genome and subsequently ex-pressed (Aiewsakun and Katzourakis 2015) For example this islikely the case for the sequence from the silkworm Bombyx morithat we also found in the silkworm genome and the L gene se-quence from Spodoptera exigua that contains stop codons Underthe assumption that viruses integrated into host genomes onceinfected those hosts this does not affect our conclusions belowabout the host range of these viruses (Katzourakis and Gifford2010 Fort et al 2011 Ballinger Bruenn and Taylor 2012) Wealso found nine other novel mononegavirale-like sequencesthat group with recently discovered clades of insect viruses (Liet al 2015) (see Supplementary Table S5 and SupplementaryFig S4)
33 Rhabdovirus phylogeny
To reconstruct the evolution of the Rhabdoviridae we have pro-duced the most complete phylogeny of the group to date (Fig 2)We aligned the relatively conserved L gene (RNA DependantRNA Polymerase) from our newly discovered viruses with se-quences of known rhabdoviruses to give an alignment of 195rhabdoviruses (and twenty-six other mononegavirales as anoutgroup) We reconstructed the phylogeny using different se-quence alignments and methodologies and these all gave qual-itatively similar results with the same major clades beingreconstructed The ML and Bayesian relaxed clock phylogenies
were very similar 149188 nodes are found in both reconstruc-tions and only two nodes present in the Bayesian relaxed clocktree with strong support are absent from the ML tree withstrong support These are found in a single basal clade of diver-gent but uniformly arthropod-specific strains where the differ-ence in topology will have no consequence for inference of hostassociation This suggests that our analysis is robust to the as-sumptions of a relaxed molecular clock The branching orderbetween the clades in the dimarhabdovirus supergroup wasgenerally poorly supported and differed between the methodsand alignments Eight sequences that we discovered were notincluded in this analysis as they were considered too short buttheir closest BLAST hits are listed in Supplementary Table S4
We recovered all of the major clades described previously(Fig 2) and found that the majority of known rhabdoviruses be-long to the dimarhabdovirus clade (Fig 2B) The RNA-seq vi-ruses from Drosophila fall into either the sigma virus clade(Fig 2B) or the arthropod clade sister to the cyto- and nucleo-rhabdoviruses (Fig 2A) The viruses from sequence databasesare diverse coming from almost all of the major clades with theexception of the lyssaviruses
34 Predicted host associations of viruses
With a few exceptions rhabdoviruses are either arthropod-vec-tored viruses of plants or vertebrates or are vertebrate- or ar-thropod- specific In many cases the only information about avirus is the host from which it was isolated Therefore a priori itis not clear whether viruses isolated from vertebrates are verte-brate-specific or arthropod-vectored or whether viruses iso-lated from biting arthropods (eg mosquitoes sandflies ticksmidges and sea lice) are arthropod specific or also infect verte-brates Likewise it is not clear whether viruses isolated fromsap-sucking insects (all Hemiptera aphids leafhoppers scaleinsect and mealybugs) are arthropod-specific or arthropod-vec-tored plant viruses By combining data on the ambiguous andknown host associations with phylogenetic information wewere able to predict both the ancestral and present host associ-ations of these viruses To do this we used a Bayesian phyloge-netic analysis that simultaneously estimated the phylogenyand host association of our data In the analysis we defined ourhost associations either as vertebrate-specific arthropod-spe-cific arthropod-vectored vertebrate arthropod-vectored plantnematode or as ambiguous between two (and in one case allfive) of these states (see lsquoMethodsrsquo section)
This approach identified a large number of viruses that arelikely to be new arthropod-vectored vertebrate viruses (Fig 2B)Of eighty viruses with ambiguous host associations eighty-ninewere assigned a host association with strong posterior support(gt095) Of the fifty-two viruses found in biting arthropods for-ty-five were predicted to be arthropod-vectored vertebrate vi-ruses and six to be arthropod-specific Of the thirty virusesfound in vertebrates twenty-two were predicted to be arthro-pod-vectored vertebrate viruses and two were predicted to bevertebrate-specific (both fish viruses) Of the seven virusesfound in plant-sap-feeding arthropods (Fig 2A) three were pre-dicted to be plant-associated and two arthropod-associated
To test the accuracy of our predictions of current host asso-ciations we randomly selected a set of viruses with known asso-ciations re-assigned their host association as ambiguousbetween all possible states (a greater level of uncertainty thanwe generally attributed to viruses in our data) and re-ran ouranalysis We repeated this ten times for nine sets of ten virusesand one set of nine viruses (randomly sampling without
B Longdon et al | 5
Hyd
ra m
agn
ipap
illat
a T
SA
Nor
ther
n ce
real
mos
aic
viru
sP
lan
oco
ccu
s ci
tri T
SA
Bok
eloh
bat
lyss
aviru
s
Tac
heng
Tic
k V
irus
7
Ikom
a ly
ssav
irus
Wuh
an In
sect
viru
s 6
Mu
sca
do
mes
tica
TS
A
Wuh
an F
ly V
irus
3
Shu
anga
o In
sect
Viru
s 6
Per
sim
mon
viru
s A
Hira
me
rhab
dovi
rus
Shu
anga
o B
edbu
g V
irus
2
Wuh
an In
sect
viru
s 4
Jing
shan
Fly
Viru
s 2
Ker
ria
lacc
a T
SA
Wuh
an M
osqu
ito V
irus
9
Shi
mon
i bat
viru
s
Aus
tral
ian
bat l
yssa
viru
s b
Mok
ola
viru
s is
olat
e 86
100C
AM
Irku
t viru
s
Lo
tus
corn
icu
latu
s T
SA
Lago
s ba
t viru
s i8
619N
GA
Aus
tral
ian
bat l
yssa
viru
s a
Spo
dopt
era
frug
iper
da r
habd
oviru
s
Lettu
ce n
ecro
tic y
ello
ws
viru
s
Fra
nkl
inie
lla o
ccid
enta
lis T
SA
Lo
lium
per
enn
e T
SA
Mai
ze Ir
ania
n m
osai
c vi
rus
Son
chus
yel
low
net
Dro
sop
hila
stu
rtva
nti
rh
abd
ovi
rus
Mai
ze m
osai
c vi
rus
Mok
ola
viru
s 86
101R
CA
Eur
opea
n ba
t lys
savi
rus
RV
9 1
Oro
psy
lla s
ilan
tiew
i TS
A
Orc
hid
fleck
viru
s
Wes
t Cau
casi
an b
at v
irus
Lettu
ce y
ello
w m
ottle
viru
s
Taa
stru
p vi
rus
Ag
ave
teq
uila
na
TS
A
Wuh
an In
sect
viru
s 5
Far
min
gton
viru
s
Wuh
an H
ouse
Fly
Viru
s 2
Rab
ies
viru
s
Ric
e ye
llow
stu
nt v
irus
Tar
o ve
in c
hlor
osis
viru
s
Vira
l hem
orrh
agic
sep
ticem
ia v
irus
Lyss
aviru
s O
zern
oe
Fox
feca
l rha
bdov
irus
Duv
enha
ge v
irus
8613
2SA
Mai
ze fi
ne s
trea
k vi
rus
Infe
ctio
us h
aem
atop
oiet
ic n
ecro
sis
viru
s
Med
icag
o sa
tiva
TS
A
San
xia
Wat
er S
trid
er V
irus
5
Tri
od
ia s
ylvi
na
TS
A
Sha
yang
Fly
Viru
s 3
Lago
s ba
t viru
s K
E13
1
Eur
opea
n ba
t lys
savi
rus
1 89
18F
RA
Dro
sop
hila
su
bo
bsc
ura
rh
abd
ovi
rus
Khu
jand
lyss
aviru
sE
urop
ean
bat l
yssa
viru
s 2
9018
HO
L
Egg
plan
t mot
tled
dwar
f viru
s
Soy
bean
cys
t nem
atod
e vi
rus
Sna
kehe
ad r
habd
oviru
s
Wuh
an A
nt V
irus
Dro
sop
hila
bu
scki
i rh
abd
ovi
rus
Ara
van
viru
s
Hyd
ra (
Cni
daria
n)
Cer
eals
and
leaf
hopp
ers
Citr
us m
ealy
bug
Bat
s
Tic
ks
Afr
ican
Civ
ets
Aph
id o
r its
par
asito
id w
asp
Mus
cid
hous
e fly
Dip
tera
spe
cies
(C
allip
horid
and
Sac
opha
gid
flies
)
Dip
tera
and
Lep
idop
tera
Per
sim
mon
tree
Fis
h
Bed
bug
Aph
id o
r its
par
asito
id w
asp
Sar
coph
agid
fles
h fly
Sca
le in
sect
Mos
quito
es
Bat
s
Bat
s an
d hu
man
s
Mam
mal
s sp
ecie
s
Bat
s
Flo
wer
ing
plan
t
Mam
mal
s sp
ecie
s
Bat
s an
d hu
man
s
Fal
l arm
y w
orm
mot
h
Lettu
ce o
ther
dic
ot p
lant
s an
d ap
hids
Wes
tern
flow
er th
rip
Rye
gra
ss
Cer
eals
and
pla
ntho
pper
Flo
wer
ing
plan
t and
aph
id
Dro
soph
ilid
frui
t fly
Mai
ze a
nd p
lant
hopp
ers
Mam
mal
s sp
ecie
s
Bat
s
Fle
a
Flo
wer
ing
plan
t
Bat
s
Lettu
ce a
nd a
phid
Leaf
hopp
er
Flo
wer
ing
plan
t
Aph
id o
r its
par
asito
id w
asp
Bird
spe
cies
Mus
cid
hous
e fly
Mam
mal
s sp
ecie
s in
clud
ing
hum
ans
Ric
e an
d le
afho
pper
s
Tar
o
Fis
h
Hum
ans
Fox
(fe
cal s
ampl
e)
Hum
ans
and
bats
Mai
ze a
nd le
afho
pper
Fis
h
Alfa
lfa
Wat
er S
trid
er
Ora
nge
swift
mot
h
Dip
tera
spe
cies
(C
allip
horid
and
Mus
cid
flies
)
Bat
s
Mam
mal
s sp
ecie
s
Dro
soph
ilid
frui
t fly
Bat
sH
uman
s an
d ba
ts
Egg
plan
t
Nem
atod
e
Fis
h
Japa
nese
car
pent
er a
nt
Dro
soph
ilid
frui
t fly
Bat
s
PAP
VS
BA
VS
AP
A AA PBA
AP
A AP
BA
VS
VS
VS
VS
P VS
VS
A PA P PPA P VS
VS
BA
P VS
PAP
PAP
VA VS
P P VS
UH
VS
PPAAA VS
VS
A VS
VS
PA A VS
N
Ass
ocia
ted
host
s A
rthro
pod-
vect
ored
pla
nt
Arth
ropo
ds
Ver
tebr
ate
spec
ific
Fig
2B0
4
Arth
ropo
d-ve
ctor
ed v
erte
brat
e Lo
w s
uppo
rt or
om
itted
N
emat
ode
lyssaviruses
cyto- and nucleo- rhabdoviruses
novi
rhab
dovi
ruse
s
A
Figu
re2
ML
ph
ylo
gen
yo
fth
eR
habd
ovir
idae
(A
)sh
ow
sth
eba
sal
fish
-in
fect
ing
no
virh
abd
ovi
ruse
san
un
assi
gned
gro
up
of
arth
rop
od
asso
ciat
edvi
ruse
sth
ep
lan
tin
fect
ing
cyto
-an
dn
ucl
eo-r
hab
do
viru
ses
asw
ella
sth
eve
rteb
rate
spec
ific
lyss
avir
use
s(B
)sh
ow
sth
ed
imar
hab
do
viru
ssu
per
gro
up
wh
ich
isp
red
om
inan
tly
com
po
sed
of
arth
rop
od
-vec
tore
dve
rteb
rate
viru
ses
alo
ng
wit
hth
ear
thro
po
d-s
pec
ific
sigm
avi
rus
clad
eB
ran
ches
are
colo
red
base
do
nth
e
Bay
esia
nh
ost
asso
ciat
ion
reco
nst
ruct
ion
anal
ysis
Bla
ckre
pre
sen
tsta
xao
mit
ted
fro
mh
ost
-sta
tere
con
stru
ctio
no
ras
soci
atio
ns
wit
hlt
095
sup
po
rtT
he
tree
was
infe
rred
fro
mL
gen
ese
qu
ence
su
sin
gth
eG
blo
cks
alig
nm
ent
Th
e
colu
mn
so
fte
xtar
eth
evi
rus
nam
eth
eh
ost
cate
gory
use
dfo
rre
con
stru
ctio
ns
and
kno
wn
ho
sts
(fro
mle
ftto
righ
t)C
od
esfo
rth
eh
ost
cate
gori
esar
eV
Sve
rteb
rate
-sp
ecifi
cV
Va
rth
rop
od
-vec
tore
dve
rteb
rate
Aa
rth
rop
od
spec
ific
BS
biti
ng-
arth
rop
od
(am
bigu
ou
sst
ate)
Vv
erte
brat
e(a
mbi
guo
us
stat
e)A
Pp
lan
t-sa
p-f
eed
ing-
arth
rop
od
(am
bigu
ou
sst
ate)
UH
un
cert
ain
-ho
st(a
mbi
guo
us
acro
ssal
lsta
tes)
an
dN
nem
atod
eN
ames
inbo
ldan
du
nd
erli
ned
are
vi-
ruse
sd
isco
vere
din
this
stu
dy
Th
etr
eeis
roo
ted
wit
hth
eC
hu
viru
scl
ade
(ro
ot
coll
apse
d)a
sid
enti
fied
asan
ou
tgro
up
in(L
iet
al2
015)
but
we
no
teth
isgi
ves
the
sam
ere
sult
asm
idp
oin
tan
dth
em
ole
cula
rcl
ock
roo
tin
gN
od
esla
-
bell
edw
ith
qu
esti
on
mar
ks(
)re
pre
sen
tn
od
esw
ith
aLR
T(a
pp
roxi
mat
eli
keli
ho
od
rati
ote
st)
stat
isti
cal
sup
po
rtva
lues
less
than
075
Sca
leba
rsh
ow
sn
um
ber
of
amin
o-a
cid
subs
titu
tio
ns
per
site
Bay
esia
nM
CC
tree
use
dto
infe
r
ance
stra
ltra
its
issh
ow
nin
Sup
ple
men
tary
Figu
reS4
(co
nti
nu
ed)
6 | Virus Evolution 2015 Vol 1 No 1
Kern Canyon virus
Mossuril virus
Yata virus
Drosophila obscura sigma virus
Gray Lodge virus
Oak Vale virus
Perinet virus
Morreton virus
Durham virus
Grass carp rhabdovirus
Wuhan Insect virus 7
Sripur virus
Itacaiunas virus
Spodoptera exigua TSA
Siniperca chuatsi virus
Fikirini bat rhabdovirus
Koolpinyah virus
Rochambeau virus
Tench rhabdovirus
Caligus rogercresseyi 11125273 TSA
Garba virus
Nishimuro virus
Keuraliba virus
New Minto virus
Long Island tick rhabdovirus
Klamath virus
Huangpi Tick Virus 3
Wuhan House Fly Virus 1
Dolphin rhabdovirus
Culex tritaeniorhynchus rhabdovirus
Scaptodrosophila deflexa sigma virus
Spring viremia of carp virus
Tibrogargan virus
La Joya virus
Vesicular stomatitis virus Alagoas Indiana 3
Berrimah virus
Taishun Tick Virus
Bovine ephmeral fever virus
Radi virus
Conwentzia psociformis TSA
Wuhan Louse Fly Virus 10
Oita virus
y
Chandipura virus
Vesicular stomatitis virus Indiana
Yongjia Tick Virus 2
Jurona virusYug Bogdanovac virus
Vesicular stomatitis virus New Jersey
Bole Tick Virus 2
Curionopolis virus
Shayang Fly Virus 2
Ceratitis capitata sigma virus
Mount Elgon bat virus
Lepeophtheirus salmonis rhabdovirus 127
Scophthalmus maximus rhabdovirus
Wuhan Fly Virus 2
Drosophila montana sigma virus
Wuhan Louse Fly Virus 5
Hart Park virus
Arboretum virus
Ord River virus
Bas Congo virus
Santa barbara virus
Wuhan Louse Fly Virus 9
Puerto Almendras virus
Lepeophtheirus salmonis rhabdovirus 9
Landjia virus
Sena Madureira virus
Isfahan virus
Sunguru virus
Pike fry rhabdovirus
Iriri virus
Caligus rogercresseyi 11114047 TSA
Wuhan Louse Fly Virus 8
Tacheng Tick Virus 3
Coastal Plains virus
Chaco virus
Drosophila ananassae sigma virus
Bahia Grande virus
Marco virus
Almpiwar virus
Malakal virus
Wuhan Tick Virus 1
Aruac virus
Sawgrass virus
Vesicular stomatitis virus Cocal
Parry Creek virus
Drosophila melanogaster sigma virus HAP23 isolate
Niakha virus
Drosophila melanogaster sigma virus AP30 isolate
Drosophila sturtvanti sigma virus
Joinjakaka virus
Moussa virus
Nkolbisson virus
Sweetwater Branch virus
Kwatta virus
Humulus lupulus TSA
Muscina stabulans sigma virus
Drosophila immigrans sigma virus
Eel Virus European X
Vesicular stomatitis virus New Jersey Hazelhurst
Barur virus
Connecticut virus
Starry flounder rhabdovirus
Harlingen virus
Inhangapi virus
North Creek Virus
Fukuoka virus
Kamese virus
Pararge aegeria rhabdovirus
Kimberley virus
Malpais Spring virus
Mosqueiro virus
Adelaide River virus
Flanders virus
Tupaia virus
Bivens Arm virus
Wuhan Louse Fly Virus 11
Drosophila tristis sigma virus
Muir Springs virus
Manitoba virus
Carajas oncolytic virus
Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus
Beaumont virus
Maraba virus
Wongabel virus
Ngaingan virus
Xiburema virus
Bats
Mosquitoes birds and mammals including humans
Mosquitoes
Drosophilid fruit fly
Mosquitoes
Mosquitoes and swine
Mosquitoes and sandflies
Sandflies
Birds
Grass carp
Aphid or its parasitoid wasp
Sandflies
Midges
Beet army worm moth
Mandarin fish
Bats
Cattle
Mosquitoes
Tench
Sea louse
Birds
Wild boar
Rodents
Ticks
Ticks
Voles
Ticks
Muscid house fly
Dolphins and porpoise
Mosquitoes
Drosophilid fruit fly
Common carp
Midges and bovids
Mosquitoes and rodents
Mammals including humans
Cattle
Ticks
Midges mosquitoes and ruminants
Sandflies
Lacewing
Louse fly
Bats
Sandflies and mammals including humans
Mammals including humans sandflies and mosquitoes
Ticks
MosquitoesSandflies
Mammals including humans biting and non-biting diptera
Ticks
Midges and mammals
Diptera species (Muscid house fly and Calliphorid laterine fly)
Tephritid fruit fly
Bats
Sea louse
Cultured turbot
Muscid house fly
Drosophilid fruit fly
Louse fly
Mosquitoes and birds
Mosquitoes
Mosquitoes
Humans
Psychodidae drain fly
Louse fly
Mosquitoes
Sea louse
Birds
Lizards
Mosquitoes ticks sandflies mammals including humans
Domestic chickens
Northern pike
Sandflies
Sea louse
Louse fly
Ticks
Bovids
Lizards
Drosophilid fruit fly
Mosquitoes
Lizards
Lizards
Mosquitoes
Ticks
Mosquitoes and birds
Ticks
Mites mosquitoes and mammals
Mosquitoes
Drosophilid fruit fly
Sandflies
Drosophilid fruit fly
Drosophilid fruit fly
Mosquitoes and cattle
Mosquitoes
Mosquitoes and humans
Midges and cattle
Mosquitoes ticks and mammals
Hops
False stable fly
Drosophilid fruit fly
European eel
Mammals including humans biting and non-biting diptera
Ticks mosquitoes fleas and mammals
Ticks and rabbits
Starry flounder
Mosquitoes
Sandflies and rodents
Mosquitoes
Midges mosquitoes and cattle
Mosquitoes and humans
Speckled wood butterfly
Midges mosquitoes and cattle
Mosquitoes
Mosquitoes
Cattle
Mosquitoes and birds
Tree shrews
Midges and cattle
Louse fly
Drosophilid fruit fly
Mosquitoes
Mosquitoes
Sandflies
Drosophilid fruit flyDrosophilid fruit fly
Mosquitoes
Sandflies
Midges and birds
Midges cattle and macropods
Mosquitoes
V
VV
BA
A
BA
VV
BA
BA
V
V
A
BA
BA
A
VS
V
V
BA
V
BA
V
V
V
BA
BA
V
BA
A
V
BA
A
VS
VV
VV
V
BA
BA
VV
BA
BA
V
VV
VV
BA
BABA
VV
BA
VV
A
A
V
BA
V
A
A
BA
VV
BA
BA
V
A
BA
BA
BA
V
V
VV
V
VS
BA
BA
BA
BA
V
V
A
BA
V
V
BA
BA
VV
BA
VV
BA
A
BA
A
A
VV
BA
VV
VV
VV
P
A
A
V
VV
VV
VV
V
BA
VV
BA
VV
VV
A
VV
BA
BA
V
VV
V
VV
BA
A
BA
BA
BA
AA
BA
BA
VV
VV
BA
04
sigma
viruses
dimarhabdovirus supergroup
BFig 2A
Figure 2 Continued
B Longdon et al | 7
replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states
We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled
35 Ancestral host associations and host-switches
Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)
Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups
Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade
Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was
contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips
There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species
Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)
We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in
8 | Virus Evolution 2015 Vol 1 No 1
the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice
4 Discussion
Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences
In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species
We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous
observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)
There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should
Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)
Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038
Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)
B Longdon et al | 9
reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)
Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis
Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses
Data availability
Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824
Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584
L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436
Funding
BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO
Supplementary data
Supplementary data is available at Virus Evolution online
Acknowledgements
Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments
Conflict of interest None declared
ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-
tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206
Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72
Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37
Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52
Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75
Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8
Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091
Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21
10 | Virus Evolution 2015 Vol 1 No 1
Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76
Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58
Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3
Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146
Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22
Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press
Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101
Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88
mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73
Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837
Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8
Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196
Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90
Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21
Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92
Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709
Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71
Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68
Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9
Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18
Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80
Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191
LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4
Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20
Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932
Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378
Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9
Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69
Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98
mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt
mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44
mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press
mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260
mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50
mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50
mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395
mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728
Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412
Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7
Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517
Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11
Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21
Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary
B Longdon et al | 11
Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23
Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt
Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt
Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91
Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59
Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9
Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77
van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256
Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press
mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25
mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664
Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63
Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210
Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32
12 | Virus Evolution 2015 Vol 1 No 1
databases and key features verified by PCR and Sanger sequenc-ing Rhabdovirus genomes are typically 11ndash13-kb long andcontain five core genes 30-N-P-M-G-L-50 (Dietzgen and Kuzmin2012) However a number of groups of rhabdoviruses containadditional accessory genes and can be up to 16-kb long(Walker et al 2011 2015)
32 New rhabdoviruses from public databases
We identified a further twenty-six novel rhabdovirus-like se-quences by searching public databases of assembled RNA-seqdata with BLAST These included nineteen viruses from arthro-pods (Fleas Crustacea Lepidoptera Diptera) one from aCnidarian (Hydra) and five from plants (Supplementary TableS3) Of these viruses nineteen had sufficient amounts of codingsequence (gt1000 bp) to include in the phylogenetic analysis(Supplementary Table S3) whilst the remainder were too short(Supplementary Table S4)
Four viruses from databases had near-complete genomesbased on their size These were from the moth Triodia sylvinathe house fly Musca domestica (99 nucleotide identity toWuhan house fly virus 2 [Li et al 2015]) the butterfly Paegeriaand the medfly Ccapitata all of which contain ORFs correspond-ing to the five core rhabdovirus genes The sequence fromCcapitata had an additional ORF between the P and M geneswith BLAST sequence similarity to the X gene in sigma virusesThere were several unusual sequences First in the virus fromPaegeria there appear to be two full-length glycoprotein ORFsbetween the M and L genes (we confirmed by Sanger sequencingthat both exist and the stop codon between the two genes wasnot an error) Second the Agave tequilana transcriptome con-tained a L gene ORF on a contig that was the length of a typicalrhabdovirus genome but did not appear to contain typical genecontent suggesting it has very atypical genome organization orhas been misassembled or is integrated into its host plant ge-nome (Chiba et al 2011) Finally the virus from Hydra magnipa-pillata contained six predicted genes but the L gene (RDRP) ORFwas unusually long Some of the viruses we detected may beEVEs inserted into the host genome and subsequently ex-pressed (Aiewsakun and Katzourakis 2015) For example this islikely the case for the sequence from the silkworm Bombyx morithat we also found in the silkworm genome and the L gene se-quence from Spodoptera exigua that contains stop codons Underthe assumption that viruses integrated into host genomes onceinfected those hosts this does not affect our conclusions belowabout the host range of these viruses (Katzourakis and Gifford2010 Fort et al 2011 Ballinger Bruenn and Taylor 2012) Wealso found nine other novel mononegavirale-like sequencesthat group with recently discovered clades of insect viruses (Liet al 2015) (see Supplementary Table S5 and SupplementaryFig S4)
33 Rhabdovirus phylogeny
To reconstruct the evolution of the Rhabdoviridae we have pro-duced the most complete phylogeny of the group to date (Fig 2)We aligned the relatively conserved L gene (RNA DependantRNA Polymerase) from our newly discovered viruses with se-quences of known rhabdoviruses to give an alignment of 195rhabdoviruses (and twenty-six other mononegavirales as anoutgroup) We reconstructed the phylogeny using different se-quence alignments and methodologies and these all gave qual-itatively similar results with the same major clades beingreconstructed The ML and Bayesian relaxed clock phylogenies
were very similar 149188 nodes are found in both reconstruc-tions and only two nodes present in the Bayesian relaxed clocktree with strong support are absent from the ML tree withstrong support These are found in a single basal clade of diver-gent but uniformly arthropod-specific strains where the differ-ence in topology will have no consequence for inference of hostassociation This suggests that our analysis is robust to the as-sumptions of a relaxed molecular clock The branching orderbetween the clades in the dimarhabdovirus supergroup wasgenerally poorly supported and differed between the methodsand alignments Eight sequences that we discovered were notincluded in this analysis as they were considered too short buttheir closest BLAST hits are listed in Supplementary Table S4
We recovered all of the major clades described previously(Fig 2) and found that the majority of known rhabdoviruses be-long to the dimarhabdovirus clade (Fig 2B) The RNA-seq vi-ruses from Drosophila fall into either the sigma virus clade(Fig 2B) or the arthropod clade sister to the cyto- and nucleo-rhabdoviruses (Fig 2A) The viruses from sequence databasesare diverse coming from almost all of the major clades with theexception of the lyssaviruses
34 Predicted host associations of viruses
With a few exceptions rhabdoviruses are either arthropod-vec-tored viruses of plants or vertebrates or are vertebrate- or ar-thropod- specific In many cases the only information about avirus is the host from which it was isolated Therefore a priori itis not clear whether viruses isolated from vertebrates are verte-brate-specific or arthropod-vectored or whether viruses iso-lated from biting arthropods (eg mosquitoes sandflies ticksmidges and sea lice) are arthropod specific or also infect verte-brates Likewise it is not clear whether viruses isolated fromsap-sucking insects (all Hemiptera aphids leafhoppers scaleinsect and mealybugs) are arthropod-specific or arthropod-vec-tored plant viruses By combining data on the ambiguous andknown host associations with phylogenetic information wewere able to predict both the ancestral and present host associ-ations of these viruses To do this we used a Bayesian phyloge-netic analysis that simultaneously estimated the phylogenyand host association of our data In the analysis we defined ourhost associations either as vertebrate-specific arthropod-spe-cific arthropod-vectored vertebrate arthropod-vectored plantnematode or as ambiguous between two (and in one case allfive) of these states (see lsquoMethodsrsquo section)
This approach identified a large number of viruses that arelikely to be new arthropod-vectored vertebrate viruses (Fig 2B)Of eighty viruses with ambiguous host associations eighty-ninewere assigned a host association with strong posterior support(gt095) Of the fifty-two viruses found in biting arthropods for-ty-five were predicted to be arthropod-vectored vertebrate vi-ruses and six to be arthropod-specific Of the thirty virusesfound in vertebrates twenty-two were predicted to be arthro-pod-vectored vertebrate viruses and two were predicted to bevertebrate-specific (both fish viruses) Of the seven virusesfound in plant-sap-feeding arthropods (Fig 2A) three were pre-dicted to be plant-associated and two arthropod-associated
To test the accuracy of our predictions of current host asso-ciations we randomly selected a set of viruses with known asso-ciations re-assigned their host association as ambiguousbetween all possible states (a greater level of uncertainty thanwe generally attributed to viruses in our data) and re-ran ouranalysis We repeated this ten times for nine sets of ten virusesand one set of nine viruses (randomly sampling without
B Longdon et al | 5
Hyd
ra m
agn
ipap
illat
a T
SA
Nor
ther
n ce
real
mos
aic
viru
sP
lan
oco
ccu
s ci
tri T
SA
Bok
eloh
bat
lyss
aviru
s
Tac
heng
Tic
k V
irus
7
Ikom
a ly
ssav
irus
Wuh
an In
sect
viru
s 6
Mu
sca
do
mes
tica
TS
A
Wuh
an F
ly V
irus
3
Shu
anga
o In
sect
Viru
s 6
Per
sim
mon
viru
s A
Hira
me
rhab
dovi
rus
Shu
anga
o B
edbu
g V
irus
2
Wuh
an In
sect
viru
s 4
Jing
shan
Fly
Viru
s 2
Ker
ria
lacc
a T
SA
Wuh
an M
osqu
ito V
irus
9
Shi
mon
i bat
viru
s
Aus
tral
ian
bat l
yssa
viru
s b
Mok
ola
viru
s is
olat
e 86
100C
AM
Irku
t viru
s
Lo
tus
corn
icu
latu
s T
SA
Lago
s ba
t viru
s i8
619N
GA
Aus
tral
ian
bat l
yssa
viru
s a
Spo
dopt
era
frug
iper
da r
habd
oviru
s
Lettu
ce n
ecro
tic y
ello
ws
viru
s
Fra
nkl
inie
lla o
ccid
enta
lis T
SA
Lo
lium
per
enn
e T
SA
Mai
ze Ir
ania
n m
osai
c vi
rus
Son
chus
yel
low
net
Dro
sop
hila
stu
rtva
nti
rh
abd
ovi
rus
Mai
ze m
osai
c vi
rus
Mok
ola
viru
s 86
101R
CA
Eur
opea
n ba
t lys
savi
rus
RV
9 1
Oro
psy
lla s
ilan
tiew
i TS
A
Orc
hid
fleck
viru
s
Wes
t Cau
casi
an b
at v
irus
Lettu
ce y
ello
w m
ottle
viru
s
Taa
stru
p vi
rus
Ag
ave
teq
uila
na
TS
A
Wuh
an In
sect
viru
s 5
Far
min
gton
viru
s
Wuh
an H
ouse
Fly
Viru
s 2
Rab
ies
viru
s
Ric
e ye
llow
stu
nt v
irus
Tar
o ve
in c
hlor
osis
viru
s
Vira
l hem
orrh
agic
sep
ticem
ia v
irus
Lyss
aviru
s O
zern
oe
Fox
feca
l rha
bdov
irus
Duv
enha
ge v
irus
8613
2SA
Mai
ze fi
ne s
trea
k vi
rus
Infe
ctio
us h
aem
atop
oiet
ic n
ecro
sis
viru
s
Med
icag
o sa
tiva
TS
A
San
xia
Wat
er S
trid
er V
irus
5
Tri
od
ia s
ylvi
na
TS
A
Sha
yang
Fly
Viru
s 3
Lago
s ba
t viru
s K
E13
1
Eur
opea
n ba
t lys
savi
rus
1 89
18F
RA
Dro
sop
hila
su
bo
bsc
ura
rh
abd
ovi
rus
Khu
jand
lyss
aviru
sE
urop
ean
bat l
yssa
viru
s 2
9018
HO
L
Egg
plan
t mot
tled
dwar
f viru
s
Soy
bean
cys
t nem
atod
e vi
rus
Sna
kehe
ad r
habd
oviru
s
Wuh
an A
nt V
irus
Dro
sop
hila
bu
scki
i rh
abd
ovi
rus
Ara
van
viru
s
Hyd
ra (
Cni
daria
n)
Cer
eals
and
leaf
hopp
ers
Citr
us m
ealy
bug
Bat
s
Tic
ks
Afr
ican
Civ
ets
Aph
id o
r its
par
asito
id w
asp
Mus
cid
hous
e fly
Dip
tera
spe
cies
(C
allip
horid
and
Sac
opha
gid
flies
)
Dip
tera
and
Lep
idop
tera
Per
sim
mon
tree
Fis
h
Bed
bug
Aph
id o
r its
par
asito
id w
asp
Sar
coph
agid
fles
h fly
Sca
le in
sect
Mos
quito
es
Bat
s
Bat
s an
d hu
man
s
Mam
mal
s sp
ecie
s
Bat
s
Flo
wer
ing
plan
t
Mam
mal
s sp
ecie
s
Bat
s an
d hu
man
s
Fal
l arm
y w
orm
mot
h
Lettu
ce o
ther
dic
ot p
lant
s an
d ap
hids
Wes
tern
flow
er th
rip
Rye
gra
ss
Cer
eals
and
pla
ntho
pper
Flo
wer
ing
plan
t and
aph
id
Dro
soph
ilid
frui
t fly
Mai
ze a
nd p
lant
hopp
ers
Mam
mal
s sp
ecie
s
Bat
s
Fle
a
Flo
wer
ing
plan
t
Bat
s
Lettu
ce a
nd a
phid
Leaf
hopp
er
Flo
wer
ing
plan
t
Aph
id o
r its
par
asito
id w
asp
Bird
spe
cies
Mus
cid
hous
e fly
Mam
mal
s sp
ecie
s in
clud
ing
hum
ans
Ric
e an
d le
afho
pper
s
Tar
o
Fis
h
Hum
ans
Fox
(fe
cal s
ampl
e)
Hum
ans
and
bats
Mai
ze a
nd le
afho
pper
Fis
h
Alfa
lfa
Wat
er S
trid
er
Ora
nge
swift
mot
h
Dip
tera
spe
cies
(C
allip
horid
and
Mus
cid
flies
)
Bat
s
Mam
mal
s sp
ecie
s
Dro
soph
ilid
frui
t fly
Bat
sH
uman
s an
d ba
ts
Egg
plan
t
Nem
atod
e
Fis
h
Japa
nese
car
pent
er a
nt
Dro
soph
ilid
frui
t fly
Bat
s
PAP
VS
BA
VS
AP
A AA PBA
AP
A AP
BA
VS
VS
VS
VS
P VS
VS
A PA P PPA P VS
VS
BA
P VS
PAP
PAP
VA VS
P P VS
UH
VS
PPAAA VS
VS
A VS
VS
PA A VS
N
Ass
ocia
ted
host
s A
rthro
pod-
vect
ored
pla
nt
Arth
ropo
ds
Ver
tebr
ate
spec
ific
Fig
2B0
4
Arth
ropo
d-ve
ctor
ed v
erte
brat
e Lo
w s
uppo
rt or
om
itted
N
emat
ode
lyssaviruses
cyto- and nucleo- rhabdoviruses
novi
rhab
dovi
ruse
s
A
Figu
re2
ML
ph
ylo
gen
yo
fth
eR
habd
ovir
idae
(A
)sh
ow
sth
eba
sal
fish
-in
fect
ing
no
virh
abd
ovi
ruse
san
un
assi
gned
gro
up
of
arth
rop
od
asso
ciat
edvi
ruse
sth
ep
lan
tin
fect
ing
cyto
-an
dn
ucl
eo-r
hab
do
viru
ses
asw
ella
sth
eve
rteb
rate
spec
ific
lyss
avir
use
s(B
)sh
ow
sth
ed
imar
hab
do
viru
ssu
per
gro
up
wh
ich
isp
red
om
inan
tly
com
po
sed
of
arth
rop
od
-vec
tore
dve
rteb
rate
viru
ses
alo
ng
wit
hth
ear
thro
po
d-s
pec
ific
sigm
avi
rus
clad
eB
ran
ches
are
colo
red
base
do
nth
e
Bay
esia
nh
ost
asso
ciat
ion
reco
nst
ruct
ion
anal
ysis
Bla
ckre
pre
sen
tsta
xao
mit
ted
fro
mh
ost
-sta
tere
con
stru
ctio
no
ras
soci
atio
ns
wit
hlt
095
sup
po
rtT
he
tree
was
infe
rred
fro
mL
gen
ese
qu
ence
su
sin
gth
eG
blo
cks
alig
nm
ent
Th
e
colu
mn
so
fte
xtar
eth
evi
rus
nam
eth
eh
ost
cate
gory
use
dfo
rre
con
stru
ctio
ns
and
kno
wn
ho
sts
(fro
mle
ftto
righ
t)C
od
esfo
rth
eh
ost
cate
gori
esar
eV
Sve
rteb
rate
-sp
ecifi
cV
Va
rth
rop
od
-vec
tore
dve
rteb
rate
Aa
rth
rop
od
spec
ific
BS
biti
ng-
arth
rop
od
(am
bigu
ou
sst
ate)
Vv
erte
brat
e(a
mbi
guo
us
stat
e)A
Pp
lan
t-sa
p-f
eed
ing-
arth
rop
od
(am
bigu
ou
sst
ate)
UH
un
cert
ain
-ho
st(a
mbi
guo
us
acro
ssal
lsta
tes)
an
dN
nem
atod
eN
ames
inbo
ldan
du
nd
erli
ned
are
vi-
ruse
sd
isco
vere
din
this
stu
dy
Th
etr
eeis
roo
ted
wit
hth
eC
hu
viru
scl
ade
(ro
ot
coll
apse
d)a
sid
enti
fied
asan
ou
tgro
up
in(L
iet
al2
015)
but
we
no
teth
isgi
ves
the
sam
ere
sult
asm
idp
oin
tan
dth
em
ole
cula
rcl
ock
roo
tin
gN
od
esla
-
bell
edw
ith
qu
esti
on
mar
ks(
)re
pre
sen
tn
od
esw
ith
aLR
T(a
pp
roxi
mat
eli
keli
ho
od
rati
ote
st)
stat
isti
cal
sup
po
rtva
lues
less
than
075
Sca
leba
rsh
ow
sn
um
ber
of
amin
o-a
cid
subs
titu
tio
ns
per
site
Bay
esia
nM
CC
tree
use
dto
infe
r
ance
stra
ltra
its
issh
ow
nin
Sup
ple
men
tary
Figu
reS4
(co
nti
nu
ed)
6 | Virus Evolution 2015 Vol 1 No 1
Kern Canyon virus
Mossuril virus
Yata virus
Drosophila obscura sigma virus
Gray Lodge virus
Oak Vale virus
Perinet virus
Morreton virus
Durham virus
Grass carp rhabdovirus
Wuhan Insect virus 7
Sripur virus
Itacaiunas virus
Spodoptera exigua TSA
Siniperca chuatsi virus
Fikirini bat rhabdovirus
Koolpinyah virus
Rochambeau virus
Tench rhabdovirus
Caligus rogercresseyi 11125273 TSA
Garba virus
Nishimuro virus
Keuraliba virus
New Minto virus
Long Island tick rhabdovirus
Klamath virus
Huangpi Tick Virus 3
Wuhan House Fly Virus 1
Dolphin rhabdovirus
Culex tritaeniorhynchus rhabdovirus
Scaptodrosophila deflexa sigma virus
Spring viremia of carp virus
Tibrogargan virus
La Joya virus
Vesicular stomatitis virus Alagoas Indiana 3
Berrimah virus
Taishun Tick Virus
Bovine ephmeral fever virus
Radi virus
Conwentzia psociformis TSA
Wuhan Louse Fly Virus 10
Oita virus
y
Chandipura virus
Vesicular stomatitis virus Indiana
Yongjia Tick Virus 2
Jurona virusYug Bogdanovac virus
Vesicular stomatitis virus New Jersey
Bole Tick Virus 2
Curionopolis virus
Shayang Fly Virus 2
Ceratitis capitata sigma virus
Mount Elgon bat virus
Lepeophtheirus salmonis rhabdovirus 127
Scophthalmus maximus rhabdovirus
Wuhan Fly Virus 2
Drosophila montana sigma virus
Wuhan Louse Fly Virus 5
Hart Park virus
Arboretum virus
Ord River virus
Bas Congo virus
Santa barbara virus
Wuhan Louse Fly Virus 9
Puerto Almendras virus
Lepeophtheirus salmonis rhabdovirus 9
Landjia virus
Sena Madureira virus
Isfahan virus
Sunguru virus
Pike fry rhabdovirus
Iriri virus
Caligus rogercresseyi 11114047 TSA
Wuhan Louse Fly Virus 8
Tacheng Tick Virus 3
Coastal Plains virus
Chaco virus
Drosophila ananassae sigma virus
Bahia Grande virus
Marco virus
Almpiwar virus
Malakal virus
Wuhan Tick Virus 1
Aruac virus
Sawgrass virus
Vesicular stomatitis virus Cocal
Parry Creek virus
Drosophila melanogaster sigma virus HAP23 isolate
Niakha virus
Drosophila melanogaster sigma virus AP30 isolate
Drosophila sturtvanti sigma virus
Joinjakaka virus
Moussa virus
Nkolbisson virus
Sweetwater Branch virus
Kwatta virus
Humulus lupulus TSA
Muscina stabulans sigma virus
Drosophila immigrans sigma virus
Eel Virus European X
Vesicular stomatitis virus New Jersey Hazelhurst
Barur virus
Connecticut virus
Starry flounder rhabdovirus
Harlingen virus
Inhangapi virus
North Creek Virus
Fukuoka virus
Kamese virus
Pararge aegeria rhabdovirus
Kimberley virus
Malpais Spring virus
Mosqueiro virus
Adelaide River virus
Flanders virus
Tupaia virus
Bivens Arm virus
Wuhan Louse Fly Virus 11
Drosophila tristis sigma virus
Muir Springs virus
Manitoba virus
Carajas oncolytic virus
Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus
Beaumont virus
Maraba virus
Wongabel virus
Ngaingan virus
Xiburema virus
Bats
Mosquitoes birds and mammals including humans
Mosquitoes
Drosophilid fruit fly
Mosquitoes
Mosquitoes and swine
Mosquitoes and sandflies
Sandflies
Birds
Grass carp
Aphid or its parasitoid wasp
Sandflies
Midges
Beet army worm moth
Mandarin fish
Bats
Cattle
Mosquitoes
Tench
Sea louse
Birds
Wild boar
Rodents
Ticks
Ticks
Voles
Ticks
Muscid house fly
Dolphins and porpoise
Mosquitoes
Drosophilid fruit fly
Common carp
Midges and bovids
Mosquitoes and rodents
Mammals including humans
Cattle
Ticks
Midges mosquitoes and ruminants
Sandflies
Lacewing
Louse fly
Bats
Sandflies and mammals including humans
Mammals including humans sandflies and mosquitoes
Ticks
MosquitoesSandflies
Mammals including humans biting and non-biting diptera
Ticks
Midges and mammals
Diptera species (Muscid house fly and Calliphorid laterine fly)
Tephritid fruit fly
Bats
Sea louse
Cultured turbot
Muscid house fly
Drosophilid fruit fly
Louse fly
Mosquitoes and birds
Mosquitoes
Mosquitoes
Humans
Psychodidae drain fly
Louse fly
Mosquitoes
Sea louse
Birds
Lizards
Mosquitoes ticks sandflies mammals including humans
Domestic chickens
Northern pike
Sandflies
Sea louse
Louse fly
Ticks
Bovids
Lizards
Drosophilid fruit fly
Mosquitoes
Lizards
Lizards
Mosquitoes
Ticks
Mosquitoes and birds
Ticks
Mites mosquitoes and mammals
Mosquitoes
Drosophilid fruit fly
Sandflies
Drosophilid fruit fly
Drosophilid fruit fly
Mosquitoes and cattle
Mosquitoes
Mosquitoes and humans
Midges and cattle
Mosquitoes ticks and mammals
Hops
False stable fly
Drosophilid fruit fly
European eel
Mammals including humans biting and non-biting diptera
Ticks mosquitoes fleas and mammals
Ticks and rabbits
Starry flounder
Mosquitoes
Sandflies and rodents
Mosquitoes
Midges mosquitoes and cattle
Mosquitoes and humans
Speckled wood butterfly
Midges mosquitoes and cattle
Mosquitoes
Mosquitoes
Cattle
Mosquitoes and birds
Tree shrews
Midges and cattle
Louse fly
Drosophilid fruit fly
Mosquitoes
Mosquitoes
Sandflies
Drosophilid fruit flyDrosophilid fruit fly
Mosquitoes
Sandflies
Midges and birds
Midges cattle and macropods
Mosquitoes
V
VV
BA
A
BA
VV
BA
BA
V
V
A
BA
BA
A
VS
V
V
BA
V
BA
V
V
V
BA
BA
V
BA
A
V
BA
A
VS
VV
VV
V
BA
BA
VV
BA
BA
V
VV
VV
BA
BABA
VV
BA
VV
A
A
V
BA
V
A
A
BA
VV
BA
BA
V
A
BA
BA
BA
V
V
VV
V
VS
BA
BA
BA
BA
V
V
A
BA
V
V
BA
BA
VV
BA
VV
BA
A
BA
A
A
VV
BA
VV
VV
VV
P
A
A
V
VV
VV
VV
V
BA
VV
BA
VV
VV
A
VV
BA
BA
V
VV
V
VV
BA
A
BA
BA
BA
AA
BA
BA
VV
VV
BA
04
sigma
viruses
dimarhabdovirus supergroup
BFig 2A
Figure 2 Continued
B Longdon et al | 7
replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states
We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled
35 Ancestral host associations and host-switches
Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)
Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups
Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade
Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was
contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips
There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species
Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)
We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in
8 | Virus Evolution 2015 Vol 1 No 1
the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice
4 Discussion
Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences
In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species
We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous
observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)
There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should
Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)
Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038
Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)
B Longdon et al | 9
reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)
Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis
Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses
Data availability
Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824
Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584
L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436
Funding
BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO
Supplementary data
Supplementary data is available at Virus Evolution online
Acknowledgements
Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments
Conflict of interest None declared
ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-
tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206
Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72
Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37
Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52
Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75
Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8
Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091
Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21
10 | Virus Evolution 2015 Vol 1 No 1
Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76
Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58
Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3
Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146
Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22
Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press
Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101
Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88
mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73
Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837
Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8
Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196
Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90
Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21
Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92
Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709
Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71
Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68
Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9
Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18
Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80
Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191
LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4
Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20
Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932
Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378
Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9
Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69
Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98
mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt
mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44
mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press
mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260
mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50
mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50
mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395
mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728
Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412
Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7
Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517
Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11
Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21
Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary
B Longdon et al | 11
Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23
Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt
Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt
Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91
Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59
Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9
Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77
van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256
Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press
mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25
mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664
Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63
Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210
Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32
12 | Virus Evolution 2015 Vol 1 No 1
Hyd
ra m
agn
ipap
illat
a T
SA
Nor
ther
n ce
real
mos
aic
viru
sP
lan
oco
ccu
s ci
tri T
SA
Bok
eloh
bat
lyss
aviru
s
Tac
heng
Tic
k V
irus
7
Ikom
a ly
ssav
irus
Wuh
an In
sect
viru
s 6
Mu
sca
do
mes
tica
TS
A
Wuh
an F
ly V
irus
3
Shu
anga
o In
sect
Viru
s 6
Per
sim
mon
viru
s A
Hira
me
rhab
dovi
rus
Shu
anga
o B
edbu
g V
irus
2
Wuh
an In
sect
viru
s 4
Jing
shan
Fly
Viru
s 2
Ker
ria
lacc
a T
SA
Wuh
an M
osqu
ito V
irus
9
Shi
mon
i bat
viru
s
Aus
tral
ian
bat l
yssa
viru
s b
Mok
ola
viru
s is
olat
e 86
100C
AM
Irku
t viru
s
Lo
tus
corn
icu
latu
s T
SA
Lago
s ba
t viru
s i8
619N
GA
Aus
tral
ian
bat l
yssa
viru
s a
Spo
dopt
era
frug
iper
da r
habd
oviru
s
Lettu
ce n
ecro
tic y
ello
ws
viru
s
Fra
nkl
inie
lla o
ccid
enta
lis T
SA
Lo
lium
per
enn
e T
SA
Mai
ze Ir
ania
n m
osai
c vi
rus
Son
chus
yel
low
net
Dro
sop
hila
stu
rtva
nti
rh
abd
ovi
rus
Mai
ze m
osai
c vi
rus
Mok
ola
viru
s 86
101R
CA
Eur
opea
n ba
t lys
savi
rus
RV
9 1
Oro
psy
lla s
ilan
tiew
i TS
A
Orc
hid
fleck
viru
s
Wes
t Cau
casi
an b
at v
irus
Lettu
ce y
ello
w m
ottle
viru
s
Taa
stru
p vi
rus
Ag
ave
teq
uila
na
TS
A
Wuh
an In
sect
viru
s 5
Far
min
gton
viru
s
Wuh
an H
ouse
Fly
Viru
s 2
Rab
ies
viru
s
Ric
e ye
llow
stu
nt v
irus
Tar
o ve
in c
hlor
osis
viru
s
Vira
l hem
orrh
agic
sep
ticem
ia v
irus
Lyss
aviru
s O
zern
oe
Fox
feca
l rha
bdov
irus
Duv
enha
ge v
irus
8613
2SA
Mai
ze fi
ne s
trea
k vi
rus
Infe
ctio
us h
aem
atop
oiet
ic n
ecro
sis
viru
s
Med
icag
o sa
tiva
TS
A
San
xia
Wat
er S
trid
er V
irus
5
Tri
od
ia s
ylvi
na
TS
A
Sha
yang
Fly
Viru
s 3
Lago
s ba
t viru
s K
E13
1
Eur
opea
n ba
t lys
savi
rus
1 89
18F
RA
Dro
sop
hila
su
bo
bsc
ura
rh
abd
ovi
rus
Khu
jand
lyss
aviru
sE
urop
ean
bat l
yssa
viru
s 2
9018
HO
L
Egg
plan
t mot
tled
dwar
f viru
s
Soy
bean
cys
t nem
atod
e vi
rus
Sna
kehe
ad r
habd
oviru
s
Wuh
an A
nt V
irus
Dro
sop
hila
bu
scki
i rh
abd
ovi
rus
Ara
van
viru
s
Hyd
ra (
Cni
daria
n)
Cer
eals
and
leaf
hopp
ers
Citr
us m
ealy
bug
Bat
s
Tic
ks
Afr
ican
Civ
ets
Aph
id o
r its
par
asito
id w
asp
Mus
cid
hous
e fly
Dip
tera
spe
cies
(C
allip
horid
and
Sac
opha
gid
flies
)
Dip
tera
and
Lep
idop
tera
Per
sim
mon
tree
Fis
h
Bed
bug
Aph
id o
r its
par
asito
id w
asp
Sar
coph
agid
fles
h fly
Sca
le in
sect
Mos
quito
es
Bat
s
Bat
s an
d hu
man
s
Mam
mal
s sp
ecie
s
Bat
s
Flo
wer
ing
plan
t
Mam
mal
s sp
ecie
s
Bat
s an
d hu
man
s
Fal
l arm
y w
orm
mot
h
Lettu
ce o
ther
dic
ot p
lant
s an
d ap
hids
Wes
tern
flow
er th
rip
Rye
gra
ss
Cer
eals
and
pla
ntho
pper
Flo
wer
ing
plan
t and
aph
id
Dro
soph
ilid
frui
t fly
Mai
ze a
nd p
lant
hopp
ers
Mam
mal
s sp
ecie
s
Bat
s
Fle
a
Flo
wer
ing
plan
t
Bat
s
Lettu
ce a
nd a
phid
Leaf
hopp
er
Flo
wer
ing
plan
t
Aph
id o
r its
par
asito
id w
asp
Bird
spe
cies
Mus
cid
hous
e fly
Mam
mal
s sp
ecie
s in
clud
ing
hum
ans
Ric
e an
d le
afho
pper
s
Tar
o
Fis
h
Hum
ans
Fox
(fe
cal s
ampl
e)
Hum
ans
and
bats
Mai
ze a
nd le
afho
pper
Fis
h
Alfa
lfa
Wat
er S
trid
er
Ora
nge
swift
mot
h
Dip
tera
spe
cies
(C
allip
horid
and
Mus
cid
flies
)
Bat
s
Mam
mal
s sp
ecie
s
Dro
soph
ilid
frui
t fly
Bat
sH
uman
s an
d ba
ts
Egg
plan
t
Nem
atod
e
Fis
h
Japa
nese
car
pent
er a
nt
Dro
soph
ilid
frui
t fly
Bat
s
PAP
VS
BA
VS
AP
A AA PBA
AP
A AP
BA
VS
VS
VS
VS
P VS
VS
A PA P PPA P VS
VS
BA
P VS
PAP
PAP
VA VS
P P VS
UH
VS
PPAAA VS
VS
A VS
VS
PA A VS
N
Ass
ocia
ted
host
s A
rthro
pod-
vect
ored
pla
nt
Arth
ropo
ds
Ver
tebr
ate
spec
ific
Fig
2B0
4
Arth
ropo
d-ve
ctor
ed v
erte
brat
e Lo
w s
uppo
rt or
om
itted
N
emat
ode
lyssaviruses
cyto- and nucleo- rhabdoviruses
novi
rhab
dovi
ruse
s
A
Figu
re2
ML
ph
ylo
gen
yo
fth
eR
habd
ovir
idae
(A
)sh
ow
sth
eba
sal
fish
-in
fect
ing
no
virh
abd
ovi
ruse
san
un
assi
gned
gro
up
of
arth
rop
od
asso
ciat
edvi
ruse
sth
ep
lan
tin
fect
ing
cyto
-an
dn
ucl
eo-r
hab
do
viru
ses
asw
ella
sth
eve
rteb
rate
spec
ific
lyss
avir
use
s(B
)sh
ow
sth
ed
imar
hab
do
viru
ssu
per
gro
up
wh
ich
isp
red
om
inan
tly
com
po
sed
of
arth
rop
od
-vec
tore
dve
rteb
rate
viru
ses
alo
ng
wit
hth
ear
thro
po
d-s
pec
ific
sigm
avi
rus
clad
eB
ran
ches
are
colo
red
base
do
nth
e
Bay
esia
nh
ost
asso
ciat
ion
reco
nst
ruct
ion
anal
ysis
Bla
ckre
pre
sen
tsta
xao
mit
ted
fro
mh
ost
-sta
tere
con
stru
ctio
no
ras
soci
atio
ns
wit
hlt
095
sup
po
rtT
he
tree
was
infe
rred
fro
mL
gen
ese
qu
ence
su
sin
gth
eG
blo
cks
alig
nm
ent
Th
e
colu
mn
so
fte
xtar
eth
evi
rus
nam
eth
eh
ost
cate
gory
use
dfo
rre
con
stru
ctio
ns
and
kno
wn
ho
sts
(fro
mle
ftto
righ
t)C
od
esfo
rth
eh
ost
cate
gori
esar
eV
Sve
rteb
rate
-sp
ecifi
cV
Va
rth
rop
od
-vec
tore
dve
rteb
rate
Aa
rth
rop
od
spec
ific
BS
biti
ng-
arth
rop
od
(am
bigu
ou
sst
ate)
Vv
erte
brat
e(a
mbi
guo
us
stat
e)A
Pp
lan
t-sa
p-f
eed
ing-
arth
rop
od
(am
bigu
ou
sst
ate)
UH
un
cert
ain
-ho
st(a
mbi
guo
us
acro
ssal
lsta
tes)
an
dN
nem
atod
eN
ames
inbo
ldan
du
nd
erli
ned
are
vi-
ruse
sd
isco
vere
din
this
stu
dy
Th
etr
eeis
roo
ted
wit
hth
eC
hu
viru
scl
ade
(ro
ot
coll
apse
d)a
sid
enti
fied
asan
ou
tgro
up
in(L
iet
al2
015)
but
we
no
teth
isgi
ves
the
sam
ere
sult
asm
idp
oin
tan
dth
em
ole
cula
rcl
ock
roo
tin
gN
od
esla
-
bell
edw
ith
qu
esti
on
mar
ks(
)re
pre
sen
tn
od
esw
ith
aLR
T(a
pp
roxi
mat
eli
keli
ho
od
rati
ote
st)
stat
isti
cal
sup
po
rtva
lues
less
than
075
Sca
leba
rsh
ow
sn
um
ber
of
amin
o-a
cid
subs
titu
tio
ns
per
site
Bay
esia
nM
CC
tree
use
dto
infe
r
ance
stra
ltra
its
issh
ow
nin
Sup
ple
men
tary
Figu
reS4
(co
nti
nu
ed)
6 | Virus Evolution 2015 Vol 1 No 1
Kern Canyon virus
Mossuril virus
Yata virus
Drosophila obscura sigma virus
Gray Lodge virus
Oak Vale virus
Perinet virus
Morreton virus
Durham virus
Grass carp rhabdovirus
Wuhan Insect virus 7
Sripur virus
Itacaiunas virus
Spodoptera exigua TSA
Siniperca chuatsi virus
Fikirini bat rhabdovirus
Koolpinyah virus
Rochambeau virus
Tench rhabdovirus
Caligus rogercresseyi 11125273 TSA
Garba virus
Nishimuro virus
Keuraliba virus
New Minto virus
Long Island tick rhabdovirus
Klamath virus
Huangpi Tick Virus 3
Wuhan House Fly Virus 1
Dolphin rhabdovirus
Culex tritaeniorhynchus rhabdovirus
Scaptodrosophila deflexa sigma virus
Spring viremia of carp virus
Tibrogargan virus
La Joya virus
Vesicular stomatitis virus Alagoas Indiana 3
Berrimah virus
Taishun Tick Virus
Bovine ephmeral fever virus
Radi virus
Conwentzia psociformis TSA
Wuhan Louse Fly Virus 10
Oita virus
y
Chandipura virus
Vesicular stomatitis virus Indiana
Yongjia Tick Virus 2
Jurona virusYug Bogdanovac virus
Vesicular stomatitis virus New Jersey
Bole Tick Virus 2
Curionopolis virus
Shayang Fly Virus 2
Ceratitis capitata sigma virus
Mount Elgon bat virus
Lepeophtheirus salmonis rhabdovirus 127
Scophthalmus maximus rhabdovirus
Wuhan Fly Virus 2
Drosophila montana sigma virus
Wuhan Louse Fly Virus 5
Hart Park virus
Arboretum virus
Ord River virus
Bas Congo virus
Santa barbara virus
Wuhan Louse Fly Virus 9
Puerto Almendras virus
Lepeophtheirus salmonis rhabdovirus 9
Landjia virus
Sena Madureira virus
Isfahan virus
Sunguru virus
Pike fry rhabdovirus
Iriri virus
Caligus rogercresseyi 11114047 TSA
Wuhan Louse Fly Virus 8
Tacheng Tick Virus 3
Coastal Plains virus
Chaco virus
Drosophila ananassae sigma virus
Bahia Grande virus
Marco virus
Almpiwar virus
Malakal virus
Wuhan Tick Virus 1
Aruac virus
Sawgrass virus
Vesicular stomatitis virus Cocal
Parry Creek virus
Drosophila melanogaster sigma virus HAP23 isolate
Niakha virus
Drosophila melanogaster sigma virus AP30 isolate
Drosophila sturtvanti sigma virus
Joinjakaka virus
Moussa virus
Nkolbisson virus
Sweetwater Branch virus
Kwatta virus
Humulus lupulus TSA
Muscina stabulans sigma virus
Drosophila immigrans sigma virus
Eel Virus European X
Vesicular stomatitis virus New Jersey Hazelhurst
Barur virus
Connecticut virus
Starry flounder rhabdovirus
Harlingen virus
Inhangapi virus
North Creek Virus
Fukuoka virus
Kamese virus
Pararge aegeria rhabdovirus
Kimberley virus
Malpais Spring virus
Mosqueiro virus
Adelaide River virus
Flanders virus
Tupaia virus
Bivens Arm virus
Wuhan Louse Fly Virus 11
Drosophila tristis sigma virus
Muir Springs virus
Manitoba virus
Carajas oncolytic virus
Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus
Beaumont virus
Maraba virus
Wongabel virus
Ngaingan virus
Xiburema virus
Bats
Mosquitoes birds and mammals including humans
Mosquitoes
Drosophilid fruit fly
Mosquitoes
Mosquitoes and swine
Mosquitoes and sandflies
Sandflies
Birds
Grass carp
Aphid or its parasitoid wasp
Sandflies
Midges
Beet army worm moth
Mandarin fish
Bats
Cattle
Mosquitoes
Tench
Sea louse
Birds
Wild boar
Rodents
Ticks
Ticks
Voles
Ticks
Muscid house fly
Dolphins and porpoise
Mosquitoes
Drosophilid fruit fly
Common carp
Midges and bovids
Mosquitoes and rodents
Mammals including humans
Cattle
Ticks
Midges mosquitoes and ruminants
Sandflies
Lacewing
Louse fly
Bats
Sandflies and mammals including humans
Mammals including humans sandflies and mosquitoes
Ticks
MosquitoesSandflies
Mammals including humans biting and non-biting diptera
Ticks
Midges and mammals
Diptera species (Muscid house fly and Calliphorid laterine fly)
Tephritid fruit fly
Bats
Sea louse
Cultured turbot
Muscid house fly
Drosophilid fruit fly
Louse fly
Mosquitoes and birds
Mosquitoes
Mosquitoes
Humans
Psychodidae drain fly
Louse fly
Mosquitoes
Sea louse
Birds
Lizards
Mosquitoes ticks sandflies mammals including humans
Domestic chickens
Northern pike
Sandflies
Sea louse
Louse fly
Ticks
Bovids
Lizards
Drosophilid fruit fly
Mosquitoes
Lizards
Lizards
Mosquitoes
Ticks
Mosquitoes and birds
Ticks
Mites mosquitoes and mammals
Mosquitoes
Drosophilid fruit fly
Sandflies
Drosophilid fruit fly
Drosophilid fruit fly
Mosquitoes and cattle
Mosquitoes
Mosquitoes and humans
Midges and cattle
Mosquitoes ticks and mammals
Hops
False stable fly
Drosophilid fruit fly
European eel
Mammals including humans biting and non-biting diptera
Ticks mosquitoes fleas and mammals
Ticks and rabbits
Starry flounder
Mosquitoes
Sandflies and rodents
Mosquitoes
Midges mosquitoes and cattle
Mosquitoes and humans
Speckled wood butterfly
Midges mosquitoes and cattle
Mosquitoes
Mosquitoes
Cattle
Mosquitoes and birds
Tree shrews
Midges and cattle
Louse fly
Drosophilid fruit fly
Mosquitoes
Mosquitoes
Sandflies
Drosophilid fruit flyDrosophilid fruit fly
Mosquitoes
Sandflies
Midges and birds
Midges cattle and macropods
Mosquitoes
V
VV
BA
A
BA
VV
BA
BA
V
V
A
BA
BA
A
VS
V
V
BA
V
BA
V
V
V
BA
BA
V
BA
A
V
BA
A
VS
VV
VV
V
BA
BA
VV
BA
BA
V
VV
VV
BA
BABA
VV
BA
VV
A
A
V
BA
V
A
A
BA
VV
BA
BA
V
A
BA
BA
BA
V
V
VV
V
VS
BA
BA
BA
BA
V
V
A
BA
V
V
BA
BA
VV
BA
VV
BA
A
BA
A
A
VV
BA
VV
VV
VV
P
A
A
V
VV
VV
VV
V
BA
VV
BA
VV
VV
A
VV
BA
BA
V
VV
V
VV
BA
A
BA
BA
BA
AA
BA
BA
VV
VV
BA
04
sigma
viruses
dimarhabdovirus supergroup
BFig 2A
Figure 2 Continued
B Longdon et al | 7
replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states
We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled
35 Ancestral host associations and host-switches
Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)
Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups
Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade
Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was
contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips
There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species
Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)
We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in
8 | Virus Evolution 2015 Vol 1 No 1
the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice
4 Discussion
Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences
In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species
We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous
observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)
There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should
Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)
Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038
Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)
B Longdon et al | 9
reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)
Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis
Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses
Data availability
Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824
Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584
L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436
Funding
BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO
Supplementary data
Supplementary data is available at Virus Evolution online
Acknowledgements
Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments
Conflict of interest None declared
ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-
tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206
Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72
Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37
Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52
Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75
Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8
Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091
Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21
10 | Virus Evolution 2015 Vol 1 No 1
Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76
Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58
Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3
Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146
Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22
Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press
Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101
Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88
mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73
Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837
Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8
Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196
Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90
Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21
Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92
Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709
Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71
Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68
Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9
Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18
Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80
Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191
LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4
Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20
Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932
Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378
Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9
Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69
Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98
mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt
mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44
mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press
mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260
mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50
mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50
mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395
mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728
Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412
Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7
Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517
Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11
Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21
Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary
B Longdon et al | 11
Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23
Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt
Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt
Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91
Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59
Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9
Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77
van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256
Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press
mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25
mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664
Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63
Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210
Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32
12 | Virus Evolution 2015 Vol 1 No 1
Kern Canyon virus
Mossuril virus
Yata virus
Drosophila obscura sigma virus
Gray Lodge virus
Oak Vale virus
Perinet virus
Morreton virus
Durham virus
Grass carp rhabdovirus
Wuhan Insect virus 7
Sripur virus
Itacaiunas virus
Spodoptera exigua TSA
Siniperca chuatsi virus
Fikirini bat rhabdovirus
Koolpinyah virus
Rochambeau virus
Tench rhabdovirus
Caligus rogercresseyi 11125273 TSA
Garba virus
Nishimuro virus
Keuraliba virus
New Minto virus
Long Island tick rhabdovirus
Klamath virus
Huangpi Tick Virus 3
Wuhan House Fly Virus 1
Dolphin rhabdovirus
Culex tritaeniorhynchus rhabdovirus
Scaptodrosophila deflexa sigma virus
Spring viremia of carp virus
Tibrogargan virus
La Joya virus
Vesicular stomatitis virus Alagoas Indiana 3
Berrimah virus
Taishun Tick Virus
Bovine ephmeral fever virus
Radi virus
Conwentzia psociformis TSA
Wuhan Louse Fly Virus 10
Oita virus
y
Chandipura virus
Vesicular stomatitis virus Indiana
Yongjia Tick Virus 2
Jurona virusYug Bogdanovac virus
Vesicular stomatitis virus New Jersey
Bole Tick Virus 2
Curionopolis virus
Shayang Fly Virus 2
Ceratitis capitata sigma virus
Mount Elgon bat virus
Lepeophtheirus salmonis rhabdovirus 127
Scophthalmus maximus rhabdovirus
Wuhan Fly Virus 2
Drosophila montana sigma virus
Wuhan Louse Fly Virus 5
Hart Park virus
Arboretum virus
Ord River virus
Bas Congo virus
Santa barbara virus
Wuhan Louse Fly Virus 9
Puerto Almendras virus
Lepeophtheirus salmonis rhabdovirus 9
Landjia virus
Sena Madureira virus
Isfahan virus
Sunguru virus
Pike fry rhabdovirus
Iriri virus
Caligus rogercresseyi 11114047 TSA
Wuhan Louse Fly Virus 8
Tacheng Tick Virus 3
Coastal Plains virus
Chaco virus
Drosophila ananassae sigma virus
Bahia Grande virus
Marco virus
Almpiwar virus
Malakal virus
Wuhan Tick Virus 1
Aruac virus
Sawgrass virus
Vesicular stomatitis virus Cocal
Parry Creek virus
Drosophila melanogaster sigma virus HAP23 isolate
Niakha virus
Drosophila melanogaster sigma virus AP30 isolate
Drosophila sturtvanti sigma virus
Joinjakaka virus
Moussa virus
Nkolbisson virus
Sweetwater Branch virus
Kwatta virus
Humulus lupulus TSA
Muscina stabulans sigma virus
Drosophila immigrans sigma virus
Eel Virus European X
Vesicular stomatitis virus New Jersey Hazelhurst
Barur virus
Connecticut virus
Starry flounder rhabdovirus
Harlingen virus
Inhangapi virus
North Creek Virus
Fukuoka virus
Kamese virus
Pararge aegeria rhabdovirus
Kimberley virus
Malpais Spring virus
Mosqueiro virus
Adelaide River virus
Flanders virus
Tupaia virus
Bivens Arm virus
Wuhan Louse Fly Virus 11
Drosophila tristis sigma virus
Muir Springs virus
Manitoba virus
Carajas oncolytic virus
Drosophila affinis sigma virusDrosophila affinis or athabasca sigma virus
Beaumont virus
Maraba virus
Wongabel virus
Ngaingan virus
Xiburema virus
Bats
Mosquitoes birds and mammals including humans
Mosquitoes
Drosophilid fruit fly
Mosquitoes
Mosquitoes and swine
Mosquitoes and sandflies
Sandflies
Birds
Grass carp
Aphid or its parasitoid wasp
Sandflies
Midges
Beet army worm moth
Mandarin fish
Bats
Cattle
Mosquitoes
Tench
Sea louse
Birds
Wild boar
Rodents
Ticks
Ticks
Voles
Ticks
Muscid house fly
Dolphins and porpoise
Mosquitoes
Drosophilid fruit fly
Common carp
Midges and bovids
Mosquitoes and rodents
Mammals including humans
Cattle
Ticks
Midges mosquitoes and ruminants
Sandflies
Lacewing
Louse fly
Bats
Sandflies and mammals including humans
Mammals including humans sandflies and mosquitoes
Ticks
MosquitoesSandflies
Mammals including humans biting and non-biting diptera
Ticks
Midges and mammals
Diptera species (Muscid house fly and Calliphorid laterine fly)
Tephritid fruit fly
Bats
Sea louse
Cultured turbot
Muscid house fly
Drosophilid fruit fly
Louse fly
Mosquitoes and birds
Mosquitoes
Mosquitoes
Humans
Psychodidae drain fly
Louse fly
Mosquitoes
Sea louse
Birds
Lizards
Mosquitoes ticks sandflies mammals including humans
Domestic chickens
Northern pike
Sandflies
Sea louse
Louse fly
Ticks
Bovids
Lizards
Drosophilid fruit fly
Mosquitoes
Lizards
Lizards
Mosquitoes
Ticks
Mosquitoes and birds
Ticks
Mites mosquitoes and mammals
Mosquitoes
Drosophilid fruit fly
Sandflies
Drosophilid fruit fly
Drosophilid fruit fly
Mosquitoes and cattle
Mosquitoes
Mosquitoes and humans
Midges and cattle
Mosquitoes ticks and mammals
Hops
False stable fly
Drosophilid fruit fly
European eel
Mammals including humans biting and non-biting diptera
Ticks mosquitoes fleas and mammals
Ticks and rabbits
Starry flounder
Mosquitoes
Sandflies and rodents
Mosquitoes
Midges mosquitoes and cattle
Mosquitoes and humans
Speckled wood butterfly
Midges mosquitoes and cattle
Mosquitoes
Mosquitoes
Cattle
Mosquitoes and birds
Tree shrews
Midges and cattle
Louse fly
Drosophilid fruit fly
Mosquitoes
Mosquitoes
Sandflies
Drosophilid fruit flyDrosophilid fruit fly
Mosquitoes
Sandflies
Midges and birds
Midges cattle and macropods
Mosquitoes
V
VV
BA
A
BA
VV
BA
BA
V
V
A
BA
BA
A
VS
V
V
BA
V
BA
V
V
V
BA
BA
V
BA
A
V
BA
A
VS
VV
VV
V
BA
BA
VV
BA
BA
V
VV
VV
BA
BABA
VV
BA
VV
A
A
V
BA
V
A
A
BA
VV
BA
BA
V
A
BA
BA
BA
V
V
VV
V
VS
BA
BA
BA
BA
V
V
A
BA
V
V
BA
BA
VV
BA
VV
BA
A
BA
A
A
VV
BA
VV
VV
VV
P
A
A
V
VV
VV
VV
V
BA
VV
BA
VV
VV
A
VV
BA
BA
V
VV
V
VV
BA
A
BA
BA
BA
AA
BA
BA
VV
VV
BA
04
sigma
viruses
dimarhabdovirus supergroup
BFig 2A
Figure 2 Continued
B Longdon et al | 7
replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states
We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled
35 Ancestral host associations and host-switches
Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)
Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups
Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade
Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was
contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips
There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species
Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)
We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in
8 | Virus Evolution 2015 Vol 1 No 1
the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice
4 Discussion
Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences
In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species
We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous
observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)
There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should
Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)
Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038
Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)
B Longdon et al | 9
reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)
Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis
Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses
Data availability
Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824
Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584
L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436
Funding
BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO
Supplementary data
Supplementary data is available at Virus Evolution online
Acknowledgements
Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments
Conflict of interest None declared
ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-
tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206
Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72
Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37
Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52
Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75
Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8
Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091
Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21
10 | Virus Evolution 2015 Vol 1 No 1
Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76
Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58
Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3
Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146
Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22
Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press
Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101
Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88
mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73
Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837
Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8
Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196
Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90
Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21
Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92
Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709
Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71
Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68
Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9
Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18
Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80
Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191
LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4
Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20
Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932
Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378
Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9
Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69
Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98
mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt
mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44
mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press
mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260
mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50
mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50
mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395
mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728
Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412
Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7
Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517
Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11
Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21
Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary
B Longdon et al | 11
Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23
Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt
Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt
Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91
Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59
Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9
Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77
van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256
Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press
mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25
mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664
Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63
Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210
Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32
12 | Virus Evolution 2015 Vol 1 No 1
replacement from the ninety-nine viruses in our data withknown host associations) These analyses correctly returnedthe true host association for ninety-fiveninety-nine viruseswith strong posterior support (gt09) and one with weak support(mean supportfrac14 099 rangefrac14 073ndash100) All three cases inwhich the reconstruction returned a false host association in-volved anomalous sequences (eg a change in host associationon a terminal branch) Note there would be no failure in caseswhere there was no phylogenetic clustering of host associa-tions In such cases the method wouldmdashcorrectlymdashreport highlevels of uncertainty in all reconstructed states
We checked for evidence of sampling bias in our data by test-ing whether sample size predicts rate to or from a host association(Lemey et al 2014) We found there is a high level of uncertaintyaround all rate estimates but that there is no pattern of increasedrate to or from states that are more frequently sampled
35 Ancestral host associations and host-switches
Viral sequences from arthropods vertebrates and plants formdistinct clusters in the phylogeny (Fig 2) To quantify this geneticstructure we calculated the Fst statistic between the sequences ofviruses from different groups of hosts There is strong evidence ofgenetic differentiation between the sequences from arthropodsplants and vertebrates (Plt 0001 Supplementary Fig S1)Similarly viruses isolated from the same host group tend to clus-ter together on the tree (GSI analysis permutation tests arthro-pod hosts GSIfrac14 043 Plt 0001 plant hosts GSIfrac14 046 Plt 0001vertebrate hosts GSIfrac14 046 Plt 0001)
Our Bayesian analysis allowed us to infer the ancestral hostassociation of 176 of 188 of the internal nodes on the phyloge-netic tree (support gt 095) however we could not infer the hostassociation of the root of the phylogeny or some of the morebasal nodes A striking pattern that emerged is that switches be-tween major groups of hosts have occurred rarely during theevolution of the rhabdoviruses (Fig 2) There are a few rare tran-sitions on terminal branches (Santa Barbara virus and the vi-rus identified from the plant Humulus lupulus) but these couldrepresent errors in the host assignment (eg cross-speciescontamination) as well as recent host shifts Our analysis al-lows us to estimate the number of times the viruses haveswitched between major host groups across the phylogenywhile accounting for uncertainty about ancestral states the treetopology and root We found strong evidence of only two typesof host-switch across our phylogeny two transitions from beingan arthropod-vectored vertebrate virus to being arthropod spe-cific (modal estimatefrac14 2 medianfrac14 31 CIsfrac14 19ndash54) and threetransitions from being an arthropod-vectored vertebrate virusto a vertebrate-specific virus (modal estimatefrac14 3 medianfrac14 31CIsfrac14 29ndash52) We could not determine the direction of the hostshifts into the other host groups
Vertebrate-specific viruses have arisen once in the lyssavi-ruses clade (Dietzgen and Kuzmin 2012) as well as at least oncein fish dimarhabdoviruses (in one of the fish-infecting clades itis unclear if it is vertebrate-specific or vector-borne from our re-constructions) There has also likely been a single transition tobeing arthropod-vectored vertebrate viruses in the dimarhabo-dovirus clade
Insect-vectored plant viruses in our dataset have arisen oncein the cyto- and nucleo- rhabdoviruses although the ancestralstate of these viruses is uncertain A single virus identified fromthe hop plant Hlupulus appears to fall within the dimarhabdovi-rus clade However this may be because the plant was
contaminated with insect matter as the same RNA-seq datasetcontains COI sequences with high similarity to thrips
There are two large clades of arthropod-specific viruses Thefirst is a sister group to the large clade of plant viruses Thisnovel group of predominantly insect-associated viruses are as-sociated with a broad range of insects including flies butter-flies moths ants thrips bedbugs fleas mosquitoes waterstriders and leafhoppers The mode of transmission and biol-ogy of these viruses is yet to be examined The second clade ofinsect-associated viruses is the sigma virus clade (LongdonObbard and Jiggins 2010 Longdon et al 2011bc LongdonWilfert and Jiggins 2012) These are derived from vector-bornedimarhabdoviruses that have lost their vertebrate host and be-come vertically transmitted viruses of insects (Longdon andJiggins 2012) They are common in Drosophilidae and our re-sults suggest that they may be widespread throughout theDiptera with occurrences in the Tephritid fruit fly Ccapitata thestable fly Muscina stabulans several divergent viruses in thehousefly Mdomestica and louse flies removed from the skin ofbats For the first time we have found sigma-like viruses outsideof the Diptera with two Lepidoptera associated viruses and a vi-rus from an aphidparasitoid wasp All of the sigma virusescharacterized to date have been vertically transmitted (Longdonand Jiggins 2012) but some of the recently described virusesmay be transmitted horizontallymdashit has been speculated thatthe viruses from louse flies may infect bats (Aznar-Lopez et al2013) and Shayang Fly Virus 2 has been reported in two fly spe-cies (Li et al 2015) (although contamination could also explainthis result) Drosophila sigma virus genomes are characterizedby an additional X gene between the P and M genes (Longdonet al 2010) Interestingly the two louse fly viruses with completegenomes Wuhan insect virus 7 from an aphidparasitoid andPaegeria rhabdovirus do not have a putative X gene The firstsigma virus was discovered in Drosophila melanogaster in 1937(LrsquoHeritier and Teissier 1937) In the last few years related sigmaviruses have been found in other Drosophila species and aMuscid fly (Longdon Obbard and Jiggins 2010 Longdon et al2011ab Longdon and Jiggins 2012) and here we have foundsigma-like viruses in a diverse array of Diptera species as wellas other insect orders Overall our results suggest sigma-like vi-ruses may be associated with a wide array of insect species
Within the arthropod-associated viruses (the most sampledhost group) it is common to find closely related viruses inclosely related hosts (Fig 1) Viruses isolated from the same ar-thropod orders tended to cluster together on the tree (GSI analy-sis permutation tests Diptera GSIfrac14 0 57 Plt 0001 HemipteraGSIfrac14 034 Plt 0001 Ixodida GSIfrac14 038 Plt 0001 LepidopteraGSIfrac14 015 Pfrac14 0089) This is also reflected in a positive correla-tion between the evolutionary distance between the viruses andthe evolutionary distance between their arthropod hosts(Pearsonrsquos correlationfrac14 036 95 CIsfrac14 034ndash038 Plt 0001 basedon permutation Fig 3 and Supplementary Fig S2) Because thevirus phylogeny is incongruent with that of the respectivehosts this suggests rhabdoviruses preferentially host shift be-tween closely related species (Longdon et al 2011a 2014)
We also find viruses clustering on the phylogeny based onthe ecosystem of their hosts there is strong evidence of geneticdifferentiation between viruses from terrestrial and aquatichosts (Fst permutation test Pfrac14 0007 Supplementary Fig S3 per-mutation tests terrestrial hostsfrac14 052 aquatic hostsfrac14 029Plt 0001 for both) There has been one shift from terrestrial toaquatic hosts during the evolution of the basal novirhabdovi-ruses which have a wide host range in fish There have beenother terrestrial to aquatic shifts in the dimarhabdoviruses in
8 | Virus Evolution 2015 Vol 1 No 1
the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice
4 Discussion
Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences
In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species
We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous
observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)
There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should
Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)
Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038
Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)
B Longdon et al | 9
reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)
Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis
Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses
Data availability
Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824
Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584
L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436
Funding
BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO
Supplementary data
Supplementary data is available at Virus Evolution online
Acknowledgements
Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments
Conflict of interest None declared
ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-
tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206
Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72
Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37
Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52
Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75
Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8
Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091
Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21
10 | Virus Evolution 2015 Vol 1 No 1
Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76
Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58
Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3
Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146
Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22
Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press
Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101
Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88
mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73
Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837
Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8
Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196
Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90
Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21
Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92
Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709
Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71
Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68
Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9
Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18
Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80
Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191
LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4
Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20
Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932
Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378
Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9
Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69
Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98
mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt
mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44
mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press
mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260
mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50
mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50
mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395
mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728
Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412
Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7
Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517
Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11
Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21
Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary
B Longdon et al | 11
Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23
Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt
Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt
Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91
Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59
Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9
Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77
van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256
Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press
mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25
mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664
Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63
Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210
Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32
12 | Virus Evolution 2015 Vol 1 No 1
the clades of fish and cetacean viruses and the clade of virusesfrom sea-lice
4 Discussion
Viruses are ubiquitous in nature and recent developments inhigh-throughput sequencing technology have led to the discov-ery and sequencing of a large number of novel viruses in arthro-pods (Li et al 2015 Walker et al 2015 Webster et al 2015) Herewe have identified forty-three novel virus-like sequences fromour own RNA-seq data and public sequence repositories Ofthese thirty-two were rhabdoviruses and twenty-six were fromarthropods Using these sequences we have produced the mostextensive phylogeny of the Rhabdoviridae to date including a to-tal of 195 virus sequences
In most cases we know nothing about the biology of theviruses beyond the host they were isolated from but our analy-sis provides a powerful way to predict which are vector-borneviruses and which are specific to vertebrates or arthropodsWe have identified a large number of new likely vector-borne vi-rusesmdashof eighty-five rhabdoviruses identified from vertebratesor biting insects we predict that seventy-six are arthropod-borne viruses of vertebrates (arboviruses) The majority ofknown rhabdoviruses are arboviruses and all of these fall in asingle clade known as the dimarhabdoviruses In addition tothe arboviruses we also identified two clades of likely insect-specific viruses associated with a wide range of species
We found that shifts between distantly related hosts arerare in the rhabdoviruses which is consistent with previous
observations that both rhabdoviruses of vertebrates (rabies vi-rus in bats) and invertebrates (sigma viruses in Drosophilidae)show a declining ability to infect hosts more distantly related totheir natural host (Streicker et al 2010 Longdon et al 2011aFaria et al 2013) It is thought that sigma viruses may some-times jump into distantly related but highly susceptible species(Longdon et al 2011a 2014 2015) but our results suggest thatthis rarely happens between major groups such as vertebratesand arthropods It is nonetheless surprising that arthropod-spe-cific viruses have arisen rarely as one might naively assumethat there would be fewer constraints on vector-borne viruseslosing one of their hosts However this would involve evolvinga new transmission route among insects and this may be animportant constraint Within the major clades closely relatedviruses often infect closely related hosts (Fig 2) For examplewithin the dimarhabdoviruses viruses identified from mosqui-toes ticks Drosophila Muscid flies Lepidoptera and sea-lice alltend to cluster together (Fig 2B) However it is also clear thatthe virus phylogeny does not mirror the host phylogeny andour data on the clustering of hosts across the virus phylogenytherefore suggests that viruses preferentially shift betweenmore closely related species (Fig 3 Supplementary Figs S1 andS2) in the same environment (Supplementary Fig S3)
There has been a near four-fold increase in the number of re-corded rhabdovirus sequences in the last 5 years In part thismay be due to the falling cost of sequencing transcriptomes(Wang Gerstein and Snyder 2009) and initiatives to sequencelarge numbers of insect and other arthropods (Misof et al 2014)The use of high-throughput sequencing technologies should
Figure 3 The relationship between the evolutionary distance between viruses and the evolutionary distance between their arthropod hosts (categorized by genus)
Closely related viruses tend to be found in closely related hosts Permutation tests find a significant positive correlation (correlationfrac14 036 95 CIsfrac14034ndash038
Plt0001) between host and virus evolutionary distance (see Supplementary Figure S2)
B Longdon et al | 9
reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)
Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis
Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses
Data availability
Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824
Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584
L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436
Funding
BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO
Supplementary data
Supplementary data is available at Virus Evolution online
Acknowledgements
Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments
Conflict of interest None declared
ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-
tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206
Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72
Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37
Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52
Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75
Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8
Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091
Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21
10 | Virus Evolution 2015 Vol 1 No 1
Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76
Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58
Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3
Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146
Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22
Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press
Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101
Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88
mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73
Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837
Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8
Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196
Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90
Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21
Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92
Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709
Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71
Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68
Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9
Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18
Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80
Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191
LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4
Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20
Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932
Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378
Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9
Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69
Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98
mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt
mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44
mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press
mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260
mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50
mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50
mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395
mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728
Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412
Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7
Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517
Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11
Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21
Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary
B Longdon et al | 11
Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23
Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt
Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt
Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91
Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59
Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9
Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77
van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256
Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press
mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25
mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664
Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63
Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210
Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32
12 | Virus Evolution 2015 Vol 1 No 1
reduce the likelihood of sampling biases associated with PCRwhere people look for similar viruses in related hostsTherefore the pattern of viruses forming clades based on thehost taxa they infect is likely to be robust However sampling isbiased towards arthropods and it is possible that there may bea great undiscovered diversity of rhabdoviruses in other organ-isms (Dudas and Obbard 2015)
Our conclusions are likely to be robust to biases in the dataor limitations in the analysis By reconstructing host associa-tions using the Bayesian methods in the BEAST software(Drummond et al 2012) we have avoided most of the simplify-ing assumptions of earlier methods (eg symmetric transitionrate matrices lack of uncertainty associated with estimates)Nonetheless all such methods depend on there being some ofsort of lsquoprocess homogeneityrsquo over the phylogeny (Omland1999) Such analyses are of course limited by sampling for ex-ample if a past host is now extinct it will never be recon-structed as an ancestral state Nevertheless previous studieshave shown that the method is relatively robust to uneven sam-pling across hosts (Edwards et al 2011) Furthermore when wehave viruses from under-sampled groups like cnidarians funginematodes they fall outside the main clades of viruses that weare analysing The limitations of the approach are evident inour results we were unable to reconstruct the host associationsof the root or most basal nodes of the phylogeny The recon-structions were however very successful within clades thatwere strongly associated with a single host or clades where theless common hosts tend to form distinct subclades As a resultof this high level of phylogenetic structure our approach wasable to reconstruct the current host associations of many vi-ruses for which we had incomplete knowledge of their hostrange To check that this approach is reliable we repeated theanalysis on datasets where we deleted the information aboutwhich hosts well-characterized viruses infect Our analysis wasfound to be robust with 97 of reconstructions being accurateThe method only failed for strains with irregular host associa-tions for their location in the phylogeny (ie recent changes inhost on terminal branches)mdasha limitation that would be ex-pected for such an analysis
Rhabdoviruses infect a diverse range of host species includ-ing a large number of arthropods Our search has unearthed alarge number of novel rhabdovirus genomes suggesting that weare only just beginning to uncover the diversity of these virusesThe host associations of these viruses have been highly con-served across their evolutionary history which provides a pow-erful tool to identify previously unknown arboviruses The largenumber of viruses being discovered through metagenomic stud-ies (Aguiar et al 2015 Webster et al 2015 Li et al 2015) meansthat in the future we will be faced by an increasingly large num-ber of viral sequences with little knowledge of the biology of thevirus Our phylogenetic approach could be extended to predictkey biological traits in other groups of pathogens where ourknowledge is incomplete However there are limitations to thismethod and the rapid evolution of RNA viruses may mean thatsome traits change too quickly to accurately infer traitsTherefore such an approach should complement and not re-place examining the basic biology of novel viruses
Data availability
Data are available through Figshare NCBI Sequence ReadArchive Data SRP057824
Results from testing ancestral trait reconstructions predic-tions httpdxdoiorg106084m9figshare1538584
L gene sequences fasta httpdxdoiorg106084m9figshare1425067TrimAl alignment fasta httpdxdoiorg106084m9figshare1425069Gblocks alignment fasta httpdxdoiorg106084m9figshare1425068Phylogenetic tree Gblocks alignment httpdxdoiorg106084m9figshare1425083Phylogenetic tree TrimAl alignment httpdxdoiorg106084m9figshare1425082BEAST alignment fasta httpdxdoiorg106084m9figshare1425431BEAUti xml file httpdxdoiorg106084m9figshare1431922Bayesian analysis tree httpdxdoiorg106084m9figshare1425436
Funding
BL and FMJ are supported by a NERC grant (NEL0042321)a European Research Council grant (281668 DrosophilaInfec-tion) a Junior Research Fellowship from Christrsquos CollegeCambridge (BL) GGRM is supported by an MRC student-ship The metagenomic sequencing of viruses fromDimmigrans Dtristis and Sdeflexa was supported by a Well-come Trust fellowship (WT085064) to DJO
Supplementary data
Supplementary data is available at Virus Evolution online
Acknowledgements
Many thanks to Mike Ritchie for providing the DMonSV in-fected fly line Casper Breuker and Melanie Gibbs for PAegRVsamples and Philip Leftwich for CCapSV samples Thanksto everyone who provided fly collections Thanks to two re-viewers and Oliver Pybus for useful comments
Conflict of interest None declared
ReferencesAguiar E R et al (2015) lsquoSequence-Independent Characteriza-
tion of Viruses Based on the Pattern of Viral Small RNAs Pro-duced by the Hostrsquo Nucleic Acids Research 43 6191ndash206
Ahne W et al (2002) lsquoSpring Viremia of Carp (SVC)rsquo Diseases ofAquatic Organisms 52 261ndash72
Aiewsakun P and Katzourakis A (2015) lsquoEndogenous VirusesConnecting Recent and Ancient Viral Evolutionrsquo Virology 479-480C 26ndash37
Anisimova M and Gascuel O (2006) lsquoApproximate Likelihood-Ratio Test for Branches A Fast Accurate and PowerfulAlternativersquo Systems Biology 55 539ndash52
Aznar-Lopez C et al (2013) lsquoDetection of Rhabdovirus Viral RNAin Oropharyngeal Swabs and Ectoparasites of Spanish BatsrsquoJournal of General Virology 94 69ndash75
Ballinger M J Bruenn J A and Taylor D J (2012) lsquoPhylogenyIntegration and Expression of Sigma Virus-like Genes inDrosophilarsquo Molecular Phylogenetics Evolution 65 251ndash8
Bazinet A Myers D and Khatavkar P (2009) genealogicalSort-ing v091
Bhatia G et al (2013) lsquoEstimating and Interpreting FST TheImpact of Rare Variantsrsquo Genome Research 23 1514ndash21
10 | Virus Evolution 2015 Vol 1 No 1
Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76
Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58
Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3
Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146
Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22
Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press
Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101
Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88
mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73
Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837
Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8
Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196
Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90
Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21
Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92
Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709
Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71
Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68
Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9
Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18
Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80
Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191
LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4
Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20
Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932
Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378
Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9
Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69
Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98
mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt
mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44
mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press
mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260
mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50
mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50
mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395
mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728
Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412
Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7
Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517
Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11
Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21
Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary
B Longdon et al | 11
Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23
Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt
Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt
Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91
Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59
Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9
Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77
van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256
Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press
mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25
mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664
Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63
Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210
Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32
12 | Virus Evolution 2015 Vol 1 No 1
Bootsma R Dekinkelin P and Leberre M (1975) lsquoTransmissionExperiments with Pike Fry (Esox-Lucius L) RhabdovirusrsquoJournal of Fish Biology 7 269ndash76
Bourhy H et al (2005) lsquoPhylogenetic Relationships AmongRhabdoviruses Inferred Using the L Polymerase Genersquo Journalof General Virology 86 2849ndash58
Capella-Gutierrez S Silla-Martinez J M and Gabaldon T(2009) lsquotrimAl A Tool for Automated Alignment Trimming inLarge-Scale Phylogenetic Analysesrsquo Bioinformatics 25 1972ndash3
Chiba S et al (2011) lsquoWidespread Endogenization of GenomeSequences of Non-Retroviral RNA Viruses into PlantGenomesrsquo Plos Pathogens 7 e1002146
Cummings M P Neel M C and Shaw K L (2008) lsquoAGenealogical Approach to Quantifying Lineage DivergencersquoEvolution 62 2411ndash22
Dietzgen R G and Kuzmin I V (2012) Rhabdoviruses MolecularTaxonomy Evolution Genomics Ecology Host-Vector InteractionsCytopathology and Control Norfolk UK Caister Academic Press
Dorson M et al (1987) lsquoSusceptibility of Pike (Esox-Lucius) toDifferent Salmonid Viruses (Ipn Vhs Ihn) and to the PerchRhabdovirusrsquo Bulletin Francais De La Peche Et De La Pisciculture307 91ndash101
Drummond A J et al (2006) lsquoRelaxed Phylogenetics and Datingwith Confidencersquo PLoS Biology 4 e88
mdashmdash et al (2012) lsquoBayesian Phylogenetics with BEAUti and theBEAST 17rsquo Molecular Biology and Evolution 29 1969ndash73
Dudas G and Obbard D J (2015) lsquoAre Arthropods at the Heart ofVirus Evolutionrsquo eLife 4 e06837
Edwards C J et al (2011) lsquoAncient Hybridization and An IrishOrigin for the Modern Polar Bear Matrilinersquo Current Biology 211251ndash8
Faria N R et al (2013) lsquoSimultaneously Reconstructing ViralCross-Species Transmission History and Identifying theUnderlying Constraintsrsquo Philosophical Transactions of the RoyalSociety of London B Biological Sciences 368 20120196
Fort P et al (2011) lsquoFossil Rhabdoviral Sequences Integrated intoArthropod Genomes Ontogeny Evolution and PotentialFunctionalityrsquo Molecular Biology and Evolution 29 381ndash90
Guindon S et al (2010) lsquoNew Algorithms and Methods toEstimate Maximum-Likelihood Phylogenies Assessing thePerformance of PhyML 30rsquo Systems Biology 59 307ndash21
Haenen O and Davidse A (1993) lsquoComparativePathogenicity of Two Strains of Pike Fry Rhabdovirus andSpring Viremia of Carp Virus for Young Roach CommonCarp Grass Carp and Rainbow Troutrsquo Diseases of AquaticOrganisms 15 87ndash92
Hampson K et al (2015) lsquoEstimating the Global Burden ofEndemic Canine Rabiesrsquo PLoS Neglected Tropical Diseases 9e0003709
Hogenhout S A Redinbaugh M G and Ammar E D (2003)lsquoPlant and Animal Rhabdovirus Host Range A Bugrsquos ViewrsquoTrends in Microbiology 11 264ndash71
Hommola K et al (2009) lsquoA Permutation Test of Host-ParasiteCospeciationrsquo Molecuar Biology and Evolution 26 1457ndash68
Hudson R R Slatkin M and Maddison W P (1992) lsquoEstimationof Levels of Gene Flow from DNA Sequence Datarsquo Genetics 132583ndash9
Jeyaprakash A and Hoy M A (2009) lsquoFirst Divergence TimeEstimate of Spiders Scorpions Mites and Ticks (SubphylumChelicerata) Inferred from Mitochondrial PhylogenyrsquoExperimental and Applied Acarology 47 1ndash18
Katoh K and Standley D M (2013) lsquoMAFFT Multiple SequenceAlignment Software Version 7 Improvements in Performanceand Usabilityrsquo Molecular Biology and Evolution 30 772ndash80
Katzourakis A and Gifford R J (2010) lsquoEndogenous ViralElements in Animal Genomesrsquo Plos Genetics 6 e1001191
LrsquoHeritier P H and Teissier G (1937) lsquoUne AnomaliePhysiologique Hereditaire Chez la Drosophilersquo Comptes Rendusde lrsquoAcademie des Sciences Paris 231 192ndash4
Le S Q and Gascuel O (2008) lsquoAn Improved General AminoAcid Replacement Matrixrsquo Molecular Biology and Evolution 251307ndash20
Lemey P et al (2014) lsquoUnifying Viral Genetics and HumanTransportation Data to Predict the Global TransmissionDynamics of Human Influenza H3N2rsquo PLoS Pathogens 10e1003932
Li C X et al (2015) lsquoUnprecedented Genomic Diversity of RNAViruses in Arthropods Reveals the Ancestry of Negative-SenseRNA Virusesrsquo eLife 107554eLife05378
Lipkin W I and Anthony S J (2015) lsquoVirus Huntingrsquo Virology479ndash480C 194ndash9
Liu S Vijayendran D and Bonning B C (2011) lsquoNextGeneration Sequencing Technologies for Insect VirusDiscoveryrsquo Viruses 3 1849ndash69
Longdon B and Jiggins F M (2012) lsquoVertically TransmittedViral Endosymbionts of Insects Do Sigma Viruses WalkAlonersquo Proceedings of the Biological Sciences 279 3889ndash98
mdashmdash and Walker P J (2011) ICTV sigmavirus species and genusproposal lthttpictvonlineorgproposals2011007a-dVAv2Sigmaviruspdfgt
mdashmdash Obbard D J and Jiggins F M (2010) lsquoSigma Viruses fromThree Species of Drosophila form a Major New Clade in theRhabdovirus Phylogenyrsquo Proceedings of the Royal Society B 27735ndash44
mdashmdash Wilfert L and Jiggins F M (2012) lsquoThe Sigma Viruses ofDrosophilarsquo in Dietzgen R and Kuzmin I (eds) RhabdovirusesMolecular Taxonomy Evolution Genomics Ecology Cytopathologyand Control pp 117-132 Norfolk UK Caister Academic Press
mdashmdash et al (2011a) lsquoHost Phylogeny Determines Viral Persistenceand Replication in Novel Hostsrsquo PLoS Pathogens 7 e1002260
mdashmdash et al (2011b) lsquoHost Switching by a Vertically-TransmittedRhabdovirus in Drosophilarsquo Biology Letters 7 747ndash50
mdashmdash et al (2011c) lsquoRhabdoviruses in Two Species ofDrosophila Vertical Transmission and a Recent SweeprsquoGenetics 188 141ndash50
mdashmdash et al (2014) lsquoThe Evolution and Genetics of Virus HostShiftsrsquo PLoS Pathogens 10 e1004395
mdashmdash et al (2015) lsquoThe Causes and Consequences of Changes inVirulence following Pathogen Host Shiftsrsquo PLoS Pathogens 11e1004728
Minin V N and Suchard M A (2008) lsquoCounting LabeledTransitions in Continuous-Time Markov Models of EvolutionrsquoJournal of Mathematical Biology 56 391ndash412
Misof B et al (2014) lsquoPhylogenomics Resolves the Timing andPattern of Insect Evolutionrsquo Science 346 763ndash7
Okland A L et al (2014) lsquoGenomic Characterization andPhylogenetic Position of Two New Species in RhabdoviridaeInfecting the Parasitic Copepod Salmon Louse (Lepeophtheirussalmonis)rsquo Plos One 9 e112517
Omland K E (1999) lsquoThe Assumptions and Challenges ofAncestral State Reconstructionsrsquo Systems Biology 48 604ndash11
Parker D J et al (2015) lsquoHow Consistent are the TranscriptomeChanges Associated with Cold Acclimation in Two Speciesof the Drosophila Virilis Grouprsquo Heredity (Edinburgh) 11513ndash21
Pfeilputzien C (1978) lsquoExperimental Transmission of SpringViremia of Carp Through Carp Lice (Argulus-Foliaceus)rsquoZentralblatt Fur Veterinarmedizin Reihe B-Journal of Veterinary
B Longdon et al | 11
Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23
Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt
Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt
Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91
Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59
Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9
Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77
van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256
Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press
mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25
mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664
Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63
Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210
Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32
12 | Virus Evolution 2015 Vol 1 No 1
Medicine Series B-Infectious Diseases Immunology Food HygieneVeterinary Public Health 25 319ndash23
Rambaut A (2011) lsquoFigTreersquo (v13 ed) lthttptreebioedacuksoftwarefigtreegt
Rambaut A and Drummond A J (2007) Tracer v16 lthttpbeastbioedacukTracergt
Rosen L (1980) lsquoCarbon-dioxide Sensitivity in MosquitosInfected with Sigma Vesicular Stomatitis and OtherRhabdovirusesrsquo Science 207 989ndash91
Shroyer D A and Rosen L (1983) lsquoExtrachromosomal-Inheritance of Carbon-dioxide Sensitivity in the MosquitoCulex-Quinquefasciatusrsquo Genetics 104 649ndash59
Streicker D G et al (2010) lsquoHost Phylogeny Constrains Cross-Species Emergence and Establishment of Rabies Virus in BatsrsquoScience 329 676ndash9
Talavera G and Castresana J (2007) lsquoImprovement of PhylogeniesAfter Removing Divergent and Ambiguously Aligned Blocksfrom Protein Sequence Alignmentsrsquo Systems Biology 56 564ndash77
van Mierlo J T et al (2014) lsquoNovel Drosophila Viruses EncodeHost-specific Suppressors of RNAirsquo PLoS Pathogens 10 e1004256
Walker P J Blasdell K R and Joubert D A (2012)lsquoEphemeroviruses Athropod-borne Rhabdoviruses ofRuminants with Large Complex Genomesrsquo in Dietzgen R Gand Kuzmin I V (eds) Rhabdoviruses Molecular TaxonomyEvolution Genomics Ecology Host-Vector InteractionsCytopathology and Control pp 59ndash88 Norfolk UK CaisterAcademic Press
mdashmdash et al (2011) lsquoRhabdovirus Accessory Genesrsquo Virus Research162 110ndash25
mdashmdash et al (2015) lsquoEvolution of Genome Size and Complexity inthe Rhabdoviridaersquo PLoS Pathogens 11 e1004664
Wang Z Gerstein M and Snyder M (2009) lsquoRNA-Seq ARevolutionary Tool for Transcriptomicsrsquo Nature ReviewGenetics 10 57ndash63
Webster C L et al (2015) lsquoThe Discovery Distribution andEvolution of Viruses Associated with Drosophila melanogasterPLoS Biology 13 e1002210
Weinert L A et al (2012) lsquoMolecular Dating of Human-to-BovidHost Jumps by Staphylococcus aureus Reveals an Associationwith the Spread of Domesticationrsquo Biology Letters 8 829ndash32
12 | Virus Evolution 2015 Vol 1 No 1