+ All Categories
Home > Documents > Pervasive Recombination and Sympatric Genome Diversification Driven by Frequency-Dependent Selection...

Pervasive Recombination and Sympatric Genome Diversification Driven by Frequency-Dependent Selection...

Date post: 21-Nov-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
24
INVESTIGATION Pervasive Recombination and Sympatric Genome Diversication Driven by Frequency-Dependent Selection in Borrelia burgdorferi, the Lyme Disease Bacterium James Haven,* ,1 Levy C. Vargas, Emmanuel F. Mongodin, Vincent Xue, § Yozen Hernandez, Pedro Pagan, Claire M. Fraser-Liggett, Steven E. Schutzer,** Benjamin J. Luft, †† Sherwood R. Casjens, ‡‡ and Wei-Gang Qiu* ,,§§,2 *Department of Biology, The Graduate Center, City University of New York, New York, New York 10016, Department of Biological Sciences and The Center for Gene Structure and Function and § Department of Computer Science, Hunter College, City University of New York, New York, New York 10065, Institute for Genome Sciences, University of Maryland BioPark, Baltimore, Maryland 21201, **Department of Medicine, University of Medicine and Dentistry of New JerseyNew Jersey Medical School, Newark, New Jersey 07103, †† Department of Medicine, Health Science Center, Stony Brook University, Stony Brook, New York 11794, ‡‡ Department of Pathology, Division of Molecular Cell Biology and Immunology, University of Utah School of Medicine, Salt Lake City, Utah 84112, and §§ National Evolutionary Synthesis Center, Durham, North Carolina 27705 ABSTRACT How genomic diversity within bacterial populations originates and is maintained in the presence of frequent recombination is a central problem in understanding bacterial evolution. Natural populations of Borrelia burgdorferi, the bacterial agent of Lyme disease, consist of diverse genomic groups co-infecting single individual vertebrate hosts and tick vectors. To understand mechanisms of sympatric genome differentiation in B. burgdorferi, we sequenced and compared 23 genomes representing major genomic groups in North America and Europe. Linkage analysis of .13,500 single-nucleotide polymorphisms revealed pervasive horizontal DNA exchanges. Although three times more frequent than point mutation, recombination is localized and weakly affects genome-wide linkage disequilibrium. We show by computer simulations that, while enhancing population tness, recombination constrains neutral and adaptive divergence among sympatric genomes through periodic selective sweeps. In contrast, simulations of frequency-dependent selection with recombination produced the observed pattern of a large number of sympatric genomic groups associated with major sequence variations at the selected locus. We conclude that negative frequency-dependent selection targeting a small number of surface-antigen loci (ospC in particular) sufciently explains the maintenance of sympatric genome diversity in B. burgdorferi without adaptive divergence. We suggest that pervasive recombination makes it less likely for local B. burgdorferi genomic groups to achieve host specialization. B. burgdorferi genomic groups in the northeastern United States are thus best viewed as constituting a single bacterial species, whose generalist nature is a key to its rapid spread and human virulence. G ENETIC discontinuity, the basis of biodiversity, is ubiq- uitous in prokaryotes as well as in eukaryotes. Most bacterial populations display a highly clonal genetic struc- ture, in which the observable number of multilocus geno- types is far fewer than the number expected under the assumption of free recombination (Maynard Smith et al. 1993). Bacterial clonality was originally thought of as a re- sult of a lack or rarity of recombination among asexually reproducing and independently evolving clones (Ochman and Selander 1984). Since then, molecular surveys of natu- ral bacterial populations using protein electrophoresis, mul- tilocus sequencing typing (MLST), and whole-genome Copyright © 2011 by the Genetics Society of America doi: 10.1534/genetics.111.130773 Manuscript received May 25, 2011; accepted for publication August 18, 2011 Supporting information is available online at http://www.genetics.org/content/ suppl/2011/09/02/genetics.111.130773.DC1. Sequence data from this article have been deposited in the NCBI BioProject database under accession nos. PRJNA3, PRJNA28633, PRJNA19839, PRJNA29359, PRJNA28629, PRJNA29357, PRJNA21003, PRJNA19835, PRJNA28627, PRJNA21001, PRJNA29361, PRJNA28621, PRJNA19837, PRJNA20999, PRJNA28631, PRJNA29363, PRJNA17057, PRJNA19841, PRJNA12554, PRJNA28625, PRJNA29573, PRJNA19843, and PRJNA28635. 1 Present address: Odum School of Ecology, University of Georgia, Athens, GA 30602. 2 Corresponding author: Department of Biological Sciences, Hunter College of the City University of New York, 695 Park Ave., New York, NY 10065. E-mail: weigang@ genectr.hunter.cuny.edu Genetics, Vol. 189, 951966 November 2011 951
Transcript

INVESTIGATION

Pervasive Recombination and Sympatric GenomeDiversification Driven by Frequency-Dependent

Selection in Borrelia burgdorferi, the LymeDisease Bacterium

James Haven,*,1 Levy C. Vargas,† Emmanuel F. Mongodin,‡ Vincent Xue,§ Yozen Hernandez,†

Pedro Pagan,† Claire M. Fraser-Liggett,‡ Steven E. Schutzer,** Benjamin J. Luft,††

Sherwood R. Casjens,‡‡ and Wei-Gang Qiu*,†,§§,2

*Department of Biology, The Graduate Center, City University of New York, New York, New York 10016, †Department ofBiological Sciences and The Center for Gene Structure and Function and §Department of Computer Science, Hunter College, CityUniversity of New York, New York, New York 10065, ‡Institute for Genome Sciences, University of Maryland BioPark, Baltimore,Maryland 21201, **Department of Medicine, University of Medicine and Dentistry of New Jersey–New Jersey Medical School,

Newark, New Jersey 07103, ††Department of Medicine, Health Science Center, Stony Brook University, Stony Brook, New York11794, ‡‡Department of Pathology, Division of Molecular Cell Biology and Immunology, University of Utah School of Medicine, Salt

Lake City, Utah 84112, and §§National Evolutionary Synthesis Center, Durham, North Carolina 27705

ABSTRACT How genomic diversity within bacterial populations originates and is maintained in the presence of frequentrecombination is a central problem in understanding bacterial evolution. Natural populations of Borrelia burgdorferi, the bacterialagent of Lyme disease, consist of diverse genomic groups co-infecting single individual vertebrate hosts and tick vectors. To understandmechanisms of sympatric genome differentiation in B. burgdorferi, we sequenced and compared 23 genomes representing majorgenomic groups in North America and Europe. Linkage analysis of .13,500 single-nucleotide polymorphisms revealed pervasivehorizontal DNA exchanges. Although three times more frequent than point mutation, recombination is localized and weakly affectsgenome-wide linkage disequilibrium. We show by computer simulations that, while enhancing population fitness, recombinationconstrains neutral and adaptive divergence among sympatric genomes through periodic selective sweeps. In contrast, simulations offrequency-dependent selection with recombination produced the observed pattern of a large number of sympatric genomic groupsassociated with major sequence variations at the selected locus. We conclude that negative frequency-dependent selection targetinga small number of surface-antigen loci (ospC in particular) sufficiently explains the maintenance of sympatric genome diversity in B.burgdorferi without adaptive divergence. We suggest that pervasive recombination makes it less likely for local B. burgdorferi genomicgroups to achieve host specialization. B. burgdorferi genomic groups in the northeastern United States are thus best viewed asconstituting a single bacterial species, whose generalist nature is a key to its rapid spread and human virulence.

GENETIC discontinuity, the basis of biodiversity, is ubiq-uitous in prokaryotes as well as in eukaryotes. Most

bacterial populations display a highly clonal genetic struc-ture, in which the observable number of multilocus geno-types is far fewer than the number expected under theassumption of free recombination (Maynard Smith et al.1993). Bacterial clonality was originally thought of as a re-sult of a lack or rarity of recombination among asexuallyreproducing and independently evolving clones (Ochmanand Selander 1984). Since then, molecular surveys of natu-ral bacterial populations using protein electrophoresis, mul-tilocus sequencing typing (MLST), and whole-genome

Copyright © 2011 by the Genetics Society of Americadoi: 10.1534/genetics.111.130773Manuscript received May 25, 2011; accepted for publication August 18, 2011Supporting information is available online at http://www.genetics.org/content/suppl/2011/09/02/genetics.111.130773.DC1.Sequence data from this article have been deposited in the NCBI BioProjectdatabase under accession nos. PRJNA3, PRJNA28633, PRJNA19839, PRJNA29359,PRJNA28629, PRJNA29357, PRJNA21003, PRJNA19835, PRJNA28627,PRJNA21001, PRJNA29361, PRJNA28621, PRJNA19837, PRJNA20999,PRJNA28631, PRJNA29363, PRJNA17057, PRJNA19841, PRJNA12554,PRJNA28625, PRJNA29573, PRJNA19843, and PRJNA28635.1Present address: Odum School of Ecology, University of Georgia, Athens, GA30602.

2Corresponding author: Department of Biological Sciences, Hunter College of the CityUniversity of New York, 695 Park Ave., New York, NY 10065. E-mail: [email protected]

Genetics, Vol. 189, 951–966 November 2011 951

sequencing revealed that horizontal genetic exchange is infact often more frequent than point mutations in bacteria,including species known as strongly clonal (Maynard Smithet al. 1993; Feil and Spratt 2001; Didelot and Maiden 2010;Retchless and Lawrence 2010). A paradox thereby arises as tohow distinct clonal groups may rise and be maintained in localbacterial populations in the absence of intrinsic gene-flowbarriers akin to reproductive isolation between eukaryotic spe-cies. A widely held hypothesis is adaptive specialization, inwhich distinct genotypes coexisting within local bacterial pop-ulations are thought to represent ecologically differentiatedsubpopulations or “ecotypes” (Cohan 2002; Koeppel et al.2008). In the ecotype model, though, natural selection needsto be persistent and strong enough to overcome the homog-enizing effects of genetic exchange (Doolittle and Papke 2006;Fraser et al. 2007; Lawrence and Retchless 2010). Frequency-dependent selection (FDS) is a form of balancing selectioncapable of maintaining genetic diversity at an antigenic locusin a pathogen without the assumption of differentiallyadapted alleles (Levin 1988; Takahata and Nei 1990; Wiener1996). To date, however, genome-wide consequences of FDSare largely unknown and FDS as a possible common cause ofsympatric genome diversity in bacteria has not receivednearly as much attention as the ecotype model.

Here we use comparative population genomics and com-puter simulation to understand the origin and maintenance ofhigh local genomic diversity in natural populations of Borreliaburgdorferi sensu lato (“B. burgdorferi s.l.” hereafter), a bacte-rial species complex that includes Lyme disease agents. Inmany ways B. burgdorferi s.l. represents an ideal system forstudying the processes of bacterial divergence in nature. First,B. burgdorferi s.l. shows a strong biogeographic structure asa result of being an obligate and tick-borne parasite of verte-brate hosts (Kurtenbach et al. 2006; Hoen et al. 2009). LocalB. burgdorferi s.l. populations, especially those in the north-eastern and midwestern United States, appear to have singlerecent origins so that possibilities such as constant migrationmaintaining local genetic diversities could be excluded (Qiuet al. 2008; Brisson et al. 2010). Second, local B. burgdorferis.l. populations are highly diverse, consisting of a large num-ber of distinct genomic groups coexisting in a strictly sympat-ric fashion, often co-infecting a single individual vertebratehost or tick vector (Wang et al. 1999b; Qiu et al. 2002; Brissonand Dykhuizen 2004; Bunikis et al. 2004). Third, the popula-tion structure of B. burgdorferi s.l. is representative of bacterialspecies in that it is neither strictly clonal nor freely recombin-ing (Qiu et al. 2004; Travinsky et al. 2010).

The selective mechanisms underlying the origin andmaintenance of local genomic diversity in B. burgdorferi s.l.remain unclear. Initial surveys of genetic diversity at loci un-der strong immune-escape selection revealed evenly distrib-uted allele frequencies consistent with frequency-dependentselection (Qiu et al. 1997, 2002). An alternative, although notnecessarily mutually exclusive, hypothesis is that distinctclonal groups are maintained by host specialization (Brissonand Dykhuizen 2004, 2006). Here, we compare newly and

previously sequenced genomes to investigate how sympatricgenome diversification in B. burgdorferi s.l. might haveoriginated and be maintained under the joint influencesof recombination and natural selection. We first estimatedrecombination rates on the core, most conserved parts ofthe B. burgdorferi s.l. genome, which revealed pervasivelocalized recombination throughout the genome. We thentested the compatibility of this observation with variousforms of natural selection using computer simulations,which showed that frequency-dependent selection actingon a few antigenic loci is capable of maintaining distinctgenomic groups in local B. burgdorferi s.l. populations de-spite frequent and pervasive recombination. In contrast,frequent recombination makes it less likely that these ge-nomic groups are maintained by host specialization.

Materials and Methods

Strain source, genome sequencing, andortholog identification

We compared 23 completed and draft B. burgdorferi s.l.genomes and focused on the comparison of 14 conspecificgenomes of the genospecies B. burgdorferi sensu stricto(B. burgdorferi s.s.) (Table 1) (Fraser et al. 1997; Casjenset al. 2000, 2011a, 2011b; Glockner et al. 2004, 2006;Schutzer et al. 2011). Ten of the B. burgdorferi s.s. genomeswere from sympatric isolates from the northeastern UnitedStates (Table 1). The new genomes were sequenced by thefollowing procedures. Briefly, genomic DNA was obtainedfrom 10 ml of low-passage, log-phase cultures as describedpreviously (Qiu et al. 2008). Genomes were sequenced to anestimated eightfold coverage by the random shotgun methodfollowed by the Sanger DNA sequencing method, as de-scribed in Nelson et al. (2004). For plasmid closure, one smallinsert plasmid library (2–3 kb) and one medium insert plas-mid library (7–8 kb) were constructed for each strain andsequenced to approximately fivefold and threefold coverages,respectively. The sequences from each strain were assembledusing a combination of the TIGR Assembler (Sutton et al.1995) and the Celera Assembler (Myers et al. 2000).

Replicon identities of contigs were determined by com-parisons with the completed genomes (B31, JD1, N40, and297), using NUCMER (Delcher et al. 2002). ORFs on eachcontig were identified by using GLIMMER (Delcher et al.1999). Orthologous ORFs were identified by clustering theminto homologous protein families using all-against-allBLASTp (Altschul et al. 1997) followed by MCL (Enrightet al. 2002). Orthologs were distinguished from paralogson the basis of gene order, using customized synteny maps.Protein sequence alignments were constructed usingClustalW (Larkin et al. 2007). Codon alignments were de-rived from protein alignments using customized PERLscripts based on BioPerl (Stajich et al. 2002). Whole-plasmidsequence alignments were obtained by using ClustalW-MPI(Li 2003).

952 J. Haven et al.

Linkage analysis

We identified SNPs segregating among conspecific B. burg-dorferi s.s. strains using LDhat (version 2.1) and used LDhatagain to estimate recombination rates between all pairs ofSNP sites (McVean et al. 2002). In LDhat the recombinationrate is estimated on the basis of the “four-gamete test” ofrecombination (Hudson and Kaplan 1985). SNP pairs show-ing scores of .2.0 or ,2.0 in a likelihood test (implementedin LDhat) have significantly high and low recombination ratesrelative to the average rate of all SNP pairs, respectively.Results of linkage analysis were formatted by using custom-ized PERL scripts and plotted using the software packagesCircos (Krzywinski et al. 2009) and R (http://r-project.org).

Phylogenetic reconstruction

We used a strongly linked portion (bbb01–bbb14) of thecp26 plasmid to infer phylogenetic relationships amongB. burgdorferi s.s. clonal groups. We obtained a concatenatedalignment of 14 genes from 15 strains (14 B. burgdorferi s.s.and SV1 used as an outgroup). We used two methods ofphylogenetic reconstruction, including a Bayesian methodusing MrBayes (v2.1) (Huelsenbeck and Ronquist 2001)and a maximum-likelihood method with 100 bootstrappedalignments using DNAML in PHYLIP (Felsenstein 1989).Branch support was measured with posterior probabilitiesobtained from MrBayes and bootstrap values from PHYLIP.Phylogenies and gene trees were plotted and annotated using

the R package APE (Paradis et al. 2004) and FigTree (http://tree.bio.ed.ac.uk/software/figtree/).

Estimating recombination rates using sister genomes

Sequence differences between two recently diverged phylo-genetic sister strains are more likely than those betweendistantly related genomes to be due to single—rather thanmultiple—events of point mutations or recombination. Assuch, direct comparison of sister-group genomes revealsrelative rates of recombination to point mutation (Guttmanand Dykhuizen 1994). For each SNP segregating betweena pair of sister genomes, it is identified as a point mutation ifit is unique across all aligned sequences and as recombina-tion if it occurs in non–sister-group genomes as well (i.e.,a homoplasy). In estimating the number of gene-conversionevents, consecutive SNPs sharing the same phylogenetic pat-tern were combined into a single count.

Codon-based simulations of bacterial genome evolutionCodon-based fitness functions: We simulated a populationwith a constant size of N = 500 haploid genomes. Each ge-nome consisted of four disjoint 500-codon genes. Whena gene evolved neutrally, fitness of each codon (w) remainedat 1 regardless of synonymous or nonsynonymous substitu-tions. When a gene was under purifying selection (mimickinga housekeeping locus), each nonsynonymous substitution(relative to the founding amino acid state) resulted in a codonwith its fitness reduced by a fraction: w = 1 2 spur. When

Table 1 Strain and genome sources

Strain Genospecies ospC type Geographic origin Biological sourceaProject

accessionb Genome report

B31 B. burgdorferi s.s. A New York I. scapularis PRJNA3 Fraser et al. (1997)64b B. burgdorferi s.s. B1 New York Human PRJNA28633 Schutzer et al. (2011)ZS7 B. burgdorferi s.s. B2 Germany I. ricinus PRJNA19839 Schutzer et al. (2011)JD1 B. burgdorferi s.s. C Massachusetts I. scapularis PRJNA29359 Schutzer et al. (2011)CA-11.2A B. burgdorferi s.s. D California I. pacificus PRJNA28629 Schutzer et al. (2011)N40 B. burgdorferi s.s. E New York I. scapularis PRJNA29357 Schutzer et al. (2011)72a B. burgdorferi s.s. G New York Human PRJNA21003 Schutzer et al. (2011)156a B. burgdorferi s.s. H New York, US Human PRJNA19835 Schutzer et al. (2011)WI91-23 B. burgdorferi s.s. I Wisconsin, US Bird PRJNA28627 Schutzer et al. (2011)118a B. burgdorferi s.s. J New York Human PRJNA21001 Schutzer et al. (2011)297 B. burgdorferi s.s. K Connecticut Human PRJNA29361 Schutzer et al. (2011)29805 B. burgdorferi s.s. M Connecticut I. scapularis PRJNA28621 Schutzer et al. (2011)Bol26 B. burgdorferi s.s. S Italy I. ricinus PRJNA19837 Schutzer et al. (2011)94a B. burgdorferi s.s. U New York Human PRJNA20999 Schutzer et al. (2011)SV1 B. finlandensis Finland I. ricinus PRJNA28631 Casjens et al. (2011a)DN127 B. bissettii California I. pacificus PRJNA29363 S. E. Schutzer et al.,

(unpublished results)PKo B. afzelii Germany Human PRJNA17057 Glockner et al. (2006)ACA-1 B. afzelii Sweden Human PRJNA19841 Casjens et al. (2011b)PBi B. bavariensis Germany Human PRJNA12554 Glockner et al. (2004)PBr B. garinii Denmark Human PRJNA28625 Casjens et al. (2011b)Far04 B. garinii Denmark Bird PRJNA29573 Casjens et al. (2011b)VS116 B. valaisiana Switzerland I. ricinus PRJNA19843 S. E. Schutzer et al.,

(unpublished results)A14S B. spielmani The Netherlands I. ricinus PRJNA28635 S. E. Schutzer et al.,

(unpublished results)a I. scapularis, I. pacificus, and I. ricinus, Ixodes (tick) vectors; Human, tissues from Lyme disease patients; Bird, bird blood.b NCBI BioProject accession (www.ncbi.nlm.nih.gov/bioproject).

Frequency-Dependent Selection in Borrelia 953

a gene was under negative frequency-dependent selection,fitness of a codon was a linearly decreasing function of thepopulation frequency of its amino acid state: 12 sfdsxi, wherexi is the population frequency of amino acid state i and sfds isthe selection coefficient (Takahata and Nei 1990; Neuhauser1999). When a gene was under directional selection, eachnonsynonymous substitution resulted in a codon with a fitnessw = 1 while fitness of codons with the founding amino acidstate was reduced: w = 1 2 sdir. Fitness of an individual wasthe multiplicative product of the fitness of its compositecodons: W =

Qw.

Evolutionary algorithms: To closely represent the B. burgdor-feri s.l. genome, the codon composition and substitutions weresimulated on the basis of the codon usage table (Nakamuraet al. 2000) and a 4:1 transition-to-transversion ratio (W. G.Qiu, unpublished results, based on maximum-likelihood analy-sis of B. burgdorferi s.l. orthologs). In the beginning of eachgeneration, a total number of Umutations (Poisson distributed)were added to each individual at randomly chosen codon posi-tions. A chosen codon was replaced by one of the nine possiblecodons that differ from the parental codon by one base (stopcodons excluded). We then introduced a total number ofR gene-conversion events (Poisson distributed) into each in-dividual, using the Didelot and Falush algorithm with a tractlength of d = 30 codons (exponentially distributed) (Didelotand Falush 2007). Briefly, the loci were assumed to be suffi-ciently far apart so that each gene-conversion event affectedonly a single gene. For each recipient genome, a donor ge-nome was randomly chosen from the population. A gene-conversion event spans the first codon of a gene with theprobability of d/(bd + L 2 b), where d is the mean of expo-nentially distributed tract lengths of a gene-conversion event,b is the number of genes, and L is the total length of genes.Gene conversion starts within a gene block with the proba-bility of 1/(bd + L 2 b).

Genetic drift and natural selection were applied in a singlestep following algorithms by Hudson and Kaplan (1995) andTakahata and Nei (1990). Specifically, an individual was ran-domly chosen from the parental pool and a random number1$ P$ 0 was generated. If fitness of an individualW$ p, thisindividual was accepted as a parent for the next-generationpopulation. Otherwise it was rejected and the next individualwas chosen with replacement. The above process continueduntil the constant population size N was reached. To improvecomputational efficiency and reduce rounding errors, fitness ofeach individual was calculated using logarithms and normal-ized to between 0.5 and 1, using a linear extrapolationWnorm ¼ f1þ ðW2WminÞ=ðWmax2WminÞg=2, where Wnorm isthe normalized fitness, and Wmin and Wmax are, respectively,the minimum and maximum fitness values in a population.

Program validation: At regular intervals during simulation,a sample of n individuals was drawn and assayed for nucle-otide diversity (p). We kept a record of reproductive (butnot recombination) ancestors of each individual during the

simulation so that gene genealogy of sampled individualscould be reconstructed. To validate the simulation program,we first simulated genome evolution under strictly neutralconditions by setting all selection coefficients to zero andcompared the p-values of the sampled alleles with the neu-tral expectation p0 = 2Nm. We ran neutral evolution underthree gene-conversion rates, R = 0, R = U, and R = 5U, andnamed these models NEU1, NEU2, and NEU3, respectively.Second, we simulated genome evolution under the influenceof purifying selection at one locus and at the same three levelsof gene conversion. We named these simulations background-selection models (BKG1, BKG2, and BKG3). The expectednucleotide diversity at a neutral locus linked to a locus underbackground selection is reduced relative to p0. We comparedthe p-values of the neutral locus under the BKG simulationmodels with an analytical expectation of background selec-tion: pbkg ¼ p0expf2U=ðsþ RÞg (Hudson and Kaplan1995). Third, we simulated genome evolution with adaptivemutations at one locus and named these simulations direc-tional-selection models (DIR1, DIR2, and DIR3). Fourth,we simulated genome evolution with negative frequency-dependent selection at one locus and named the simula-tions FDS1, FDS2, and FDS3models. We compared nucleotidediversity and coalescence trees resulting from DIR and FDSmodels with expected patterns such as those from Takahataand Nei (1990) and Neuhauser (1999).

Testing frequency-dependent selection vs. directional se-lection: After validating the simulation program by runningit under each of the neutral (NEU) and selection models (BKG,DIR, and FDS) alone, we simulated B. burgdorferi s.l. genomeevolution under mixed, more realistic models. To test mainte-nance of sympatric clonal groups by frequency-dependent se-lection, we applied negative frequency-dependent selection atone locus and purifying selection at other loci (FDS and BKGmodels, Figure 1A). To test genome divergence driven bydirectional selection, we created two subpopulations and ap-plied directional selection at one locus and purifying selectionat other loci (DIR and BKG models, Figure 1B). The twosubpopulations underwent natural selection and genetic driftseparately (mimicking independent adaptive evolution withindifferent host species) but they were allowed to recombine(simulating co-infection of a single host).

Parameters of the simulation are summarized in Table 2.Simulation programs were written in PERL and are availableupon request. A simulation with a genome size of 2000codons and N = 500 individuals evolving for 5000 genera-tions took �12 hr on a microcomputer with a 3.3-GHz CPU.

Results

Genomic and ortholog alignments

We compared the core, mostly syntenic parts of the B. burg-dorferi s.l. genome, consisting of the main chromosome, thelinear plasmid lp54, and the circular plasmid cp26. Thesethree replicons comprise �65% of the total �1.5 million

954 J. Haven et al.

bases of a B. burgdorferi s.l. genome (Casjens et al. 2000).The main chromosomes and the cp26 and lp54 plasmids arefor the most part syntenic among the genomes, with the ex-ception of two large-scale genome variations including thevariable right end of the main chromosome and the PFam54gene cluster on the lp54 plasmid (Huang et al. 2004; Wywialet al. 2009). We obtained a total of 73,451-bases-long geno-mic alignments of the cp26 plasmids and lp54 plasmids (thePFam54 gene cluster excluded) and 837 alignments of orthol-ogous ORF families with a combined alignment length of989,679 nucleotides (Table 3).

Pervasive localized recombination

Figure 2 shows the recombination rates between pairs ofSNPs based on the genomic alignment of cp26 plasmidsfrom 14 B. burgdorferi s.s. strains. SNPs showing high recom-bination rates (red lines) were all located within 500 bases toeach other. The ospC (encoding outer surface protein C) locusand its surrounding regions showed the highest levels of se-quence polymorphism and localized recombination. Linkagedisequilibrium (LD) grew stronger as the distance from ospCincreased and the region directly opposite to ospC, includinggenes bbb01–bbb14, was the most strongly linked (Figure 2).Nevertheless, localized recombination rates were highthroughout the cp26 plasmid (Figure 2). In contrast, SNPssituated $500 bases from each other had low recombinationrates and high LD (blue lines) (Figure 2). The lp54 plasmid

Figure 1 Simulation algorithms. (A) Frequency-dependent selection. Apopulation of identical genomes consisting of disjoint genes (each with500 codons) was created at the start of simulation. The genomes evolvedunder a uniform mutation rate, a uniform gene-conversion rate, genetic

drift, frequency-dependent selection at one locus (in red), purifying se-lection at another locus (in blue), and neutral sequence variations ata third locus (in green). (B) Adaptive divergence. Two subpopulationsunderwent independent directional selection and genetic drift, but wereallowed to recombine freely at the start of each generation. One locuswas under directional selection (in red), another under purifying selection(in blue), and a third under neutral evolution (in green).

Table 2 Simulation parameters

Symbol Description and settings

N Population size (N ¼ 500), held constant.G Total number of generations (G ¼ 5000)m, U Per-site and per-genome rates of mutations

(m ¼ 1026, U ¼ 0.06), Poisson distributedand uniform across the genome

R Per-genome rate of recombination (R ¼ 0,R ¼ U, R ¼ 5U), Poisson distributed anduniform across the genome

L Gene length (L ¼ 500 codons)B Number of genes (b ¼ 4)d Tract lengths of gene conversion (d ¼ 30 codons),

exponentially distributedspur, sdir, sfds Selection coefficients of purifying (spur ¼ 0.1),

directional (sdir ¼ 0.01), and frequency-dependent(sfds ¼ 0.01) selection

n, f Sample size (n ¼ 30 individuals), taken at everyf ¼ 50 generations

w, W Fitness of a codon and an individual haploid genomep, K Average pairwise nucleotide differences within and

between populations

Frequency-Dependent Selection in Borrelia 955

and the main chromosome showed similarly pervasive local-ized recombination and global clonality, with a recombinationand sequence-diversity hotspot at dbpA (encoding decorin-binding protein A) on lp54 and a high-polymorphism hotspotat lmp1 (encoding surface-located membrane protein 1) onthe main chromosome (Supporting Information, Figure S1and Figure S2).

Phylogeny of B. burgdorferi s.s. genomic groups

Using the concatenated bbb01–bbb14 sequences, we obtainedan intraspecific phylogeny of B. burgdorferi s.s. genomic

groups (Figure 3). This tree improves upon previously pub-lished phylogenies of the B. burgdorferi s.s. genomic groupsby gaining significant statistical support for most of thebranches (Bunikis et al. 2004; Margos et al. 2008; Qiu et al.2008; Barbour and Travinsky 2010). This and other intraspe-cific phylogenies of B. burgdorferi s.l. isolates share the char-acteristics of nearly one-to-one association between thegenomic lineages and the major ospC alleles, with rare excep-tions of the same ospC allele appearing polyphyletically dueto recombination (Qiu et al. 2008; Barbour and Travinsky2010). Using a bootstrap cutoff value of 75%, we identified

Table 3 Genomic and ORF alignments

Genomic alignments Orthologous ORF alignments

Length(bases) No. SNPsa

% SNPdensity

No. orthologfamilies

Total alignmentlength (bases) No. SNPs

% SNPdensity

cp26 26,591 1,267 4.76 26 26,451 826 3.12lp54b 46,860 870 1.86 61 53,383 661 1.24Main chromosome NAc NA NA 750 909,845 12,311 1.35Total 73,451 2,137 2.90 837 989,679 13,798 1.39a Two-state sites only.b PFam54 gene array excluded for lack of synteny.c Not available.

Figure 2 Recombination rates and nucleotide polymor-phisms on cp26. All values were based on a genomicalignment of cp26 plasmids from 14 B. burgdorferi s.s.genomes. Four data tracks are shown, representing (start-ing from the outside) the circular plasmid cp26 (black, B31coordinates), ORFs (orange and yellow, indicating oppositecoding directions), average nucleotide diversity (green,120-base window size and 3-base window step; lowerand upper gray lines indicating 0.15 and 0.3 differencesper site, respectively), and recombination rates betweenpairs of SNPs. A red line links two SNPs with a high re-combination rate and a blue line links two SNPs with a lowrecombination rate relative to the average rate (McVeanet al. 2002).

956 J. Haven et al.

three pairs of strongly supported sister-group genomes, con-sisting of two North American pairs (118a-72a and 156a-297) and one European pair (Bol26-ZS7). These sister groupsare strongly supported by a maximum-likelihood phylogenyinferred using all SNPs on the main chromosome, lp54, andcp26 (E. F. Mongodin, unpublished results).

Rates and tract lengths of gene conversion

The method of distinguishing SNPs caused by mutation andthose caused by recombination based on sister-group com-parisons is illustrated in Figure S3. Sequence comparisonsbetween pairs of sister-group genomes revealed an�3:1 ratioof recombination to point-mutation rates (Table 4). Inclusionof proper and a large number of outgroup genomes as refer-ences is critical. For example, our sample of 14 B. burgdorferis.s. genomes contains only two European strains, ZS7 andBol26. As a result, most of the SNPs segregating betweenZS7 and Bol26 appeared to be mutations and the ratio ofrecombination to mutation was ,1 (Table 4). When we ex-panded the outgroup genomes to include nine Europeanstrains, most of the SNPs were found in these sympatric albeitnon-B. burgdorferi s.s. strains, resulting in a recombination-to-mutation ratio similar to those estimated using the two NorthAmerican sister-genome pairs (Table 4).

The ZS7-Bol26 comparison was also informative on thetract length of gene conversion. Bol26 has a horizontallytransferred ospC allele based on a multilocus sequence phy-logeny (Qiu et al. 2008). It was unknown at the time whetherthe recombination event involved the ospC locus alone, itsneighboring genes, or the entire cp26 plasmid. We firstobtained gene trees using ortholog alignments on the cp26plasmids, which indicated that phylogenetic inconsistenciesinvolving Bol26 were limited to three loci including bbb18(guaA, coding for GMP synthase), bbb19 (ospC), and bbb22(function unknown). Instead of being grouped with its con-specific sister group ZS7, the closest known relative of Bol26,bbb22 from Bol26 is grouped with that from the B. afzeliistrain PKo. A genomic alignment of the bbb17–bbb22 regionshowed nearly identical sequences between Bol26 and PKo ina region spanning the entire bbb22 locus (Figure S4). On thebasis of shifts of phylogenetic relatedness along the plasmidalignment, we were able to identify both breakpoints of thiscross-genospecies gene-conversion event and estimated itstract length to be between 1862 to 1903 bases long (FigureS4). There were at least two additional gene-conversionevents at or near ospC, which introduced into Bol26 an ospCS-type allele from a conspecific donor (Qiu et al. 2008) anda guaA allele from an unknown donor. We therefore deter-mined that at least three separate events of gene conversionhave caused phylogenetically inconsistent guaA, ospC, andbbb22 alleles in Bol26, each originating from a different do-nor cell and affecting approximately a single gene.

Simulation results

We used computer simulations to test which of two forms ofpositive natural selection—adaptive divergence or fre-quency-dependent selection—is more compatible with theobserved patterns of genetic linkage and polymorphisms.Specifically, we tested the ability of a selective mechanismto produce the following characteristics of the B. burgorferis.s. population structure: (1) localized reduction in linkagedisequilibrium within a genome-wide clonal frame, (2)

Figure 3 Plasmid-based phylogeny and ospC diversity. (A) Phylogeny of B.burgdorferi s.s. strains, rooted with the B. finlandensis strain SV1. The treeis based on the concatenated coding sequences including and betweenbbb01 and bbb14 on cp26, a region with strong linkage disequilibrium(Figure 2). Branch support values were obtained by using MrBayes (Ronquistand Huelsenbeck 2003) and DNAML with SEQBOOT (Felsenstein 1989). (B) Aneighbor-joining tree of ospC nucleotide sequences from all 23 genomes.Fourteen B. burgdorferi s.s. strains were highlighted. The other 9 strainsrepresent the following genospecies: B. finlandensis (by SV1), B. bissettii(DN127), B. afzelii (PKo and ACA-1), B. bavariensis (PBi), B. garinii (PBr andFar04), B. spielmanii (A14S), and B. valaisian (VS116).

Frequency-Dependent Selection in Borrelia 957

a high recombination rate and a high level of sequence poly-morphism at the locus under positive natural selection, and(3) a strong association between genomic lineages and largesequence variations at the positively selected locus.

Program validation: Simulations of genome evolution undersingle models produced results close to theoretical expect-ations. First, gene genealogies of the final simulated samplesfit theoretical expectations. For example, the total coalescencetime of the final sample was the shortest under the directionalselection (DIR) model, longer under the background selection(BKG) model, even longer under the neutral evolution (NEU)model, and the longest under the frequency-dependentselection (FDS) model (Figure 4). Second, the equilibriumvalues of nucleotide diversities of simulated populationsreached theoretically predicted levels of p0 = 2Nm underneutral models and p ¼ p0exp f2U=ðsþ RÞg in models ofbackground selection (Figure S5). Simulation under mod-els of directional selection (DIR1–3) resulted in low geno-mic polymorphisms as expected from recurrent selectivesweeps (Figure S6, top row). In the negative frequency-dependent selection (FDS1–3) models, sequence diversityincreased quickly at the target locus (Figure S6, bottomrow). Sequence diversity at linked neutral loci increasedat slower rates and became increasingly independent asrecombination rate increased, also as expected.

Adaptive divergence: In the absence of recombination (DIRand BKG1), the nucleotide divergence (K) between the twosubpopulations at the neutral locus increased at the expectedrate K = 2mt (an additional validation of the simulation pro-gram), while levels of divergence were higher at the posi-tively selected locus and lower at the negatively selectedlocus (Figure 5A). In the presence of recombination, sequencedivergence at all loci failed to increase with time and quicklysettled to levels similar to that of within-subpopulation

polymorphisms (Figure 5, B and C). The linkage analysisamong SNPs identified by mixing the two subpopulationsshowed localized reduction in LD and strong LD amongdistant SNPs (Figure 6A). There was no increase of effec-tive recombination rates or levels of sequence polymor-phism at the positively selected locus relative to theneutral loci (Figure 6A).

Frequency-dependent selection: In the absence of recom-bination, within-population sequence diversity at all lociincreased linearly over time (Figure 5D). With higher re-combination rates, loci evolved more independently andsequence diversity eventually stabilized (or was expected tostabilize) (Figure 5, E and F). Although recombination slowedthe rise of sequence diversity at the locus under positive se-lection, it had the beneficial effect of reducing deleterious

Table 4 Relative rates of recombination to mutation based on sister-group genomes

Sister genomes Outgroup genomes Replicons SNP density (%) Muta Recb Rec/mut

118a-72a American and European cp26c 0.081 7 7 1.0lp54 0.306 16 82 5.1Main chr 0.139 203 564 2.78

156a-297 American and European cp26c 0.233 13 25 1.92lp54 0.323 24 66 2.75Main chr NAd NAd NAd NAd

ZS7-Bol26 No European strains cp26c 0.043 8 0 0lp54 0.145 30 16 0.53Main chr 0.088 544 73 0.134

ZS7-Bol26 With European strains cp26c 0.034 2 4 2.0lp54 0.144 13 38 2.92Main chr 0.090 139 461 3.31

Average (SD) (ZS7-Bol26/no European strains results excluded) 2.72 (1.20)

chr, chromosome.a Number of point mutations between sister genomes. These SNPs are unique.b Number of SNPs between sister genomes due to recombination. These SNPs occur in outgroup genomes as well (i.e., homoplasies). Consecutive SNPs sharing the samephylogenetic pattern were counted as a single gene-conversion event.

c On cp26, the guaA, ospC, and bbb22 genes were excluded because of uncertainties in counting overlapping gene-conversion events.d Data not available because the main chromosome of 297 is not sequenced.

Figure 4 Simulation validation: coalescence trees. Gene genealogies ofthe final samples of 30 individuals after simulated evolution of 5000generations are shown. The scale bar indicates 500 generations.Genomes evolved without recombination and under (A) neutral evolution(NEU), (B) background selection (BKG), (C) directional selection (DIR), and(D) frequency-dependent selection (FDS).

958 J. Haven et al.

polymorphisms at the locus under purifying selection (bluelines in Figure 5, D–F). As in the adaptive-divergence model,a pattern of global high LD and localized LD reduction wasreproduced, with much smaller distances between the highlyrecombining SNPs than between those resulting from theadaptive-divergence model (Figure 6B). Also differing fromthe adaptive-divergence model, the locus under frequency-dependent selection displayed much higher effective recom-bination rates than the neutral locus (Figure 6B). Closelymirroring the pattern of genetic diversity at ospC, with a re-combination rate of R = U the positively selected locusshowed a star-like tree with strong sequence clusteringand a one-to-one association of the major-group alleles withthe genome lineages (Figure 7).

The simulation results could be summarized as follows: (1)Both the adaptive-divergence and the frequency-dependent

selection models produced a similar pattern of genome-wideclonality with localized reduction of LD (Figure 6); (2) thehigh effective recombination rates and levels of sequence poly-morphism at the selected loci (e.g., ospC) are more consistentwith the frequency-dependent selection model than with theadaptive-divergence model (Figures 2 and 6); and (3) thefrequency-dependent selection model produced the observedpatterns of a star-like gene tree at the positively selected locusand a one-to-one association of major-group alleles at thislocus with major genome lineages (compare Figures 3 and 7).

Discussion

Despite a highly clonal population structure, B. burgdorferis.l. recombines more frequently than it mutates and its re-combination takes the form of horizontal gene transfers of

Figure 5 Simulation results: sequence diversity. We simulated evolution of a bacterial genome consisting of four protein-coding genes and plottednucleotide divergence between two subpopulations (K) and polymorphisms within populations (p) every 100 generations. One locus (red lines) was undereither directional selection or frequency-dependency selection. A second locus was under purifying selection (blue lines) and the last two loci (green lines)were under neutral evolution. In simulations of adaptive divergence (A–C, DIR and BKG models), two subpopulations diverged from an ancestral populationand underwent independent directional selection and genetic drift. But the two subpopulations were allowed to recombine at each generation. A singlepopulation was tracked in simulations of frequency-dependent selection (D–F, FDS and BKG models). Three levels of gene conversion (R ¼ 0,U, and 5U)were simulated. Shared parameters included a constant population size of N ¼ 500 individuals, a haploid genome of L ¼ 2000 codons, an averagemutation rate of U ¼ 0.06 nucleotide substitutions per genome per generation, and an average gene-conversion tract length of d ¼ 30 codons.

Frequency-Dependent Selection in Borrelia 959

generally short DNA pieces (Dykhuizen and Baranton 2001;Qiu et al. 2004). By using genome sequences of a largenumber of sympatric genomic groups, the present studyobtained genome-wide, robust estimates on the rate, tract

length, and patterns of recombination in B. burgdorferi s.s.Our characterization of the genome-wide linkage structurein B. burgdorferi s.s. is based on estimating recombinationrates and tract lengths by analyzing linkage between SNPpairs and by comparing sister-group genomes. LDhat hasbeen widely used for estimating recombination rates in bac-teria, using multilocus and genome sequences (Wirth et al.2007; Touchon et al. 2009). We are aware of some potentialbiases in the sister-group method. For instance, identifica-tion of homoplasy is sensitive to the choice and number ofoutgroups. A small number of outgroup genomes leads to anunderestimation of the recombination to mutation ratio be-cause of a decreased chance of identifying donor sequences.Using distantly related genomes as sister-group or outgroupgenomes, on the other hand, may lead to an overestimationof recombination rates due to recurrent mutations. In addi-tion, it is possible that homoplastic SNPs are caused by re-current “hotspot”mutations due to positive natural selectionrather than recombination (Chattopadhyay et al. 2009).However, we found no evidence supporting this possibilityin our samples. For instance, the hotspot-mutation hypoth-esis predicts a higher proportion of nonsynonymous SNPs athomoplastic sites than that at phylogenetically consistentsites. In comparing ORF sequences on the lp54 and cp26plasmids between a pair of sister groups (156a-297), wefound no evidence for accelerated nonsynonymous changesat homoplastic sites (Figure S3B). Alternative methods existfor estimating rates and tract lengths of gene conversion,including GenCo (Gay et al. 2007), ClonalFrame (Didelotand Falush 2007), and ClonalOrigin (Didelot et al. 2010).While these methods can be more powerful with the use ofexplicit models of gene conversion, their computation timeis long and Bayesian estimates failed to converge for ourdata sets. In addition, effects of natural selection were notexplicitly modeled in these coalescence-based simulations.In practice, direct estimates of recombination rates and phy-logeny (using, e.g., LDHat and PHYLIP as in the presentstudy) are comparable to those from model-based analysis(Didelot and Falush 2007; Didelot et al. 2010).

We further tested selective mechanisms responsible forthe observed patterns of linkage and polymorphisms in theB. burgdorferi s.s. genome by using computer simulations. Atechnical novelty of the present study, in comparison withprevious sequence-based forward-evolution simulations(Fraser et al. 2007, 2009), is that with the use of codon-based algorithms, we were able to simulate major forms ofnatural selection operating on a bacterial genome, includ-ing purifying selection, adaptive divergence, and frequency-dependent selection.

Patterns of recombination in B. burgdorferi s.l.

Four conclusions could be drawn on the nature of re-combination in B. burgdorferi s.l. First, localized recombina-tion is pervasive across the B. burgdorferi s.l. genome. Whilea few surface-protein loci appear to be recombination hot-spots, this appearance is perhaps more indicative of natural

Figure 6 Simulation results: recombination rates. See Figure 2 legend fordescriptions of data tracks. (A) Recombination rates calculated on thebasis of a sample of 50 individuals from two subpopulations evolvingunder the adaptive divergence model (DIR and BKG, R = 5U). One locus(DIR) was under directional selection, another locus (PUR) under purifyingselection, and the last two loci (NEU) under neutral evolution. (B) Recom-bination rates calculated on the basis of a sample of 30 individuals froma population evolving under the frequency-dependent selection model(FDS and BKG, R = 5U). One locus (FDS) was under frequency-dependentselection, another locus (PUR) under purifying selection, and the last twoloci (NEU) under neutral evolution. In both simulations, the genome waslinear (not circular as depicted) and loci were sufficiently apart so thata single gene-conversion event affected only one locus.

960 J. Haven et al.

selection than of a propensity for recombination (Vos 2009).Nucleotide polymorphisms are retained at surface-antigenloci like ospC by positive natural selection (Wang et al.1999a; Barbour and Travinsky 2010). More directly, wefound at least three independent gene-conversion eventsat the guaA-ospC-bbb22 loci in a single B. burgdorferi s.s.genome (Figure S4). In other words, bacterial loci underpositive natural selection may have high effective, but notactual, gene-conversion rates (Vos 2009). Second, recombi-nation maintains a majority of sequence polymorphismswithin B. burgdorferi s.l. populations. About three-quartersof sequence diversity in B. burgdorferi s.l. is due to the reas-sortment of preexisting sequence variations through local-ized recombination and only one-quarter is due to de novopoint mutations. This estimate is remarkably similar regard-less of the strains and replicons used for comparison (Table4). It is also nearly identical to our earlier estimates basedon comparisons of isolates belonging to the same genomicgroups (Qiu et al. 2004). Third, there is a considerableamount of cross-species genetic exchange among sympatricgenomes. Although the European isolates ZS7 and Bol26 areconspecific with other B. burgdorferi s.s. strains from NorthAmerica, there is a deficiency of shared SNPs among the twogeographically separated populations (Table 4, “No EUstrains”). In contrast, SNPs segregating between ZS7 andBol26 are mostly shared with those in the non-B. burgdorferis.s. yet sympatric European strains, suggesting a high rate ofcross-species recombination (Table 4, “With EU strains”).Fourth, although LDHat analysis suggests that the tractlengths of gene conversion events in B. burgdorferi s.s. aresmall (,500 bases), the actual tract lengths of individualgene-conversion events may be considerably longer becausemultiple events could have occurred at a single locus (e.g., atospC). In what seems to be a single event of gene conver-sion, a DNA fragment of a minimum length of �1900 bases

from a close relative of the B. afzelii strain PKo was incor-porated into the homologous locus of the B. burgdorferi s.s.strain Bol26 (Figure S4). This observed tract length is closeto the experimentally estimated tract length of between 950and 1850 bases in Helicobacter pylori (Lin et al. 2009). Tosummarize, we conclude that recombination in B. burgdor-feri s.l. is pervasive across its genome, is localized with shorttract lengths (,2000 bp), occurs approximately three timesmore frequently than mutations, and occurs frequentlyamong sympatric genospecies.

Maintenance of genome clustersby frequency-dependent selection

Pervasive and frequent recombination suggests that naturalselection plays a large role in the maintenance of distinctclonal groups coexisting within natural bacterial popula-tions, but the exact forms of natural selection may vary fromspecies to species (Doolittle and Papke 2006; Fraser et al.2009). A widely held view is that sympatric bacterial geno-mic groups represent ecotypes, each one adapting to its mi-crohabitat (Majewski and Cohan 1999; Cohan 2002;Koeppel et al. 2008). In the case of B. burgdorferi s.s., ithas been proposed that genomic groups with distinct ospCalleles differ in their host preferences (Brisson and Dykhui-zen 2004, 2006). The ecotype model has been challenged bycomputer simulations, which showed that neutrally evolvinggenomes would merge into a single cluster when the recom-bination rate is above three times the mutation rate (Fraseret al. 2007). Our simulation went further by showing that inthe absence of a recombination barrier, nascent adaptive orneutral sequence divergence among the genome lineages isephemeral and highly vulnerable to homogenization by pe-riodic selective sweeps. For instance, sequence divergencebetween two independently adapted subpopulations virtu-ally disappeared with a recombination rate as low as the

Figure 7 Simulation results: ge-nome genealogy. Thirty individualswere sampled from a simulatedpopulation (N ¼ 500) after evolv-ing for 5000 generations. Thegenome evolved under the in-fluence of frequency-dependentselection at one locus and puri-fying selection at a secondlocus and with a gene-conversion rate equal to themutation rate (R ¼ U, FDS andBKG). (A) Genealogy of 30individual genomes. (B) Neighbor-joining tree of nucleotidesequences at the positively se-lected locus. The simulation pro-duced the following patternsobserved in B. burgdorferi s.s.(Figure 3): two founder lineagespersisting in the final sample,

genomic sequences forming distinct clusters (labeled “A–H”, especially those at the positively selected locus), and an association of genomic groupswith major sequence variations at the positively selected locus.

Frequency-Dependent Selection in Borrelia 961

mutation rate (Figure 5B). With an estimated recombinationrate three times that of the mutation rate and with a limitednumber of competent host species in a given locality, it isunlikely that partitioning of host species is the primary mech-anism maintaining the stable coexistence of �20 distinctB. burgdorferi s.s. genomic groups in the northeastern UnitedStates. Nor is it likely the maintenance of the high sequencediversity at positively selected loci such as ospC, dbpA, andlmp1. To sustain independent adaptive evolution of bacterialgenomic groups, it is necessary that recombination rates de-cline precipitously with increasing sequence distances (Fraseret al. 2007; Lawrence and Retchless 2010).

In contrast, it is well known that negative frequency-dependent selection is capable of maintaining a high allelicdiversity at surface-antigen loci of bacterial and viral patho-gens as well as at the antigen-receptor (e.g., MHC) loci ofvertebrates (Levin 1988; Takahata and Nei 1990; Wiener1996). The adequacy of negative frequency-dependent selec-tion for maintaining sympatric genetic diversity in an asexualspecies without niche partitioning has recently been demon-strated empirically (Weeks and Hoffmann 2008). Our resultssuggest that in a bacterial population where the recombina-tion rate is of the same magnitude as the mutation rate, thediversifying and balancing effects of FDS at individual lociextend genome-wide. The population-genomic effects ofFDS are perhaps best understood from the shape of genomegenealogy. Mirroring the gene genealogy at the selected locus,the genealogy of genomes evolving under FDS is stronglybalanced and characterized by long internal and short exter-nal branches (Figure 7). Such a genome genealogy has thefollowing consequences. First, as a direct consequence of thisbalanced genome genealogy allele frequencies at all genomicloci tend to be evenly distributed (Figure 4). Indeed, allelefrequency distributions in natural populations of B. burgdorferis.s. are more even than expected from neutral evolution (Qiuet al. 1997, 2002; Rannala et al. 2000). Second, a genomegenealogy with long internal and short terminal branchesresults in a clustering of genomic sequences into distinctgroups, with large sequence differences between the groupsand sequence homogeneity within the groups (Figure 7). It isindeed without exception that isolates within B. burgdorferis.s. populations form a large number of distinct sequenceclusters regardless of the genetic marker used (Wang et al.1999b; Margos et al. 2008; Qiu et al. 2008; Travinsky et al.2010). Third, such genome-sequence clusters are associatedwith major alleles at the locus targeted by FDS (Figure 7). Inthe core, conserved parts of the B. burgdorferi s.s. genomeconsisting of the main chromosome and the lp54 and cp26plasmids, ospC displays the strongest association of majoralleles with genomic lineages (with rare exceptions causedby horizontal transfers of ospC alleles, e.g., in Bol26) (Figure3). This pattern suggests that FDS operating at ospC is a dom-inant selective mechanism maintaining the sympatric genomediversity in B. burgdorferi. Fourth, due to the prolonged per-sistence of founder lineages in a population, the level of se-quence polymorphisms within populations could be as high

as the level of sequence divergence between populations(e.g., between the A and C alleles in Figure 7). In the B.burgdorferi s.l. genome, a number of surface-protein loci dis-play levels of polymorphisms within genospecies close to thelevels of sequence divergence between the genospecies [e.g.,ospC (Figure 3), dbpA (Figure S1), and lmp1 (Figure S2)].

While specific molecular functions of OspC are not yetclear, it is known that ospC is required for host infection andis among the most differentially expressed genes during hostinvasion (Brooks et al. 2003; Grimm et al. 2004; Liang et al.2004; Tilly et al. 2006; Antonara et al. 2010). Consideringthat ospC is a serotype determinant of B. burgdorferi s.l.,negative FDS operating at ospC is highly plausible. Presum-ably, cells with rare amino acid types in OspC would en-counter weaker adaptive immune responses from the hostpopulation than cells with common OspC alleles (Wanget al. 1999b; Barbour and Travinsky 2010).

Implications for B. burgdorferi s.l. speciation andcomparisons with other models of bacterial evolutionThe ecotype model: Implicit in the frequency-dependentselection model is that sympatric B. burgdorferi s.s. genomicgroups in the northeastern United States represent varia-tions within a single, generalist species rather than individ-ually adapted ecotypes. Here we used the word “species” torefer to a shared population pipeline of natural selection andgenetic drift among individual genomes as specified in ourcomputational model (Figure 1A) (Hey 2006). On the basisof the FDS model, we predict that it is unlikely that sympat-ric B. burgdorferi s.s. genomic groups would diverge withoutbound and eventually become host-specialized species.Rather, the overall genetic diversity of individual B. burgdor-feri s.s. populations is expected to reach a steady state as thediversifying effect of FDS being balanced out by the homog-enizing effects of genetic drift, recombination, and purifyingselection (Figures 5, E and F). On the basis of the assumptionthat all B. burgdorferi s.s. genomic groups share a uniformevolutionary process (e.g., by sharing the same transmissioncycle consisting of a single tick species as the vector and thesame set of vertebrate species as hosts), we think it appropri-ate to consider the B. burgdorferi s.s. groups in the northeast-ern United States as constituting a single ecotype or a singleecological species. Evidence supporting B. burgdorferi s.s. asa single, generalist species includes, first, that the host rangeof a number of B. burgdorferi s.s. genomic groups in thenortheastern United States spans three or more mammalianorders (Hanincova et al. 2006). Second, consistent with theFDS model and less compatible with the host-specializationmodel, a number of recently dispersed B. burgdorferi s.s. ge-nomic groups flourish in two separate transmission cycles(Europe and North America), which differ in tick-vector spe-cies and presumably in host-species composition as well (Qiuet al. 2008). Third, the severe human virulence of B. burgdor-feri s.s. is itself evidence for it being a generalist parasite ofmammalian hosts, since despite the fact that humans are notits natural reservoir hosts, B. burgdorferi s.s. is capable of

962 J. Haven et al.

invading and infecting the human tissues. Although B. burgdor-feri s.s. strains appear to vary in human virulence (Dykhuizenet al. 2008; Wormser et al. 2008), it has been argued thatassociations between human pathogenecity and genomicgroups are weak and all groups have the potential to causeinvasive infections in humans (Alghaferi et al. 2005; Jones et al.2006).

It follows from the FDS model that geographic isolationmay be a prerequisite for host specialization and speciationin B. burgdorferi s.l. Biogeographic patterns of B. burgdorferis.l. support this prediction. Globally most B. burgdorferi s.l.genospecies have distinct geographic distributions (Kurtenbachet al. 2006). Exceptional coexistence of multiple genospeciesin a single geographic region (e.g., B. burgdorferi s.s., B. afzelii,and B. garinii in Europe) may be results of secondary contactof previously isolated populations. Although being membersof a single genospecies, the European and North American B.burgdorferi s.s. populations have diverged and share onlya few genomic groups due to recent trans-Atlantic migrations(Margos et al. 2008; Qiu et al. 2008). In the United States, thenortheastern and midwestern B. burgdorferi s.s. populationshave differentiated significantly (Qiu et al. 2008; Hoenet al. 2009; Brisson et al. 2010). Population divergence inB. burgdorferi s.l. as a biogeographic process is harder tounderstand under the niche-partitioning model. The eco-type and niche-partitioning models would predict thatsympatric speciation is common in B. burgdorferi s.l. andthat there is a strong association of host-species composi-tion with the composition of B. burgdorferi s.l. genomicgroups. As far as we know these predictions are eithernot supported by empirical observations or yet to be tested.

Contrary to the proposal by Brisson and Dykhuizen(2004), the FDS models are likely to be more parsimoniousthan ecotype models for explaining the maintenance of bac-terial genome diversity. For instance, while we used a singlemutation rate, recombination rate, and selection coefficientin the present FDS simulations, Majewski and Cohan (1999)used two mutation rates, three recombination rates, and twoselection parameters for simulating the stable coexistence oftwo ecotypes. Nevertheless, our computational analysis is pre-liminary and not a full test of the ecotype model. For instance,we did not test the possibility of each genomic group adaptingto multiple host species with slight fitness differences. In ad-dition, the ecotype model and the frequency-dependent selec-tion model may operate simultaneously in a population.

The “epidemic” model: The “epidemic” model explains themaintenance of sympatric genomic groups with frequent re-combination by the rapid growth of high-fitness cloneswithin populations of pathogenic bacteria (Maynard Smithet al. 1993; Smith et al. 2000; Feil 2004). We reject theepidemic model as a major mechanism for the maintenanceof B. burgdorferi s.s. genomic groups on the basis of thefollowing considerations. First, the epidemic model predictsa rapid turnover of dominant clonal groups rather than a sta-ble coexistence of genomic groups over time and space as

observed in B. burgdorferi s.s. populations. Second, the epi-demic model assumes the existence of a diverse, low-frequencyreservoir of genomic groups, which serves as the source of newgenotypes through recombination among the “backgroundgenomes”. Extensive surveys of B. burgdorferi s.s. populationsin the United States and elsewhere in the past decades find noevidence for a shadow population with unknown genomicgroups in any locality. In contrast, the FDS model has no needto assume a background population that is more diverse thanwhat could be sampled. Third, the epidemic model requiresthat recombination is more frequent among the backgroundgenomes than among the high-fitness clones. In comparison,the FDS model does not require that recombination rates differbetween genomic groups. Nevertheless, clonal expansion couldstill play a role in the maintenance of genomic groups inB. burgdorferi s.l. So far we have tested the FDS model onlyunder a constant population size, which is unrealistic since thesize of natural B. burgdorferi s.l. populations is likely to fluctu-ate. We expect that population growth combined with FDSwould further boost genomic diversities in bacteria, a hypothe-sis that could be tested by further simulations.

The “species-less” model: As an obligate parasite B. burg-dorferi s.l. may not be representative of other, free-livingbacteria. It is unclear whether and to what extent FDS playsa role in the maintenance of sympatric genomic diversity inother bacterial pathogens. However, comparisons of wholegenomes of bacterial strains revealed that, similar to ospC inB. burgdorferi s.l., as few as one to two loci in the wholebacterial genome display highly elevated effective recombina-tion rates and high sequence variability (Vos 2009; Didelotand Maiden 2010). Also similar to ospC, most such loci codefor surface structures directly involved in interactions withthe host, such as the rlr operon coding for a pilus in Strepto-coccus pneumonia (Lefebure and Stanhope 2007), the rfb op-eron coding for the O-antigen and the fim locus coding for anadhesion in Escherichia coli (Touchon et al. 2009), and theompA locus coding for an outer-surface protein in Chlamydiatrachomatis (Gomes et al. 2007). A further and key similarityto ospC is that many of these genes (e.g., rfb in E. coli) alsoappear to be lineage-defining genes associated with genomicgroups in a bacterial pathogen (Lawrence and Retchless2010). These similarities suggest that maintenance of sym-patric genomic diversity by FDS may be a general evolution-ary mechanism operating in free-living bacterial pathogens aswell. Curiously, with a few exceptions (e.g., Touchon et al.2009), most authors regard genomic groups of these bacterialpathogens as adapted to host or predator species and havenot considered diversifying selection such as FDS as a pos-sible contributing factor (Vos 2009; Lawrence and Retchless2010). Such models are thus essentially ecotype models andthereby suffer from the same difficulties in explaining howpositively selected loci are protected from selective sweeps ina population with frequent recombination. Although FDSmay be considered a form of diversifying and balancingselection, it differs from balancing selection caused by

Frequency-Dependent Selection in Borrelia 963

ecological adaptation in not assuming specialized adaptationsof individual alleles. As such, the FDS model is a selective, butnonadaptive model. In fact, FDS is maladaptive by elevatingthe deleterious mutation load at housekeeping loci when therecombination rate is low (blue lines in Figure 5D, E, and F).

To conclude, we showed that the high sympatric genomicdiversity in natural B. burgdorferi s.l. populations (such asthose in the northeastern United States) could sufficientlyand parsimoniously be accounted for by negative frequency-dependent selection targeting a small number of surface-antigen loci and ospC in particular. Since recombination inB. burgdorferi s.l. is localized and not overly frequent, thediversifying and balancing effects of FDS extend genome-wideand the clustering of genomic sequences ensues. Meanwhile,recombination is pervasive and effective enough to reduce theload of deleterious mutations and to prevent neutral andadaptive divergence (e.g., host specialization) among the sym-patric genomic groups. FDS may be a common selective mech-anism maintaining sympatric genomic diversity in free-livingbacterial pathogens as well, with the implication that, as ineukaryotes, speciation in bacteria is as much a biogeographicprocess as an ecological or adaptive process (Papke and Ward2004). Although computationally slow and inefficient relativeto coalescence simulations, our codon-based forward-simulationapproach proves to be effective for testing hypotheses on bac-terial genome evolution owing to more straightforward ways ofsimulating recombination and diverse forms of natural selectionoperating in molecular sequences.

Acknowledgments

We thank two anonymous reviewers for their meticulousand constructive critiques of the manuscript. We thank LiaDi for preparing figures. Funding for this study includesgrants GM083722 (to W.Q.), GM60665 (to J.H.), AI37256(to B.J.L.), AI49003 (to S.R.C.), AI30071 (to C.M.F. andS.E.S.), and RR03037 (to Hunter College) from the NationalInstitutes of Health, the Tami Fund, and the Lyme DiseaseAssociation. Additionally, W.Q. is grateful to a SabbaticalScholarship awarded by the National Evolutionary SynthesisCenter (National Science Foundation grant EF-0905606).

Literature Cited

Alghaferi, M. Y., J. M. Anderson, J. Park, P. G. Auwaerter, J. N. Aucottet al., 2005 Borrelia burgdorferi ospC heterogeneity among hu-man and murine isolates from a defined region of northern Mary-land and southern Pennsylvania: lack of correlation with invasiveand noninvasive genotypes. J. Clin. Microbiol. 43: 1879–1884.

Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang et al.,1997 Gapped BLAST and PSI-BLAST: a new generation of pro-tein database search programs. Nucleic Acids Res. 25: 3389–3402.

Antonara, S., L. Ristow, J. McCarthy, and J. Coburn, 2010 Effect ofBorrelia burgdorferi OspC at the site of inoculation in mouseskin. Infect. Immun. 78: 4723–4733.

Barbour, A. G., and B. Travinsky, 2010 Evolution and distributionof the ospC gene, a transferable serotype determinant of Borreliaburgdorferi. mBio 1: e00153-10.

Brisson, D., and D. E. Dykhuizen, 2004 ospC diversity in Borreliaburgdorferi: different hosts are different niches. Genetics 168:713–722.

Brisson, D., and D. E. Dykhuizen, 2006 A modest model explainsthe distribution and abundance of Borrelia burgdorferi strains.Am. J. Trop. Med. Hyg. 74: 615–622.

Brisson, D., M. F. Vandermause, J. K. Meece, K. D. Reed, and D. E.Dykhuizen, 2010 Evolution of northeastern and midwesternBorrelia burgdorferi, United States. Emerg. Infect. Dis. 16:911–917.

Brooks, C. S., P. S. Hefty, S. E. Jolliff, and D. R. Akins, 2003 Globalanalysis of Borrelia burgdorferi genes regulated by mammalianhost-specific signals. Infect. Immun. 71: 3371–3383.

Bunikis, J., U. Garpmo, J. Tsao, J. Berglund, D. Fish et al.,2004 Sequence typing reveals extensive strain diversity ofthe Lyme borreliosis agents Borrelia burgdorferi in North Amer-ica and Borrelia afzelii in Europe. Microbiology 150: 1741–1755.

Casjens, S., N. Palmer, R. van Vugt, W. M. Huang, B. Stevensonet al., 2000 A bacterial genome in flux: the twelve linear andnine circular extrachromosomal DNAs in an infectious isolate ofthe Lyme disease spirochete Borrelia burgdorferi. Mol. Microbiol.35: 490–516.

Casjens, S. R., C. M. Fraser-Liggett, E. F. Mongodin, W.-G. Qiu, J. J.Dunn et al., 2011a Whole genome sequence of an unusualBorrelia burgdorferi sensu lato isolate. J. Bacteriol. 193: 1489–1490.

Casjens, S. R., E. F. Mongodin, W.-G. Qiu, C. M. Fraser-Liggett, J. J.Dunn et al., 2011b Whole genome sequence of two Borreliaafzelii and two Borrelia garinii Lyme disease agent isolates. J.Bacteriol. (in press).

Chattopadhyay, S., S. J. Weissman, V. N. Minin, T. A. Russo, D. E.Dykhuizen et al., 2009 High frequency of hotspot mutations incore genes of Escherichia coli due to short-term positive selec-tion. Proc. Natl. Acad. Sci. USA 106: 12412–12417.

Cohan, F. M., 2002 What are bacterial species? Annu. Rev. Micro-biol. 56: 457–487.

Delcher, A. L., D. Harmon, S. Kasif, O. White, and S. L. Salzberg,1999 Improved microbial gene identification with GLIMMER.Nucleic Acids Res. 27: 4636–4641.

Delcher, A. L., A. Phillippy, J. Carlton, and S. L. Salzberg,2002 Fast algorithms for large-scale genome alignment andcomparison. Nucleic Acids Res. 30: 2478–2483.

Didelot, X., and D. Falush, 2007 Inference of bacterial microevo-lution using multilocus sequence data. Genetics 175: 1251–1266.

Didelot, X., and M. C. J. Maiden, 2010 Impact of recombinationon bacterial evolution. Trends Microbiol. 18: 315–322.

Didelot, X., D. Lawson, A. Darling, and D. Falush, 2010 Inferenceof homologous recombination in bacteria using whole-genomesequences. Genetics 186: 1435–1449.

Doolittle, W. F., and R. T. Papke, 2006 Genomics and the bacte-rial species problem. Genome Biol. 7: 116.

Dykhuizen, D. E., and G. Baranton, 2001 The implications ofa low rate of horizontal transfer in Borrelia. Trends Microbiol.9: 344–350.

Dykhuizen, D. E., D. Brisson, S. Sandigursky, G. P. Wormser, J.Nowakowski et al., 2008 The propensity of different Borreliaburgdorferi sensu stricto genotypes to cause disseminated infec-tions in humans. Am. J. Trop. Med. Hyg. 78: 806–810.

Enright, A. J., S. Van Dongen, and C. A. Ouzounis, 2002 An effi-cient algorithm for large-scale detection of protein families.Nucleic Acids Res. 30: 1575–1584.

Feil, E. J., 2004 Small change: keeping pace with microevolution.Nat. Rev. Microbiol. 2: 483–495.

Feil, E. J., and B. G. Spratt, 2001 Recombination and the popula-tion structures of bacterial pathogens. Annu. Rev. Microbiol. 55:561–590.

964 J. Haven et al.

Felsenstein, J., 1989 PHYLIP—Phylogeny Inference Package. Cla-distics 5: 164–166.

Fraser, C., W. P. Hanage, and B. G. Spratt, 2007 Recombinationand the nature of bacterial speciation. Science 315: 476–480.

Fraser, C., E. J. Alm, M. F. Polz, B. G. Spratt, and W. P. Hanage,2009 The bacterial species challenge: making sense of geneticand ecological diversity. Science 323: 741–746.

Fraser, C. M., S. Casjens, W. M. Huang, G. G. Sutton, R. Claytonet al., 1997 Genomic sequence of a Lyme disease spirochaete,Borrelia burgdorferi. Nature 390: 580–586.

Gay, J., S. Myers, and G. McVean, 2007 Estimating meiotic geneconversion rates from population genetic data. Genetics 177: 881.

Glockner, G., R. Lehmann, A. Romualdi, S. Pradella, U. Schulte-Spechtel et al., 2004 Comparative analysis of the Borrelia gar-inii genome. Nucleic Acids Res. 32: 6038–6046.

Glockner, G., U. Schulte-Spechtel, M. Schilhabel, M. Felder, J. Suhnelet al., 2006 Comparative genome analysis: selection pressure onthe Borrelia vls cassettes is essential for infectivity. BMC Genomics7: 211.

Gomes, J. P., W. J. Bruno, A. Nunes, N. Santos, C. Florindo et al.,2007 Evolution of Chlamydia trachomatis diversity occurs bywidespread interstrain recombination involving hotspots. Ge-nome Res. 17: 50–60.

Grimm, D., K. Tilly, R. Byram, P. E. Stewart, J. G. Krum et al.,2004 Outer-surface protein C of the Lyme disease spirochete:a protein induced in ticks for infection of mammals. Proc. Natl.Acad. Sci. USA 101: 3142–3147.

Guttman, D. S., and D. E. Dykhuizen, 1994 Clonal divergence inEscherichia coli as a result of recombination, not mutation. Sci-ence 266: 1380–1383.

Hanincova, K., K. Kurtenbach, M. Diuk-Wasser, B. Brei, and D. Fish,2006 Epidemic spread of Lyme borreliosis, northeasternUnited States. Emerg. Infect. Dis. 12: 604–611.

Hey, J., 2006 On the failure of modern species concepts. TrendsEcol. Evol. (Amst.) 21: 447–450.

Hoen, A. G., G. Margos, S. J. Bent, M. A. Diuk-Wasser, A. Barbouret al., 2009 Phylogeography of Borrelia burgdorferi in the east-ern United States reflects multiple independent Lyme diseaseemergence events. Proc. Natl. Acad. Sci. USA 106: 15013–15018.

Huang, W. M., M. Robertson, J. Aron, and S. Casjens,2004 Telomere exchange between linear replicons of Borreliaburgdorferi. J. Bacteriol. 186: 4134–4141.

Hudson, R. R., and N. L. Kaplan, 1985 Statistical properties of thenumber of recombination events in the history of a sample ofDNA sequences. Genetics 111: 147–164.

Hudson, R. R., and N. L. Kaplan, 1995 Deleterious backgroundselection with recombination. Genetics 141: 1605–1617.

Huelsenbeck, J. P., and F. Ronquist, 2001 MRBAYES: Bayesianinference of phylogenetic trees. Bioinformatics 17: 754–755.

Jones, K. L., L. J. Glickstein, N. Damle, V. K. Sikand, G. McHughet al., 2006 Borrelia burgdorferi genetic markers and dissemi-nated disease in patients with early Lyme disease. J. Clin. Micro-biol. 44: 4407–4413.

Koeppel, A., E. B. Perry, J. Sikorski, D. Krizanc, A. Warner et al.,2008 Identifying the fundamental units of bacterial diversity:a paradigm shift to incorporate ecology into bacterial systemat-ics. Proc. Natl. Acad. Sci. USA 105: 2504–2509.

Krzywinski, M., J. Schein, I. Birol, J. Connors, R. Gascoyne et al.,2009 Circos: an information aesthetic for comparative ge-nomics. Genome Res. 19: 1639–1645.

Kurtenbach, K., K. Hanincová, J. Tsao, G. Margos, D. Fish et al.,2006 Fundamental processes in the evolutionary ecology ofLyme borreliosis. Nat. Rev. Microbiol. 4: 660–669.

Larkin, M. A., G. Blackshields, N. P. Brown, R. Chenna, P. A. McGettiganet al., 2007 Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948.

Lawrence, J. G., and A. C. Retchless, 2010 The myth of bacterialspecies and speciation. Biol. Philos. 25: 569–588.

Lefebure, T., and M. J. Stanhope, 2007 Evolution of the core andpan-genome of Streptococcus: positive selection, recombination,and genome composition. Genome Biol. 8: R71.

Levin, B. R., 1988 Frequency-dependent selection in bacterial pop-ulations. Philos. Trans. R. Soc. Lond. B Biol. Sci. 319: 459–472.

Li, K. B., 2003 ClustalW-MPI: ClustalW analysis using distributedand parallel computing. Bioinformatics 19: 1585–1586.

Liang, F. T., J. Yan, M. L. Mbow, S. L. Sviat, R. D. Gilmore et al.,2004 Borrelia burgdorferi changes its surface antigenic expres-sion in response to host immune responses. Infect. Immun. 72:5759–5767.

Lin, E. A., X. S. Zhang, S. M. Levine, S. R. Gill, D. Falush et al.,2009 Natural transformation of Helicobacter pylori involvesthe integration of short DNA fragments interrupted by gaps ofvariable size. PLoS Pathog. 5: e1000337.

Majewski, J., and F. M. Cohan, 1999 Adapt globally, act locally:the effect of selective sweeps on bacterial sequence diversity.Genetics 152: 1459–1474.

Margos, G., A. G. Gatewood, D. M. Aanensen, K. Hanincová, D.Terekhova et al., 2008 MLST of housekeeping genes capturesgeographic population structure and suggests a European originof Borrelia burgdorferi. Proc. Natl. Acad. Sci. USA 105: 8730–8735.

Maynard Smith, J., N. Smith, M. O’Rourke, and B. Spratt,1993 How clonal are bacteria? Proc. Natl. Acad. Sci. USA90: 4384–4388.

McVean, G., P. Awadalla, and P. Fearnhead, 2002 A coalescent-based method for detecting and estimating recombination fromgene sequences. Genetics 160: 1231–1241.

Myers, E. W., G. G. Sutton, A. L. Delcher, I. M. Dew, D. P. Fasuloet al., 2000 A whole-genome assembly of Drosophila. Science287: 2196–2204.

Nakamura, Y., T. Gojobori, and T. Ikemura, 2000 Codon usagetabulated from international DNA sequence databases: statusfor the year 2000. Nucleic Acids Res. 28: 292.

Nelson, K. E., D. E. Fouts, E. F. Mongodin, J. Ravel, R. T. DeBoyet al., 2004 Whole genome comparisons of serotype 4b and1/2a strains of the food-borne pathogen Listeria monocytogenesreveal new insights into the core genome components of thisspecies. Nucleic Acids Res. 32: 2386–2395.

Neuhauser, C., 1999 The ancestral graph and gene genealogy un-der frequency-dependent selection. Theor. Popul. Biol. 56: 203–214.

Ochman, H., and R. K. Selander, 1984 Evidence for clonal popu-lation structure in Escherichia coli. Proc. Natl. Acad. Sci. USA 81:198–201.

Papke, R. T., and D. M. Ward, 2004 The importance of physicalisolation to microbial diversification. FEMS Microbiol. Ecol. 48:293–303.

Paradis, E., J. Claude, and K. Strimmer, 2004 APE: Analyses ofPhylogenetics and Evolution in R language. Bioinformatics 20:289–290.

Qiu, W. G., E. M. Bosler, J. R. Campbell, G. D. Ugine, I. N. Wanget al., 1997 A population genetic study of Borrelia burgdorferisensu stricto from eastern Long Island, New York, suggestedfrequency-dependent selection, gene flow and host adaptation.Hereditas 127: 203–216.

Qiu, W.-G., D. E. Dykhuizen, M. S. Acosta, and B. J. Luft,2002 Geographic uniformity of the Lyme disease spirochete(Borrelia burgdorferi) and its shared history with tick vector(Ixodes scapularis) in the northeastern United States. Genetics160: 833–849.

Qiu, W.-G., S. E. Schutzer, J. F. Bruno, O. Attie, Y. Xu et al.,2004 Genetic exchange and plasmid transfers in Borrelia burg-dorferi sensu stricto revealed by three-way genome comparisons

Frequency-Dependent Selection in Borrelia 965

and multilocus sequence typing. Proc. Natl. Acad. Sci. USA 101:14150–14155.

Qiu, W.-G., J. F. Bruno, W. D. McCaig, Y. Xu, I. Livey et al.,2008 Wide distribution of a high-virulence Borrelia burgdorfericlone in Europe and North America. Emerg. Infect. Dis. 14:1097–1104.

Rannala, B., W. G. Qiu, and D. E. Dykhuizen, 2000 Methods forestimating gene frequencies and detecting selection in bacterialpopulations. Genetics 155: 499–508.

Retchless, A. C., and J. G. Lawrence, 2010 Phylogenetic incongru-ence arising from fragmented speciation in enteric bacteria.Proc. Natl. Acad. Sci. USA 107: 11453–11458.

Ronquist, F., and J. P. Huelsenbeck, 2003 MrBayes 3: Bayesianphylogenetic inference under mixed models. Bioinformatics 19:1572–1574.

Schutzer, S. E., C. M. Fraser-Liggett, S. R. Casjens, W. G. Qiu, J. J.Dunn et al., 2011 Whole-genome sequences of thirteen iso-lates of Borrelia burgdorferi. J. Bacteriol. 193: 1018–1020.

Smith, J. M., E. J. Feil, and N. H. Smith, 2000 Population struc-ture and evolutionary dynamics of pathogenic bacteria. BioEs-says 22: 1115–1122.

Stajich, J. E., D. Block, K. Boulez, S. E. Brenner, S. A. Chervitz et al.,2002 The Bioperl toolkit: Perl modules for the life sciences.Genome Res. 12: 1611–1618.

Sutton, G. G., O. White, M. D. Adams, and A. R. Kerlavage,1995 TIGR Assembler: a new tool for assembling large shot-gun sequencing projects. Genome Sci. Technol. 1: 9–19.

Takahata, N., and M. Nei, 1990 Allelic genealogy under overdom-inant and frequency-dependent selection and polymorphism ofmajor histocompatibility complex loci. Genetics 124: 967–978.

Tilly, K., J. G. Krum, A. Bestor, M. W. Jewett, D. Grimm et al.,2006 Borrelia burgdorferi OspC protein required exclusivelyin a crucial early stage of mammalian infection. Infect. Immun.74: 3554–3564.

Touchon, M., C. Hoede, O. Tenaillon, V. Barbe, S. Baeriswyl et al.,2009 Organised genome dynamics in the Escherichia coli speciesresults in highly diverse adaptive paths. PLoS Genet. 5: e1000344.

Travinsky, B., J. Bunikis, and A. G. Barbour, 2010 Geographicdifferences in genetic locus linkages for Borrelia burgdorferi.Emerg. Infect. Dis. 16: 1147–1150.

Vos, M., 2009 Why do bacteria engage in homologous recombi-nation? Trends Microbiol. 17: 226–232.

Wang, G., A. P. van Dam, and J. Dankert, 1999a Evidence forfrequent OspC gene transfer between Borrelia valaisiana sp.nov. and other Lyme disease spirochetes. FEMS Microbiol. Lett.177: 289–296.

Wang, I. N., D. E. Dykhuizen, W. Qiu, J. J. Dunn, E. M. Bosler et al.,1999b Genetic diversity of ospC in a local population ofBorrelia burgdorferi sensu stricto. Genetics 151: 15–30.

Weeks, A. R., and A. A. Hoffmann, 2008 Frequency-dependentselection maintains clonal diversity in an asexual organism.Proc. Natl. Acad. Sci. USA 105: 17872–17877.

Wiener, P., 1996 Inferring frequency dependent selection fromthe molecular evolution of a rapidly evolving virus: a theoreticalinvestigation. Proc. Biol. Sci. 263: 1283–1289.

Wirth, T., G. Morelli, B. Kusecek, A. van Belkum, C. van der Scheeet al., 2007 The rise and spread of a new pathogen: seroresist-ant Moraxella catarrhalis. Genome Res. 17: 1647–1656.

Wormser, G. P., D. Brisson, D. Liveris, K. Hanincova, S. Sandigurskyet al., 2008 Borrelia burgdorferi genotype predicts the capacityfor hematogenous dissemination during early Lyme disease.J. Infect. Dis. 198: 1358–1364.

Wywial, E., J. Haven, S. R. Casjens, Y. A. Hernandez, S. Singh et al.,2009 Fast, adaptive evolution at a bacterial host-resistance locus:the PFam54 gene array in Borrelia burgdorferi. Gene 445: 26–37.

Communicating editor: J. Lawrence

966 J. Haven et al.

GENETICSSupporting Information

http://www.genetics.org/content/suppl/2011/09/02/genetics.111.130773.DC1

Pervasive Recombination and Sympatric GenomeDiversification Driven by Frequency-Dependent

Selection in Borrelia burgdorferi, the LymeDisease Bacterium

James Haven, Levy C. Vargas, Emmanuel F. Mongodin, Vincent Xue, Yozen Hernandez,Pedro Pagan, Claire M. Fraser-Liggett, Steven E. Schutzer, Benjamin J. Luft,

Sherwood R. Casjens, and Wei-Gang Qiu

Copyright © 2011 by the Genetics Society of AmericaDOI: 10.1534/genetics.111.130773

(A) Codon Codon Codon Codon Codon Codon Codon

Strains 60 67 185 188 219 268 275 156a TTA (L) AAC (N) GAA (E) GTC (V) AGT GCA (A) AGT (S) 297 ..T (F) .G. (S) A.. (K) .A. (E) ... A.. (T) ..C (S) JD1 ... ... ... ... .A. ... ..C 94a ..T ... ... ... .A. ... ..C 118a ..T ... ... ... ... (S) A.. (T) ..C 72a ..T ... ... ... .A. (N) ... (A) ..C B31 ... ... ... ... .A. ... ..C 64b ... ... ... ... ... A.. ..C 29805 ... ... ... .A. .A. ... ..C BOL26 ..T ... ... ... .A. ... ... ZS7 ..T ... ... ... .A. ... ... CA-11.2A ... ... ... ... .A. ... ..C WI91-23 ... ... ... .A. .A. ... ..C N40 ..T ... ... ... .A. ... ..C Rec Mut Mut Rec Rec 2 Recs Rec Nonsyn Nonsyn Nonsyn Nonsyn Nonsyn Nonsyn Syn (B) Syn Nonsyn Mutation 23 17 Recombination 77 52

μ

0 1000 2000 3000 4000 5000

0.000

0.005

0.010

0.015

0.020

NEU, R=0

Generation

0

0.5

1

Under neutral evolutionNeutral ExpectationFitness

0 1000 2000 3000 4000 5000

0.000

0.005

0.010

0.015

0.020

NEU, R=U

Generation

0

0.5

1

0 1000 2000 3000 4000 5000

0.000

0.005

0.010

0.015

0.020

NEU, R=5U

Generation

0

0.5

1

0 1000 2000 3000 4000 5000

0.000

0.005

0.010

0.015

0.020

BKG, R=0

Generation

Under purifying selectionBKG Expectation

0

0.5

1

0 1000 2000 3000 4000 5000

0.000

0.005

0.010

0.015

0.020

BKG, R=U

Generation

0

0.5

1

0 1000 2000 3000 4000 5000

0.000

0.005

0.010

0.015

0.020

BKG, R=5U

Generation

0

0.5

1

A (R=0) B (R=U) C (R=5U)

D (R=0) E (R=U) F (R=5U)

Neutral Evolution

Neutral & Purifying Selection

.

0 1000 2000 3000 4000 5000

0.000

0.005

0.010

0.015

0.020

DIR, R=0

Generation

Under directional selectionUnder neutral evolutionNeutral Expectation

0 1000 2000 3000 4000 5000

0.000

0.005

0.010

0.015

0.020

DIR, R=U

Generation

0 1000 2000 3000 4000 5000

0.000

0.005

0.010

0.015

0.020

DIR, R=5U

Generation

0 1000 2000 3000 4000 5000

0.00

0.05

0.10

0.15

0.20

FDS, R=0

Generation

Under freq dep selectionUnder neutral evolutionNeutral Expectation

0 1000 2000 3000 4000 5000

0.00

0.05

0.10

0.15

0.20

FDS, R=U

Generation

0 1000 2000 3000 4000 5000

0.00

0.05

0.10

0.15

0.20

FDS, R=5U

Generation

Neutral & Directional Selection

Neutral & Frequency-Dependent Selection

A (R=0) B (R=U) C (R=5U)

D (R=0) E (R=U) F (R=5U)


Recommended