1
Genome Sequences of Three Agrobacterium Biovars Help Elucidate the Evolution of 1
Multi-Chromosome Genomes in Bacteria† 2
3
Running Title: Agrobacterium Biovar II and III genome sequences 4
5
Steven C. Slater1, Barry S. Goldman
2, Brad Goodner
3, João C. Setubal
4,5*, Stephen K. 6
Farrand6; Eugene W. Nester
7, Thomas J. Burr
8, Lois Banta
9, Allan W. Dickerman
5, Ian 7
Paulsen10
, Leon Otten11
, Garret Suen12‡
, Roy Welch12
, Nalvo F. Almeida5,13
, Frank 8
Arnold3, Oliver T. Burton
9, Zijin Du
2, Adam Ewing
3, Eric Godsy
2, Sara Heisel
2, Kathryn 9
L. Houmiel14,15
, Jinal Jhaveri5§
, Jing Lu2, Nancy M. Miller
2, Stacie Norton
2, Qiang 10
Chen14
, Waranyoo Phoolcharoen14
, Victoria Ohlin3, Dan Ondrusek
3, Nicole Pride
3, 11
Shawn L. Stricklin2, Jian Sun
5¶, Cathy Wheeler
3||, Lindsey Wilson
3, Huijun Zhu
2 and 12
Derek W. Wood7,15
13
14
1Great Lakes Bioenergy Research Center, 1550 Linden Dr., University of Wisconsin, 15
Madison, WI 53706; 2Monsanto Company, 800 North Lindbergh Boulevard, St. Louis, 16
MO 63167; 3Department of Biology, Hiram College, Hiram, OH 44234;
4Department of 17
Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 18
24060; 5Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State 19
University, Blacksburg, VA 24060; 6Department of Microbiology, University of Illinois 20
at Urbana-Champaign, Urbana, IL 61801; 7Department of Microbiology, University of 21
Washington, Seattle WA 98195; 8Department of Plant Pathology, Cornell University, 22
NYSAES, Geneva, NY 14456; 9Department of Biology, Williams College, 23
Copyright © 2009, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved.J. Bacteriol. doi:10.1128/JB.01779-08 JB Accepts, published online ahead of print on 27 February 2009
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
2
Williamstown, MA 01267; 10
Department of Chemistry and Biomolecular Sciences, 24
Macquarie University, North Ryde, Australia NSW2109; 11
Institute of Plant Molecular 25
Biology, Strasbourg, 67084 France; 12
Department of Biology, Syracuse University, 26
Syracuse, NY 13244; 13
Department of Computing and Statistics, Federal University of 27
Mato Grosso do Sul, Campo Grande, Brazil; 14
The Biodesign Institute, Arizona State 28
University, 1001 S. McAllister Ave., Tempe, AZ, 85287; 15
Department of Biology, 29
Seattle Pacific University, Seattle, WA 98119 30
31
Current addresses: ‡Great Lakes Bioenergy Research Center, University of Wisconsin-32
Madison, Madison, WI USA 53706-1521; §Weather Bill, San Francisco, CA 94108;
¶La 33
Jolla Institute for Allergy & Immunology, La Jolla, CA 92037; ||Cathy Wheeler, 34
Department of Biology, John Carroll University, Cleveland, OH 44118 35
36
*Corresponding Author. Mailing Address: Virginia Bioinformatics Institute, 37
Washington St., MC 0477, Blacksburg, VA 24060. Phone: (540) 231-9464. Fax: (540) 38
231-2606. E-mail: [email protected]. 39
40
†Supplemental material can be found at http://www.agrobacterium.org and 41
http://agro.vbi.vt.edu/public42
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
3
ABSTRACT 43
44
The family Rhizobiaceae contains plant-associated bacteria with critical roles in 45
ecology and agriculture. Within this family, many Rhizobium and Sinorhizobium strains 46
are nitrogen-fixing plant mutualists, while many strains designated as Agrobacterium are 47
plant pathogens. These contrasting lifestyles are primarily dependent on the transmissible 48
plasmids each strain harbors. Members of Rhizobiaceae also have diverse genome 49
architectures that include single chromosomes, multiple chromosomes, and plasmids of 50
various sizes. Agrobacterium strains have been divided into three Biovars, based on 51
physiological and biochemical properties. The genome of a Biovar I strain, A. 52
tumefaciens C58, has been previously sequenced. In this study the genomes of the 53
Biovar II strain A. radiobacter K84, a commercially available biological control strain 54
that inhibits certain pathogenic agrobacteria, and the Biovar III strain A. vitis S4, a 55
narrow host range strain that infects grapes and invokes a hypersensitive response on 56
non-host plants, were fully sequenced and annotated. Comparison with other sequenced 57
members of the α-proteobacteria provides new data on evolution of multi-partite bacterial 58
genomes. Primary chromosomes show extensive conservation of both gene content and 59
order. In contrast, secondary chromosomes share smaller percentages of genes, and 60
conserved gene order is restricted to short blocks. We propose that secondary 61
chromosomes originated from an ancestral plasmid to which genes have been transferred 62
from a progenitor primary chromosome. Similar patterns are observed in select β- and γ-63
proteobacteria species. Together these results define the evolution of chromosome 64
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
4
architecture and gene content among the Rhizobiaceae and support a generalized 65
mechanism for second chromosome formation among bacteria. 66
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
5
INTRODUCTION 67
68
The family Rhizobiaceae (order Rhizobiales) of the α-proteobacteria includes the 69
plant pathogens Agrobacterium and the nitrogen-fixing plant mutualists Rhizobium and 70
Sinorhizobium. Members house single and multiple chromosome arrangements, linear 71
replicons, and plasmids of various sizes. Genes of pathogenicity, mutualism, and other 72
symbiotic properties are primarily encoded on large transmissible plasmids. Given the 73
promiscuous nature of these elements, different genomic lineages within the 74
Rhizobiaceae exhibit a variety of symbiotic phenotypes that range from pathogenesis to 75
nitrogen-fixing mutualism. 76
Agrobacterium taxonomy and phylogeny display a marked disparity. Empirically, 77
Agrobacterium is grouped into five species based on the disease phenotype associated 78
with the resident disease-inducing plasmid: A. tumefaciens causes crown gall on 79
dicotyledonous plants including stone fruit and nut trees, A. rubi causes crown gall on 80
raspberry, A. vitis causes gall formation that is limited to grape, A. rhizogenes causes 81
hairy root disease, and A. radiobacter is avirulent. An alternative classification scheme 82
groups Agrobacterium into three biovars based on physiological and biochemical 83
properties without consideration of disease phenotype. Whole genome and molecular 84
marker comparisons indicate that Agrobacterium strains are derived from multiple 85
chromosomal lineages (see below; (19, 26, 51, 52)). The species and biovar 86
classification schemes do not coincide well, in large part because the disease-inducing 87
plasmids are readily transmissible. The history of Agrobacterium classification was 88
recently reviewed by Young (52). 89
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
6
Representative genomes from all three Agrobacterium biovars are now available. 90
The genome sequence of the biovar I strain A. tumefaciens C58 (C58) has one circular 91
and one linear chromosome (19, 51). The genome sequences for representatives of the 92
two remaining biovars are presented here. Agrobacterium radiobacter K84 (K84), an 93
avirulent biovar II strain, is a widely used biological control agent for preventing crown 94
gall disease in the field (25, 35). A. vitis S4 (S4), a virulent biovar III strain, is 95
phenotypically distinct from strains of A. tumefaciens in two significant ways. First, 96
whereas A. tumefaciens infects many host species, A. vitis causes crown gall only on 97
grapevines (2, 4). Second, A. vitis induces necrosis on grapevine roots and a 98
hypersensitive response on non-host plants (3, 22). 99
This study examines the evolution of genome architecture among Agrobacterium, 100
selected sequenced members of the Rhizobiales, and additional bacteria that harbor 101
multiple chromosomes. The biovar I genome of C58 harbors a linear chromosome II 102
derived from a plasmid to which large blocks of DNA, including rRNA operons and 103
other essential genes, have transferred from Chromosome I (19, 51). While the 104
sequencing of S4 and K84 was motivated by the need to have full genomic sequences for 105
at least one biovar II representative and at least one biovar III representative, we have 106
found that their genomes, as well as those of C58 and other Rhizobiales species, enabled 107
us to infer a general model for bacterial genome evolution. Crucial for this inference is 108
the complex (for bacteria) replicon architecture of all three Agrobacterium genomes. Data 109
provided here, and additional evidence (41, 49), support our model as a generalized 110
mechanism of genome evolution among bacteria that harbor multiple chromosomes. 111
112
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
7
MATERIALS AND METHODS 113
114
DNA sequencing and assembly. Two DNA libraries (insert sizes 2-4 kb and 4-8 115
kb) were generated for each Agrobacterium genome by mechanical shearing of DNA and 116
cloning into pUC18, followed by a shotgun sequencing approach. The reads (~87,000 for 117
K84 and ~82,000 for S4) were assembled and edited using Phred, Phrap, and Consed (13, 118
14, 20). Gaps were closed by sequencing specific products. All rRNA operons were 119
amplified with specific flanking primers, sequenced and assembled individually. All 120
nucleotides with Phred scores less than 40 were re-sequenced using an independent PCR 121
fragment as template. The error rate is estimated to be less than 1:10,000. 122
Comparative genomics analyses. Ortholog families were obtained with 123
orthoMCL (32). Ortholog alignments were obtained with custom Perl scripts. Circular 124
representations of these alignments were obtained with the tool genomeViz (17). 125
Analysis of potential intragenome transfers (Tables S6-S24) involved the Multi-Genome 126
Homology Comparison (38) and Phylogenetic Profiler (33) Web-based tools. Completed 127
bacterial genomes listed in NCBI as having more than one chromosome were initially 128
examined and only those cases where the additional chromosome(s) carried a substantial 129
number of essential genes were maintained. Within this subset, three cases in which two 130
or more closely related genera appear to share a common origin of additional 131
chromosomes were analyzed in greater detail. If intragenome transfer is a robust 132
explanation for the origin of additional chromosomes, then the “transferred” genes should 133
occur in clusters within which the synteny from the initial ancestral chromosome I was 134
maintained. The additional chromosomes of two related genera A and B were searched 135
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
8
for such shared gene clusters that are present on chromosome I of a unichromosomal 136
relative C but are no longer found on chromosome I in genera A and B. An initial 137
similarity lower limit of 60% identity was used, and once clusters were identified the 138
lower limit was adjusted to 40% identity to determine the fullest extent of each shared 139
gene cluster. Preliminary versions of Tables S6-S24 were checked against the ortholog 140
alignments, and minor corrections and additions were done to obtain the final set of 141
tables. 142
Analysis of the repABC systems of Agrobacterium. The RepA, RepB, and 143
RepC protein sequences from Agrobacterium tumefaciens were used as a query against 144
the NCBI database as of May, 2007 using the NCBI BlastP program (1). The top 100 145
matches were used for analysis. The sequences of each protein were aligned using the 146
MUSCLE program (11). Phylogenetic and molecular evolutionary analyses were 147
conducted using MEGA, version 4 (44). 148
Phylogenetic comparisons among the Rhizobiaceae. Phylogenetic analysis was 149
performed on a dataset of 507 homologous protein groups selected from 19 species of 150
Rhizobiales organisms (taxa listed in Table S4 (50); results in Fig. 1). The genes were 151
selected strictly from the primary chromosome of each genome. It was allowed that one 152
or two genomes could be missing the gene. Three hundred and seventeen homologous 153
groups contained all genomes, 146 were missing one genome, and 45 missed two 154
genomes (Table S5 (50)). Homolog groups with more than one entry for a genome were 155
not used. Sequences in each homolog group were trimmed by fit to a hidden Markov 156
model (HMM) using the HMMer package (10) and then aligned using MUSCLE (11) 157
with default parameters, as described previously (45). The concatenation of 119,758 158
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
9
aligned positions was analyzed using the program RaxML (43) using the GAMMA-159
distributed WAG substitution model. Bootstrapping was performed using the non-160
parametric (slow) method for 100 replicates. 161
Comparative single protein analysis. Genomes were clustered by gene content by 162
constructing a matrix of pairwise distances between bacterial proteomes (results in Fig. 163
S1). Pairwise distances were estimated using the following procedure. Using NCBI 164
BlastP, each protein in genome A was compared to the proteome of genome B. The 165
similarity of the top hit in genome B was noted for each protein in genome A. All such A 166
to B comparisons were summarized by calculating the percentage of the proteins in A 167
which had a match in B of at least certain similarity. That is, a table was created showing 168
what fraction of proteome A had matches of at least 100% identity in B, what fraction had 169
matches of at least 99% identity, what fraction had 98%, and so on. If A and B were the 170
same proteome, this table would contain values of 1.0 for all percents identity from 100% 171
to 1%. A histogram is generated for each genomic comparison and the area above the 172
histogram is measured. This represents the sum of the differences between the actual 173
fractions observed and those which would arise from having identical proteomes; this is a 174
distance measure. The proteomic comparison was repeated for all possible pairwise 175
comparisons. To generate the actual distances used in the phylogeny reconstruction we 176
have compared the pairs of organisms in both directions (A→B and B→A) and averaged 177
the histogram areas. 178
A dendrogram illustrating how the genomes cluster (and their relative distances) 179
with this scheme can be easily derived from the matrix of pairwise distances using the 180
neighbor-joining method implemented in PHYLIP (15). This proteomic comparison 181
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
10
method also appears robust with respect to highly divergent and even largely disjoint 182
protein sets: the archeon Pyrococcus furiosus branches deeply from the gamma 183
proteobacteria, while the small genome of Buchnera aphidicola, for example, clusters 184
with Wigglesworthia (data not shown). In spirit, this clustering method is similar to the 185
more rigorous average amino acid identity (AAI) measure proposed by Konstantinidis 186
and Tiedje (31); like theirs, our method shows that entire proteome comparisons largely 187
recapitulate standard 16S rRNA phylogeny yet provide insights into the correlation of 188
genome and ecological role as well as highlighting possible horizontal gene transfer. 189
A. radiobacter K84 and A. vitis S4 genome sequences. The annotated genome 190
sequences of both A. vitis S4 (GenBank CP000633 through CP000639) and A. 191
radiobacter K84 (GenBank CP000628 through CP000632) are available from GenBank 192
and from the Agrobacterium Genomes Database (40). 193
194
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
11
RESULTS 195
196
Sequencing and annotation of representative genomes from Agrobacterium 197
Biovars II and III. Representative genomes from all three Agrobacterium biovars are 198
now available. The genome sequence of the Biovar I strain A. tumefaciens C58 (C58) was 199
sequenced by our group and has been recently revised and updated (19, 42, 51). The 200
genome sequences for representatives of the two remaining biovars are presented here. 201
Table 1 compares the general features of C58, K84, and S4, and Tables S1-S3 202
provide a more detailed picture of each genome. The three sequenced Agrobacterium 203
biovars have distinct genome architectures. The genomes of C58 and S4 contain two true 204
chromosomes, which we define as replicons containing both rRNA operons and genes 205
essential for prototrophic growth. C58, however, has one circular and one linear 206
chromosome (19, 51) while S4 has two circular chromosomes. In both strains, the larger 207
chromosome (chromosome I) contains an origin of replication that is similar to other 208
chromosomal origins within the α-proteobacteria (24), while chromosome II has a 209
repABC origin of replication typical of the large plasmids within the Rhizobiaceae. C58 210
contains two plasmids, pTiC58 and pAtC58 (19, 51; Tables 1 and S1), whereas S4 has 211
five plasmids (Tables 1 and S2). K84, in contrast, has a single circular chromosome, a 212
second 2.65 Mb replicon and three plasmids (Tables 1 and S3): pAgK84 (44kb; (30)), 213
pAgK84b (185kb, pNOC (7)) and pAgK84c (388kb, pAgK434 (9)). Like the second 214
chromosomes of C58 and S4, the 2.65 Mb replicon contains a plasmid-type repABC 215
origin. However, it lacks the rRNA operons and does not contain the extensive sets of 216
essential metabolic genes found on the second chromosomes of C58 and S4. It does 217
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
12
contain at least one gene that is likely to be essential; L-seryl-tRNA selenium transferase 218
(Arad7947). 219
Multi-protein phylogeny of new genomes shows Agrobacterium to be 220
paraphyletic. The relationships among C58, S4, K84 and 16 previously sequenced 221
genomes in the Rhizobiales were investigated by maximum likelihood phylogenetic 222
analysis. Protein alignments were performed for 507 single-copy orthologous gene 223
families located on primary chromosomes that are likely to have tracked the vertical 224
component of ancestry (Fig. 1; Tables S4 and S5 (50)). Analysis of the concatenated 225
dataset produces a single topology with 100% a posteriori support for all branches within 226
the Rhizobiaceae, which is consistent with results of Williams et al. (48). This 227
phylogenetic reconstruction finds S4 to group with C58 and K84 to group with two 228
Rhizobium genomes (R. leguminosarum and R. etli). The lineage uniting K84 with 229
Rhizobium has a substantial branch length, while S4 and C58 appear to have separated 230
soon after the divergence of Sinorhizobium. 231
Whole-genome similarity plots support these findings (Fig. S1). The neighbor-232
joining tree of the distances measured from these plots gives the same topology and 233
similar relative branch lengths within the Rhizobiaceae as the maximum likelihood tree 234
analysis (Fig. S2). These large-scale investigations provide a well-defined phylogenetic 235
basis for uniting biovar II (represented by K84) with Rhizobium. 236
RepABC replication origins are not linearly descendant among secondary 237
chromosomes and large plasmids. Plasmid replication among the Rhizobiaceae is 238
generally under the control of the RepABC system (5, 47). Plasmid origins of replication 239
are typically considered characteristic of a plasmid since replication is required for 240
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
13
transmission. Thus, we would predict that the repABC genes, which are generally found 241
in an operon, evolved as a single unit on the plasmids and second chromosomes for 242
which they mediate replication. Phylogenetic analyses of these gene lineages, however, 243
indicate a lack of evolutionary congruence with the species tree (Fig. 1) among the 244
repABC systems of plasmids and of second largest replicons of the three biovars (Figs. 245
S5-S6). Therefore one cannot infer an ancestry for repABC genes that does not invoke 246
continuous horizontal gene transfer of these genes. Individual repABC genes show a 247
similar lack of evolutionary congruence within replicons (the RepA and RepB trees, 248
while congruent to each other, are not congruent to the RepC tree, Figs. S7-S8), 249
suggesting that plasmid evolution is mediated both by the frequent movement of plasmids 250
among strains and by exchange of the individual repABC genes within replicons. We note 251
that the repABC genes in the second largest replicons of C58, S4 and K84 do not form an 252
operon. In a wider evolutionary perspective, congruence among repABC genes generally 253
does hold. For example, even though the repC genes appear to move easily within 254
families, they move less easily within orders, and rarely outside of an order (Fig. 2). 255
These findings are consistent with recent work by Cevallos et al. (5) and confirm that the 256
intragenomic movement of genes across replicons includes the replication systems. 257
Conservation of gene content and order is much greater on primary 258
chromosomes than on secondary chromosomes. The C58 chromosome I shares large-259
scale synteny with the chromosome of Sinorhizobium meliloti 1021 and with the 260
chromosome of the more distantly related Mesorhizobium loti MAFF303099 (19, 51). 261
Subsequent analyses show conservation of gene order and content among primary 262
ancestral chromosomes of other Rhizobiales (Brucella, Bradyrhizobium, Mesorhizobium, 263
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
14
and Rhizobium strains, Ochrobactrum anthropi, and Azorhizobium caulinodans) (37, 41). 264
Given these relationships, we might expect the secondary chromosomes and large 265
replicons within Agrobacterium and across the Rhizobiales to display similar syntenic 266
relationships. Although some conservation of gene content is apparent, these replicons 267
lack the large-scale conservation of gene order seen among the primary chromosomes 268
(Fig. 3). Where gene order has been retained, it is limited to small blocks of genes. These 269
contrasting findings led us to examine the origins of the large secondary replicons. 270
271
Secondary chromosomes originated from intragenomic transfers from 272
primary chromosomes to ancestral plasmids. In spite of the lack of large-scale 273
synteny across the secondary chromosomes and large replicons of the Rhizobiales, 274
evidence supports a common origin for chromosomes II of C58 and S4 and the 2.65 Mbp 275
replicon of K84. Of the 3,382 genes shared by all three genomes, 291 are located on 276
chromosomes II of C58 and S4, and on the 2.65 Mbp replicon of K84. This represents 277
16%, 27%, and 12% of the genes on each of the respective DNA molecules (40). In 278
addition, six gene clusters are shared by chromosomes II of C58 and S4, by the 2.65 Mbp 279
replicon of K84, and by plasmids p42e of R. etli and pRL11 of R. leguminosarum (Fig. 3, 280
Tables S6 and S10 (50)). 281
Comparisons among the Rhizobiales suggest that gene transfer from primary 282
chromosomes to ancestral plasmids resulted in secondary chromosomes. Because these 283
transfers occur within the same genome (and can potentially occur between any pair of 284
replicons), we term them intragenomic gene transfers. Under this model, translocated 285
genes would be expected to occur in clusters that retain synteny with the ancestral 286
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
15
chromosome, and this is clearly observed (Fig. 3). All fully sequenced genomes in the 287
Brucella/Ochrobactrum clade (5 sequenced strains), two members of the genus 288
Sinorhizobium and the mixed Agrobacterium/Rhizobium clade (5 sequenced strains) 289
possess multiple chromosomes or a large replicon with some chromosomal 290
characteristics. Moreover, except for the Brucellae, all these members carry one or more 291
plasmids. 292
All fully sequenced Rhizobiales species that harbor multiple replicons have at 293
least one RepABC replicon. We suggest that the common ancestor of this order was a 294
uni-chromosomal strain that acquired a single ancestral plasmid of this class, here 295
referred to as the Intragenomic Translocation Recipient (ITR) (Fig. 4). The best evidence 296
for the existence of this ancestral plasmid is three gene clusters shared by almost all fully 297
sequenced Rhizobiales (in addition to repABC). As shown in Fig. 5 and Table S6 (50), in 298
29 out of 32 cases these four clusters are found in secondary large replicons. The three 299
exceptions (A. vitis (minCDE), O. anthropi (hutIHGU) and A. radiobacter (hutIHGU)) 300
can be explained by subsequent retrotransfers to the primary chromosome from the ITR, 301
based on analysis of adjacent syntenic regions shared with chromosome II of their nearest 302
sequenced relatives. Moreover, three of these clusters (minCDE, hutIHGU, and repABC) 303
are not seen in the uni-chromosomal genome of Azorhizobium caulinodans, a Rhizobiales 304
member, suggesting that the ITR plasmid brought those genes to the ancestral strain and 305
that the fourth gene cluster (pca) later moved from the ancestral chromosome to the ITR 306
plasmid. 307
At some point the Brucella/Ochrobactrum clade diverged from the lineage that 308
gave rise to the family Rhizobiaceae (Fig. 1). The transfer of chromosomal genes to the 309
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
16
ITR plasmid took place independently in the Brucella/Ochrobactrum clade (also 310
hypothesized in (36)) and in the Rhizobiaceae family. In the Brucella/Ochrobactrum 311
clade there have been 25 intragenomic transfers from the primary chromosome to the ITR 312
plasmid, as shown by the fact that these 25 clusters are shared by all of the sequenced 313
members of the Brucella/Ochrobactrum clade (Table S7 (50)) and that these clusters are 314
still found in the primary chromosome of S. meliloti. Twenty more transfers occurred 315
since Brucella diverged away from Ochrobactrum (Table S8 (50)). In fact, the recently 316
sequenced genome of Brucella suis ATCC 23445 (NC_010169.1) shows that another 220 317
kb section, found in chromosome I for all other fully sequenced Brucellae, is now part of 318
its chromosome II (46). In Sinorhizobium meliloti the ancestral ITR plasmid evolved into 319
the pSymB plasmid, with one intragenomic transfer event from the chromosome to the 320
ITR plasmid occurring prior to its divergence from the Agrobacterium/Rhizobium clade 321
and three events after (Table S9 (50)). 322
Among the Rhizobiaceae, at least two gene clusters transferred to the ancestral 323
ITR plasmid prior to the divergence of the clade that includes the biovar I/III strains from 324
the biovar II clade that includes K84, Rhizobium etli CFN42, and R. leguminosarum bv. 325
viciae 3841. These transfers include a cluster containing genes encoding a glutamate 326
synthase and glutamine synthetase III (Fig. 3B; Table S10 (50)). After this divergence, 327
there was at least one intragenomic transfer to the ITR plasmid before it became 328
chromosome II for Agrobacterium biovar I/III strains (Fig. 3B; Table S11 (50)). 329
Subsequently, transfers to chromosome II have occurred that are unique to biovars I or III 330
(19). For example, there have been at least seven large-scale gene transfer events, 331
ranging from 10 kbp to 220 kbp, and a few smaller transfer events between the ancestral 332
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
17
chromosome and chromosome II of C58 that did not occur in S4 (Fig. 3B; Table S12 333
(50)). In a separate but parallel track, there was at least one intragenomic transfer to the 334
ITR plasmid ancestral of K84 (2.65 Mbp replicon), R. etli (plasmid p42e) and R. 335
leguminosarum (plasmid pRL11) (Table S13 (50)). None of the secondary replicons in 336
this branch has reached chromosome status yet. 337
We observe that among Rhizobiales, another evolutionary path seems to be that of 338
integration of the ancestral ITR plasmid into the main chromosome. The best example of 339
this path is Bradyrhizobium strains. All fully sequenced Bradyrhizobium strains have 340
very large chromosomes (B. japonicum USDA 110 has a single chromosome larger than 341
nine Mbp (29)) and only one strain (Bradyrhizobium sp. BTAi1) has a plasmid that might 342
serve to nucleate second chromosomes. However, the presence of ITR plasmid gene 343
clusters and other plasmid genes in the chromosomes of these species (also seen in 344
Mesorhizobium main chromosomes) suggests integration of one or more plasmids into 345
the ancestral chromosome (Fig. 4). 346
347
Intragenomic flow from chromosomes to large plasmids mediates second 348
chromosome formation in other bacteria. A plasmid-based mechanism of secondary 349
chromosome formation was first proposed with the genome sequence of the two 350
chromosomes of Vibrio cholerae, based solely on the presence of plasmid replication 351
functions (12). The extensive data for the Rhizobiales just described goes well beyond 352
just replication functions, and we now provide evidence for two more examples of 353
extensive intragenomic gene transfer to a new chromosome based on published genomes 354
sequences. First, among the γ-Proteobacteria, the example of Vibrio is much older and 355
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
18
complex than first proposed. Strains of Photobacterium were once considered to be 356
within the genus Vibrio and multiple lines of evidence support Vibrio and 357
Photobacterium as sister genera. Both genera have two chromosomes and sequences are 358
available for P. profundum and four Vibrio species. Phylogenetic analysis of several 359
conserved proteins showed that among the available sequenced genomes, Aeromonas 360
hydrophila is the closest relative with a single chromosome. Comparative analyses 361
support six gene cluster transfers from the ancestral chromosome I to the plasmid 362
progenitor of chromosome II (itself defined by seven unique gene clusters) prior to the 363
divergence of the sister genera Photobacterium and Vibrio; seven additional gene cluster 364
transfers to chromosome II of the common ancestor of all the sequenced Vibrio strains; 365
and 29 transfers unique to the Photobacterium side (Fig. S3; Tables S14-S17 (50)). 366
Second, in the β-Proteobacteria, the genus Burkholderia was subdivided several years 367
ago, with some members of Burkholderia along with some stragglers from other genera 368
reclassified into the genus Ralstonia. Several lines of evidence support a very close 369
relationship between Burkholderia and Ralstonia, and they each consist of species with 370
two or three chromosomes. The most closely related sequenced genomes with a single 371
chromosome are those from the genus Bordetella; B. bronchiseptica was used as the 372
comparison genome for this analysis. Using chromosome II sequences from five 373
different Burkholderia species and three different Ralstonia species, the second 374
chromosomes of Burkholderia and Ralstonia share a common origin with 11 gene cluster 375
transfers from the ancestral chromosome to a plasmid progenitor (defined by two unique 376
gene clusters) (Fig. S4; Tables S18 and S19 (50)). After the divergence of these two 377
clades, 12 additional transfers to chromosome II are unique to the Burkholderia 378
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
19
bichromosome ancestor, and 24 transfers to the Ralstonia bichromosome ancestor (Fig. 379
S4; Tables S20 and S21 (50)). Within a subset of Burkholderia strains there is a third 380
plasmid-based chromosome to which four gene clusters were transferred from either 381
chromosome I or chromosome II (Fig. S4; Table S22 (50)). Taken together these data 382
support a generalized mechanism of secondary chromosome formation among bacteria. 383
384
385
386
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
20
DISCUSSION 387
388
Within the Rhizobiaceae, available evidence strongly supports a mixed 389
Agrobacterium/Rhizobium clade containing two subclades. One subclade includes the 390
biovar II agrobacteria (e.g., K84) and certain of the fast growing rhizobia including R. etli 391
and R. leguminosarum. The second subclade includes the Biovars I (e.g., C58) and III 392
(e.g., S4) lineages that separated after diverging from the Biovar II lineage. Linearization 393
of the Biovar I chromosome appears to have been a seminal event in this radiation (42). 394
Analysis of complete genome sequences within the Rhizobiales allows a more 395
precise definition of phylogenetic relationships. While it has long been known that gene 396
transfer can occur between organisms, the picture that results from our study shows a 397
group characterized by composite genomes in which genes of all classes are not only 398
migrating between organisms (19,51), but also intracellularly among chromosomal and 399
plasmid replicons. In the Rhizobiaceae, such movements, as well as chromosomal 400
rearrangements, have not completely disrupted the backbone of the ancestral 401
chromosome. In contrast, while second chromosomes and evolving plasmid-based large 402
replicons have some overlapping gene content, they display significant loss of gene order. 403
In Biovar I and III agrobacteria these movements produced second chromosomes derived 404
from plasmids, while in the biovar II strain K84 the plasmid-based replicon has yet to 405
reach second chromosome status. 406
Although it is clear that the 2.65 Mb replicon of K84, second chromosomes of 407
C58 and S4, and large plasmids in other members of the Rhizobiales have evolved from a 408
common plasmid ancestor, the repABC genes involved in replication initiation, copy 409
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
21
number control, and partition on these molecules are phylogenetically distinct even 410
within a single organism . These findings show that repABC genes, like other genes, are 411
being exchanged among replicons. This may reflect selective pressure to move from 412
incompatibility to coexistence in genomes with multiple repABC-based replicons. It also 413
means there is no internal standard by which to directly compare replicon lineages among 414
these plasmids. 415
Our data show a common mechanism of secondary chromosome formation in 416
Rhizobiacea and other bacteria. A prerequisite for this evolution is the intracellular 417
presence of a second replicon capable of stably and efficiently replicating large DNA 418
molecules. The repABC-type replicons that are widely distributed among the Rhizobiales 419
fall into this class, and have produced second chromosomes in addition to large replicons 420
such as the 2.65Mb K84 replicon and the Sym plasmids of nitrogen fixing members of 421
the Rhizobiaceae (6, 8, 16, 18, 19, 21, 37, 51, 53). In A. tumefaciens, it has been shown 422
that chromosome II is replicated concurrently with chromosome I; such overall genome 423
synchrony probably allowed intragenomic transfers to be maintained (27, 28). Most of 424
the large gene movements have been from the ancestral chromosome to plasmid 425
replicons, with only rare retrotransfers. While plasmids can undergo large gene 426
rearrangements and losses/insertions, available evidence suggests there are some 427
constraints to large-scale rearrangements of the bacterial chromosome (23, 34, 39). 428
The advantage of multiple chromosomes is unclear, but we speculate that they 429
may permit further accumulation of genes when the primary replicon cannot support 430
further chromosome enlargement. Within the Rhizobiaceae, different species appear to 431
handle gene accumulation in different ways. Bradyrhizobium and Mesorhizobium 432
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
22
species have very large chrosomosomes with few, if any, relatively small plasmids. In 433
contrast, Agrobacterium and Rhizobium strains have multiple chromosomes or large 434
replicons that show gene accumulation, and anywhere from one to six plasmids. These 435
differences may suggest that chromosomal origins have differing abilities to replicate 436
molecules larger than about five or six Mbp, with multiple chromosomes providing an 437
alternative reservoir for newly-acquired DNA. Alternatively, the initial movement of a 438
few essential gene clusters to a plasmid replicon may be simply a historical contingency 439
with no attached selective advantage. Additional essential gene transfers would simply 440
solidify the essential nature of the new replicon. An evaluation of the selective advantage 441
hypothesis is needed, but regardless of the reason, it is clear that genetic organization of 442
even essential genes in bacteria is much more complex and fluid than has been imagined. 443
444
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
23
ACKNOWLEDGMENTS 445
446
This work was supported by National Science Foundation Grants 0333297 and 447
0603491 to EWN and 0736671 to SCS, grants from the M. J. Murdock Charitable Trust 448
Life Sciences program (2004262:JVZ and 2006245:JVZ) to DWW, by a science 449
education grant from the Howard Hughes Medical Institute to BG (52005125), by a 450
Conselho Nacional de Desenvolvimento Científico e Tecnológico fellowship to N.F.A. 451
(#200447/2007-6) and by the Monsanto Company. Special thanks to the over 450 452
undergraduate students at Hiram College, Oregon State University, Seattle Pacific 453
University, Arizona State University, University of North Carolina, Washington 454
University in St. Louis, and Williams College who contributed to the deep annotation of 455
all three Agrobacterium genomes between 2004 and 2008. 456
457
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
24
REFERENCES 458
459
1. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, 460
and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of 461
protein database search programs. Nucleic Acids Res. 25:3389-402. 462
2. Burr, T. J., C. Bazzi, S. Sule, and L. Otten. 1998. Crown gall of grape: biology 463
of Agrobacterium vitis and the development of disease control strategies. Plant 464
Dis. 82:1288-1297. 465
3. Burr, T. J., A. L. Bishop, B. H. Katz, L. M. Blanchard, and C. Bazzi. 1987. A 466
root-specific decay of grapevine caused by Agrobacterium tumefaciens and A. 467
radiobacter biovar 3. Phytopathology 77:1424-1427. 468
4. Burr, T. J., and L. Otten. 1999. Crown Gall of grape: Biology and Disease 469
Management. Annu. Rev. Phytopathol. 37:53-80. 470
5. Cevallos, M. A., R. Cervantes-Rivera, and R. M. Gutierrez-Rios. 2008. The 471
repABC plasmid family. Plasmid. 60:19-37. 472
6. Chain, P. S., D. J. Comerci, M. E. Tolmasky, F. W. Larimer, S. A. Malfatti, 473
L. M. Vergez, F. Aguero, M. L. Land, R. A. Ugalde, and E. Garcia. 2005. 474
Whole-genome analyses of speciation events in pathogenic Brucellae. Infect. 475
Immun. 73:8353-61. 476
7. Clare, B. G., A. Kerr, and D. A. Jones. 1990. Characteristics of the nopaline 477
catabolic plasmid in Agrobacterium strains K84 and K1026 used for biological 478
control of crown gall disease. Plasmid 23:126-37. 479
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
25
8. DelVecchio, V. G., V. Kapatral, R. J. Redkar, G. Patra, C. Mujer, T. Los, N. 480
Ivanova, I. Anderson, A. Bhattacharyya, A. Lykidis, G. Reznik, L. Jablonski, 481
N. Larsen, M. D'Souza, A. Bernal, M. Mazur, E. Goltsman, E. Selkov, P. H. 482
Elzer, S. Hagius, D. O'Callaghan, J. J. Letesson, R. Haselkorn, N. Kyrpides, 483
and R. Overbeek. 2002. The genome sequence of the facultative intracellular 484
pathogen Brucella melitensis. Proc. Natl. Acad. Sci. U. S. A. 99:443-8. 485
9. Donner, S. C., D. A. Jones, N. C. McClure, G. M. Rosewarne, M. E. Tate, A. 486
Kerr, N. N. Fajardo, and B. G. Clare. 1993. Agrocin 434, a new plasmid 487
encoded agrocin from the biocontrol Agrobacterium strains K84 and K1026, 488
which inhibits biovar 2 agrobacteria. Physiol. Mol. Plant Pathol. 42:185-194. 489
10. Eddy S.R. 1998. Profile hidden Markov models. 1998. Bioinformatics, 14:755-490
763. 491
11. Edgar, R. C. 2004. MUSCLE: a multiple sequence alignment method with 492
reduced time and space complexity. BMC Bioinformatics. 5:113. 493
12. Egan, E. S., M. A. Fogel, and M. K. Waldor. 2005. Divided genomes: 494
negotiating the cell cycle in prokaryotes with multiple chromosomes. Molecular 495
Microbiology 56:1129-1138. 496
13. Ewing, B., and P. Green. 1998. Base-calling of automated sequencer traces 497
using phred. II. Error probabilities. Genome Res. 8:186-94. 498
14. Ewing, B., L. Hillier, M. C. Wendl, and P. Green. 1998. Base-calling of 499
automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 500
8:175-85. 501
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
26
15. Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). 502
Cladistics 5: 164-166. 503
16. Galibert, F., T. M. Finan, S. R. Long, A. Puhler, P. Abola, F. Ampe, F. 504
Barloy-Hubler, M. J. Barnett, A. Becker, P. Boistard, G. Bothe, M. Boutry, 505
L. Bowser, J. Buhrmester, E. Cadieu, D. Capela, P. Chain, A. Cowie, R. W. 506
Davis, S. Dreano, N. A. Federspiel, R. F. Fisher, S. Gloux, T. Godrie, A. 507
Goffeau, B. Golding, J. Gouzy, M. Gurjal, I. Hernandez-Lucas, A. Hong, L. 508
Huizar, R. W. Hyman, T. Jones, D. Kahn, M. L. Kahn, S. Kalman, D. H. 509
Keating, E. Kiss, C. Komp, V. Lelaure, D. Masuy, C. Palm, M. C. Peck, T. M. 510
Pohl, D. Portetelle, B. Purnelle, U. Ramsperger, R. Surzycki, P. Thebault, M. 511
Vandenbol, F.-J. Vorholter, S. Weidner, D. H. Wells, K. Wong, K.-C. Yeh, 512
and J. Batut. 2001. The Composite Genome of the Legume Symbiont 513
Sinorhizobium meliloti. Science 293:668-672. 514
17. Ghai, R., and T. Chakraborty. 2007. Comparative microbial genome 515
visualization using GenomeViz. Methods Mol. Biol. 395:97-108. 516
18. Gonzalez, V., R. I. Santamaria, P. Bustos, I. Hernandez-Gonzalez, A. 517
Medrano-Soto, G. Moreno-Hagelsieb, S. C. Janga, M. A. Ramirez, V. 518
Jimenez-Jacinto, J. Collado-Vides, and G. Davila. 2006. The partitioned 519
Rhizobium etli genome: Genetic and metabolic redundancy in seven interacting 520
replicons. Proc. Natl. Acad. Sci. U. S. A. 103:3834-3839. 521
19. Goodner, B., G. Hinkle, S. Gattung, N. Miller, M. Blanchard, B. Qurollo, B. 522
S. Goldman, Y. W. Cao, M. Askenazi, C. Halling, L. Mullin, K. Houmiel, J. 523
Gordon, M. Vaudin, O. Iartchouk, A. Epp, F. Liu, C. Wollam, M. Allinger, 524
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
27
D. Doughty, C. Scott, C. Lappas, B. Markelz, C. Flanagan, C. Crowell, J. 525
Gurson, C. Lomo, C. Sear, G. Strub, C. Cielo, and S. Slater. 2001. Genome 526
sequence of the plant pathogen and biotechnology agent Agrobacterium 527
tumefaciens C58. Science 294:2323-2328. 528
20. Gordon, D., C. Abajian, and P. Green. 1998. Consed: a graphical tool for 529
sequence finishing. Genome Res. 8:195-202. 530
21. Halling, S. M., B. D. Peterson-Burch, B. J. Bricker, R. L. Zuerner, Z. Qing, 531
L. L. Li, V. Kapur, D. P. Alt, and S. C. Olsen. 2005. Completion of the genome 532
sequence of Brucella abortus and comparison to the highly similar genomes of 533
Brucella melitensis and Brucella suis. J. Bacteriol. 187:2715-26. 534
22. Herlache, T. C., H. S. Zhang, C. L. Ried, S. A. Carle, P. Basaran, M. Thaker, 535
A. T. Burr, and T. J. Burr. 2001. Mutations that affect Agrobacterium vitis-536
induced grape necrosis also alter its ability to cause a hypersensitive response on 537
tobacco. Phytopathology 91:966-972. 538
23. Hughes, D. 2000. Evaluating genome dynamics: the constraints on 539
rearrangements within bacterial genomes. Genome Biol. 1:REVIEWS0006. 540
24. Ioannidis, P., J. C. Hotopp, P. Sapountzis, S. Siozios, G. Tsiamis, S. R. 541
Bordenstein, L. Baldo, J. H. Werren, and K. Bourtzis. 2007. New criteria for 542
selecting the origin of DNA replication in Wolbachia and closely related bacteria. 543
BMC Genomics. 8:182. 544
25. Jones, D. A., M. H. Ryder, B. G. Clare, S. K. Farrand, and A. Kerr. 1991. 545
Biological control of crown gall using Agrobacterium strains K84 and K1026, p. 546
161-170. In H. Komada, K. Kiritani, and J. Bay-Peterson (ed.), The biological 547
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
28
control of plant diseases. Food and Fertilizer Technology Center for the Asian and 548
Pacific Region, Taipei, Taiwan. 549
26. Jumas-Bilak, E., S. Michaux-Charachon, G. Bourg, M. Ramuz, and A. 550
Allardet-Servent. 1998. Unconventional genomic organization in the alpha 551
subgroup of the Proteobacteria. J. Bacteriol. 180:2749-55. 552
27. Kahng, L. S., and L. Shapiro. 2001. The CcrM DNA methyltransferase of 553
Agrobacterium tumefaciens is essential, and its activity is cell cycle regulated. J. 554
Bacteriol. 183:3065-3075. 555
28. Kahng, L. S., and L. Shapiro. 2003. Polar localization of replicon origins in the 556
multipartite genomes of Agrobacterium tumefaciens and Sinorhizobium meliloti. 557
J. Bacteriol. 185:3384-91. 558
29. Kaneko, T., Y. Nakamura, S. Sato, K. Minamisawa, T. Uchiumi, S. 559
Sasamoto, A. Watanabe, K. Idesawa, M. Iriguchi, K. Kawashima, M. 560
Kohara, M. Matsumoto, S. Shimpo, H. Tsuruoka, T. Wada, M. Yamada, and 561
S. Tabata. 2002. Complete genomic sequence of nitrogen-fixing symbiotic 562
bacterium Bradyrhizobium japonicum USDA110. DNA Research 9:189-197. 563
30. Kim, J. G., B. K. Park, S. U. Kim, D. Choi, B. H. Nahm, J. S. Moon, J. S. 564
Reader, S. K. Farrand, and I. Hwang. 2006. Bases of biocontrol: Sequence 565
predicts synthesis and mode of action of agrocin 84, the Trojan Horse antibiotic 566
that controls crown gall. Proc. Natl. Acad. Sci. U. S. A. 103:8846-51. 567
31. Konstantinidis, K. T., and J. M. Tiedje. 2005. Towards a genome-based 568
taxonomy for prokaryotes. J. Bacteriol. 187:6258-64. 569
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
29
32. Li, L., C. J. Stoeckert, Jr., and D. S. Roos. 2003. OrthoMCL: identification of 570
ortholog groups for eukaryotic genomes. Genome Res. 13:2178-89. 571
33. Markowitz, V. M., E. Szeto, K. Palaniappan, Y. Grechkin, K. Chu, I. M. 572
Chen, I. Dubchak, I. Anderson, A. Lykidis, K. Mavromatis, N. N. Ivanova, 573
and N. C. Kyrpides. 2008. The integrated microbial genomes (IMG) system in 574
2007: data content and analysis tool extensions. Nucleic Acids Res. 36:D528-33. 575
34. Miesel, L., A. Segall, and J. R. Roth. 1994. Construction of chromosomal 576
rearrangements in Salmonella by transduction: inversions of non-permissive 577
segments are not lethal. Genetics. 137:919-32. 578
35. Moore, L. W., and G. Warren. 1979. Agrobacterium radiobacter strain K84 and 579
biological control of crown gall. Annu. Rev. Phytopathol. 17:163-179. 580
36. Moreno, E., A. Cloeckaert, and I. Moriyon. 2002. Brucella evolution and 581
taxonomy. Vet. Microbiol. 90:209-27. 582
37. Paulsen, I. T., R. Seshadri, K. E. Nelson, J. A. Eisen, J. F. Heidelberg, T. D. 583
Read, R. J. Dodson, L. Umayam, L. M. Brinkac, M. J. Beanan, S. C. 584
Daugherty, R. T. Deboy, A. S. Durkin, J. F. Kolonay, R. Madupu, W. C. 585
Nelson, B. Ayodeji, M. Kraul, J. Shetty, J. Malek, S. E. Van Aken, S. 586
Riedmuller, H. Tettelin, S. R. Gill, O. White, S. L. Salzberg, D. L. Hoover, L. 587
E. Lindler, S. M. Halling, S. M. Boyle, and C. M. Fraser. 2002. The Brucella 588
suis genome reveals fundamental similarities between animal and plant pathogens 589
and symbionts. Proc. Natl. Acad. Sci. U.S.A. 99:13148-13153. 590
38. Peterson, J. D., L. A. Umayam, T. Dickinson, E. K. Hickey, and O. White. 591
2001. The Comprehensive Microbial Resource. Nucleic Acids Res. 29:123-5. 592
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
30
39. Rocha, E. P. 2006. Inference and analysis of the relative stability of bacterial 593
chromosomes. Mol. Biol. Evol. 23:513-22. 594
40. Setubal, J. C. 2008. JCSlab Genome Databases http://agro.vbi.vt.edu/public. 595
[Online.] 596
41. Setubal, J. C., D. Wood, T. Burr, S. Farrand, B. Goldman, B. Goodner, L. 597
Otten, and S. Slater. 2009. The Genomics of Agrobacterium: Insights into 598
Pathogenicity, Biocontrol, and Evolution. , p. 91-112. In R. Jackson (ed.), Plant 599
Pathogenic Bacteria: Genomics and Molecular Biology. Caister Academic Press, 600
Norfolk, UK. 601
42. Slater, S., J.C. Setubal, B. Goodner, Y. Zhou, K. Houmiel, J. Sun, B. S. 602
Goldman, S. K. Farrand, W. M. Huang, S. Casjens, R. Kaul, Q. Chen, T. 603
Burr, E. Nester, R. Kadoi, T. Ostheimer, N. Nicole Pride, A. Allison Sabo, E. 604
Erin Henry, E. Erin Telepak, L. Lindsey Wilson, A. Alana Harkleroad, and 605
D. Wood. submitted. Evolution and distribution of linear chromosomes in plant 606
symbionts of the Rhizobiaceae. BMC Genomics. 607
43. Stamatakis A., T. Ludwig, H. Meier. RAxML-III: a fast program for maximum 608
likelihood-based inference of large phylogenetic trees. 2005. Bioinformatics. 609
21(4):456-63. 610
44. Tamura, K., J. Dudley, M. Nei, and S. Kumar. 2007. MEGA4: Molecular 611
Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 612
24:1596-9. 613
45. Tian, Y., and A. W. Dickerman. 2007. GeneTrees: a phylogenomics resource 614
for prokaryotes. Nucleic Acids Res. 35:D328-31. 615
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
31
46. Wattam, A. R., K. P. Williams, E. E. Snyder, N. F. Almeida Jr., M. Shukla, 616
A. W. Dickerman, O. R. Crasta, R. Kenyon, J. Lu, J. M. Shallom, H. Yoo, T. 617
A. Ficht, R. M. Tsolis, C. Munk, R. Tapia, C. S. Han, J. C. Detter, D. Bruce, 618
T. S. Brettin, B. W. Sobral, S. M. Boyle, and J. C. Setubal. 2009. Analysis of 619
ten Brucella genomes reveals evidence for horizontal gene transfer despite a 620
preferred intracellular lifestyle. J. Bacteriol. (submitted) 621
47. Weaver, K. E. 2007. Emerging plasmid-encoded antisense RNA regulated 622
systems. Curr. Opin. Microbiol. 10:110-6. 623
48. Williams, K. P., B. W. Sobral, and A. W. Dickerman. 2007. A robust species 624
tree for the alphaproteobacteria. J. Bacteriol. 189:4578-86. 625
49. Wong, K., and G. B. Golding. 2003. A phylogenetic analysis of the pSymB 626
replicon from the Sinorhizobium meliloti genome reveals a complex evolutionary 627
history. Can. J. Microbiol. 49:269-80. 628
50. Wood, D. W. 2008. Agrobacterium.org: An online resource for the 629
Agrobacterium research community http://www.agrobacterium.org. [Online.] 630
51. Wood, D. W., J. C. Setubal, R. Kaul, D. E. Monks, J. P. Kitajima, V. K. 631
Okura, Y. Zhou, L. Chen, G. E. Wood, N. F. Almeida, L. Woo, Y. C. Chen, I. 632
T. Paulsen, J. A. Eisen, P. D. Karp, D. Bovee, P. Chapman, J. Clendenning, 633
G. Deatherage, W. Gillet, C. Grant, T. Kutyavin, R. Levy, M. J. Li, E. 634
McClelland, A. Palmieri, C. Raymond, G. Rouse, C. Saenphimmachak, Z. N. 635
Wu, P. Romero, D. Gordon, S. P. Zhang, H. Y. Yoo, Y. M. Tao, P. Biddle, M. 636
Jung, W. Krespan, M. Perry, B. Gordon-Kamm, L. Liao, S. Kim, C. 637
Hendrick, Z. Y. Zhao, M. Dolan, F. Chumley, S. V. Tingey, J. F. Tomb, M. P. 638
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
32
Gordon, M. V. Olson, and E. W. Nester. 2001. The genome of the natural 639
genetic engineer Agrobacterium tumefaciens C58. Science 294:2317-2323. 640
52. Young, J. M. 2008. Agrobacterium: Taxonomy of plant-pathogenic Rhizobium 641
species., p. 184-220. In T. Tzfira and V. Citovsky (ed.), Agrobacterium; From 642
biology to biotechnology. Springer, New York. 643
53. Young, J. P., L. C. Crossman, A. W. Johnston, N. R. Thomson, Z. F. 644
Ghazoui, K. H. Hull, M. Wexler, A. R. Curson, J. D. Todd, P. S. Poole, T. H. 645
Mauchline, A. K. East, M. A. Quail, C. Churcher, C. Arrowsmith, I. 646
Cherevach, T. Chillingworth, K. Clarke, A. Cronin, P. Davis, A. Fraser, Z. 647
Hance, H. Hauser, K. Jagels, S. Moule, K. Mungall, H. Norbertczak, E. 648
Rabbinowitsch, M. Sanders, M. Simmonds, S. Whitehead, and J. Parkhill. 649
2006. The genome of Rhizobium leguminosarum has recognizable core and 650
accessory components. Genome Biol. 7:R34. 651
652
653
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
33
654
FIGURE LEGENDS 655
656
Figure 1. Phylogenetic tree relating 19 genomes in the Rhizobiales. The tree was 657
inferred from 119,758 aligned protein positions from 509 genes located strictly on the on 658
the primary chromosome in each genome. Bootstrap support was 100% for all nodes 659
except that linking Bradyrhizobium and Nitrobacter, which was 98 out of 100. 660
661
Figure 2. Phylogenetic analysis of RepC proteins among the Rhizobiaceae. Organism 662
name is followed by the NCBI Gene Identification Number. Red indicates membership in 663
the Rhizobiales, purple Sphingomonadales, blue Rhodospirillales, green 664
Rhodobacterales, and orange Caulobacterales. 665
666
Figure 3: Gene conservation among replicons of the Rhizobiales. Graphic depicts 667
ortholog gene alignments shown from the outer circle and moving inward: Sinorhizobium 668
meliloti 1021 (NC_003047.1), Rhizobium leguminosarum bv. viciae 3841 669
(NC_008380.1), Rhizobium etli CFN42 (NC_007761.1), K84, S4, C58, Ochrobactrum 670
anthropi ATCC 49188 (NC_009668.1), and Brucella suis 1330 (NC_004310.3). Top: the 671
alignment is anchored by C58 chromosome I; bottom: the alignment is anchored by C58 672
chromosome II. The anchor replicon itself is represented by the circle bordered by scales 673
with marks every 1/8 of its total size. Each gene is colored according to its replicon of 674
origin: blue for primary chromosome, green for secondary chromosomes (including K84 675
2.65 Mb replicon), and orange for plasmids. Note that in all circles except the anchor, 676
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
location of gene in the figure is not tied to physical position in that genome. At higher 677
resolution (40) it is possible to see that many genes in the non-anchor circles occur 678
consecutively in their respective replicons, thus representing syntenic blocks or clusters. 679
The position of clusters that occur in C58 listed in supplementary Tables S9, S13-S15 680
(51) are indicated by outermost arc sections painted in black. Each such arc is labeled as 681
Sx-y, where x is the supplementary table number and y is the order of the cluster in the 682
table. The top alignment is predominantly blue, suggesting the high degree of 683
conservation among Rhizobiales primary chromosomes. The bottom alignment is a 684
mixture of blue, green, and orange, suggesting the mosaic nature of chromosome II and 685
hinting at the various genomic transfers hypothesized to have taken place, as explained in 686
the text. 687
688
Figure 4. Reconstruction of the origin of secondary chromosomes and related large 689
replicons within the Rhizobiales through transfers of gene clusters from the primordial 690
chromosome to what originally was a repABC-type plasmid (called here the 691
Intragenomic Translocation Recipient or ITR plasmid). 692
693
Figure 5. Key gene clusters present on ITR plasmid progenitor of chromosome II and 694
related large replicons, during evolution of Rhizobiales. C58 is the reference, and its 695
genes are represented as arrows consistent with the strand they are found on in the 696
deposited genome sequence. Genes for the other genomes were aligned with the C58 697
genes and are represented with circles or squares. Circles/squares are connected with 698
lines when corresponding genes are consecutive. A black (or gray) circle means that the 699
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
represented gene is in a secondary chromosome or plasmid; a black (or gray) square 700
means that the represented gene is in the primary chromosome. A black circle or square 701
means that the alignment to the C58 ortholog covered 80% of more of both genes; a gray 702
circle or square means the alignment covered less than 80%. Gene numbering shown for 703
C58, S4, K84, R. etli CFN42, R. leguminosarum bv. viciae 3841, S. meliloti 1021, B. suis 704
1330, and O. anthropi ATCC49188. 705
706
707
708
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
Organism A. tumefaciens C58 A. radiobacter K84 A. vitis S4
biovariant Biovar I Biovar II Biovar III
genome size (bp) 5,674,260 7,273,300 6,320,946
%GC content 59.0 59.9 57.5
chromosomesa 2 1 2
plasmids 2 4b 5
protein coding genes
total 5,385 6,752 5,479
functionality assigned 3,516 5,099 3,897
conserved hypothetical 1,287 1,201 1,282
hypothetical 582 452 300
pseudogenes 28 68 90
RNA genes
rRNA operons 4 3 4
tRNAs 56 51 54
other RNAsc 26 23 30
genomic islands
total 38 59 20
average size (kb) 23.3 28.2 33.0
aA chromosome is defined here as a replicon harboring rRNA operons and essential genes.
bThe 2.65Mb replicon, which does not meet our definition of a chromosome, is included here (see Results).
cIncludes tmRNAs, SRP RNAs, suhB, riboswitches, and miscellaneous features.
Table 1. Summary of genome features from sequenced Agrobacterium strains.
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
Parvibaculum lavamentivorans DS-1
Xanthobacter autotrophicus Py2
Azorhizobium caulinodans ORS 571
Rhodopseudomonas palustris BisA53
Bradyrhizobium japonicum USDA 110
Nitrobacter hamburgensis X14
Aurantimonas sp. SI85-9A1
Mesorhizobium loti MAFF303099
Mesorhizobium sp. BNC1
Bartonella quintana str. Toulouse
Ochrobactrum anthropi ATCC 49188Ochrobactrum anthropi ATCC 49188
Brucella abortus bv. 1 str. 9-941
Brucella melitensis 16M
Sinorhizobium meliloti 1021
Agrobacterium vitis S4
Agrobacterium tumefaciens str. C58
Agrobacterium radiobacter K84
Rhizobium leguminosarum bv. viciae
Rhizobium etli CFN 42
0.1
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
A. r
hiz
og
en
es
13
23
60
S.
me
dic
ae
11
38
72
61
1R
. etl
i Re
pC
d 2
14
92
96
5
R. etli 86360943
R. l
eg
um
ino
saru
m 1
16
25
49
13
Avi
70
02
p6
31
A. t
um
efac
ien
s p
AT
Atu
5002
pAT
S. m
elilo
ti 7
6880
804
R. le
gum
inos
arum
116
2549
92
Avi
8002
p25
9
R. legum
inosa
rum
116255070
A. rhiz
ogenes 10954780
R. etli
86359880
R. legum
inosa
rum
116254946
R. leguminosarum 116248679
R. leguminosarum 116255203
R. leguminosarum 116254470
A. tumefaciens linear 15891041Brucella suis 23499768
Mesorhizobium BNC1 11347335
Mesorhizobium BNC1 110347371
Mesorhizobium MAFF 13488160
Mesorhizobium MAFF 13488496
Mesorhizobium
BNC1 110347005
Xanthobacter 89362935
AV
i9003 p212
Nitrobacter 92109694
Mesorhizob
ium BN
C1 110347155
A. tu
mefacien
s pTi_092 10955108
Nitrobacter 92109428
Nitrobacter 92109644
Bradyrhizobium 78699859
R. sp
ha
ero
ide
s 77
40
46
11
Oce
an
ico
la 8
45
02
39
8
Oce
an
ico
la 8
90
68
36
8
Ro
seo
va
riu
s 1
14
76
52
26
Ro
seo
va
riu
s 1
14
76
32
65
Su
l�to
ba
cte
r 8
39
44
56
3
Su
l�to
ba
cte
r 8
39
55
99
9
Acidiphilium 88938589
Acidiphilium 88938673
Acidiphilium 88938797
Acidiphilium 88939256
Sul�tobacter 8
3956257
Sul�tobacte
r 83956069
Paracoccus v
ersutu
s 1402845
Paraco
ccus 1
7136069
Roseovarius 114766710Ro
seovarius 85706498
Roseovarius 85705643
Sphingomonas 94498101
Paracoccus denitri�cans 69936217
Ruegeria 28558919
Paraco
ccus 2
0385858
Ro
seo
vari
us
11
47
62
55
9
Roseova
rius
85705537R
ose
ova
riu
s 85
7072
92R
ose
ova
riu
s 8
39
52
92
4
Sul�
tobac
ter 8
3956
207
R. tropici 59327229Avi9901 p79
Avi8201 p259
R. etli 86359740
Arad12077 p388
R. s
ph
aero
ides
77
40
48
05
Oce
anic
ola
84
50
19
91
Oce
anic
ola
8450
3606
Sul�tobacter 83955816
Roseobacter 86139665S. meliloti 46319
Reugeria 28558827
Roseovarius 85707388
Caulobacter 113935928
Gluconobacter 58038354
Oceanicola 84503258
Roseobacter 86139546
Roseobacter 115345552
S. meliloti 66876424
Oligotropha 47176963
Atu
pTi Saku
ra 10
95
48
41
Atu
60
45
pTi
Brucella melitensis 16M 17988435
Atu3922 linear
Arad7003 p2651
S. medicae 113874539
S. medicae 113875852
Arad12022 p388
S. meliloti 16263795
R. etli 86361301
R. etli 86360733
R. etli 22023154
R. etli 86360277
R. sp NGR234 16519681
S. melilo
ti 16263745
Avi
98
02
p7
9
Arad
14186 p185
Avi5
00
2 p
C2
Avi
95
03
p1
30
Rh
od
ob
acte
rale
s 84
6868
25
Rhodobacterales 84686692
Rhodobacterales 84687329
R. sp
haero
ides 83370885
R. sp
hae
roid
es 7
74
04
83
2
R. sp
haero
ides 77386334
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
C58 Chromosome II
S8-1
S9-1
S9-2
S7-1
S3-4
S9-3
S9-4
S9-5
S9-6
S9-7
S9-13
S3-1S3-3
S9-14
S9-15
S9-19
S3-2
S9-16
S9-17
S9-18
S7-2
S9-9
S9-8
S9-10
S9-11
S9-12
C58 Chromosome I
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
Diverged before IRT plasmid entryOR
Loss of ITR plasmid(e.g., Azorhizobium, Xanthobacter)
Integration of ITR plasmid into Chromosome (e.g.,
Mesorhizobium, Bradyrhizobium)
Loss of ITR plasmid(e.g., Bartonella, Parvibaculum)
Unichromosome Rhizobiales Ancestor with Intragenome
Transfer Recipient (ITR) plasmid
IRT with 3 key gene Clusters (Figure 4,
Supplementary Table S3)
IRT with 4 key gene Clusters (Figure 4,
Supplementary Table S3)
UnichromosomeRhizobiaceae Ancestor
with IRT plasmid
UnichromosomeAgrobacterium/Rhizobium Ancestor with IRT plasmid
Biochromosomal Biovar 1/Biovar 3
Ancestor
transfer of pca gene
cluster25 shared cluster transfers by
Brucella-Ochrobactrum Ancestor (Supplementary Table S4)
20 shared cluster transfers(Supplementary Table S5)
1 shared cluster transfer3 cluster transfers
(Supplementary Table S6)
2 shared cluster transfers(Supplementary Table S7)
Bv2-speci�c cluster transfers(Supplementary Table S10)
1 shared cluster transfer + LGT(Supplementary Table S8)
Bv3-speci�c cluster transfers
Bv1-speci�c cluster transfers (Supplementary
Table S9)
Ancestral Bichromosome Brucella
Ancestral Bichromosome Ochrobactrum
Ancestral Sinorhizobium with IRT plasmid = pSymB
Ancestral Biovar 2
Ancestral Bichromosomal Biovar 3
Ancestral Bichromosomal Biovar 1
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from
pcaGHCD,Q,R,IJF
Agro C58 cII
Atu4538-4549
Agro vitis S4 cII Avi6130-6117
Agro K84 2651r
Arad9505-9490
R.etli p42e PE00055-60,CH03444-3,
PE00200-00203
R.leg pRL11
pRL110085-90,RL3905-4 pRL110296-89
S.meli pSymB
SMb20575-80,SMb20583,SMb20586-20589
B.suis 1330 cII
Bra0647-0636
O.anthropi cII
Oant_3729-3718
repABC minCDE hutIHGU
Agro C58 cII
Atu3922-3924
Agro C58 cIIAtu3247-3249
Agro C58 cII
Atu3931-3936
Agro vitis S4 cII
Avi5002-5000
Agro vitis S4 cI
Avi3506-3508
Agro vitis S4 cII
Avi5956-60,63
Agro K84 2651r
Arad7003-7000
Agro K84 2651r
Arad8858-8856
Agro K84 cI
Arad4562-4566
R.etli p42e PE00459-457
R.etli p42e
PE00407-00409
R.etli p42e
PE00070-00075
R.leg pRL11
pRL110003-01
R.leg pRL11
pRL110544-0546
R.leg pRL11
pRL110203-0208
S.meli pSymB
SMb20044-20046
S.meli pSymB
SMb21522-21524
S.meli pSymB
SMb21163-21166, SMc00673,SMb20048
B.suis 1330 cII
Bra 0001,1203-02
B.suis 1330 cII
Bra0323-0321
B.suis 1330 cII
Bra0932-0927
O.anthropi cII
Oant_4454-4456
O.anthropi cII
Oant_2972-2970
O.anthropi cI
Oant_1433-1438
on February 14, 2018 by guest
http://jb.asm.org/
Dow
nloaded from