Environmental Microbiology (2005)
7
(12) 2011ndash2026 doi101111j1462-2920200500918x
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Blackwell Science LtdOxford UKEMIEnvironmental Microbiology 1462-2912Society for Applied Microbiology and Blackwell Publishing Ltd 2005
7
1220112026
Original Article
LGT and phylogenetic assignment of metagenomic clonesC L Nesboslash
Y Boucher M Dlutek and W F Doolittle
Received 29 April 2005 accepted 1 August 2005 For correspon-dence E-mail cnesbodalca Tel (
+
1) 902 494 2968 Fax(
+
1) 902 494 1355
Lateral gene transfer and phylogenetic assignment of environmental fosmid clones
Camilla L Nesboslash
1
Yan Boucher
2
Marlena Dlutek
1
and W Ford Doolittle
1
1
Department of Biochemistry and Molecular Biology Dalhousie University and Genome Atlantic 5850 College Street Halifax Nova Scotia Canada B3H 1X5
2
Department of Biological Sciences Macquarie University Sydney NSW Australia
Summary
Metagenomic data especially sequence data fromlarge insert clones are most useful when reasonableinferences about phylogenetic origins of inserts canbe made Often clones that bear phylotypic markers(usually ribosomal RNA genes) are sought but some-times phylogenetic assignments have been based onthe preponderance of
BLAST
hits obtained with pre-dicted protein coding sequences (CDSs) Here we usea cloning method which greatly enriches for riboso-mal RNA-bearing fosmid clones to ask two questions(i) how reliably can we judge the phylogenetic originof a clone (that is its RNA phylotype) from thesequences of its CDSs and (ii) how much lateralgene transfer (LGT) do we see as assessed by CDSsof different phylogenetic origins on the same fosmidWe sequenced 12 rRNA containing fosmid clonesobtained from libraries constructed using DNA iso-lated from Baltimore harbour sediments Three of theclones are from bacterial candidate divisions forwhich no cultured representatives are available andthus represent the first protein coding sequencesfrom these major bacterial lineages The amount ofLGT was assessed by making phylogenetic trees ofall the CDSs in the fosmid clones and comparing thephylogenetic position of the CDS to the rRNA phylo-type We find that the majority of CDSs in each fosmid57ndash96 agree with their respective rRNA genesHowever we also find that a significant fraction of theCDSs in each fosmid 7ndash44 has been acquired byLGT In several cases we can infer co-transfer offunctionally related genes and generate hypotheses
about mechanism and ecological significance oftransfer
Introduction
Metagenomics or culture-independent genome analysesis increasingly being used in microbial ecology studies(Riesenfeld
et al
2004) In one of the first metagenomestudies Rondon and colleagues (2000) isolated DNA fromsoil and cloned it into a BAC vector to construct a lsquosoilmetagenomersquo library They screened the library for expres-sion of heterologous genes from the inserts and foundantibacterial lipase amylase nuclease and haemolyticactivities In another pioneering study Beja and col-leagues (2000) identified a novel type of rhodopsin pro-teorhodopsin on a genomic fragment from an uncultured
γ
-proteobacterium This novel type of phototropy in pro-teobacteria plays an important role in marine ecosystems(Beja
et al
2001 de la Torre
et al
2003) and DeLongand collaborators have shown how readily genomic phys-iological and ecological data can be incorporated into anew interdisciplinary science lsquoenvironmental genomicsrsquo
In metagenomic libraries clones containing a phyloge-netic anchor such as rRNA genes are particularly usefulas the identification of the cloned fragmentrsquos original hostis greatly facilitated (Riesenfeld
et al
2004) One signifi-cant problem associated with efficient screening of BACand fosmid libraries for bacterial rDNA containing clonesis the presence of DNA from the host used in cloning (ie
Escherichia coli
DNA) This hinders detection with thecommonly used universal bacterial 16S rRNA primersand several different alternative screening procedureshave been developed for identifying rRNA containingclones For instance Suzuki and colleagues (2004) useda screening method based on length heterogeneity of theinternal transcribed spacer (ITS) region as well as thepresence and location of tRNA-Ala to identify rRNA-genecontaining BAC-clones In some cases (de la Torre
et al
2003) phylogenetic origins of large-insert clones whichlack phylogenetic anchors have been inferred from thepreponderance of best
BLAST
hits to GenBank sequencesof known phylogenetic origin
Here we have identified rRNA clones by utilizing 23S-rRNA-intron-encoded homing endonucleases Suchendonucleases are encoded by group I introns that are
2012
C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
found in the 23S rRNA of eukaryotic chloroplasts andmitochondria (Cannone
et al
2002) as well as a fewbacteria (Nesboslash and Doolittle 2003) They very specifi-cally cleave conserved sequences in intron-free 23SrRNA genes The recognition sites are usually located inthe most conserved part of the host rRNA gene and being15ndash25 bp long are highly specific to rRNA genes Theslow evolutionary rate of these sites as well as the toler-ance for minor sequence changes by homing endonu-cleases means that rRNA genes from a wide range of taxacan usually be cut Three such enzymes belonging to theLAGLIDADG family have been particularly well character-ized I-CeuI (from the chloroplast 23S gene of
Chlamy-domonas eugametos
) (Marshall and Lemieux 1992) I-CreI (from the chloroplast 23S gene of
Chlamydomonasreinhardtii
) (Chevalier
et al
2003) and I-DmoI (from the23S gene of the crenarchaeon
Desulphurococcus mobilis
)(Aagaard
et al
1997) These enzymes have different cut-ting sites and specificities The enzyme used here I-CeuIis commercially available from New England Biolabs(httpwwwnebcom) and targets a 19-bp cut site at 23SrRNA position 1923 (relative to the
E coli
23S rRNA) thatis conserved in most bacteria
Here we present the sequences of 12 environmentalfosmid clones 10 that contain about 1000 bp of the 23SrRNA gene one that contains both 23S rRNA and 16SrRNA genes and one that contains 1079 bp of the 16SrRNA gene The metagenomic libraries containing theseclones were constructed using DNA isolated from anaer-obic sediments from Baltimore harbour Microbial commu-nities from these sediments have been shown earlier tobe capable of reductive dechlorination of PCBs (Holoman
et al
1998) The taxonomic position of each fosmid clonewas assessed using its rRNA gene The amount of lateralgene transfer (LGT) was assessed by making phyloge-netic trees of all the predicted protein coding sequences(CDSs) in the fosmid clones and comparing the phyloge-netic position of the CDS to that indicated by the rRNAgene
Results and discussion
Two different types of fosmid libraries were made fromthe anaerobic sediment DNA The first used thepCCFos- vector from Epicenter (B1BF1) and the seconda modification of that vector containing an I-CeuI sitefor specifically cloning DNA fragments containing 23SrRNA genes (B1BCF1 B1DCF1 B3CF5 B1DCF5) TheB1BF1-library contained about 10 000 clones The I-CeuI-libraries were considerably smaller and we identi-fied only 49 clones with unique 23S rRNA end-sequences However assuming one to three bacterialrRNA containing clones among every 100 clones (Suzuki
et al
2004) and considering that not all bacterial 23S
rRNA are cut by I-CeuI the number of clones obtained isclose to expected values
End-sequencing and subcloning analyses of lsquonormalrsquo fosmids
In order to get information on the diversity of genomicfragments captured in the B1BF1 library we obtained 576end-sequences resulting in 565 unique sequences withan average of 408 high-quality base pairs correspondingto 232 kb of environmental DNA Among the sequenceswe identified a 16S rRNA sequence in B1BF110d03which was fully sequenced (see below) as well as one23S rRNA containing clone We also attempted to identify23S rRNA containing clones by screening 10 96-well-plates from the B1BF1 library using I-CeuI However offour clones that appeared to be cut by I-CeuI only oneproved to contain 23S rRNAs and was fully sequenced(B1BF1a01 see below)
The distribution of G
+
C content of the end-sequencessignificant hits to proteins in GenBank (based on
BLASTX
results with
e
-values
lt
1 e
minus
10
) as well as matches to pro-teins that have been assigned to COG categories areshown in supplemental Fig S1AndashC As observed byTreusch and colleagues (2004) the distribution of the COGcategories are similar to what is observed for singlegenomes of cultivated organisms suggesting that thisrepresents an average of the genomes in this habitatTaken together the large G
+
C content variation as wellas the wide functional and phylogenetic diversity of thesequences suggests that we have sampled sequencesoriginating from a large diversity of genomes
End-sequencing of I-CeuI fosmid libraries
Four libraries were made using the pCC1FOSCeuI23Svector High-quality end-sequences of 91 clones revealed62 unique clones of which at least 49 (79) contained1000 bp of 23S rRNA Eight clones (129) did not con-tain a 23S rRNA and for five clones we could not obtainhigh-quality sequence from the end that should containthe 23S rRNA The 23S rRNA fragments showed highestsimilarity in
BLASTN
searches to sequences from severaldifferent bacterial groups
α
-proteobacteria (1)
δ
-proteo-bacteria (15)
γ
-proteobacteria (9) Firmicutes (8) Planc-tomycetes (6) Bacteroidetes (1) Actinobacteria (1)Spirochaetes (1) ChlamydiaeVerrucomicrobia (6) Thesesequence tags were not long enough to make well-supported phylogenetic trees (average sequence length
=
431 bp) however this gives a rough indication of thediversity captured by this method
These results demonstrate the efficiency of usingintron-encoded endonucleases to specifically clone rRNAcontaining DNA fragments Screening a lsquonormalrsquo fosmid
LGT and phylogenetic assignment of metagenomic clones
2013
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
library for rRNA genes usually leads to about 1 ofpositives clones (Suzuki
et al
2004) Although ourpCC1FOSCeuI23S libraries contained only 62 uniqueclones at least 49 of them (79) contained a 23S genewhich is the equivalent of screening 4900 clones from alsquonormalrsquo fosmid library Also the peripheral location of therDNA on the DNA fragments greatly facilitates screeningand sequencing It is also unlikely to have the samebiases as polymerase chain reaction (PCR) screening forthe recovery of rDNA containing clones For I-CeuI recov-ery of positives is based on a single
sim
20 bp DNA region(rather than two for PCR) and allows for a different typeof degeneracy of this DNA region (drop in cuttingefficiency for divergent sequences) Although the I-CeuIrecognition sequence is specific to bacteria (except Acti-nobacteria) other homing endonucleases such as I-CreIand I-DmoI could be used to recover archaeal and acti-nobacterial DNA fragments
Phylogenetic analyses of rRNA genes demonstrate the recovery of protein-coding genes from a wide diversity of bacterial lineages
Twelve rRNA containing fosmid clones were fullysequenced The annotation of these clones is given inTables S1ndash12 (see
Supplementary material
) and Fig 1gives an overview of the fosmid clonesrsquo phylogenetic affil-iation Figure 2A shows the phylogenetic trees estimatedfrom the 1000 bp 23S rRNA from the I-CeuI-fosmids forseven of the fosmid clones this 23S tag could be used toassign the clone to a specific bacterial lineage We have23S rRNA containing fragments from two
δ
-proteobacte-ria two
γ
-proteobacteria one
ε
-proteobacterium one
β
-proteobacterium (from this we also have the 16S rRNA)and one taxon from the phylum Chloroflexi Two fosmidclones contained 16S rRNA genes ndash B1BF110d03 andB1BF11a01 ndash and phylogenetic analyses placed thesesequences within the
Flavobacteriaceae
and
β
-proteobac-teria respectively (Figs 2B and C)
For four fosmid clones ndash b1bcf11f04 b1dcf51c12b3cf12f09 and b1dcf55a06 ndash the 23S rRNA-tag did notcluster with any specific 23S lineage For these clones weattempted to obtain the 16S rRNA sequence by using onespecific 23S rRNA primer and a universal 16S primer Wesuccessfully obtained four 16S-23S rRNA sequencesthat showed 98ndash99 identity to the 23S fragment inb11bcf11f04 (715 bp overlap) One of these ampliconswas fully sequenced and phylogenetic analyses showedthat it belong to the candidate division WS3 (Dojka
et al
1998) (Fig 2B) Because b1dcf51c12 clusters signifi-cantly with b1bcf11f04 in the 23S rRNA tree (Fig 2A)we also assigned this clone to the WS3 division Forb3cf12f09 we obtained two different 16S-23S rRNAclones that showed 100 and 99 identity to the 23S
fragment of this clone (281 bp overlap) and phylogeneticanalyses showed that this clone belongs to the candidatedivision OP8 (Hugenholtz
et al
1998) (Fig 2B) The ITSregions of both the WS3 and the OP8 rRNA operonscontained tRNA Ile and tRNA Ala For b1dcf51a06 no16S rRNA sequence could be obtained
Most protein coding sequences are in agreement with the adjacent rRNA genes in phylogenetic analyses
Phylogenetic trees were obtained for all predicted CDSsof each fosmid clone sequenced We compared the phy-logenetic placement of each CDS to the phylogenysuggested by the rRNA If the phylogeny of the CDSsuggested that it belonged to another bacterial group andthis relationship was supported in bootstrap analysesacquisition by LGT was inferred for the CDS For the clonewhere no specific phylogenetic relationship could beinferred (b1dcf51a06) and for the fosmid clones wherethe rRNA showed that it originated from a bacterium withno cultivated representative we classified as likelyinstances of LGT all CDSs that did cluster specifically(with bootstrap support) with another bacterial group Asummary of the phylogenetic analysis of the rRNA genesas well as of all protein coding CDSs is given in Table 1
The majority of the CDSs did agree with their respectiverRNA phylogeny and 57ndash96 (average 768) of theCDSs that gave good alignments and robust phylogeniesshowed the same phylogenetic relationship as the rRNAgenes This was also true for the fosmid clones frombacterial lineages with no cultivated representative asmost CDSs from these clones did not cluster with anyspecific lineage or had no or only a few significantmatches in GenBank (Fig 1) However for these clonesthe number of CDSs that robustly agree with the rRNAtopology is problematic to calculate as they may or maynot fall into well-supported clades when more sequencesfrom these phyla become available The fosmid cloneswith the highest number of congruent trees areb1bf11a01 which originated from a
β
-proteobacteriumvery similar to
Thiobacillus denitrificans
where 96 of theCDSs with robust phylogenies agree with the rRNA genesand b1dcf13c08 which originated from an
isin
-proteobac-terium where 90 of the lsquotreeablersquo CDSs agree with therRNA
High levels of LGT detected in phylogenetic analyses
Phylogenetic analyses showed that 7ndash44 (average17) of the CDSs have been acquired by LGT from dis-tantly related bacterial lineages (Fig 1 Table 1) For manyof the fosmid clones there were additional CDSs thatprobably also have been involved in LGT these caseswere not scored as LGT either because the CDS was too
2014
C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
Fig
1
Ove
rvie
w o
f th
e se
quen
ced
fosm
id c
lone
s Y
ello
w C
DS
s ar
e su
gges
ted
to h
ave
been
acq
uire
d by
LG
T a
nd b
lue
CD
Ss
have
no
sign
ifica
nt m
atch
in G
enB
ank
A
α
-pro
teob
acte
ria B
β
-pr
oteo
bact
eria
D
δ
-pro
teob
acte
ria
E
ε
-pro
teob
acte
ria
G
γ
-pro
teob
acte
ria
C
Cya
noba
cter
ia
CB
C
hlor
obi-B
acte
roid
etes
F
Fir
mic
utes
P
pro
teob
acte
ria
CH
C
hlor
oflex
i T
D
The
rmus
-D
eino
cocc
us g
roup
A
CT
Act
inob
acte
ria
PL
Pla
ncto
myc
etes
S
PIR
S
piro
chae
tes
TH
ER
T
herm
otog
ales
A
Q
Aqu
ifeca
les
FU
SO
F
usob
acte
ria
AR
CH
A
rcha
eal
EU
K
Euk
aryo
tes
EN
Ven
viro
nmen
tal s
eque
nce
c
lust
er r
obus
tly w
ithin
a m
ixed
cla
de in
phy
loge
netic
tree
s ndash
no
sign
ifica
nt m
atch
in G
enB
ank
Upp
erca
se s
uppo
rted
by
phyl
ogen
etic
ana
lysi
s L
ower
case
sug
gest
edby
BLA
ST
sea
rche
s as
the
re w
as n
o su
ppor
ted
phyl
ogen
y T
he lo
w-q
ualit
y re
gion
in b
1dcf
13
c08
(pos
ition
119
2ndash13
42)
is in
dica
ted
by a
bla
ck b
ox T
he o
rang
e sh
adin
gs in
dica
tes
LGT-
CD
Ss
that
are
foun
d in
mor
e th
an o
ne fo
smid
ORFAN
A c
onju
gativ
e tr
ansp
oson
ob
tain
ed fr
om a
Bac
terio
ides
bac
teriu
m
unkn
own
b1dc
f51
a06
Chl
orof
exi
b1dc
f13
f01
Can
dida
te d
ivsi
on O
P8
b3cf
12
f09
Can
dida
te d
ivsi
on W
S3
b1bc
f11
f4
Can
dida
te d
ivsi
on W
S3
b1bc
f51
c12
d-pr
oteo
bact
eria
b1bc
f11
h03
d-pr
oteo
bact
eria
b1bc
f11
d04
e-pr
oteo
bact
era
b1dc
f13
c08
g-pr
oteo
bact
eria
b1dc
f12
d07
g-pr
oteo
bact
eria
b1bc
f11
c04
b-pr
oteo
bact
eria
b1bf
11
a01
Fla
voba
cter
iace
aeb1
bf1
10d
03
LGT and phylogenetic assignment of metagenomic clones
2015
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
Fig 2
rRNA phylogeniesA The minimum evolution tree estimated from LogDet distances of the 23S-tag from the CeuI-fosmids (984 positions in alignment) For the sequences from the fosmid clones the Maximum Likelihood topology was similar (GTR
+
G
+
I) except that the
δ
-proteobacteria where paraphyletic with the
γ
- and
β
-proteobacteria clustering within the
δ
-proteobacteria Moreover b1bcf11d04 fell at the bottom of this cladeB The minimum evolution tree estimated from LogDet distances of the 16S sequences (1243 positions in alignment) For the sequences from the fosmid clones the Maximum Likelihood (GTR
+
G
+
I) topology was identical However there where several differences in the backbone of the tree with for instance Geobacter clustering with Firmicutes The trees in both A and B were rooted by the
Thermotoga maritima
sequenceC The minimum evolution tree estimated from LogDet distances of the closest matches of the 16S fragment in b1bf110d03 (1046 positions in alignment) The Maximum Likelihood (GTR
+
G
+
I) topology was identicalFor all three trees numbers on branches refers to bootstrap values from the minimum evolution analysis (
italic
) and from the Maximum Likelihood analysis (plain text) If both bootstrap values were above 70 this is indicated by a grey circle while a black circle indicated that all three values were above 90
B
Thermotoga maritima Coprothermobacter proteolyticus
Acidobacterium capsulatumPirellula marina
R76-B102OPB95
OPB5HMMVPog-54
HS9-30
PBS-II-35
LD1-PB19PBS-III-30
PRR-12Simkania negevensisBorrelia burgdorferi
Synechococcus Chloroflexus aurantiacus
Dehalococcoides ethenogenes Bacteroides thetaiotaomicron
Cytophaga hutchinsoniiChlorobium tepidum
Leptospirillum ferrooxidans Deinococcus radiodurans
Geobacillus subterraneus Paenibacillus popilliae
Fusobacterium nucleatum Geobacter metallireducens
Bradyrhizobium japonicum Vibrio splendidus
Methylobacillus flagellatum Thiobacillus denitrificans
005 substitutionssite
b3cf12f09
b1bcf11f04
b1bf11a01
candidate division OP8
candidate division WS3
Betaproteobacteria
92
72
54
78
57
75
Porphyromonas gingivalis
Bacteroides thetaiotaomicron
Cytophaga hutchinsonii
Cellulophaga pacifica
Flavobacterium gelidilacus
Flavobacterium psychrolimnae
Flavobacterium frigoris
Flavobacterium xinjiangensis
Gelidibacter algens
Bizionia paragorgiae
Formosa algae
Algibacter lectus
Flavobacterium sp 5N-3
Psychroserpens burtonensis
Mesophilibacter yeosuensis
b1bf110d03
Flavobacteriaceae bacterium BSA CS 02
Flavobacteriaceae bacterium BSD RB 42
001 substitutionssite
C
isolated from estuarine and salt marsh sediments
b3cf12f09Chlorobium tepidum
Synechocystis sp D64000
Deinococcus radiodurans
b1dcf13f01Dehalococcoides ethenogenes
b1dcf511a06Fusobacterium nucleatum
b1bcf11f04b1dcf51c12
Mycobacterium kansasiiStreptomyces coelicolor Thermomonospora chromogena
Paenibacillus popilliaeOceanobacillus iheyensis
Geobacillus kaustophilus
Simkania negevensis Pirellula sp strain 1
b3cf12d07Pseudomonas stutzeri
005 substitutionssite
candidate division WS3
Wolinella succinogenes Helicobacter hepaticus
Campylobacter jejuni b1dcf13c08
Epsilonproteobacteria
b1bcf11d04Desulfotalea psychrophila
b1bcf11h03Nannocystis exedens
Stigmatella aurantiacaGeobacter metallireducens
Deltaproteobacteria
Methylobacillus flagellatusb1bf11a01Thiobacillus denitrificans
Halomonas pantelleriensis
Microbulbifer degradansVibrio splendidus
b1bcf11c04Uncultured bacterium 463 clone EBAC080-L32B05
Betaproteobacteria
Gammaproteobacteria
Thermotoga maritima
candidate division OP8
Chloroflexi
Symbiobacterium thermophilum
Bacillus cereus
Desulfovibrio vulgaris
A
51
6197
87
55
67
54100
61
58
84
57
8968
58
97
54
65
64
68
73
51
58
53
87
58
2016
C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
short to obtain reliable alignments the CDS was found ina lsquomixedrsquo clade also containing genes from the same bac-terial group or the CDS was found outside its group butdid not cluster with any specific lineage For three of theclones more than 30 of the CDSs have been acquiredby LGT (Table 1) two of these are from candidate divi-sions and one is from a
δ
-proteobacterium For all threeof these fosmids there appears to have been a transfer ofa large island of genes from a phylogenetically distantlineage Specifically we infer an
α
-proteobacterial islandin b3cf12f09 a
δ
-proteobacterial island in b1dcf51c12and an archaeal
β
-proteobacterial island in b1bcf11d04(Fig 1) It should be noted that the proportions of foreigngenes identified here might not represent the proportion
of foreign genes in the respective genomes that we havesampled but
rather the amount of LGT to be expectedwhen sequencing environmental fosmid clones
Forinstance in some genomes LGT might be enriched incertain variable parts of the genome Indeed the distribu-tion of proteins that match COG categories was signifi-cantly different (
P
=
13 e-13 in a
χ
2
-test) to what weobserved for the end-sequencing of lsquonormalrsquo fosmidclones (supplemental Fig S1) the main difference beingproportionally more J K U F and H category sequencesin the full fosmid sequences and more L P R and Scategory sequences among the end-sequences Whencomparing the distributions of different COG-groups (ieinformational metabolism etc) however the two datasets were significantly different only when including thepoorly characterized categories (R S) If such genes aremore frequently transferred than the other categoriesthen we would be underestimating the level of LGT thatwould be expected when analysing metagenomic clones
Interestingly in b1bcf11d04 the transfer vector for oneof the acquired gene clusters could be identified ORF6encodes an acetyl transferase gene and ORF8 ORF9and ORF10 encode subunits for an acyl-CoA synthase ndashtwo
α
-subunits and one
β
-subunit Phylogenetic analysessuggested all four CDSs have been acquired by LGTlikely from a
β
-proteobacterium The
β
-proteobacteriahave in turn likely acquired the acyl-CoA synthase genesfrom Archaea (Fig 3) In support of the archaeal origin ofthese genes the acyl-CoA synthase in bcf11d04 hassimilar domain organization to the acetyl-CoA synthase in
Pyrococcus
spp with two subunits (Sanchez
et al
2000)Furthermore these genes have been transferred multipletimes and the transfers involved all three domains of life[Fig 3 (Andersson
et al
2003)] These transferred CDSsare preceded by one integrase gene (ORF3) a trans-posase gene (ORF4) and an intergerasetransposasegene (ORF5 COG2801 Tra5 which contains an inte-gerase core domain Table S7) which probably wereresponsible for transferring this cluster into this genomeThe
α
-proteobacterial island in the b3cf12f09 cloneencodes a wide range of different functions and no typicalmobile elements were identified However as this islandextends to the 3
prime
end of the fosmid mobile genes mightbe found further downstream The first CDS of this islandencodes a DnaJ-class chaperone (ORF29) which is trun-cated at the 5
prime
end This pseudogene still shows 65protein identity to a homologue in
Magnetoospirillummagnetotacticum
(Table S3) Hence this probably repre-sents a very recent transfer (or rearrangement) Anotherpossibility is that this fosmid might be a chimera Howeverthe G
+
C content of the CDSs in the
α
-proteobacterialisland (595 G
+
C) is very similar to the rest of thefosmid (596 G
+
C supplemental Table S3) Also fur-ther upstream there is a proteobacterial transposase
Fig 3
Maximum Likelihood phylogeny of acetyl-CoA synthetase (ADP-forming) homologues estimated using PMBML (459 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF8 and ORF10 sequences against GenBank and the 100 best matches were retrieved and aligned Groups of very similar sequences from the same species or sister species were trimmed down to one sequence representative The tree was arbi-trarily rooted by Entamoeba histolytica Numbers on branches refers to bootstrap support obtained from using PMBML in bold PUZZLEBOOT in plain text and Neighbour-joining in italic If all bootstrap values were above 70 this is indicated by a grey circle while a black circle indicated that all three values were above 80
10
Entamoeba histolytica Parachlamydia sp UWE25
Rubrobacter xylanophilus Gloeobacter violaceus
Nostoc sp PCC 7120Thermosynechococcus elongatus
Dechloromonas aromaticaMesorhizobium sp BNC1
Sinorhizobium melilotiXanthomonas axonopodisRhodopseudomonas palustris
Bradyrhizobium japonicum Desulfovibrio desulfuricans
Rhodospirillum rubrumMagnetospirillum magnetotacticum
Magnetospirillum magnetotacticumShewanella oneidensis
Photobacterium profundumVibrio cholerae
Vibrio vulnificus Photorhabdus luminescens
Yersinia pestis Salmonella enterica
Escherichia coli Methanopyrus kandleri
Pyrococcus furiosus Archaeoglobus fulgidus
Methanococcus maripaludisMethanocaldococcus jannaschii
Magnetococcus sp MC-1 Chloroflexus aurantiacus
Spironucleus barkhanus Giardia intestinalis
Pyrococcus furiosusThermoplasma acidophilum Thermoplasma volcanium
Pyrococcus furiosus Streptomyces avermitilisBradyrhizobium japonicum
Ralstonia metalliduransFerroplasma acidarmanus
Sulfolobus solfataricusSulfolobus tokodaii
Pyrococcus furiosusPyrococcus furiosus
Pyrobaculum aerophilumMethanosarcina mazei Methanosarcina acetivoransThermobifida fusca
Archaeoglobus fulgidusArchaeoglobus fulgidus
Archaeoglobus fulgidusArchaeoglobus fulgidus
b1bcf11d04ORF8b1bcf11d04ORF10
Bordetella bronchiseptica Ralstonia metallidurans
Bordetella pertussis Bordetella bronchiseptica
Burkholderia fungorumBurkholderia fungorumRalstonia eutropha
Bordetella bronchisepticaRalstonia eutropha
Bradyrhizobium japonicumRalstonia eutropha
Burkholderia fungorumBordetella bronchiseptica
Ralstonia eutrophaBordetella bronchiseptica
Bradyrhizobium japonicumBordetella bronchiseptica
Pseudomonas mendocina Bradyrhizobium japonicum
7480
9764
75
52
83
52
57
60
61
70
89
51
64
6262
64
57
58
50
7173
62
100100
LGT and phylogenetic assignment of metagenomic clones 2017
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Tab
le 1
S
umm
ary
of p
hylo
gene
tic a
naly
ses
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
b1dc
f51
a06
No
clea
r af
filia
tion
with
exi
stin
gse
quen
ces
Cou
ld n
ot b
eam
plifi
ed
Mos
t C
DS
s ha
ve n
o or
only
a f
ew s
igni
fican
tm
atch
es in
Gen
Ban
kO
RF
4 cl
uste
rs w
ithLe
ptos
pira
inte
rrog
ans
with
in a
mix
ed c
lade
ho
wev
er
L in
terr
ogan
sha
s se
vera
l par
alog
ues
and
this
gen
e ap
pear
sto
hav
e be
en f
requ
ently
tran
sfer
red
and
islik
ely
to b
e a
tran
sfer
OR
F20
clu
ster
s w
ithM
etha
nosa
rcin
a w
ithin
δ-pr
oteo
bact
eria
O
RF
19cl
uste
rs w
ith G
eoba
cter
but
is m
ostly
foun
d in
met
hano
gens
OR
F17
and
OR
F18
have
hom
olog
ues
inM
etha
noge
ns o
nly
4 C
DS
s (1
9 o
f th
eto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
33
(38
)
b1dc
f13
f01
Clu
ster
s w
ithD
ehal
ococ
coid
eset
heno
gene
sC
hlor
oflex
usau
rant
iacu
s 23
SrR
NA
seq
uenc
eof
too
poo
r qu
ality
to in
clud
e in
the
tree
7 of
10
CD
Ss
(70
) w
ithsu
ppor
ted
phyl
ogen
etic
topo
logi
es a
gree
with
23S
fra
gmen
t In
addi
tion
6 C
DS
s w
hich
only
hit
Chl
orofl
exus
aura
ntia
cus
Two
CD
Ss
have
like
lybe
en a
cqui
red
thro
ugh
LGT
One
clu
ster
s w
ithhi
gh s
uppo
rt w
ithT
herm
otog
a m
ariti
ma
(OR
F16
) an
d on
e cl
uste
rsw
ithin
the
euk
aryo
tes
(OR
F25
)
OR
F2
has
only
sign
ifica
ntho
mol
ogue
s in
Cro
cosp
haer
aw
atso
nii
3 C
DS
s (1
1 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
14
(5
)
b3cf
12
f09
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A g
enes
and
do
not
clus
ter
with
in a
nysp
ecifi
c ba
cter
ial g
roup
Phy
loge
netic
ana
lysi
ssu
gges
ts t
hat
10 C
DS
sha
ve li
kely
bee
n ac
quire
dby
LG
T 8
of
thes
e ha
vebe
en a
cqui
red
from
an
α-pr
oteo
bact
eriu
man
d ar
e fo
und
linke
d
Thr
ee C
DS
s fo
und
linke
d to
CD
Ss
whe
reph
ylog
enet
ic a
naly
ses
sugg
est
LGT
hav
eal
so li
kely
bee
nac
quire
d by
LG
T
13 C
DS
s (3
2 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
OR
F16
is a
tran
spos
ase
of
prot
eoba
cter
ial
orig
in
and
show
slo
wer
GC
con
tent
than
the
res
t of
the
fosm
id T
wel
ve o
fth
e tr
ansf
erre
dC
DS
s (O
RF
29ndash
41)
are
linke
d an
dal
l app
ear
to h
ave
been
acq
uire
dfr
om a
n α-
prot
eoba
cter
ium
22
(9
)
b1bc
f11
f04
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A a
nd d
oes
not
clus
ter
with
any
spe
cific
bact
eria
l lin
eage
A
mon
g th
ese
was
the
high
ly c
onse
rved
Dna
Ege
ne
Two
CD
Ss
(OR
F14
and
OR
F15
) cl
uste
r w
ithse
quen
ces
from
the
Chl
orob
iBac
tero
idet
esgr
oup
2 C
DS
s (9
o
f to
tal)
hav
e b
een
acq
uir
ed b
y L
GT
26
(14
)
2018 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f51c
12C
andi
date
div
isio
nW
S3
bact
eriu
mM
ost
CD
Ss
have
no
oron
ly a
few
sig
nific
ant
mat
ches
in G
enB
ank
OR
F6ndash
OR
F11
are
als
ofo
und
in b
1bcf
11
h3 in
sam
e or
der
and
phyl
ogen
etic
ana
lysi
ssu
ppor
ts t
hat
OR
F7
OR
F8
and
OR
F10
wer
etr
ansf
erre
d fr
om a
δ-
prot
eoba
cter
ium
to
b1bc
f51c
12 O
RF
10 a
ndO
RF
11 a
lso
clus
ter
with
δ-pr
oteo
bact
eria
ho
wev
er
with
no
boot
stra
p su
ppor
t O
RF
9ha
s on
ly o
ne m
atch
inG
enB
ank
OR
F15
(fu
sA)
clus
ters
with
Chl
orob
ium
tepi
dum
with
inF
irm
icut
es
OR
F12
has
no
hom
olog
ue in
b1bc
f11
h3
but
doe
scl
uste
r w
ith δ
-pr
oteo
bact
eria
ho
wev
er w
ith n
obo
otst
rap
supp
ort
It is
like
ly t
hat
also
thi
sC
DS
was
tra
nsfe
rred
as p
art
of w
ith a
δ-
prot
eoba
cter
ial i
slan
d
8 C
DS
s (4
4 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
One
lar
ge lsquoi
slan
drsquo o
fδ-
prot
eoba
cter
ial
orig
in
22
(29
)
b1cf
11
1h0
3δ-
Pro
teob
acte
rium
ndash8
of 1
3 C
DS
s (5
7)
that
give
s su
ppor
ted
phyl
ogen
ies
agre
e w
ithth
e fr
agm
ent
orig
inat
ing
from
a δ
-pr
oteo
bact
eriu
m
Six
CD
Ss
have
like
ly b
een
acqu
ired
by L
GT
OR
F8
clus
ters
with
Clo
strid
ium
ther
moc
ellu
m a
ndTr
epon
ema
dent
icol
aO
RF
18 is
fou
ndse
para
ted
from
oth
erpr
oteo
bact
eria
inph
ylog
enet
ic t
rees
cl
uste
ring
with
Pla
smod
ium
spp
O
RF
23is
fou
nd in
a m
ixed
cla
dean
d ap
pear
s to
hav
ebe
en f
requ
ently
tran
sfer
red
OR
F28
clus
ters
with
β-
prot
eoba
cter
ia
OR
F29
clus
ters
with
γ-
prot
eoba
cter
ia a
ndO
RF
30 is
fou
nd a
tbo
ttom
of
clad
e th
atco
ntai
ns α
-pr
oteo
bact
eria
and
Act
inob
acte
ria
6 C
DS
s (1
7 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
OR
F11
ndashOR
F16
ha
ve b
een
tran
sfer
red
from
an
ance
stor
of
B1B
CF
11
h03
tob1
dcf5
1c
12 a
sw
ell t
o th
eC
hlor
obiu
m li
neag
e
6 (
1)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2019
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f11
d04
δ-P
rote
obac
teriu
mndash
12 o
f 18
CD
Ss
(67
)w
ith s
uppo
rted
phyl
ogen
etic
top
olog
ies
agre
e w
ith a
δ-
prot
eoba
cter
ial o
rigin
of
the
frag
men
t
Six
CD
Ss
are
sugg
este
dby
phy
loge
netic
ana
lyse
sto
hav
e be
en a
cqui
red
byLG
T O
ne o
f th
ese
tran
sfer
red
gene
s ndasht
hefu
sA h
omol
ogue
(OR
F19
) ndash is
als
o fo
und
inb1
bcf5
c12
Thi
s C
DS
has
been
tra
nsfe
rred
to
othe
r δ-
prot
eoba
cter
ia a
sw
ell
Thr
ee C
DS
s (O
RF
3ndash5)
that
enc
ode
anin
tege
rase
and
tw
otr
ansp
osas
es t
hat
prec
edes
fou
r of
the
LGT
gen
es d
etec
ted
in t
he p
hylo
gene
tican
alys
is
OR
F7
also
likel
y tr
ansf
erre
d w
ithO
RF
3 ndashO
RF
10
OR
F20
and
OR
F21
have
mai
nly
hom
olog
ues
inF
irm
icut
es a
nd is
the
neig
hbou
r of
OR
F19
that
has
als
o be
enac
quire
d fr
omF
irm
icut
es
12 C
DS
s (3
1 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
Inte
rest
ingl
y th
isfo
smid
clo
nepr
ovid
es t
hetr
ansf
er v
ecto
r ndash
the
inte
gera
se a
ndtr
ansp
osas
e ndash
for
8of
the
tra
nsfe
rred
gene
s
ndash
b1bc
f13
c08
ε-P
rote
obac
teriu
m
mos
t cl
osel
yre
late
d to
Cam
pylo
bact
erje
juni
21 C
DS
s gi
ve s
uppo
rted
phyl
ogen
ies
and
ofth
ese
19 (
90
) ag
ree
with
rR
NA
OR
F4
clus
ters
with
Geo
bact
er a
ndC
lost
ridiu
m
and
OR
F23
does
not
hav
eho
mol
ogue
s in
ε-
prot
eoba
cter
ia a
ndcl
uste
rs w
ith γ
- an
d β-
prot
eoba
cter
ia
OR
F24
doe
s no
t gi
ve a
supp
orte
d tr
ee b
utha
s al
so p
roba
bly
been
tra
nsfe
rred
fro
mγ-
or
β-pr
oteo
bact
eria
3 C
DS
s (7
o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
10
(3
)
b3cf
12
d07
γ-P
rote
obac
teriu
m
Clu
ster
s w
ithin
the
γ-pr
oteo
bact
eria
inLo
gDet
dis
tanc
etr
ees
but
at t
heba
se o
f γ-
prot
eoba
cter
ia a
ndβ-
prot
eoba
cter
iain
the
bes
tm
axim
umlik
elih
ood
tree
Onl
y 7
CD
Ss
give
su
ppor
ted
phyl
ogen
ies
O
f th
ese
4 (5
7)
agre
e w
ith r
RN
A
OR
F7
clus
ter
with
in β
-pr
oteo
bact
eria
OR
F15
ha
s a
patc
hy d
istr
ibut
ion
and
does
not
clu
ster
with
ot
her
prot
eoba
cter
ia in
th
e ph
ylog
enet
ic t
ree
Sev
eral
add
ition
al C
DS
s (O
RF
16ndashO
RF
25)
that
did
not
prod
uce
wel
l-re
solv
ed t
rees
ha
d on
ly d
iver
gent
hom
olog
ues
inG
enB
ank
or
nosi
gnifi
cant
hom
olog
ues
may
also
hav
e be
enac
quire
d by
LG
T I
nsu
ppor
t of
thi
sO
RF
26 e
ncod
es a
tran
spos
ase
2 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
O
RF
16 ndash
OR
F25
w
as n
ot in
clud
ed in
es
timat
e du
e to
lim
ited
evid
ence
for
th
e tr
ansf
er o
f the
se
23
(23
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
2020 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f1c
04γ-
Pro
teob
acte
rium
ndash14
CD
Ss
give
sup
port
edph
ylog
enie
s an
d of
thes
e 13
(93
)
agre
ew
ith r
RN
A
Phy
loge
netic
ana
lyse
ssh
ow t
hat
two
CD
Ss
have
bee
n ac
quire
d by
LGT
OR
F3
is f
ound
in a
mix
ed c
lade
whi
leO
RF
30 c
lust
er w
ithin
β-
prot
eoba
cter
ia
Thr
ee g
enes
tha
t sh
owun
cong
ruen
tph
ylog
enie
s b
utw
ith lo
w b
oots
trap
supp
ort
foun
d cl
ose
to O
RF
3 an
d O
RF
34ha
ve p
roba
bly
also
been
acq
uire
d by
LGT
O
RF
5 cl
uste
rsw
ith β
-pro
teob
acte
ria
OR
F31
clu
ster
s w
ithδ-
prot
eoba
cter
ia
and
OR
F32
(G
ST
) cl
uste
rsw
ith a
γ-pr
oteo
bact
eriu
m
but
appe
ars
toha
ve b
een
freq
uent
lytr
ansf
erre
d
5 C
DS
s (1
3 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
3 (
1)
b1bf
11
a01
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns (
98
iden
tity
at 2
3S
rRN
A)
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns(9
8 id
entit
yat
16S
rR
NA
)
Hig
h de
gree
of
gene
sy
nten
y co
mpa
red
with
Thi
obac
illus
de
nitr
ifica
ns
29 C
DS
sha
ve b
est
BLA
ST
mat
chin
Thi
obac
illus
de
nitr
ifica
ns 2
7 of
28
CD
Ss
(96
) th
at g
ive
stat
istic
ally
sup
port
edph
ylog
enie
s ag
ree
with
rR
NA
gen
es
One
OR
F30
(R
suA
)cl
uste
r w
ith γ
-pr
oteo
bact
eria
and
has
no
hom
olog
ue in
T
hiob
acill
us d
enitr
ifica
ns
Two
CD
Ss
(OR
F14
and
O
RF
31)
have
bee
n tr
ansf
erre
d to
bot
h fo
smid
an
d T
hiob
acill
us
deni
trifi
cans
OR
F29
has
no
sign
ifica
nt
hom
olog
ues
inpr
oteo
bact
eria
4 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
3 (
2)
b1bf
110
d03
ndashA
Fla
voba
cter
iace
aeba
cter
ium
am
ong
sequ
ence
dge
nom
es m
ost
clos
ely
rela
ted
toC
ytop
haga
hutc
hins
onii
16 o
f 18
(84
) C
DS
s w
ith
supp
orte
d ph
ylog
enet
icto
polo
gies
agr
ee w
ith16
S f
ragm
ent
OR
F5
and
OR
F10
hav
e no
cl
ose
hom
olog
ues
in
othe
r B
acte
roid
es a
ndph
ylog
enet
ic a
naly
sis
sugg
ests
fre
quen
ttr
ansf
er
OR
F4
has
no d
etec
tabl
eho
mol
ogue
s in
oth
er
Bac
tero
ides
A
tran
spos
on w
ith 8
C
DS
s lik
ely
acqu
ired
from
rel
ativ
e of
Bac
tero
ides
thet
aiot
aoim
icro
n
3 C
DS
s (1
0 o
f to
tal)
h
ave
likel
y b
een
acq
uir
ed b
y L
GT
The
tra
nspo
son
not
incl
uded
as
it ha
sbe
en t
rans
ferr
edw
ithin
the
B
acte
roid
es
10
(3
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
a O
nly
LGT
eve
nts
invo
lvin
g th
e C
DS
fro
m t
he fo
smid
clo
ne a
naly
sed
was
cou
nted
and
onl
y w
hen
they
wer
e su
ppor
ted
by p
hylo
gene
tic a
naly
ses
or c
lear
phy
loge
netic
dis
trib
utio
n pa
ttern
s (i
e
the
gene
is n
ot p
rese
nt in
its
rRN
A g
roup
but
pre
sent
in s
ome
othe
r di
stin
ct b
acte
rial g
roup
) N
umbe
r of
CD
Ss
acqu
ired
by L
GT
is s
how
n in
bol
db
O
RFa
ns w
here
cla
ssifi
ed a
s C
DS
s w
ith n
o si
gnifi
cant
mat
ch in
Gen
Ban
k M
atch
es t
o se
quen
ces
in t
he e
nviro
nmen
tal p
ortio
n of
Gen
Ban
k w
ere
not
cons
ider
ed I
n pa
rent
hesi
s is
giv
en t
he
prop
ortio
n of
pro
tein
cod
ing
DN
A t
hat
has
no m
atch
in G
enB
ank
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2021
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
(ORF16) showing that this lineage has indeed acquiredproteobacterial genes This CDS might have been part ofthe α-proteobacterial island upon transfer
In the Flavobacteriaceae fosmid b1bf11d10 a largeself-transmitting conjugative transposon was identified(Fig 1) This transposon is inserted next to a tRNA and issimilar in sequence and structure to the transposonsfound in Bacteroides thetaiotaomicron (Xu et al 2003)Bacteroides fragilis (Kuwahara et al 2004) and Porphy-romonas gingivalis (Nelson et al 2003) In the phyloge-netic tree of the transposase gene (ORF21) the CDSfrom the fosmid falls into a cluster containing numerousB thetaiotaomicron sequences separated from the singleCytophaga hutchinsonii homologue detected among the100 best BLAST hits For the other CDSs that are clearlypart of this transposon (ORF22ndashORF27) we found no
significant homologues in C hutchinsonii and the best(and in most cases the only) match was always to Bthetaiotaomicron and P gingivalis genes suggesting thatthis transposon has been acquired from the Bacteroidaleslineage It is likely that we have captured only part of thistransposon ndash because many of the CDSs found in thetransposons in B thetaiotaomicron are not present in thefragment we have sequenced ndash and that also the 3prime CDSsin this fosmid clone (ORF28ndashORF30) were transferredalong with this transposon Additional CDSs (possibly notinvolved in transposon function) where also present in theB thetaiotaomicron transposons (Xu et al 2003) Wenote that the acquisition of this transposon was notincluded in our LGT estimate as it originated from thesame major bacterial group as the fosmid clone
Interestingly one gene was found to have been trans-ferred to two of the fosmids the fusA paralogue inb1bcf11d04 and b1dcf51c12 (Figs 1 and 4) This pro-tein appears to be a distant paralogue of fusA and it hasa very patchy phylogenetic distribution suggesting that itoriginated in one of the lineages that possesses it andthen has been transferred to the other lineages Onecharacteristic common to the organisms encoding thisprotein is that they are all anaerobes or microaerophilic(Symbiobacterium thermophilum) and they are all foundin environments similar to the one sampled here Trans-ferred genes are likely to give a selective advantage in theenvironment where the organisms harbouring them liveand an ecological function for this fusA paralogue shouldbe sought
Another set of genes identified in two of the fosmidclones forms a cluster encoding outer membrane proteinsand proteins involved in biopolymer transport (OmpATolB TonB ExbD TolQ) This cluster is found in both thecandidate division WS3 clone b1dcf51c12 and the δ-proteobacterial clone b1bcf11h03 (Fig 1) In this casethe gene cluster appears to have been transferred from aδ-proteobacterium to b1dcf51c12 while it might be nativeto b1bcf11h03 (Fig 5) This gene cluster also appearsto have been transferred to Chlorobium tepidum as bothb1dcf51c12 and C tepidum cluster within the δ-proteo-bacteria for all these genes except TonB (from which wecould not make a reliable alignment) Robust phylogenieswere only obtained from OmpA and TolB However theconserved gene order in b1dcf51c12 C tepidumb1bcf11h03 and other δ-proteobacteria such as Geo-bacter suggests that this entire 4-kb fragment was trans-ferred from a δ-proteobacterium to C tepidum andb1dcf51c12 probably as two separate events Moreoverfor b1dcf51c12 the fusA paralogue discussed abovemay have been transferred as part of this gene cluster asthey are found close together in this clone The second δ-proteobacterial fosmid clone b1bcf11d04 also containsan OmpA homologue However this CDS is distantly
Fig 4 Maximum Likelihood phylogeny fusA homologues estimated using PMBML (661 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF19 and b1dcf51c12 ORF15 sequences against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbitrarily rooted by Aquifex aeolicus Results from bootstrap analyses are indicated as in Fig 3
10
Aquifex aeolicus Thermotoga maritima
Chlorobium tepidum b1dcf51c12ORF15
b1bcf11d04ORF19Desulfovibrio vulgaris
Desulfotalea psychrophila Magnetococcus sp MC-1
Geobacter sulfurreducens Geobacter metallireducens
Moorella thermoacetica Desulfitobacterium hafniense
Symbiobacterium thermophilum Chloroflexus aurantiacus
Dehalococcoides ethenogenesThermoanaerobacter tengcongensis
Clostridium thermocellumFusobacterium nucleatum
Clostridium perfringensClostridium tetani
Thermus thermophilus Rubrobacter xylanophilus
Mycoplasma penetransUreaplasma parvum
Geobacillus stearothermophilusExiguobacterium sp 255-15
Bacillus cereus Bacillus halodurans
Listeria monocytogenes Bacillus subtilis
Oceanobacillus iheyensis Staphylococcus aureus
Lactobacillus johnsonii Pediococcus pentosaceusLactobacillus plantarum
Enterococcus faecalisLactococcus lactis
Streptococcus mutans Streptococcus agalactiae
Moorella thermoacetica Symbiobacterium thermophilum
Thermoanaerobacter tengcongensis Clostridium thermocellum
Clostridium acetobutylicumClostridium perfringens
Clostridium tetani Chlorobium tepidum
Fusobacterium nucleatumThermobifida fusca
Desulfovibrio desulfuricansMagnetococcus sp MC-1
Geobacter sulfurreducensSynechococcus elongatus
Prochlorococcus marinus Synechococcus sp WH 8102
Thermosynechococcus elongatus Nostoc punctiforme
Synechocystis sp PCC 6803 Trichodesmium erythraeum
Spirulina platensis Campylobacter jejuni Helicobacter pylori Wolinella succinogenes
Legionella pneumophilaMethylococcus capsulatus
Coxiella burnetii Photorhabdus luminescens
Pasteurella multocida Shewanella oneidensis Photobacterium profundum Vibrio parahaemolyticusNeisseria meningitidis
Chromobacterium violaceum Bordetella parapertussis
Ralstonia metallidurans Bordetella bronchiseptica Burkholderia pseudomalleiRalstonia metallidurans
Azoarcus sp EbN1 Dechloromonas aromatica
Nitrosomonas europaea Thiobacillus denitrificans
66
57 65 55
61
5160
9072
80
86
88
6090
63
50 52 75 74
9094
50 68 74
78
53
7985
8481
72
53 9968
7790
70
2022 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
related to the OmpA found in this gene cluster and wasnot included in the alignment
We also identified some mobile genes that might beinvolved in biodegradation of pollutants by searching thePfam database In one of the γ-proteobacterial fosmidsb1bcf11c4 we identified a glutathione-S-transferase(GST ORF36) gene that was flanked by an acetyltrans-ferase gene (ORF35) and a transporter (ORF34) Eukary-otic GSTs are important in detoxifying metabolism Wellcharacterized bacterial GSTs (such as dichloromethanedehalogenase and 12-dichloroepoxyethane epoxidase)on the other hand are catabolic enzymes that play anessential role in growth on various difficult-to-degradechemicals (Vuilleumier and Pagni 2002) Considering theenvironment the fosmid originated from ndash highly pollutedmarine sediments ndash these CDSs would be good candi-dates for genes involved in biodegradation of a xenbiotic
compound The b1bf11c4 GST-gene clusters with a γ-proteobacterium (Acinetobacter sp ADP1 Accession noYP_046221) However as observed by Vuilleumier andPagni (2002) the phylogeny suggests that this gene hasbeen frequently transferred In support of this CDS havingbeen acquired by LGT its neighbour ndash ORF34 ndash clustersrobustly within the β-proteobacteria while ORF35 clusterswith δ-proteobacteria (although with no bootstrapsupport)
Another gene that might be involved in biodegradationof pollutants was identified among the CDSs that havebeen transferred into the β-proteobacterial fosmidb1bf11a01 ndash ORF31 which encodes a dienelactonehydrolases Dienelactone hydrolases play a crucial role inchlorocatechol degradation via the modified ortho cleav-age pathway (Eulberg et al 1998 Muller et al 2004)suggesting that the bacterium from which this fragmentoriginated might use chloraromatic compounds as energysource However it should be noted that this CDS is foundin a cluster of CDSs from genome projects with no exper-imentally confirmed function Again this gene is flankedby other genes that also have been acquired by LGT Thephylogeny of the neighbouring genes ndash ORF30 an S4domain protein suggests that it has been acquired froma γ-proteobacterium The next gene upstream ORF29could not be used in phylogenetic analyses However thisCDS has no match in its close relative T denitrificans andits best match was to a conserved membrane protein fromClostridium tetani (Table S11) Thus it is likely that allthese genes have been acquired by LGT Notably a shortinverted repeat (80 identity) was found to flank thesegenes (34021ndash34040 36693ndash36674)
Few laterally transferred CDSs identified by G + C content
Differences in G + C content are commonly used as anindication of recent LGT (Lawrence and Ochman 1997)We identified only eight CDSs that showed a G + C con-tent 10 higher or lower than the average for the respec-tive fosmid clone (see Tables S1ndash12) ORF20 in the δ-proteobacterial clone b1bcf11h3 has a G + C content of475 compared with 366 for the complete fosmid ThisCDS clusters with Desulfovibrio vulgaris within a mixedclade with no bootstrap support and was not included inthe LGT estimate for this fosmid A very short ORFan(ORF1) in the candidate division OP8 clone b3cf12f09has a G + C content of 436 compared with 594 forthe fosmid clone In addition the transposase (ORF16)and its neighbouring ORFan (ORF17) in the same clonehave a G + C content of 463 and 402 respectivelyORF11 ORF13 and ORF14 in the γ-proteobacterial cloneb3cf12d07 all show higher G + C content than the restof the fosmid with 664 657 and 647 comparedwith 525 for the rest of the fosmid All these CDSs
Fig 5 Maximum Likelihood phylogeny of OmpA homologues esti-mated using PMBML (135 positions in alignment) The sequences were obtained by blasting the b1dcf51c12 ORF7 sequence against Gen-Bank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative We also removed three sequences from Chlamydiaceae as these sequences formed a long unstable branch in the tree as well as some sequences that where considerably shorter than the remaining alignment The tree was arbitrarily rooted by Agrobacterium tumefaciens Results from bootstrap analyses are indicated as in Fig 3
10
Agrobacterium tumefaciens Sinorhizobium meliloti
Brucella melitensis Mesorhizobium loti
Mesorhizobium sp BNC1 Helicobacter bizzozeronii
Bartonella henselae Rhodopseudomonas palustris Bradyrhizobium japonicum
Rhodobacter sphaeroidesSilicibacter sp TM1040
Rhodospirillum rubrum Caulobacter crescentus
Magnetospirillum gryphiswaldense Rickettsia typhi
Rickettsia sibirica Gluconobacter oxydans
Zymomonas mobilis Novosphingobium aromaticivorans
Novosphingobium aromaticivorans Magnetococcus sp MC-1
Myxococcus xanthusXanthomonas campestris
Desulfotalea psychrophila Wolinella succinogenes
Desulfotalea psychrophila Desulfovibrio vulgaris
Geobacter metallireducens Geobacter sulfurreducens
Geobacter metallireducens Geobacter sulfurreducens
Chlorobium tepidum b1bcf11h03ORF12
Bdellovibrio bacteriovorus b1dcf51c12ORF7
Psychrobacter sp 273-4 Acinetobacter sp ADP1
Microbulbifer degradans Pseudomonas syringae Pseudomonas aeruginosa
Rubrivivax gelatinosus Thiobacillus denitrificans Nitrosomonas europaea
Ralstonia solanacearum Ralstonia eutropha
Burkholderia fungorum Burkholderia cepacia
Burkholderia cepacia Burkholderia pseudomallei
Idiomarina loihiensisPhotobacterium profundum
Shewanella oneidensis Vibrio cholerae Vibrio vulnificus Vibrio parahaemolyticus
Haemophilus somnus Haemophilus influenzae
Pasteurella multocida Photorhabdus luminescens Yersinia pseudotuberculosis
Erwinia carotovora Salmonella enterica
Erwinia chrysanthemi
6155
79 61 83
7255
5467
71
52
65
5152
5474
82
52
73
528498 52
508992
8472 54
527383
698372
8783
77 92
52
LGT and phylogenetic assignment of metagenomic clones 2023
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
cluster with γ-proteobacteria and might therefore repre-sent recent within γ-proteobacteria transfers ORF40 inthe isin-proteobacterial clone b1dcf13c08 a short ORFanhas a G + C content of 222 compared with 347 forthe complete clone In addition ORF9 another ORFan inb1dcf13c08 has a marginally lower G + C content com-pared with the rest of the fosmid clone with 257 Simi-larly ORF26 in the Chloroflexi clone b1dcf13f01 has aG + C content of 478 G + C compared with 569 forthe complete fosmid clone
The first protein coding sequences from uncultivated lineages
Four of the fosmids that we sequenced were from uncul-tivated lineages These fosmid clones represent to ourknowledge the first protein coding sequences obtainedfrom these major bacterial lineages In agreement withtheir rRNA phylotype most of the CDSs with homologuesin GenBank are found as independent lineages in phylo-genetic trees (Fig 1 Table 1) These clones also containseveral large CDSs with no significant matches in Gen-Bank or only partial matches to known proteins (Fig 1Table 1) A t-test showed that both the proportion ofORFans (P = 0002) and the proportion of coding bases(P = 002) with no match in GenBank (excluding the envi-ronmental part of GenBank) were significantly higherthan what was observed in fosmid clones from lineagesthat have cultivated representatives
The two candidate division WS3 clones b1bcf11f04and b1dcf51c12 contain several large CDSs for whichwe can make no clear functional prediction or that haveno match in GenBank For instance for b1dcf51c12 halfof the clone is occupied by two CDSs that have no signif-icant matches in GenBank (ORF4) or only a single match(ORF5) Also none of these CDSs had significantmatches to domains in Pfam These CDSs might repre-sent lineage-specific proteins and homologues may beidentified when more sequences from this lineages areavailable The candidate division OP8 also contains anumber of ORFans however in this fosmid the predictedproteins tend to be smaller than what we observed for thetwo WS3 clones
The b1dcf51a06 clone encodes a large ORFan(ORF1) as well as several smaller ORFans (ORF5ORF7ndash9 ORF14) and CDSs with only single hits in Gen-Bank (ORF6 ORF11ndash13) (Fig 1) For ORF1 we canmake some functional prediction based on Pfamsearches This protein contains a nucleoside diphosphatekinases domain a fibronectin type III domain as well asa PBS lyase HEAT-like repeat (three repeat units) ThePBS lyase repeat is responsible for specifically attachingparticular phycobilins to apophycobiliprotein subunits inthe phycobilisomes (PBS) which are light harvesting mac-
romolecular complexes of cyanobacteria and red algae(Zhao et al 2000) The phycobilins are open-chain tet-rapyrrole chromophores which function as the photosyn-thetic light-harvesting pigments Interestingly two otherCDSs ndash ORF15 and ORF16 ndash also contain several PBSrepeats It is possible that the proteins encoded by thePBS-containing CDSs in b1dcf51a06 has a similar func-tion as the PBS lyase proteins in cyanobacteria andthat this fosmid clone originated from a photosyntheticorganism
Among the CDSs that do have matches in GenBank arepotential phylogenetic markers The candidate divisionWS3 clone b1bcf11f04 clone contains two CDSs withsimilarity to DNA polymerase III subunit A homologuesDnaE and the Gram-positive type PolC In phylogenetictrees of both genes the b1bcf11f04 homologue forms aseparate lineage (Fig 6) Conserved domain searches atNCBI showed that the PolC-like CDS shows similarity toonly part of this gene ndash the exonuclease domain ndash and itis fused to DinG that encodes Rad3-related DNA heli-cases Proteins with similar domain architecture are foundin several other bacterial genomes mostly Firmicutes aswell as S thermophilum and Chloroflexus aurantiacussuggesting that the candidate division WS3 might be spe-cifically related to one of these lineages In phylogenetictrees of the DinG domain of these proteins the fusionproteins are all found in the same clade (Fig 6) Howeverthe monophyly of this clade was not supported by boot-strap analyses In the Maximum Likelihood phylogeny theb1bcf11f4 CDS clusters at the bottom of the clade withC aurantiacus No non-fusion proteins are found inthis clade suggesting a single origin of this domainorganization
Summary
Metagenomic approaches play an increasing and highlyvisible role in microbial ecology The data sets they gen-erate are complex and coupling the information they pro-vide concerning the metabolic potential of an environmentto organismal lineage that may be present there remainsa challenge Here we have shown the utility of rRNA-targeted cloning and phylogenetic analysis of CDSs inmaking such a coupling We also show that LGT evenwhen not precluding provisional assignment to lineages(taxonomy) will likely complicate the history of any lin-eage (phylogenetics) making phylotype-ecotype infer-ences provisional Environmental metagenomic data opena window into a rich world of genetic interactions someof which might be partially reconstructed as we havedescribed here The bioinformatic challenges associatedwith a complete metagenomic assessment of an environ-ment as complex as Baltimore harbour sediment aredaunting indeed But progress in understanding our own
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
2012
C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
found in the 23S rRNA of eukaryotic chloroplasts andmitochondria (Cannone
et al
2002) as well as a fewbacteria (Nesboslash and Doolittle 2003) They very specifi-cally cleave conserved sequences in intron-free 23SrRNA genes The recognition sites are usually located inthe most conserved part of the host rRNA gene and being15ndash25 bp long are highly specific to rRNA genes Theslow evolutionary rate of these sites as well as the toler-ance for minor sequence changes by homing endonu-cleases means that rRNA genes from a wide range of taxacan usually be cut Three such enzymes belonging to theLAGLIDADG family have been particularly well character-ized I-CeuI (from the chloroplast 23S gene of
Chlamy-domonas eugametos
) (Marshall and Lemieux 1992) I-CreI (from the chloroplast 23S gene of
Chlamydomonasreinhardtii
) (Chevalier
et al
2003) and I-DmoI (from the23S gene of the crenarchaeon
Desulphurococcus mobilis
)(Aagaard
et al
1997) These enzymes have different cut-ting sites and specificities The enzyme used here I-CeuIis commercially available from New England Biolabs(httpwwwnebcom) and targets a 19-bp cut site at 23SrRNA position 1923 (relative to the
E coli
23S rRNA) thatis conserved in most bacteria
Here we present the sequences of 12 environmentalfosmid clones 10 that contain about 1000 bp of the 23SrRNA gene one that contains both 23S rRNA and 16SrRNA genes and one that contains 1079 bp of the 16SrRNA gene The metagenomic libraries containing theseclones were constructed using DNA isolated from anaer-obic sediments from Baltimore harbour Microbial commu-nities from these sediments have been shown earlier tobe capable of reductive dechlorination of PCBs (Holoman
et al
1998) The taxonomic position of each fosmid clonewas assessed using its rRNA gene The amount of lateralgene transfer (LGT) was assessed by making phyloge-netic trees of all the predicted protein coding sequences(CDSs) in the fosmid clones and comparing the phyloge-netic position of the CDS to that indicated by the rRNAgene
Results and discussion
Two different types of fosmid libraries were made fromthe anaerobic sediment DNA The first used thepCCFos- vector from Epicenter (B1BF1) and the seconda modification of that vector containing an I-CeuI sitefor specifically cloning DNA fragments containing 23SrRNA genes (B1BCF1 B1DCF1 B3CF5 B1DCF5) TheB1BF1-library contained about 10 000 clones The I-CeuI-libraries were considerably smaller and we identi-fied only 49 clones with unique 23S rRNA end-sequences However assuming one to three bacterialrRNA containing clones among every 100 clones (Suzuki
et al
2004) and considering that not all bacterial 23S
rRNA are cut by I-CeuI the number of clones obtained isclose to expected values
End-sequencing and subcloning analyses of lsquonormalrsquo fosmids
In order to get information on the diversity of genomicfragments captured in the B1BF1 library we obtained 576end-sequences resulting in 565 unique sequences withan average of 408 high-quality base pairs correspondingto 232 kb of environmental DNA Among the sequenceswe identified a 16S rRNA sequence in B1BF110d03which was fully sequenced (see below) as well as one23S rRNA containing clone We also attempted to identify23S rRNA containing clones by screening 10 96-well-plates from the B1BF1 library using I-CeuI However offour clones that appeared to be cut by I-CeuI only oneproved to contain 23S rRNAs and was fully sequenced(B1BF1a01 see below)
The distribution of G
+
C content of the end-sequencessignificant hits to proteins in GenBank (based on
BLASTX
results with
e
-values
lt
1 e
minus
10
) as well as matches to pro-teins that have been assigned to COG categories areshown in supplemental Fig S1AndashC As observed byTreusch and colleagues (2004) the distribution of the COGcategories are similar to what is observed for singlegenomes of cultivated organisms suggesting that thisrepresents an average of the genomes in this habitatTaken together the large G
+
C content variation as wellas the wide functional and phylogenetic diversity of thesequences suggests that we have sampled sequencesoriginating from a large diversity of genomes
End-sequencing of I-CeuI fosmid libraries
Four libraries were made using the pCC1FOSCeuI23Svector High-quality end-sequences of 91 clones revealed62 unique clones of which at least 49 (79) contained1000 bp of 23S rRNA Eight clones (129) did not con-tain a 23S rRNA and for five clones we could not obtainhigh-quality sequence from the end that should containthe 23S rRNA The 23S rRNA fragments showed highestsimilarity in
BLASTN
searches to sequences from severaldifferent bacterial groups
α
-proteobacteria (1)
δ
-proteo-bacteria (15)
γ
-proteobacteria (9) Firmicutes (8) Planc-tomycetes (6) Bacteroidetes (1) Actinobacteria (1)Spirochaetes (1) ChlamydiaeVerrucomicrobia (6) Thesesequence tags were not long enough to make well-supported phylogenetic trees (average sequence length
=
431 bp) however this gives a rough indication of thediversity captured by this method
These results demonstrate the efficiency of usingintron-encoded endonucleases to specifically clone rRNAcontaining DNA fragments Screening a lsquonormalrsquo fosmid
LGT and phylogenetic assignment of metagenomic clones
2013
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
library for rRNA genes usually leads to about 1 ofpositives clones (Suzuki
et al
2004) Although ourpCC1FOSCeuI23S libraries contained only 62 uniqueclones at least 49 of them (79) contained a 23S genewhich is the equivalent of screening 4900 clones from alsquonormalrsquo fosmid library Also the peripheral location of therDNA on the DNA fragments greatly facilitates screeningand sequencing It is also unlikely to have the samebiases as polymerase chain reaction (PCR) screening forthe recovery of rDNA containing clones For I-CeuI recov-ery of positives is based on a single
sim
20 bp DNA region(rather than two for PCR) and allows for a different typeof degeneracy of this DNA region (drop in cuttingefficiency for divergent sequences) Although the I-CeuIrecognition sequence is specific to bacteria (except Acti-nobacteria) other homing endonucleases such as I-CreIand I-DmoI could be used to recover archaeal and acti-nobacterial DNA fragments
Phylogenetic analyses of rRNA genes demonstrate the recovery of protein-coding genes from a wide diversity of bacterial lineages
Twelve rRNA containing fosmid clones were fullysequenced The annotation of these clones is given inTables S1ndash12 (see
Supplementary material
) and Fig 1gives an overview of the fosmid clonesrsquo phylogenetic affil-iation Figure 2A shows the phylogenetic trees estimatedfrom the 1000 bp 23S rRNA from the I-CeuI-fosmids forseven of the fosmid clones this 23S tag could be used toassign the clone to a specific bacterial lineage We have23S rRNA containing fragments from two
δ
-proteobacte-ria two
γ
-proteobacteria one
ε
-proteobacterium one
β
-proteobacterium (from this we also have the 16S rRNA)and one taxon from the phylum Chloroflexi Two fosmidclones contained 16S rRNA genes ndash B1BF110d03 andB1BF11a01 ndash and phylogenetic analyses placed thesesequences within the
Flavobacteriaceae
and
β
-proteobac-teria respectively (Figs 2B and C)
For four fosmid clones ndash b1bcf11f04 b1dcf51c12b3cf12f09 and b1dcf55a06 ndash the 23S rRNA-tag did notcluster with any specific 23S lineage For these clones weattempted to obtain the 16S rRNA sequence by using onespecific 23S rRNA primer and a universal 16S primer Wesuccessfully obtained four 16S-23S rRNA sequencesthat showed 98ndash99 identity to the 23S fragment inb11bcf11f04 (715 bp overlap) One of these ampliconswas fully sequenced and phylogenetic analyses showedthat it belong to the candidate division WS3 (Dojka
et al
1998) (Fig 2B) Because b1dcf51c12 clusters signifi-cantly with b1bcf11f04 in the 23S rRNA tree (Fig 2A)we also assigned this clone to the WS3 division Forb3cf12f09 we obtained two different 16S-23S rRNAclones that showed 100 and 99 identity to the 23S
fragment of this clone (281 bp overlap) and phylogeneticanalyses showed that this clone belongs to the candidatedivision OP8 (Hugenholtz
et al
1998) (Fig 2B) The ITSregions of both the WS3 and the OP8 rRNA operonscontained tRNA Ile and tRNA Ala For b1dcf51a06 no16S rRNA sequence could be obtained
Most protein coding sequences are in agreement with the adjacent rRNA genes in phylogenetic analyses
Phylogenetic trees were obtained for all predicted CDSsof each fosmid clone sequenced We compared the phy-logenetic placement of each CDS to the phylogenysuggested by the rRNA If the phylogeny of the CDSsuggested that it belonged to another bacterial group andthis relationship was supported in bootstrap analysesacquisition by LGT was inferred for the CDS For the clonewhere no specific phylogenetic relationship could beinferred (b1dcf51a06) and for the fosmid clones wherethe rRNA showed that it originated from a bacterium withno cultivated representative we classified as likelyinstances of LGT all CDSs that did cluster specifically(with bootstrap support) with another bacterial group Asummary of the phylogenetic analysis of the rRNA genesas well as of all protein coding CDSs is given in Table 1
The majority of the CDSs did agree with their respectiverRNA phylogeny and 57ndash96 (average 768) of theCDSs that gave good alignments and robust phylogeniesshowed the same phylogenetic relationship as the rRNAgenes This was also true for the fosmid clones frombacterial lineages with no cultivated representative asmost CDSs from these clones did not cluster with anyspecific lineage or had no or only a few significantmatches in GenBank (Fig 1) However for these clonesthe number of CDSs that robustly agree with the rRNAtopology is problematic to calculate as they may or maynot fall into well-supported clades when more sequencesfrom these phyla become available The fosmid cloneswith the highest number of congruent trees areb1bf11a01 which originated from a
β
-proteobacteriumvery similar to
Thiobacillus denitrificans
where 96 of theCDSs with robust phylogenies agree with the rRNA genesand b1dcf13c08 which originated from an
isin
-proteobac-terium where 90 of the lsquotreeablersquo CDSs agree with therRNA
High levels of LGT detected in phylogenetic analyses
Phylogenetic analyses showed that 7ndash44 (average17) of the CDSs have been acquired by LGT from dis-tantly related bacterial lineages (Fig 1 Table 1) For manyof the fosmid clones there were additional CDSs thatprobably also have been involved in LGT these caseswere not scored as LGT either because the CDS was too
2014
C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
Fig
1
Ove
rvie
w o
f th
e se
quen
ced
fosm
id c
lone
s Y
ello
w C
DS
s ar
e su
gges
ted
to h
ave
been
acq
uire
d by
LG
T a
nd b
lue
CD
Ss
have
no
sign
ifica
nt m
atch
in G
enB
ank
A
α
-pro
teob
acte
ria B
β
-pr
oteo
bact
eria
D
δ
-pro
teob
acte
ria
E
ε
-pro
teob
acte
ria
G
γ
-pro
teob
acte
ria
C
Cya
noba
cter
ia
CB
C
hlor
obi-B
acte
roid
etes
F
Fir
mic
utes
P
pro
teob
acte
ria
CH
C
hlor
oflex
i T
D
The
rmus
-D
eino
cocc
us g
roup
A
CT
Act
inob
acte
ria
PL
Pla
ncto
myc
etes
S
PIR
S
piro
chae
tes
TH
ER
T
herm
otog
ales
A
Q
Aqu
ifeca
les
FU
SO
F
usob
acte
ria
AR
CH
A
rcha
eal
EU
K
Euk
aryo
tes
EN
Ven
viro
nmen
tal s
eque
nce
c
lust
er r
obus
tly w
ithin
a m
ixed
cla
de in
phy
loge
netic
tree
s ndash
no
sign
ifica
nt m
atch
in G
enB
ank
Upp
erca
se s
uppo
rted
by
phyl
ogen
etic
ana
lysi
s L
ower
case
sug
gest
edby
BLA
ST
sea
rche
s as
the
re w
as n
o su
ppor
ted
phyl
ogen
y T
he lo
w-q
ualit
y re
gion
in b
1dcf
13
c08
(pos
ition
119
2ndash13
42)
is in
dica
ted
by a
bla
ck b
ox T
he o
rang
e sh
adin
gs in
dica
tes
LGT-
CD
Ss
that
are
foun
d in
mor
e th
an o
ne fo
smid
ORFAN
A c
onju
gativ
e tr
ansp
oson
ob
tain
ed fr
om a
Bac
terio
ides
bac
teriu
m
unkn
own
b1dc
f51
a06
Chl
orof
exi
b1dc
f13
f01
Can
dida
te d
ivsi
on O
P8
b3cf
12
f09
Can
dida
te d
ivsi
on W
S3
b1bc
f11
f4
Can
dida
te d
ivsi
on W
S3
b1bc
f51
c12
d-pr
oteo
bact
eria
b1bc
f11
h03
d-pr
oteo
bact
eria
b1bc
f11
d04
e-pr
oteo
bact
era
b1dc
f13
c08
g-pr
oteo
bact
eria
b1dc
f12
d07
g-pr
oteo
bact
eria
b1bc
f11
c04
b-pr
oteo
bact
eria
b1bf
11
a01
Fla
voba
cter
iace
aeb1
bf1
10d
03
LGT and phylogenetic assignment of metagenomic clones
2015
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
Fig 2
rRNA phylogeniesA The minimum evolution tree estimated from LogDet distances of the 23S-tag from the CeuI-fosmids (984 positions in alignment) For the sequences from the fosmid clones the Maximum Likelihood topology was similar (GTR
+
G
+
I) except that the
δ
-proteobacteria where paraphyletic with the
γ
- and
β
-proteobacteria clustering within the
δ
-proteobacteria Moreover b1bcf11d04 fell at the bottom of this cladeB The minimum evolution tree estimated from LogDet distances of the 16S sequences (1243 positions in alignment) For the sequences from the fosmid clones the Maximum Likelihood (GTR
+
G
+
I) topology was identical However there where several differences in the backbone of the tree with for instance Geobacter clustering with Firmicutes The trees in both A and B were rooted by the
Thermotoga maritima
sequenceC The minimum evolution tree estimated from LogDet distances of the closest matches of the 16S fragment in b1bf110d03 (1046 positions in alignment) The Maximum Likelihood (GTR
+
G
+
I) topology was identicalFor all three trees numbers on branches refers to bootstrap values from the minimum evolution analysis (
italic
) and from the Maximum Likelihood analysis (plain text) If both bootstrap values were above 70 this is indicated by a grey circle while a black circle indicated that all three values were above 90
B
Thermotoga maritima Coprothermobacter proteolyticus
Acidobacterium capsulatumPirellula marina
R76-B102OPB95
OPB5HMMVPog-54
HS9-30
PBS-II-35
LD1-PB19PBS-III-30
PRR-12Simkania negevensisBorrelia burgdorferi
Synechococcus Chloroflexus aurantiacus
Dehalococcoides ethenogenes Bacteroides thetaiotaomicron
Cytophaga hutchinsoniiChlorobium tepidum
Leptospirillum ferrooxidans Deinococcus radiodurans
Geobacillus subterraneus Paenibacillus popilliae
Fusobacterium nucleatum Geobacter metallireducens
Bradyrhizobium japonicum Vibrio splendidus
Methylobacillus flagellatum Thiobacillus denitrificans
005 substitutionssite
b3cf12f09
b1bcf11f04
b1bf11a01
candidate division OP8
candidate division WS3
Betaproteobacteria
92
72
54
78
57
75
Porphyromonas gingivalis
Bacteroides thetaiotaomicron
Cytophaga hutchinsonii
Cellulophaga pacifica
Flavobacterium gelidilacus
Flavobacterium psychrolimnae
Flavobacterium frigoris
Flavobacterium xinjiangensis
Gelidibacter algens
Bizionia paragorgiae
Formosa algae
Algibacter lectus
Flavobacterium sp 5N-3
Psychroserpens burtonensis
Mesophilibacter yeosuensis
b1bf110d03
Flavobacteriaceae bacterium BSA CS 02
Flavobacteriaceae bacterium BSD RB 42
001 substitutionssite
C
isolated from estuarine and salt marsh sediments
b3cf12f09Chlorobium tepidum
Synechocystis sp D64000
Deinococcus radiodurans
b1dcf13f01Dehalococcoides ethenogenes
b1dcf511a06Fusobacterium nucleatum
b1bcf11f04b1dcf51c12
Mycobacterium kansasiiStreptomyces coelicolor Thermomonospora chromogena
Paenibacillus popilliaeOceanobacillus iheyensis
Geobacillus kaustophilus
Simkania negevensis Pirellula sp strain 1
b3cf12d07Pseudomonas stutzeri
005 substitutionssite
candidate division WS3
Wolinella succinogenes Helicobacter hepaticus
Campylobacter jejuni b1dcf13c08
Epsilonproteobacteria
b1bcf11d04Desulfotalea psychrophila
b1bcf11h03Nannocystis exedens
Stigmatella aurantiacaGeobacter metallireducens
Deltaproteobacteria
Methylobacillus flagellatusb1bf11a01Thiobacillus denitrificans
Halomonas pantelleriensis
Microbulbifer degradansVibrio splendidus
b1bcf11c04Uncultured bacterium 463 clone EBAC080-L32B05
Betaproteobacteria
Gammaproteobacteria
Thermotoga maritima
candidate division OP8
Chloroflexi
Symbiobacterium thermophilum
Bacillus cereus
Desulfovibrio vulgaris
A
51
6197
87
55
67
54100
61
58
84
57
8968
58
97
54
65
64
68
73
51
58
53
87
58
2016
C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
short to obtain reliable alignments the CDS was found ina lsquomixedrsquo clade also containing genes from the same bac-terial group or the CDS was found outside its group butdid not cluster with any specific lineage For three of theclones more than 30 of the CDSs have been acquiredby LGT (Table 1) two of these are from candidate divi-sions and one is from a
δ
-proteobacterium For all threeof these fosmids there appears to have been a transfer ofa large island of genes from a phylogenetically distantlineage Specifically we infer an
α
-proteobacterial islandin b3cf12f09 a
δ
-proteobacterial island in b1dcf51c12and an archaeal
β
-proteobacterial island in b1bcf11d04(Fig 1) It should be noted that the proportions of foreigngenes identified here might not represent the proportion
of foreign genes in the respective genomes that we havesampled but
rather the amount of LGT to be expectedwhen sequencing environmental fosmid clones
Forinstance in some genomes LGT might be enriched incertain variable parts of the genome Indeed the distribu-tion of proteins that match COG categories was signifi-cantly different (
P
=
13 e-13 in a
χ
2
-test) to what weobserved for the end-sequencing of lsquonormalrsquo fosmidclones (supplemental Fig S1) the main difference beingproportionally more J K U F and H category sequencesin the full fosmid sequences and more L P R and Scategory sequences among the end-sequences Whencomparing the distributions of different COG-groups (ieinformational metabolism etc) however the two datasets were significantly different only when including thepoorly characterized categories (R S) If such genes aremore frequently transferred than the other categoriesthen we would be underestimating the level of LGT thatwould be expected when analysing metagenomic clones
Interestingly in b1bcf11d04 the transfer vector for oneof the acquired gene clusters could be identified ORF6encodes an acetyl transferase gene and ORF8 ORF9and ORF10 encode subunits for an acyl-CoA synthase ndashtwo
α
-subunits and one
β
-subunit Phylogenetic analysessuggested all four CDSs have been acquired by LGTlikely from a
β
-proteobacterium The
β
-proteobacteriahave in turn likely acquired the acyl-CoA synthase genesfrom Archaea (Fig 3) In support of the archaeal origin ofthese genes the acyl-CoA synthase in bcf11d04 hassimilar domain organization to the acetyl-CoA synthase in
Pyrococcus
spp with two subunits (Sanchez
et al
2000)Furthermore these genes have been transferred multipletimes and the transfers involved all three domains of life[Fig 3 (Andersson
et al
2003)] These transferred CDSsare preceded by one integrase gene (ORF3) a trans-posase gene (ORF4) and an intergerasetransposasegene (ORF5 COG2801 Tra5 which contains an inte-gerase core domain Table S7) which probably wereresponsible for transferring this cluster into this genomeThe
α
-proteobacterial island in the b3cf12f09 cloneencodes a wide range of different functions and no typicalmobile elements were identified However as this islandextends to the 3
prime
end of the fosmid mobile genes mightbe found further downstream The first CDS of this islandencodes a DnaJ-class chaperone (ORF29) which is trun-cated at the 5
prime
end This pseudogene still shows 65protein identity to a homologue in
Magnetoospirillummagnetotacticum
(Table S3) Hence this probably repre-sents a very recent transfer (or rearrangement) Anotherpossibility is that this fosmid might be a chimera Howeverthe G
+
C content of the CDSs in the
α
-proteobacterialisland (595 G
+
C) is very similar to the rest of thefosmid (596 G
+
C supplemental Table S3) Also fur-ther upstream there is a proteobacterial transposase
Fig 3
Maximum Likelihood phylogeny of acetyl-CoA synthetase (ADP-forming) homologues estimated using PMBML (459 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF8 and ORF10 sequences against GenBank and the 100 best matches were retrieved and aligned Groups of very similar sequences from the same species or sister species were trimmed down to one sequence representative The tree was arbi-trarily rooted by Entamoeba histolytica Numbers on branches refers to bootstrap support obtained from using PMBML in bold PUZZLEBOOT in plain text and Neighbour-joining in italic If all bootstrap values were above 70 this is indicated by a grey circle while a black circle indicated that all three values were above 80
10
Entamoeba histolytica Parachlamydia sp UWE25
Rubrobacter xylanophilus Gloeobacter violaceus
Nostoc sp PCC 7120Thermosynechococcus elongatus
Dechloromonas aromaticaMesorhizobium sp BNC1
Sinorhizobium melilotiXanthomonas axonopodisRhodopseudomonas palustris
Bradyrhizobium japonicum Desulfovibrio desulfuricans
Rhodospirillum rubrumMagnetospirillum magnetotacticum
Magnetospirillum magnetotacticumShewanella oneidensis
Photobacterium profundumVibrio cholerae
Vibrio vulnificus Photorhabdus luminescens
Yersinia pestis Salmonella enterica
Escherichia coli Methanopyrus kandleri
Pyrococcus furiosus Archaeoglobus fulgidus
Methanococcus maripaludisMethanocaldococcus jannaschii
Magnetococcus sp MC-1 Chloroflexus aurantiacus
Spironucleus barkhanus Giardia intestinalis
Pyrococcus furiosusThermoplasma acidophilum Thermoplasma volcanium
Pyrococcus furiosus Streptomyces avermitilisBradyrhizobium japonicum
Ralstonia metalliduransFerroplasma acidarmanus
Sulfolobus solfataricusSulfolobus tokodaii
Pyrococcus furiosusPyrococcus furiosus
Pyrobaculum aerophilumMethanosarcina mazei Methanosarcina acetivoransThermobifida fusca
Archaeoglobus fulgidusArchaeoglobus fulgidus
Archaeoglobus fulgidusArchaeoglobus fulgidus
b1bcf11d04ORF8b1bcf11d04ORF10
Bordetella bronchiseptica Ralstonia metallidurans
Bordetella pertussis Bordetella bronchiseptica
Burkholderia fungorumBurkholderia fungorumRalstonia eutropha
Bordetella bronchisepticaRalstonia eutropha
Bradyrhizobium japonicumRalstonia eutropha
Burkholderia fungorumBordetella bronchiseptica
Ralstonia eutrophaBordetella bronchiseptica
Bradyrhizobium japonicumBordetella bronchiseptica
Pseudomonas mendocina Bradyrhizobium japonicum
7480
9764
75
52
83
52
57
60
61
70
89
51
64
6262
64
57
58
50
7173
62
100100
LGT and phylogenetic assignment of metagenomic clones 2017
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Tab
le 1
S
umm
ary
of p
hylo
gene
tic a
naly
ses
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
b1dc
f51
a06
No
clea
r af
filia
tion
with
exi
stin
gse
quen
ces
Cou
ld n
ot b
eam
plifi
ed
Mos
t C
DS
s ha
ve n
o or
only
a f
ew s
igni
fican
tm
atch
es in
Gen
Ban
kO
RF
4 cl
uste
rs w
ithLe
ptos
pira
inte
rrog
ans
with
in a
mix
ed c
lade
ho
wev
er
L in
terr
ogan
sha
s se
vera
l par
alog
ues
and
this
gen
e ap
pear
sto
hav
e be
en f
requ
ently
tran
sfer
red
and
islik
ely
to b
e a
tran
sfer
OR
F20
clu
ster
s w
ithM
etha
nosa
rcin
a w
ithin
δ-pr
oteo
bact
eria
O
RF
19cl
uste
rs w
ith G
eoba
cter
but
is m
ostly
foun
d in
met
hano
gens
OR
F17
and
OR
F18
have
hom
olog
ues
inM
etha
noge
ns o
nly
4 C
DS
s (1
9 o
f th
eto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
33
(38
)
b1dc
f13
f01
Clu
ster
s w
ithD
ehal
ococ
coid
eset
heno
gene
sC
hlor
oflex
usau
rant
iacu
s 23
SrR
NA
seq
uenc
eof
too
poo
r qu
ality
to in
clud
e in
the
tree
7 of
10
CD
Ss
(70
) w
ithsu
ppor
ted
phyl
ogen
etic
topo
logi
es a
gree
with
23S
fra
gmen
t In
addi
tion
6 C
DS
s w
hich
only
hit
Chl
orofl
exus
aura
ntia
cus
Two
CD
Ss
have
like
lybe
en a
cqui
red
thro
ugh
LGT
One
clu
ster
s w
ithhi
gh s
uppo
rt w
ithT
herm
otog
a m
ariti
ma
(OR
F16
) an
d on
e cl
uste
rsw
ithin
the
euk
aryo
tes
(OR
F25
)
OR
F2
has
only
sign
ifica
ntho
mol
ogue
s in
Cro
cosp
haer
aw
atso
nii
3 C
DS
s (1
1 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
14
(5
)
b3cf
12
f09
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A g
enes
and
do
not
clus
ter
with
in a
nysp
ecifi
c ba
cter
ial g
roup
Phy
loge
netic
ana
lysi
ssu
gges
ts t
hat
10 C
DS
sha
ve li
kely
bee
n ac
quire
dby
LG
T 8
of
thes
e ha
vebe
en a
cqui
red
from
an
α-pr
oteo
bact
eriu
man
d ar
e fo
und
linke
d
Thr
ee C
DS
s fo
und
linke
d to
CD
Ss
whe
reph
ylog
enet
ic a
naly
ses
sugg
est
LGT
hav
eal
so li
kely
bee
nac
quire
d by
LG
T
13 C
DS
s (3
2 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
OR
F16
is a
tran
spos
ase
of
prot
eoba
cter
ial
orig
in
and
show
slo
wer
GC
con
tent
than
the
res
t of
the
fosm
id T
wel
ve o
fth
e tr
ansf
erre
dC
DS
s (O
RF
29ndash
41)
are
linke
d an
dal
l app
ear
to h
ave
been
acq
uire
dfr
om a
n α-
prot
eoba
cter
ium
22
(9
)
b1bc
f11
f04
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A a
nd d
oes
not
clus
ter
with
any
spe
cific
bact
eria
l lin
eage
A
mon
g th
ese
was
the
high
ly c
onse
rved
Dna
Ege
ne
Two
CD
Ss
(OR
F14
and
OR
F15
) cl
uste
r w
ithse
quen
ces
from
the
Chl
orob
iBac
tero
idet
esgr
oup
2 C
DS
s (9
o
f to
tal)
hav
e b
een
acq
uir
ed b
y L
GT
26
(14
)
2018 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f51c
12C
andi
date
div
isio
nW
S3
bact
eriu
mM
ost
CD
Ss
have
no
oron
ly a
few
sig
nific
ant
mat
ches
in G
enB
ank
OR
F6ndash
OR
F11
are
als
ofo
und
in b
1bcf
11
h3 in
sam
e or
der
and
phyl
ogen
etic
ana
lysi
ssu
ppor
ts t
hat
OR
F7
OR
F8
and
OR
F10
wer
etr
ansf
erre
d fr
om a
δ-
prot
eoba
cter
ium
to
b1bc
f51c
12 O
RF
10 a
ndO
RF
11 a
lso
clus
ter
with
δ-pr
oteo
bact
eria
ho
wev
er
with
no
boot
stra
p su
ppor
t O
RF
9ha
s on
ly o
ne m
atch
inG
enB
ank
OR
F15
(fu
sA)
clus
ters
with
Chl
orob
ium
tepi
dum
with
inF
irm
icut
es
OR
F12
has
no
hom
olog
ue in
b1bc
f11
h3
but
doe
scl
uste
r w
ith δ
-pr
oteo
bact
eria
ho
wev
er w
ith n
obo
otst
rap
supp
ort
It is
like
ly t
hat
also
thi
sC
DS
was
tra
nsfe
rred
as p
art
of w
ith a
δ-
prot
eoba
cter
ial i
slan
d
8 C
DS
s (4
4 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
One
lar
ge lsquoi
slan
drsquo o
fδ-
prot
eoba
cter
ial
orig
in
22
(29
)
b1cf
11
1h0
3δ-
Pro
teob
acte
rium
ndash8
of 1
3 C
DS
s (5
7)
that
give
s su
ppor
ted
phyl
ogen
ies
agre
e w
ithth
e fr
agm
ent
orig
inat
ing
from
a δ
-pr
oteo
bact
eriu
m
Six
CD
Ss
have
like
ly b
een
acqu
ired
by L
GT
OR
F8
clus
ters
with
Clo
strid
ium
ther
moc
ellu
m a
ndTr
epon
ema
dent
icol
aO
RF
18 is
fou
ndse
para
ted
from
oth
erpr
oteo
bact
eria
inph
ylog
enet
ic t
rees
cl
uste
ring
with
Pla
smod
ium
spp
O
RF
23is
fou
nd in
a m
ixed
cla
dean
d ap
pear
s to
hav
ebe
en f
requ
ently
tran
sfer
red
OR
F28
clus
ters
with
β-
prot
eoba
cter
ia
OR
F29
clus
ters
with
γ-
prot
eoba
cter
ia a
ndO
RF
30 is
fou
nd a
tbo
ttom
of
clad
e th
atco
ntai
ns α
-pr
oteo
bact
eria
and
Act
inob
acte
ria
6 C
DS
s (1
7 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
OR
F11
ndashOR
F16
ha
ve b
een
tran
sfer
red
from
an
ance
stor
of
B1B
CF
11
h03
tob1
dcf5
1c
12 a
sw
ell t
o th
eC
hlor
obiu
m li
neag
e
6 (
1)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2019
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f11
d04
δ-P
rote
obac
teriu
mndash
12 o
f 18
CD
Ss
(67
)w
ith s
uppo
rted
phyl
ogen
etic
top
olog
ies
agre
e w
ith a
δ-
prot
eoba
cter
ial o
rigin
of
the
frag
men
t
Six
CD
Ss
are
sugg
este
dby
phy
loge
netic
ana
lyse
sto
hav
e be
en a
cqui
red
byLG
T O
ne o
f th
ese
tran
sfer
red
gene
s ndasht
hefu
sA h
omol
ogue
(OR
F19
) ndash is
als
o fo
und
inb1
bcf5
c12
Thi
s C
DS
has
been
tra
nsfe
rred
to
othe
r δ-
prot
eoba
cter
ia a
sw
ell
Thr
ee C
DS
s (O
RF
3ndash5)
that
enc
ode
anin
tege
rase
and
tw
otr
ansp
osas
es t
hat
prec
edes
fou
r of
the
LGT
gen
es d
etec
ted
in t
he p
hylo
gene
tican
alys
is
OR
F7
also
likel
y tr
ansf
erre
d w
ithO
RF
3 ndashO
RF
10
OR
F20
and
OR
F21
have
mai
nly
hom
olog
ues
inF
irm
icut
es a
nd is
the
neig
hbou
r of
OR
F19
that
has
als
o be
enac
quire
d fr
omF
irm
icut
es
12 C
DS
s (3
1 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
Inte
rest
ingl
y th
isfo
smid
clo
nepr
ovid
es t
hetr
ansf
er v
ecto
r ndash
the
inte
gera
se a
ndtr
ansp
osas
e ndash
for
8of
the
tra
nsfe
rred
gene
s
ndash
b1bc
f13
c08
ε-P
rote
obac
teriu
m
mos
t cl
osel
yre
late
d to
Cam
pylo
bact
erje
juni
21 C
DS
s gi
ve s
uppo
rted
phyl
ogen
ies
and
ofth
ese
19 (
90
) ag
ree
with
rR
NA
OR
F4
clus
ters
with
Geo
bact
er a
ndC
lost
ridiu
m
and
OR
F23
does
not
hav
eho
mol
ogue
s in
ε-
prot
eoba
cter
ia a
ndcl
uste
rs w
ith γ
- an
d β-
prot
eoba
cter
ia
OR
F24
doe
s no
t gi
ve a
supp
orte
d tr
ee b
utha
s al
so p
roba
bly
been
tra
nsfe
rred
fro
mγ-
or
β-pr
oteo
bact
eria
3 C
DS
s (7
o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
10
(3
)
b3cf
12
d07
γ-P
rote
obac
teriu
m
Clu
ster
s w
ithin
the
γ-pr
oteo
bact
eria
inLo
gDet
dis
tanc
etr
ees
but
at t
heba
se o
f γ-
prot
eoba
cter
ia a
ndβ-
prot
eoba
cter
iain
the
bes
tm
axim
umlik
elih
ood
tree
Onl
y 7
CD
Ss
give
su
ppor
ted
phyl
ogen
ies
O
f th
ese
4 (5
7)
agre
e w
ith r
RN
A
OR
F7
clus
ter
with
in β
-pr
oteo
bact
eria
OR
F15
ha
s a
patc
hy d
istr
ibut
ion
and
does
not
clu
ster
with
ot
her
prot
eoba
cter
ia in
th
e ph
ylog
enet
ic t
ree
Sev
eral
add
ition
al C
DS
s (O
RF
16ndashO
RF
25)
that
did
not
prod
uce
wel
l-re
solv
ed t
rees
ha
d on
ly d
iver
gent
hom
olog
ues
inG
enB
ank
or
nosi
gnifi
cant
hom
olog
ues
may
also
hav
e be
enac
quire
d by
LG
T I
nsu
ppor
t of
thi
sO
RF
26 e
ncod
es a
tran
spos
ase
2 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
O
RF
16 ndash
OR
F25
w
as n
ot in
clud
ed in
es
timat
e du
e to
lim
ited
evid
ence
for
th
e tr
ansf
er o
f the
se
23
(23
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
2020 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f1c
04γ-
Pro
teob
acte
rium
ndash14
CD
Ss
give
sup
port
edph
ylog
enie
s an
d of
thes
e 13
(93
)
agre
ew
ith r
RN
A
Phy
loge
netic
ana
lyse
ssh
ow t
hat
two
CD
Ss
have
bee
n ac
quire
d by
LGT
OR
F3
is f
ound
in a
mix
ed c
lade
whi
leO
RF
30 c
lust
er w
ithin
β-
prot
eoba
cter
ia
Thr
ee g
enes
tha
t sh
owun
cong
ruen
tph
ylog
enie
s b
utw
ith lo
w b
oots
trap
supp
ort
foun
d cl
ose
to O
RF
3 an
d O
RF
34ha
ve p
roba
bly
also
been
acq
uire
d by
LGT
O
RF
5 cl
uste
rsw
ith β
-pro
teob
acte
ria
OR
F31
clu
ster
s w
ithδ-
prot
eoba
cter
ia
and
OR
F32
(G
ST
) cl
uste
rsw
ith a
γ-pr
oteo
bact
eriu
m
but
appe
ars
toha
ve b
een
freq
uent
lytr
ansf
erre
d
5 C
DS
s (1
3 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
3 (
1)
b1bf
11
a01
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns (
98
iden
tity
at 2
3S
rRN
A)
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns(9
8 id
entit
yat
16S
rR
NA
)
Hig
h de
gree
of
gene
sy
nten
y co
mpa
red
with
Thi
obac
illus
de
nitr
ifica
ns
29 C
DS
sha
ve b
est
BLA
ST
mat
chin
Thi
obac
illus
de
nitr
ifica
ns 2
7 of
28
CD
Ss
(96
) th
at g
ive
stat
istic
ally
sup
port
edph
ylog
enie
s ag
ree
with
rR
NA
gen
es
One
OR
F30
(R
suA
)cl
uste
r w
ith γ
-pr
oteo
bact
eria
and
has
no
hom
olog
ue in
T
hiob
acill
us d
enitr
ifica
ns
Two
CD
Ss
(OR
F14
and
O
RF
31)
have
bee
n tr
ansf
erre
d to
bot
h fo
smid
an
d T
hiob
acill
us
deni
trifi
cans
OR
F29
has
no
sign
ifica
nt
hom
olog
ues
inpr
oteo
bact
eria
4 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
3 (
2)
b1bf
110
d03
ndashA
Fla
voba
cter
iace
aeba
cter
ium
am
ong
sequ
ence
dge
nom
es m
ost
clos
ely
rela
ted
toC
ytop
haga
hutc
hins
onii
16 o
f 18
(84
) C
DS
s w
ith
supp
orte
d ph
ylog
enet
icto
polo
gies
agr
ee w
ith16
S f
ragm
ent
OR
F5
and
OR
F10
hav
e no
cl
ose
hom
olog
ues
in
othe
r B
acte
roid
es a
ndph
ylog
enet
ic a
naly
sis
sugg
ests
fre
quen
ttr
ansf
er
OR
F4
has
no d
etec
tabl
eho
mol
ogue
s in
oth
er
Bac
tero
ides
A
tran
spos
on w
ith 8
C
DS
s lik
ely
acqu
ired
from
rel
ativ
e of
Bac
tero
ides
thet
aiot
aoim
icro
n
3 C
DS
s (1
0 o
f to
tal)
h
ave
likel
y b
een
acq
uir
ed b
y L
GT
The
tra
nspo
son
not
incl
uded
as
it ha
sbe
en t
rans
ferr
edw
ithin
the
B
acte
roid
es
10
(3
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
a O
nly
LGT
eve
nts
invo
lvin
g th
e C
DS
fro
m t
he fo
smid
clo
ne a
naly
sed
was
cou
nted
and
onl
y w
hen
they
wer
e su
ppor
ted
by p
hylo
gene
tic a
naly
ses
or c
lear
phy
loge
netic
dis
trib
utio
n pa
ttern
s (i
e
the
gene
is n
ot p
rese
nt in
its
rRN
A g
roup
but
pre
sent
in s
ome
othe
r di
stin
ct b
acte
rial g
roup
) N
umbe
r of
CD
Ss
acqu
ired
by L
GT
is s
how
n in
bol
db
O
RFa
ns w
here
cla
ssifi
ed a
s C
DS
s w
ith n
o si
gnifi
cant
mat
ch in
Gen
Ban
k M
atch
es t
o se
quen
ces
in t
he e
nviro
nmen
tal p
ortio
n of
Gen
Ban
k w
ere
not
cons
ider
ed I
n pa
rent
hesi
s is
giv
en t
he
prop
ortio
n of
pro
tein
cod
ing
DN
A t
hat
has
no m
atch
in G
enB
ank
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2021
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
(ORF16) showing that this lineage has indeed acquiredproteobacterial genes This CDS might have been part ofthe α-proteobacterial island upon transfer
In the Flavobacteriaceae fosmid b1bf11d10 a largeself-transmitting conjugative transposon was identified(Fig 1) This transposon is inserted next to a tRNA and issimilar in sequence and structure to the transposonsfound in Bacteroides thetaiotaomicron (Xu et al 2003)Bacteroides fragilis (Kuwahara et al 2004) and Porphy-romonas gingivalis (Nelson et al 2003) In the phyloge-netic tree of the transposase gene (ORF21) the CDSfrom the fosmid falls into a cluster containing numerousB thetaiotaomicron sequences separated from the singleCytophaga hutchinsonii homologue detected among the100 best BLAST hits For the other CDSs that are clearlypart of this transposon (ORF22ndashORF27) we found no
significant homologues in C hutchinsonii and the best(and in most cases the only) match was always to Bthetaiotaomicron and P gingivalis genes suggesting thatthis transposon has been acquired from the Bacteroidaleslineage It is likely that we have captured only part of thistransposon ndash because many of the CDSs found in thetransposons in B thetaiotaomicron are not present in thefragment we have sequenced ndash and that also the 3prime CDSsin this fosmid clone (ORF28ndashORF30) were transferredalong with this transposon Additional CDSs (possibly notinvolved in transposon function) where also present in theB thetaiotaomicron transposons (Xu et al 2003) Wenote that the acquisition of this transposon was notincluded in our LGT estimate as it originated from thesame major bacterial group as the fosmid clone
Interestingly one gene was found to have been trans-ferred to two of the fosmids the fusA paralogue inb1bcf11d04 and b1dcf51c12 (Figs 1 and 4) This pro-tein appears to be a distant paralogue of fusA and it hasa very patchy phylogenetic distribution suggesting that itoriginated in one of the lineages that possesses it andthen has been transferred to the other lineages Onecharacteristic common to the organisms encoding thisprotein is that they are all anaerobes or microaerophilic(Symbiobacterium thermophilum) and they are all foundin environments similar to the one sampled here Trans-ferred genes are likely to give a selective advantage in theenvironment where the organisms harbouring them liveand an ecological function for this fusA paralogue shouldbe sought
Another set of genes identified in two of the fosmidclones forms a cluster encoding outer membrane proteinsand proteins involved in biopolymer transport (OmpATolB TonB ExbD TolQ) This cluster is found in both thecandidate division WS3 clone b1dcf51c12 and the δ-proteobacterial clone b1bcf11h03 (Fig 1) In this casethe gene cluster appears to have been transferred from aδ-proteobacterium to b1dcf51c12 while it might be nativeto b1bcf11h03 (Fig 5) This gene cluster also appearsto have been transferred to Chlorobium tepidum as bothb1dcf51c12 and C tepidum cluster within the δ-proteo-bacteria for all these genes except TonB (from which wecould not make a reliable alignment) Robust phylogenieswere only obtained from OmpA and TolB However theconserved gene order in b1dcf51c12 C tepidumb1bcf11h03 and other δ-proteobacteria such as Geo-bacter suggests that this entire 4-kb fragment was trans-ferred from a δ-proteobacterium to C tepidum andb1dcf51c12 probably as two separate events Moreoverfor b1dcf51c12 the fusA paralogue discussed abovemay have been transferred as part of this gene cluster asthey are found close together in this clone The second δ-proteobacterial fosmid clone b1bcf11d04 also containsan OmpA homologue However this CDS is distantly
Fig 4 Maximum Likelihood phylogeny fusA homologues estimated using PMBML (661 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF19 and b1dcf51c12 ORF15 sequences against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbitrarily rooted by Aquifex aeolicus Results from bootstrap analyses are indicated as in Fig 3
10
Aquifex aeolicus Thermotoga maritima
Chlorobium tepidum b1dcf51c12ORF15
b1bcf11d04ORF19Desulfovibrio vulgaris
Desulfotalea psychrophila Magnetococcus sp MC-1
Geobacter sulfurreducens Geobacter metallireducens
Moorella thermoacetica Desulfitobacterium hafniense
Symbiobacterium thermophilum Chloroflexus aurantiacus
Dehalococcoides ethenogenesThermoanaerobacter tengcongensis
Clostridium thermocellumFusobacterium nucleatum
Clostridium perfringensClostridium tetani
Thermus thermophilus Rubrobacter xylanophilus
Mycoplasma penetransUreaplasma parvum
Geobacillus stearothermophilusExiguobacterium sp 255-15
Bacillus cereus Bacillus halodurans
Listeria monocytogenes Bacillus subtilis
Oceanobacillus iheyensis Staphylococcus aureus
Lactobacillus johnsonii Pediococcus pentosaceusLactobacillus plantarum
Enterococcus faecalisLactococcus lactis
Streptococcus mutans Streptococcus agalactiae
Moorella thermoacetica Symbiobacterium thermophilum
Thermoanaerobacter tengcongensis Clostridium thermocellum
Clostridium acetobutylicumClostridium perfringens
Clostridium tetani Chlorobium tepidum
Fusobacterium nucleatumThermobifida fusca
Desulfovibrio desulfuricansMagnetococcus sp MC-1
Geobacter sulfurreducensSynechococcus elongatus
Prochlorococcus marinus Synechococcus sp WH 8102
Thermosynechococcus elongatus Nostoc punctiforme
Synechocystis sp PCC 6803 Trichodesmium erythraeum
Spirulina platensis Campylobacter jejuni Helicobacter pylori Wolinella succinogenes
Legionella pneumophilaMethylococcus capsulatus
Coxiella burnetii Photorhabdus luminescens
Pasteurella multocida Shewanella oneidensis Photobacterium profundum Vibrio parahaemolyticusNeisseria meningitidis
Chromobacterium violaceum Bordetella parapertussis
Ralstonia metallidurans Bordetella bronchiseptica Burkholderia pseudomalleiRalstonia metallidurans
Azoarcus sp EbN1 Dechloromonas aromatica
Nitrosomonas europaea Thiobacillus denitrificans
66
57 65 55
61
5160
9072
80
86
88
6090
63
50 52 75 74
9094
50 68 74
78
53
7985
8481
72
53 9968
7790
70
2022 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
related to the OmpA found in this gene cluster and wasnot included in the alignment
We also identified some mobile genes that might beinvolved in biodegradation of pollutants by searching thePfam database In one of the γ-proteobacterial fosmidsb1bcf11c4 we identified a glutathione-S-transferase(GST ORF36) gene that was flanked by an acetyltrans-ferase gene (ORF35) and a transporter (ORF34) Eukary-otic GSTs are important in detoxifying metabolism Wellcharacterized bacterial GSTs (such as dichloromethanedehalogenase and 12-dichloroepoxyethane epoxidase)on the other hand are catabolic enzymes that play anessential role in growth on various difficult-to-degradechemicals (Vuilleumier and Pagni 2002) Considering theenvironment the fosmid originated from ndash highly pollutedmarine sediments ndash these CDSs would be good candi-dates for genes involved in biodegradation of a xenbiotic
compound The b1bf11c4 GST-gene clusters with a γ-proteobacterium (Acinetobacter sp ADP1 Accession noYP_046221) However as observed by Vuilleumier andPagni (2002) the phylogeny suggests that this gene hasbeen frequently transferred In support of this CDS havingbeen acquired by LGT its neighbour ndash ORF34 ndash clustersrobustly within the β-proteobacteria while ORF35 clusterswith δ-proteobacteria (although with no bootstrapsupport)
Another gene that might be involved in biodegradationof pollutants was identified among the CDSs that havebeen transferred into the β-proteobacterial fosmidb1bf11a01 ndash ORF31 which encodes a dienelactonehydrolases Dienelactone hydrolases play a crucial role inchlorocatechol degradation via the modified ortho cleav-age pathway (Eulberg et al 1998 Muller et al 2004)suggesting that the bacterium from which this fragmentoriginated might use chloraromatic compounds as energysource However it should be noted that this CDS is foundin a cluster of CDSs from genome projects with no exper-imentally confirmed function Again this gene is flankedby other genes that also have been acquired by LGT Thephylogeny of the neighbouring genes ndash ORF30 an S4domain protein suggests that it has been acquired froma γ-proteobacterium The next gene upstream ORF29could not be used in phylogenetic analyses However thisCDS has no match in its close relative T denitrificans andits best match was to a conserved membrane protein fromClostridium tetani (Table S11) Thus it is likely that allthese genes have been acquired by LGT Notably a shortinverted repeat (80 identity) was found to flank thesegenes (34021ndash34040 36693ndash36674)
Few laterally transferred CDSs identified by G + C content
Differences in G + C content are commonly used as anindication of recent LGT (Lawrence and Ochman 1997)We identified only eight CDSs that showed a G + C con-tent 10 higher or lower than the average for the respec-tive fosmid clone (see Tables S1ndash12) ORF20 in the δ-proteobacterial clone b1bcf11h3 has a G + C content of475 compared with 366 for the complete fosmid ThisCDS clusters with Desulfovibrio vulgaris within a mixedclade with no bootstrap support and was not included inthe LGT estimate for this fosmid A very short ORFan(ORF1) in the candidate division OP8 clone b3cf12f09has a G + C content of 436 compared with 594 forthe fosmid clone In addition the transposase (ORF16)and its neighbouring ORFan (ORF17) in the same clonehave a G + C content of 463 and 402 respectivelyORF11 ORF13 and ORF14 in the γ-proteobacterial cloneb3cf12d07 all show higher G + C content than the restof the fosmid with 664 657 and 647 comparedwith 525 for the rest of the fosmid All these CDSs
Fig 5 Maximum Likelihood phylogeny of OmpA homologues esti-mated using PMBML (135 positions in alignment) The sequences were obtained by blasting the b1dcf51c12 ORF7 sequence against Gen-Bank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative We also removed three sequences from Chlamydiaceae as these sequences formed a long unstable branch in the tree as well as some sequences that where considerably shorter than the remaining alignment The tree was arbitrarily rooted by Agrobacterium tumefaciens Results from bootstrap analyses are indicated as in Fig 3
10
Agrobacterium tumefaciens Sinorhizobium meliloti
Brucella melitensis Mesorhizobium loti
Mesorhizobium sp BNC1 Helicobacter bizzozeronii
Bartonella henselae Rhodopseudomonas palustris Bradyrhizobium japonicum
Rhodobacter sphaeroidesSilicibacter sp TM1040
Rhodospirillum rubrum Caulobacter crescentus
Magnetospirillum gryphiswaldense Rickettsia typhi
Rickettsia sibirica Gluconobacter oxydans
Zymomonas mobilis Novosphingobium aromaticivorans
Novosphingobium aromaticivorans Magnetococcus sp MC-1
Myxococcus xanthusXanthomonas campestris
Desulfotalea psychrophila Wolinella succinogenes
Desulfotalea psychrophila Desulfovibrio vulgaris
Geobacter metallireducens Geobacter sulfurreducens
Geobacter metallireducens Geobacter sulfurreducens
Chlorobium tepidum b1bcf11h03ORF12
Bdellovibrio bacteriovorus b1dcf51c12ORF7
Psychrobacter sp 273-4 Acinetobacter sp ADP1
Microbulbifer degradans Pseudomonas syringae Pseudomonas aeruginosa
Rubrivivax gelatinosus Thiobacillus denitrificans Nitrosomonas europaea
Ralstonia solanacearum Ralstonia eutropha
Burkholderia fungorum Burkholderia cepacia
Burkholderia cepacia Burkholderia pseudomallei
Idiomarina loihiensisPhotobacterium profundum
Shewanella oneidensis Vibrio cholerae Vibrio vulnificus Vibrio parahaemolyticus
Haemophilus somnus Haemophilus influenzae
Pasteurella multocida Photorhabdus luminescens Yersinia pseudotuberculosis
Erwinia carotovora Salmonella enterica
Erwinia chrysanthemi
6155
79 61 83
7255
5467
71
52
65
5152
5474
82
52
73
528498 52
508992
8472 54
527383
698372
8783
77 92
52
LGT and phylogenetic assignment of metagenomic clones 2023
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
cluster with γ-proteobacteria and might therefore repre-sent recent within γ-proteobacteria transfers ORF40 inthe isin-proteobacterial clone b1dcf13c08 a short ORFanhas a G + C content of 222 compared with 347 forthe complete clone In addition ORF9 another ORFan inb1dcf13c08 has a marginally lower G + C content com-pared with the rest of the fosmid clone with 257 Simi-larly ORF26 in the Chloroflexi clone b1dcf13f01 has aG + C content of 478 G + C compared with 569 forthe complete fosmid clone
The first protein coding sequences from uncultivated lineages
Four of the fosmids that we sequenced were from uncul-tivated lineages These fosmid clones represent to ourknowledge the first protein coding sequences obtainedfrom these major bacterial lineages In agreement withtheir rRNA phylotype most of the CDSs with homologuesin GenBank are found as independent lineages in phylo-genetic trees (Fig 1 Table 1) These clones also containseveral large CDSs with no significant matches in Gen-Bank or only partial matches to known proteins (Fig 1Table 1) A t-test showed that both the proportion ofORFans (P = 0002) and the proportion of coding bases(P = 002) with no match in GenBank (excluding the envi-ronmental part of GenBank) were significantly higherthan what was observed in fosmid clones from lineagesthat have cultivated representatives
The two candidate division WS3 clones b1bcf11f04and b1dcf51c12 contain several large CDSs for whichwe can make no clear functional prediction or that haveno match in GenBank For instance for b1dcf51c12 halfof the clone is occupied by two CDSs that have no signif-icant matches in GenBank (ORF4) or only a single match(ORF5) Also none of these CDSs had significantmatches to domains in Pfam These CDSs might repre-sent lineage-specific proteins and homologues may beidentified when more sequences from this lineages areavailable The candidate division OP8 also contains anumber of ORFans however in this fosmid the predictedproteins tend to be smaller than what we observed for thetwo WS3 clones
The b1dcf51a06 clone encodes a large ORFan(ORF1) as well as several smaller ORFans (ORF5ORF7ndash9 ORF14) and CDSs with only single hits in Gen-Bank (ORF6 ORF11ndash13) (Fig 1) For ORF1 we canmake some functional prediction based on Pfamsearches This protein contains a nucleoside diphosphatekinases domain a fibronectin type III domain as well asa PBS lyase HEAT-like repeat (three repeat units) ThePBS lyase repeat is responsible for specifically attachingparticular phycobilins to apophycobiliprotein subunits inthe phycobilisomes (PBS) which are light harvesting mac-
romolecular complexes of cyanobacteria and red algae(Zhao et al 2000) The phycobilins are open-chain tet-rapyrrole chromophores which function as the photosyn-thetic light-harvesting pigments Interestingly two otherCDSs ndash ORF15 and ORF16 ndash also contain several PBSrepeats It is possible that the proteins encoded by thePBS-containing CDSs in b1dcf51a06 has a similar func-tion as the PBS lyase proteins in cyanobacteria andthat this fosmid clone originated from a photosyntheticorganism
Among the CDSs that do have matches in GenBank arepotential phylogenetic markers The candidate divisionWS3 clone b1bcf11f04 clone contains two CDSs withsimilarity to DNA polymerase III subunit A homologuesDnaE and the Gram-positive type PolC In phylogenetictrees of both genes the b1bcf11f04 homologue forms aseparate lineage (Fig 6) Conserved domain searches atNCBI showed that the PolC-like CDS shows similarity toonly part of this gene ndash the exonuclease domain ndash and itis fused to DinG that encodes Rad3-related DNA heli-cases Proteins with similar domain architecture are foundin several other bacterial genomes mostly Firmicutes aswell as S thermophilum and Chloroflexus aurantiacussuggesting that the candidate division WS3 might be spe-cifically related to one of these lineages In phylogenetictrees of the DinG domain of these proteins the fusionproteins are all found in the same clade (Fig 6) Howeverthe monophyly of this clade was not supported by boot-strap analyses In the Maximum Likelihood phylogeny theb1bcf11f4 CDS clusters at the bottom of the clade withC aurantiacus No non-fusion proteins are found inthis clade suggesting a single origin of this domainorganization
Summary
Metagenomic approaches play an increasing and highlyvisible role in microbial ecology The data sets they gen-erate are complex and coupling the information they pro-vide concerning the metabolic potential of an environmentto organismal lineage that may be present there remainsa challenge Here we have shown the utility of rRNA-targeted cloning and phylogenetic analysis of CDSs inmaking such a coupling We also show that LGT evenwhen not precluding provisional assignment to lineages(taxonomy) will likely complicate the history of any lin-eage (phylogenetics) making phylotype-ecotype infer-ences provisional Environmental metagenomic data opena window into a rich world of genetic interactions someof which might be partially reconstructed as we havedescribed here The bioinformatic challenges associatedwith a complete metagenomic assessment of an environ-ment as complex as Baltimore harbour sediment aredaunting indeed But progress in understanding our own
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
LGT and phylogenetic assignment of metagenomic clones
2013
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
library for rRNA genes usually leads to about 1 ofpositives clones (Suzuki
et al
2004) Although ourpCC1FOSCeuI23S libraries contained only 62 uniqueclones at least 49 of them (79) contained a 23S genewhich is the equivalent of screening 4900 clones from alsquonormalrsquo fosmid library Also the peripheral location of therDNA on the DNA fragments greatly facilitates screeningand sequencing It is also unlikely to have the samebiases as polymerase chain reaction (PCR) screening forthe recovery of rDNA containing clones For I-CeuI recov-ery of positives is based on a single
sim
20 bp DNA region(rather than two for PCR) and allows for a different typeof degeneracy of this DNA region (drop in cuttingefficiency for divergent sequences) Although the I-CeuIrecognition sequence is specific to bacteria (except Acti-nobacteria) other homing endonucleases such as I-CreIand I-DmoI could be used to recover archaeal and acti-nobacterial DNA fragments
Phylogenetic analyses of rRNA genes demonstrate the recovery of protein-coding genes from a wide diversity of bacterial lineages
Twelve rRNA containing fosmid clones were fullysequenced The annotation of these clones is given inTables S1ndash12 (see
Supplementary material
) and Fig 1gives an overview of the fosmid clonesrsquo phylogenetic affil-iation Figure 2A shows the phylogenetic trees estimatedfrom the 1000 bp 23S rRNA from the I-CeuI-fosmids forseven of the fosmid clones this 23S tag could be used toassign the clone to a specific bacterial lineage We have23S rRNA containing fragments from two
δ
-proteobacte-ria two
γ
-proteobacteria one
ε
-proteobacterium one
β
-proteobacterium (from this we also have the 16S rRNA)and one taxon from the phylum Chloroflexi Two fosmidclones contained 16S rRNA genes ndash B1BF110d03 andB1BF11a01 ndash and phylogenetic analyses placed thesesequences within the
Flavobacteriaceae
and
β
-proteobac-teria respectively (Figs 2B and C)
For four fosmid clones ndash b1bcf11f04 b1dcf51c12b3cf12f09 and b1dcf55a06 ndash the 23S rRNA-tag did notcluster with any specific 23S lineage For these clones weattempted to obtain the 16S rRNA sequence by using onespecific 23S rRNA primer and a universal 16S primer Wesuccessfully obtained four 16S-23S rRNA sequencesthat showed 98ndash99 identity to the 23S fragment inb11bcf11f04 (715 bp overlap) One of these ampliconswas fully sequenced and phylogenetic analyses showedthat it belong to the candidate division WS3 (Dojka
et al
1998) (Fig 2B) Because b1dcf51c12 clusters signifi-cantly with b1bcf11f04 in the 23S rRNA tree (Fig 2A)we also assigned this clone to the WS3 division Forb3cf12f09 we obtained two different 16S-23S rRNAclones that showed 100 and 99 identity to the 23S
fragment of this clone (281 bp overlap) and phylogeneticanalyses showed that this clone belongs to the candidatedivision OP8 (Hugenholtz
et al
1998) (Fig 2B) The ITSregions of both the WS3 and the OP8 rRNA operonscontained tRNA Ile and tRNA Ala For b1dcf51a06 no16S rRNA sequence could be obtained
Most protein coding sequences are in agreement with the adjacent rRNA genes in phylogenetic analyses
Phylogenetic trees were obtained for all predicted CDSsof each fosmid clone sequenced We compared the phy-logenetic placement of each CDS to the phylogenysuggested by the rRNA If the phylogeny of the CDSsuggested that it belonged to another bacterial group andthis relationship was supported in bootstrap analysesacquisition by LGT was inferred for the CDS For the clonewhere no specific phylogenetic relationship could beinferred (b1dcf51a06) and for the fosmid clones wherethe rRNA showed that it originated from a bacterium withno cultivated representative we classified as likelyinstances of LGT all CDSs that did cluster specifically(with bootstrap support) with another bacterial group Asummary of the phylogenetic analysis of the rRNA genesas well as of all protein coding CDSs is given in Table 1
The majority of the CDSs did agree with their respectiverRNA phylogeny and 57ndash96 (average 768) of theCDSs that gave good alignments and robust phylogeniesshowed the same phylogenetic relationship as the rRNAgenes This was also true for the fosmid clones frombacterial lineages with no cultivated representative asmost CDSs from these clones did not cluster with anyspecific lineage or had no or only a few significantmatches in GenBank (Fig 1) However for these clonesthe number of CDSs that robustly agree with the rRNAtopology is problematic to calculate as they may or maynot fall into well-supported clades when more sequencesfrom these phyla become available The fosmid cloneswith the highest number of congruent trees areb1bf11a01 which originated from a
β
-proteobacteriumvery similar to
Thiobacillus denitrificans
where 96 of theCDSs with robust phylogenies agree with the rRNA genesand b1dcf13c08 which originated from an
isin
-proteobac-terium where 90 of the lsquotreeablersquo CDSs agree with therRNA
High levels of LGT detected in phylogenetic analyses
Phylogenetic analyses showed that 7ndash44 (average17) of the CDSs have been acquired by LGT from dis-tantly related bacterial lineages (Fig 1 Table 1) For manyof the fosmid clones there were additional CDSs thatprobably also have been involved in LGT these caseswere not scored as LGT either because the CDS was too
2014
C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
Fig
1
Ove
rvie
w o
f th
e se
quen
ced
fosm
id c
lone
s Y
ello
w C
DS
s ar
e su
gges
ted
to h
ave
been
acq
uire
d by
LG
T a
nd b
lue
CD
Ss
have
no
sign
ifica
nt m
atch
in G
enB
ank
A
α
-pro
teob
acte
ria B
β
-pr
oteo
bact
eria
D
δ
-pro
teob
acte
ria
E
ε
-pro
teob
acte
ria
G
γ
-pro
teob
acte
ria
C
Cya
noba
cter
ia
CB
C
hlor
obi-B
acte
roid
etes
F
Fir
mic
utes
P
pro
teob
acte
ria
CH
C
hlor
oflex
i T
D
The
rmus
-D
eino
cocc
us g
roup
A
CT
Act
inob
acte
ria
PL
Pla
ncto
myc
etes
S
PIR
S
piro
chae
tes
TH
ER
T
herm
otog
ales
A
Q
Aqu
ifeca
les
FU
SO
F
usob
acte
ria
AR
CH
A
rcha
eal
EU
K
Euk
aryo
tes
EN
Ven
viro
nmen
tal s
eque
nce
c
lust
er r
obus
tly w
ithin
a m
ixed
cla
de in
phy
loge
netic
tree
s ndash
no
sign
ifica
nt m
atch
in G
enB
ank
Upp
erca
se s
uppo
rted
by
phyl
ogen
etic
ana
lysi
s L
ower
case
sug
gest
edby
BLA
ST
sea
rche
s as
the
re w
as n
o su
ppor
ted
phyl
ogen
y T
he lo
w-q
ualit
y re
gion
in b
1dcf
13
c08
(pos
ition
119
2ndash13
42)
is in
dica
ted
by a
bla
ck b
ox T
he o
rang
e sh
adin
gs in
dica
tes
LGT-
CD
Ss
that
are
foun
d in
mor
e th
an o
ne fo
smid
ORFAN
A c
onju
gativ
e tr
ansp
oson
ob
tain
ed fr
om a
Bac
terio
ides
bac
teriu
m
unkn
own
b1dc
f51
a06
Chl
orof
exi
b1dc
f13
f01
Can
dida
te d
ivsi
on O
P8
b3cf
12
f09
Can
dida
te d
ivsi
on W
S3
b1bc
f11
f4
Can
dida
te d
ivsi
on W
S3
b1bc
f51
c12
d-pr
oteo
bact
eria
b1bc
f11
h03
d-pr
oteo
bact
eria
b1bc
f11
d04
e-pr
oteo
bact
era
b1dc
f13
c08
g-pr
oteo
bact
eria
b1dc
f12
d07
g-pr
oteo
bact
eria
b1bc
f11
c04
b-pr
oteo
bact
eria
b1bf
11
a01
Fla
voba
cter
iace
aeb1
bf1
10d
03
LGT and phylogenetic assignment of metagenomic clones
2015
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
Fig 2
rRNA phylogeniesA The minimum evolution tree estimated from LogDet distances of the 23S-tag from the CeuI-fosmids (984 positions in alignment) For the sequences from the fosmid clones the Maximum Likelihood topology was similar (GTR
+
G
+
I) except that the
δ
-proteobacteria where paraphyletic with the
γ
- and
β
-proteobacteria clustering within the
δ
-proteobacteria Moreover b1bcf11d04 fell at the bottom of this cladeB The minimum evolution tree estimated from LogDet distances of the 16S sequences (1243 positions in alignment) For the sequences from the fosmid clones the Maximum Likelihood (GTR
+
G
+
I) topology was identical However there where several differences in the backbone of the tree with for instance Geobacter clustering with Firmicutes The trees in both A and B were rooted by the
Thermotoga maritima
sequenceC The minimum evolution tree estimated from LogDet distances of the closest matches of the 16S fragment in b1bf110d03 (1046 positions in alignment) The Maximum Likelihood (GTR
+
G
+
I) topology was identicalFor all three trees numbers on branches refers to bootstrap values from the minimum evolution analysis (
italic
) and from the Maximum Likelihood analysis (plain text) If both bootstrap values were above 70 this is indicated by a grey circle while a black circle indicated that all three values were above 90
B
Thermotoga maritima Coprothermobacter proteolyticus
Acidobacterium capsulatumPirellula marina
R76-B102OPB95
OPB5HMMVPog-54
HS9-30
PBS-II-35
LD1-PB19PBS-III-30
PRR-12Simkania negevensisBorrelia burgdorferi
Synechococcus Chloroflexus aurantiacus
Dehalococcoides ethenogenes Bacteroides thetaiotaomicron
Cytophaga hutchinsoniiChlorobium tepidum
Leptospirillum ferrooxidans Deinococcus radiodurans
Geobacillus subterraneus Paenibacillus popilliae
Fusobacterium nucleatum Geobacter metallireducens
Bradyrhizobium japonicum Vibrio splendidus
Methylobacillus flagellatum Thiobacillus denitrificans
005 substitutionssite
b3cf12f09
b1bcf11f04
b1bf11a01
candidate division OP8
candidate division WS3
Betaproteobacteria
92
72
54
78
57
75
Porphyromonas gingivalis
Bacteroides thetaiotaomicron
Cytophaga hutchinsonii
Cellulophaga pacifica
Flavobacterium gelidilacus
Flavobacterium psychrolimnae
Flavobacterium frigoris
Flavobacterium xinjiangensis
Gelidibacter algens
Bizionia paragorgiae
Formosa algae
Algibacter lectus
Flavobacterium sp 5N-3
Psychroserpens burtonensis
Mesophilibacter yeosuensis
b1bf110d03
Flavobacteriaceae bacterium BSA CS 02
Flavobacteriaceae bacterium BSD RB 42
001 substitutionssite
C
isolated from estuarine and salt marsh sediments
b3cf12f09Chlorobium tepidum
Synechocystis sp D64000
Deinococcus radiodurans
b1dcf13f01Dehalococcoides ethenogenes
b1dcf511a06Fusobacterium nucleatum
b1bcf11f04b1dcf51c12
Mycobacterium kansasiiStreptomyces coelicolor Thermomonospora chromogena
Paenibacillus popilliaeOceanobacillus iheyensis
Geobacillus kaustophilus
Simkania negevensis Pirellula sp strain 1
b3cf12d07Pseudomonas stutzeri
005 substitutionssite
candidate division WS3
Wolinella succinogenes Helicobacter hepaticus
Campylobacter jejuni b1dcf13c08
Epsilonproteobacteria
b1bcf11d04Desulfotalea psychrophila
b1bcf11h03Nannocystis exedens
Stigmatella aurantiacaGeobacter metallireducens
Deltaproteobacteria
Methylobacillus flagellatusb1bf11a01Thiobacillus denitrificans
Halomonas pantelleriensis
Microbulbifer degradansVibrio splendidus
b1bcf11c04Uncultured bacterium 463 clone EBAC080-L32B05
Betaproteobacteria
Gammaproteobacteria
Thermotoga maritima
candidate division OP8
Chloroflexi
Symbiobacterium thermophilum
Bacillus cereus
Desulfovibrio vulgaris
A
51
6197
87
55
67
54100
61
58
84
57
8968
58
97
54
65
64
68
73
51
58
53
87
58
2016
C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
short to obtain reliable alignments the CDS was found ina lsquomixedrsquo clade also containing genes from the same bac-terial group or the CDS was found outside its group butdid not cluster with any specific lineage For three of theclones more than 30 of the CDSs have been acquiredby LGT (Table 1) two of these are from candidate divi-sions and one is from a
δ
-proteobacterium For all threeof these fosmids there appears to have been a transfer ofa large island of genes from a phylogenetically distantlineage Specifically we infer an
α
-proteobacterial islandin b3cf12f09 a
δ
-proteobacterial island in b1dcf51c12and an archaeal
β
-proteobacterial island in b1bcf11d04(Fig 1) It should be noted that the proportions of foreigngenes identified here might not represent the proportion
of foreign genes in the respective genomes that we havesampled but
rather the amount of LGT to be expectedwhen sequencing environmental fosmid clones
Forinstance in some genomes LGT might be enriched incertain variable parts of the genome Indeed the distribu-tion of proteins that match COG categories was signifi-cantly different (
P
=
13 e-13 in a
χ
2
-test) to what weobserved for the end-sequencing of lsquonormalrsquo fosmidclones (supplemental Fig S1) the main difference beingproportionally more J K U F and H category sequencesin the full fosmid sequences and more L P R and Scategory sequences among the end-sequences Whencomparing the distributions of different COG-groups (ieinformational metabolism etc) however the two datasets were significantly different only when including thepoorly characterized categories (R S) If such genes aremore frequently transferred than the other categoriesthen we would be underestimating the level of LGT thatwould be expected when analysing metagenomic clones
Interestingly in b1bcf11d04 the transfer vector for oneof the acquired gene clusters could be identified ORF6encodes an acetyl transferase gene and ORF8 ORF9and ORF10 encode subunits for an acyl-CoA synthase ndashtwo
α
-subunits and one
β
-subunit Phylogenetic analysessuggested all four CDSs have been acquired by LGTlikely from a
β
-proteobacterium The
β
-proteobacteriahave in turn likely acquired the acyl-CoA synthase genesfrom Archaea (Fig 3) In support of the archaeal origin ofthese genes the acyl-CoA synthase in bcf11d04 hassimilar domain organization to the acetyl-CoA synthase in
Pyrococcus
spp with two subunits (Sanchez
et al
2000)Furthermore these genes have been transferred multipletimes and the transfers involved all three domains of life[Fig 3 (Andersson
et al
2003)] These transferred CDSsare preceded by one integrase gene (ORF3) a trans-posase gene (ORF4) and an intergerasetransposasegene (ORF5 COG2801 Tra5 which contains an inte-gerase core domain Table S7) which probably wereresponsible for transferring this cluster into this genomeThe
α
-proteobacterial island in the b3cf12f09 cloneencodes a wide range of different functions and no typicalmobile elements were identified However as this islandextends to the 3
prime
end of the fosmid mobile genes mightbe found further downstream The first CDS of this islandencodes a DnaJ-class chaperone (ORF29) which is trun-cated at the 5
prime
end This pseudogene still shows 65protein identity to a homologue in
Magnetoospirillummagnetotacticum
(Table S3) Hence this probably repre-sents a very recent transfer (or rearrangement) Anotherpossibility is that this fosmid might be a chimera Howeverthe G
+
C content of the CDSs in the
α
-proteobacterialisland (595 G
+
C) is very similar to the rest of thefosmid (596 G
+
C supplemental Table S3) Also fur-ther upstream there is a proteobacterial transposase
Fig 3
Maximum Likelihood phylogeny of acetyl-CoA synthetase (ADP-forming) homologues estimated using PMBML (459 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF8 and ORF10 sequences against GenBank and the 100 best matches were retrieved and aligned Groups of very similar sequences from the same species or sister species were trimmed down to one sequence representative The tree was arbi-trarily rooted by Entamoeba histolytica Numbers on branches refers to bootstrap support obtained from using PMBML in bold PUZZLEBOOT in plain text and Neighbour-joining in italic If all bootstrap values were above 70 this is indicated by a grey circle while a black circle indicated that all three values were above 80
10
Entamoeba histolytica Parachlamydia sp UWE25
Rubrobacter xylanophilus Gloeobacter violaceus
Nostoc sp PCC 7120Thermosynechococcus elongatus
Dechloromonas aromaticaMesorhizobium sp BNC1
Sinorhizobium melilotiXanthomonas axonopodisRhodopseudomonas palustris
Bradyrhizobium japonicum Desulfovibrio desulfuricans
Rhodospirillum rubrumMagnetospirillum magnetotacticum
Magnetospirillum magnetotacticumShewanella oneidensis
Photobacterium profundumVibrio cholerae
Vibrio vulnificus Photorhabdus luminescens
Yersinia pestis Salmonella enterica
Escherichia coli Methanopyrus kandleri
Pyrococcus furiosus Archaeoglobus fulgidus
Methanococcus maripaludisMethanocaldococcus jannaschii
Magnetococcus sp MC-1 Chloroflexus aurantiacus
Spironucleus barkhanus Giardia intestinalis
Pyrococcus furiosusThermoplasma acidophilum Thermoplasma volcanium
Pyrococcus furiosus Streptomyces avermitilisBradyrhizobium japonicum
Ralstonia metalliduransFerroplasma acidarmanus
Sulfolobus solfataricusSulfolobus tokodaii
Pyrococcus furiosusPyrococcus furiosus
Pyrobaculum aerophilumMethanosarcina mazei Methanosarcina acetivoransThermobifida fusca
Archaeoglobus fulgidusArchaeoglobus fulgidus
Archaeoglobus fulgidusArchaeoglobus fulgidus
b1bcf11d04ORF8b1bcf11d04ORF10
Bordetella bronchiseptica Ralstonia metallidurans
Bordetella pertussis Bordetella bronchiseptica
Burkholderia fungorumBurkholderia fungorumRalstonia eutropha
Bordetella bronchisepticaRalstonia eutropha
Bradyrhizobium japonicumRalstonia eutropha
Burkholderia fungorumBordetella bronchiseptica
Ralstonia eutrophaBordetella bronchiseptica
Bradyrhizobium japonicumBordetella bronchiseptica
Pseudomonas mendocina Bradyrhizobium japonicum
7480
9764
75
52
83
52
57
60
61
70
89
51
64
6262
64
57
58
50
7173
62
100100
LGT and phylogenetic assignment of metagenomic clones 2017
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Tab
le 1
S
umm
ary
of p
hylo
gene
tic a
naly
ses
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
b1dc
f51
a06
No
clea
r af
filia
tion
with
exi
stin
gse
quen
ces
Cou
ld n
ot b
eam
plifi
ed
Mos
t C
DS
s ha
ve n
o or
only
a f
ew s
igni
fican
tm
atch
es in
Gen
Ban
kO
RF
4 cl
uste
rs w
ithLe
ptos
pira
inte
rrog
ans
with
in a
mix
ed c
lade
ho
wev
er
L in
terr
ogan
sha
s se
vera
l par
alog
ues
and
this
gen
e ap
pear
sto
hav
e be
en f
requ
ently
tran
sfer
red
and
islik
ely
to b
e a
tran
sfer
OR
F20
clu
ster
s w
ithM
etha
nosa
rcin
a w
ithin
δ-pr
oteo
bact
eria
O
RF
19cl
uste
rs w
ith G
eoba
cter
but
is m
ostly
foun
d in
met
hano
gens
OR
F17
and
OR
F18
have
hom
olog
ues
inM
etha
noge
ns o
nly
4 C
DS
s (1
9 o
f th
eto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
33
(38
)
b1dc
f13
f01
Clu
ster
s w
ithD
ehal
ococ
coid
eset
heno
gene
sC
hlor
oflex
usau
rant
iacu
s 23
SrR
NA
seq
uenc
eof
too
poo
r qu
ality
to in
clud
e in
the
tree
7 of
10
CD
Ss
(70
) w
ithsu
ppor
ted
phyl
ogen
etic
topo
logi
es a
gree
with
23S
fra
gmen
t In
addi
tion
6 C
DS
s w
hich
only
hit
Chl
orofl
exus
aura
ntia
cus
Two
CD
Ss
have
like
lybe
en a
cqui
red
thro
ugh
LGT
One
clu
ster
s w
ithhi
gh s
uppo
rt w
ithT
herm
otog
a m
ariti
ma
(OR
F16
) an
d on
e cl
uste
rsw
ithin
the
euk
aryo
tes
(OR
F25
)
OR
F2
has
only
sign
ifica
ntho
mol
ogue
s in
Cro
cosp
haer
aw
atso
nii
3 C
DS
s (1
1 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
14
(5
)
b3cf
12
f09
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A g
enes
and
do
not
clus
ter
with
in a
nysp
ecifi
c ba
cter
ial g
roup
Phy
loge
netic
ana
lysi
ssu
gges
ts t
hat
10 C
DS
sha
ve li
kely
bee
n ac
quire
dby
LG
T 8
of
thes
e ha
vebe
en a
cqui
red
from
an
α-pr
oteo
bact
eriu
man
d ar
e fo
und
linke
d
Thr
ee C
DS
s fo
und
linke
d to
CD
Ss
whe
reph
ylog
enet
ic a
naly
ses
sugg
est
LGT
hav
eal
so li
kely
bee
nac
quire
d by
LG
T
13 C
DS
s (3
2 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
OR
F16
is a
tran
spos
ase
of
prot
eoba
cter
ial
orig
in
and
show
slo
wer
GC
con
tent
than
the
res
t of
the
fosm
id T
wel
ve o
fth
e tr
ansf
erre
dC
DS
s (O
RF
29ndash
41)
are
linke
d an
dal
l app
ear
to h
ave
been
acq
uire
dfr
om a
n α-
prot
eoba
cter
ium
22
(9
)
b1bc
f11
f04
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A a
nd d
oes
not
clus
ter
with
any
spe
cific
bact
eria
l lin
eage
A
mon
g th
ese
was
the
high
ly c
onse
rved
Dna
Ege
ne
Two
CD
Ss
(OR
F14
and
OR
F15
) cl
uste
r w
ithse
quen
ces
from
the
Chl
orob
iBac
tero
idet
esgr
oup
2 C
DS
s (9
o
f to
tal)
hav
e b
een
acq
uir
ed b
y L
GT
26
(14
)
2018 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f51c
12C
andi
date
div
isio
nW
S3
bact
eriu
mM
ost
CD
Ss
have
no
oron
ly a
few
sig
nific
ant
mat
ches
in G
enB
ank
OR
F6ndash
OR
F11
are
als
ofo
und
in b
1bcf
11
h3 in
sam
e or
der
and
phyl
ogen
etic
ana
lysi
ssu
ppor
ts t
hat
OR
F7
OR
F8
and
OR
F10
wer
etr
ansf
erre
d fr
om a
δ-
prot
eoba
cter
ium
to
b1bc
f51c
12 O
RF
10 a
ndO
RF
11 a
lso
clus
ter
with
δ-pr
oteo
bact
eria
ho
wev
er
with
no
boot
stra
p su
ppor
t O
RF
9ha
s on
ly o
ne m
atch
inG
enB
ank
OR
F15
(fu
sA)
clus
ters
with
Chl
orob
ium
tepi
dum
with
inF
irm
icut
es
OR
F12
has
no
hom
olog
ue in
b1bc
f11
h3
but
doe
scl
uste
r w
ith δ
-pr
oteo
bact
eria
ho
wev
er w
ith n
obo
otst
rap
supp
ort
It is
like
ly t
hat
also
thi
sC
DS
was
tra
nsfe
rred
as p
art
of w
ith a
δ-
prot
eoba
cter
ial i
slan
d
8 C
DS
s (4
4 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
One
lar
ge lsquoi
slan
drsquo o
fδ-
prot
eoba
cter
ial
orig
in
22
(29
)
b1cf
11
1h0
3δ-
Pro
teob
acte
rium
ndash8
of 1
3 C
DS
s (5
7)
that
give
s su
ppor
ted
phyl
ogen
ies
agre
e w
ithth
e fr
agm
ent
orig
inat
ing
from
a δ
-pr
oteo
bact
eriu
m
Six
CD
Ss
have
like
ly b
een
acqu
ired
by L
GT
OR
F8
clus
ters
with
Clo
strid
ium
ther
moc
ellu
m a
ndTr
epon
ema
dent
icol
aO
RF
18 is
fou
ndse
para
ted
from
oth
erpr
oteo
bact
eria
inph
ylog
enet
ic t
rees
cl
uste
ring
with
Pla
smod
ium
spp
O
RF
23is
fou
nd in
a m
ixed
cla
dean
d ap
pear
s to
hav
ebe
en f
requ
ently
tran
sfer
red
OR
F28
clus
ters
with
β-
prot
eoba
cter
ia
OR
F29
clus
ters
with
γ-
prot
eoba
cter
ia a
ndO
RF
30 is
fou
nd a
tbo
ttom
of
clad
e th
atco
ntai
ns α
-pr
oteo
bact
eria
and
Act
inob
acte
ria
6 C
DS
s (1
7 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
OR
F11
ndashOR
F16
ha
ve b
een
tran
sfer
red
from
an
ance
stor
of
B1B
CF
11
h03
tob1
dcf5
1c
12 a
sw
ell t
o th
eC
hlor
obiu
m li
neag
e
6 (
1)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2019
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f11
d04
δ-P
rote
obac
teriu
mndash
12 o
f 18
CD
Ss
(67
)w
ith s
uppo
rted
phyl
ogen
etic
top
olog
ies
agre
e w
ith a
δ-
prot
eoba
cter
ial o
rigin
of
the
frag
men
t
Six
CD
Ss
are
sugg
este
dby
phy
loge
netic
ana
lyse
sto
hav
e be
en a
cqui
red
byLG
T O
ne o
f th
ese
tran
sfer
red
gene
s ndasht
hefu
sA h
omol
ogue
(OR
F19
) ndash is
als
o fo
und
inb1
bcf5
c12
Thi
s C
DS
has
been
tra
nsfe
rred
to
othe
r δ-
prot
eoba
cter
ia a
sw
ell
Thr
ee C
DS
s (O
RF
3ndash5)
that
enc
ode
anin
tege
rase
and
tw
otr
ansp
osas
es t
hat
prec
edes
fou
r of
the
LGT
gen
es d
etec
ted
in t
he p
hylo
gene
tican
alys
is
OR
F7
also
likel
y tr
ansf
erre
d w
ithO
RF
3 ndashO
RF
10
OR
F20
and
OR
F21
have
mai
nly
hom
olog
ues
inF
irm
icut
es a
nd is
the
neig
hbou
r of
OR
F19
that
has
als
o be
enac
quire
d fr
omF
irm
icut
es
12 C
DS
s (3
1 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
Inte
rest
ingl
y th
isfo
smid
clo
nepr
ovid
es t
hetr
ansf
er v
ecto
r ndash
the
inte
gera
se a
ndtr
ansp
osas
e ndash
for
8of
the
tra
nsfe
rred
gene
s
ndash
b1bc
f13
c08
ε-P
rote
obac
teriu
m
mos
t cl
osel
yre
late
d to
Cam
pylo
bact
erje
juni
21 C
DS
s gi
ve s
uppo
rted
phyl
ogen
ies
and
ofth
ese
19 (
90
) ag
ree
with
rR
NA
OR
F4
clus
ters
with
Geo
bact
er a
ndC
lost
ridiu
m
and
OR
F23
does
not
hav
eho
mol
ogue
s in
ε-
prot
eoba
cter
ia a
ndcl
uste
rs w
ith γ
- an
d β-
prot
eoba
cter
ia
OR
F24
doe
s no
t gi
ve a
supp
orte
d tr
ee b
utha
s al
so p
roba
bly
been
tra
nsfe
rred
fro
mγ-
or
β-pr
oteo
bact
eria
3 C
DS
s (7
o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
10
(3
)
b3cf
12
d07
γ-P
rote
obac
teriu
m
Clu
ster
s w
ithin
the
γ-pr
oteo
bact
eria
inLo
gDet
dis
tanc
etr
ees
but
at t
heba
se o
f γ-
prot
eoba
cter
ia a
ndβ-
prot
eoba
cter
iain
the
bes
tm
axim
umlik
elih
ood
tree
Onl
y 7
CD
Ss
give
su
ppor
ted
phyl
ogen
ies
O
f th
ese
4 (5
7)
agre
e w
ith r
RN
A
OR
F7
clus
ter
with
in β
-pr
oteo
bact
eria
OR
F15
ha
s a
patc
hy d
istr
ibut
ion
and
does
not
clu
ster
with
ot
her
prot
eoba
cter
ia in
th
e ph
ylog
enet
ic t
ree
Sev
eral
add
ition
al C
DS
s (O
RF
16ndashO
RF
25)
that
did
not
prod
uce
wel
l-re
solv
ed t
rees
ha
d on
ly d
iver
gent
hom
olog
ues
inG
enB
ank
or
nosi
gnifi
cant
hom
olog
ues
may
also
hav
e be
enac
quire
d by
LG
T I
nsu
ppor
t of
thi
sO
RF
26 e
ncod
es a
tran
spos
ase
2 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
O
RF
16 ndash
OR
F25
w
as n
ot in
clud
ed in
es
timat
e du
e to
lim
ited
evid
ence
for
th
e tr
ansf
er o
f the
se
23
(23
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
2020 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f1c
04γ-
Pro
teob
acte
rium
ndash14
CD
Ss
give
sup
port
edph
ylog
enie
s an
d of
thes
e 13
(93
)
agre
ew
ith r
RN
A
Phy
loge
netic
ana
lyse
ssh
ow t
hat
two
CD
Ss
have
bee
n ac
quire
d by
LGT
OR
F3
is f
ound
in a
mix
ed c
lade
whi
leO
RF
30 c
lust
er w
ithin
β-
prot
eoba
cter
ia
Thr
ee g
enes
tha
t sh
owun
cong
ruen
tph
ylog
enie
s b
utw
ith lo
w b
oots
trap
supp
ort
foun
d cl
ose
to O
RF
3 an
d O
RF
34ha
ve p
roba
bly
also
been
acq
uire
d by
LGT
O
RF
5 cl
uste
rsw
ith β
-pro
teob
acte
ria
OR
F31
clu
ster
s w
ithδ-
prot
eoba
cter
ia
and
OR
F32
(G
ST
) cl
uste
rsw
ith a
γ-pr
oteo
bact
eriu
m
but
appe
ars
toha
ve b
een
freq
uent
lytr
ansf
erre
d
5 C
DS
s (1
3 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
3 (
1)
b1bf
11
a01
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns (
98
iden
tity
at 2
3S
rRN
A)
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns(9
8 id
entit
yat
16S
rR
NA
)
Hig
h de
gree
of
gene
sy
nten
y co
mpa
red
with
Thi
obac
illus
de
nitr
ifica
ns
29 C
DS
sha
ve b
est
BLA
ST
mat
chin
Thi
obac
illus
de
nitr
ifica
ns 2
7 of
28
CD
Ss
(96
) th
at g
ive
stat
istic
ally
sup
port
edph
ylog
enie
s ag
ree
with
rR
NA
gen
es
One
OR
F30
(R
suA
)cl
uste
r w
ith γ
-pr
oteo
bact
eria
and
has
no
hom
olog
ue in
T
hiob
acill
us d
enitr
ifica
ns
Two
CD
Ss
(OR
F14
and
O
RF
31)
have
bee
n tr
ansf
erre
d to
bot
h fo
smid
an
d T
hiob
acill
us
deni
trifi
cans
OR
F29
has
no
sign
ifica
nt
hom
olog
ues
inpr
oteo
bact
eria
4 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
3 (
2)
b1bf
110
d03
ndashA
Fla
voba
cter
iace
aeba
cter
ium
am
ong
sequ
ence
dge
nom
es m
ost
clos
ely
rela
ted
toC
ytop
haga
hutc
hins
onii
16 o
f 18
(84
) C
DS
s w
ith
supp
orte
d ph
ylog
enet
icto
polo
gies
agr
ee w
ith16
S f
ragm
ent
OR
F5
and
OR
F10
hav
e no
cl
ose
hom
olog
ues
in
othe
r B
acte
roid
es a
ndph
ylog
enet
ic a
naly
sis
sugg
ests
fre
quen
ttr
ansf
er
OR
F4
has
no d
etec
tabl
eho
mol
ogue
s in
oth
er
Bac
tero
ides
A
tran
spos
on w
ith 8
C
DS
s lik
ely
acqu
ired
from
rel
ativ
e of
Bac
tero
ides
thet
aiot
aoim
icro
n
3 C
DS
s (1
0 o
f to
tal)
h
ave
likel
y b
een
acq
uir
ed b
y L
GT
The
tra
nspo
son
not
incl
uded
as
it ha
sbe
en t
rans
ferr
edw
ithin
the
B
acte
roid
es
10
(3
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
a O
nly
LGT
eve
nts
invo
lvin
g th
e C
DS
fro
m t
he fo
smid
clo
ne a
naly
sed
was
cou
nted
and
onl
y w
hen
they
wer
e su
ppor
ted
by p
hylo
gene
tic a
naly
ses
or c
lear
phy
loge
netic
dis
trib
utio
n pa
ttern
s (i
e
the
gene
is n
ot p
rese
nt in
its
rRN
A g
roup
but
pre
sent
in s
ome
othe
r di
stin
ct b
acte
rial g
roup
) N
umbe
r of
CD
Ss
acqu
ired
by L
GT
is s
how
n in
bol
db
O
RFa
ns w
here
cla
ssifi
ed a
s C
DS
s w
ith n
o si
gnifi
cant
mat
ch in
Gen
Ban
k M
atch
es t
o se
quen
ces
in t
he e
nviro
nmen
tal p
ortio
n of
Gen
Ban
k w
ere
not
cons
ider
ed I
n pa
rent
hesi
s is
giv
en t
he
prop
ortio
n of
pro
tein
cod
ing
DN
A t
hat
has
no m
atch
in G
enB
ank
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2021
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
(ORF16) showing that this lineage has indeed acquiredproteobacterial genes This CDS might have been part ofthe α-proteobacterial island upon transfer
In the Flavobacteriaceae fosmid b1bf11d10 a largeself-transmitting conjugative transposon was identified(Fig 1) This transposon is inserted next to a tRNA and issimilar in sequence and structure to the transposonsfound in Bacteroides thetaiotaomicron (Xu et al 2003)Bacteroides fragilis (Kuwahara et al 2004) and Porphy-romonas gingivalis (Nelson et al 2003) In the phyloge-netic tree of the transposase gene (ORF21) the CDSfrom the fosmid falls into a cluster containing numerousB thetaiotaomicron sequences separated from the singleCytophaga hutchinsonii homologue detected among the100 best BLAST hits For the other CDSs that are clearlypart of this transposon (ORF22ndashORF27) we found no
significant homologues in C hutchinsonii and the best(and in most cases the only) match was always to Bthetaiotaomicron and P gingivalis genes suggesting thatthis transposon has been acquired from the Bacteroidaleslineage It is likely that we have captured only part of thistransposon ndash because many of the CDSs found in thetransposons in B thetaiotaomicron are not present in thefragment we have sequenced ndash and that also the 3prime CDSsin this fosmid clone (ORF28ndashORF30) were transferredalong with this transposon Additional CDSs (possibly notinvolved in transposon function) where also present in theB thetaiotaomicron transposons (Xu et al 2003) Wenote that the acquisition of this transposon was notincluded in our LGT estimate as it originated from thesame major bacterial group as the fosmid clone
Interestingly one gene was found to have been trans-ferred to two of the fosmids the fusA paralogue inb1bcf11d04 and b1dcf51c12 (Figs 1 and 4) This pro-tein appears to be a distant paralogue of fusA and it hasa very patchy phylogenetic distribution suggesting that itoriginated in one of the lineages that possesses it andthen has been transferred to the other lineages Onecharacteristic common to the organisms encoding thisprotein is that they are all anaerobes or microaerophilic(Symbiobacterium thermophilum) and they are all foundin environments similar to the one sampled here Trans-ferred genes are likely to give a selective advantage in theenvironment where the organisms harbouring them liveand an ecological function for this fusA paralogue shouldbe sought
Another set of genes identified in two of the fosmidclones forms a cluster encoding outer membrane proteinsand proteins involved in biopolymer transport (OmpATolB TonB ExbD TolQ) This cluster is found in both thecandidate division WS3 clone b1dcf51c12 and the δ-proteobacterial clone b1bcf11h03 (Fig 1) In this casethe gene cluster appears to have been transferred from aδ-proteobacterium to b1dcf51c12 while it might be nativeto b1bcf11h03 (Fig 5) This gene cluster also appearsto have been transferred to Chlorobium tepidum as bothb1dcf51c12 and C tepidum cluster within the δ-proteo-bacteria for all these genes except TonB (from which wecould not make a reliable alignment) Robust phylogenieswere only obtained from OmpA and TolB However theconserved gene order in b1dcf51c12 C tepidumb1bcf11h03 and other δ-proteobacteria such as Geo-bacter suggests that this entire 4-kb fragment was trans-ferred from a δ-proteobacterium to C tepidum andb1dcf51c12 probably as two separate events Moreoverfor b1dcf51c12 the fusA paralogue discussed abovemay have been transferred as part of this gene cluster asthey are found close together in this clone The second δ-proteobacterial fosmid clone b1bcf11d04 also containsan OmpA homologue However this CDS is distantly
Fig 4 Maximum Likelihood phylogeny fusA homologues estimated using PMBML (661 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF19 and b1dcf51c12 ORF15 sequences against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbitrarily rooted by Aquifex aeolicus Results from bootstrap analyses are indicated as in Fig 3
10
Aquifex aeolicus Thermotoga maritima
Chlorobium tepidum b1dcf51c12ORF15
b1bcf11d04ORF19Desulfovibrio vulgaris
Desulfotalea psychrophila Magnetococcus sp MC-1
Geobacter sulfurreducens Geobacter metallireducens
Moorella thermoacetica Desulfitobacterium hafniense
Symbiobacterium thermophilum Chloroflexus aurantiacus
Dehalococcoides ethenogenesThermoanaerobacter tengcongensis
Clostridium thermocellumFusobacterium nucleatum
Clostridium perfringensClostridium tetani
Thermus thermophilus Rubrobacter xylanophilus
Mycoplasma penetransUreaplasma parvum
Geobacillus stearothermophilusExiguobacterium sp 255-15
Bacillus cereus Bacillus halodurans
Listeria monocytogenes Bacillus subtilis
Oceanobacillus iheyensis Staphylococcus aureus
Lactobacillus johnsonii Pediococcus pentosaceusLactobacillus plantarum
Enterococcus faecalisLactococcus lactis
Streptococcus mutans Streptococcus agalactiae
Moorella thermoacetica Symbiobacterium thermophilum
Thermoanaerobacter tengcongensis Clostridium thermocellum
Clostridium acetobutylicumClostridium perfringens
Clostridium tetani Chlorobium tepidum
Fusobacterium nucleatumThermobifida fusca
Desulfovibrio desulfuricansMagnetococcus sp MC-1
Geobacter sulfurreducensSynechococcus elongatus
Prochlorococcus marinus Synechococcus sp WH 8102
Thermosynechococcus elongatus Nostoc punctiforme
Synechocystis sp PCC 6803 Trichodesmium erythraeum
Spirulina platensis Campylobacter jejuni Helicobacter pylori Wolinella succinogenes
Legionella pneumophilaMethylococcus capsulatus
Coxiella burnetii Photorhabdus luminescens
Pasteurella multocida Shewanella oneidensis Photobacterium profundum Vibrio parahaemolyticusNeisseria meningitidis
Chromobacterium violaceum Bordetella parapertussis
Ralstonia metallidurans Bordetella bronchiseptica Burkholderia pseudomalleiRalstonia metallidurans
Azoarcus sp EbN1 Dechloromonas aromatica
Nitrosomonas europaea Thiobacillus denitrificans
66
57 65 55
61
5160
9072
80
86
88
6090
63
50 52 75 74
9094
50 68 74
78
53
7985
8481
72
53 9968
7790
70
2022 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
related to the OmpA found in this gene cluster and wasnot included in the alignment
We also identified some mobile genes that might beinvolved in biodegradation of pollutants by searching thePfam database In one of the γ-proteobacterial fosmidsb1bcf11c4 we identified a glutathione-S-transferase(GST ORF36) gene that was flanked by an acetyltrans-ferase gene (ORF35) and a transporter (ORF34) Eukary-otic GSTs are important in detoxifying metabolism Wellcharacterized bacterial GSTs (such as dichloromethanedehalogenase and 12-dichloroepoxyethane epoxidase)on the other hand are catabolic enzymes that play anessential role in growth on various difficult-to-degradechemicals (Vuilleumier and Pagni 2002) Considering theenvironment the fosmid originated from ndash highly pollutedmarine sediments ndash these CDSs would be good candi-dates for genes involved in biodegradation of a xenbiotic
compound The b1bf11c4 GST-gene clusters with a γ-proteobacterium (Acinetobacter sp ADP1 Accession noYP_046221) However as observed by Vuilleumier andPagni (2002) the phylogeny suggests that this gene hasbeen frequently transferred In support of this CDS havingbeen acquired by LGT its neighbour ndash ORF34 ndash clustersrobustly within the β-proteobacteria while ORF35 clusterswith δ-proteobacteria (although with no bootstrapsupport)
Another gene that might be involved in biodegradationof pollutants was identified among the CDSs that havebeen transferred into the β-proteobacterial fosmidb1bf11a01 ndash ORF31 which encodes a dienelactonehydrolases Dienelactone hydrolases play a crucial role inchlorocatechol degradation via the modified ortho cleav-age pathway (Eulberg et al 1998 Muller et al 2004)suggesting that the bacterium from which this fragmentoriginated might use chloraromatic compounds as energysource However it should be noted that this CDS is foundin a cluster of CDSs from genome projects with no exper-imentally confirmed function Again this gene is flankedby other genes that also have been acquired by LGT Thephylogeny of the neighbouring genes ndash ORF30 an S4domain protein suggests that it has been acquired froma γ-proteobacterium The next gene upstream ORF29could not be used in phylogenetic analyses However thisCDS has no match in its close relative T denitrificans andits best match was to a conserved membrane protein fromClostridium tetani (Table S11) Thus it is likely that allthese genes have been acquired by LGT Notably a shortinverted repeat (80 identity) was found to flank thesegenes (34021ndash34040 36693ndash36674)
Few laterally transferred CDSs identified by G + C content
Differences in G + C content are commonly used as anindication of recent LGT (Lawrence and Ochman 1997)We identified only eight CDSs that showed a G + C con-tent 10 higher or lower than the average for the respec-tive fosmid clone (see Tables S1ndash12) ORF20 in the δ-proteobacterial clone b1bcf11h3 has a G + C content of475 compared with 366 for the complete fosmid ThisCDS clusters with Desulfovibrio vulgaris within a mixedclade with no bootstrap support and was not included inthe LGT estimate for this fosmid A very short ORFan(ORF1) in the candidate division OP8 clone b3cf12f09has a G + C content of 436 compared with 594 forthe fosmid clone In addition the transposase (ORF16)and its neighbouring ORFan (ORF17) in the same clonehave a G + C content of 463 and 402 respectivelyORF11 ORF13 and ORF14 in the γ-proteobacterial cloneb3cf12d07 all show higher G + C content than the restof the fosmid with 664 657 and 647 comparedwith 525 for the rest of the fosmid All these CDSs
Fig 5 Maximum Likelihood phylogeny of OmpA homologues esti-mated using PMBML (135 positions in alignment) The sequences were obtained by blasting the b1dcf51c12 ORF7 sequence against Gen-Bank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative We also removed three sequences from Chlamydiaceae as these sequences formed a long unstable branch in the tree as well as some sequences that where considerably shorter than the remaining alignment The tree was arbitrarily rooted by Agrobacterium tumefaciens Results from bootstrap analyses are indicated as in Fig 3
10
Agrobacterium tumefaciens Sinorhizobium meliloti
Brucella melitensis Mesorhizobium loti
Mesorhizobium sp BNC1 Helicobacter bizzozeronii
Bartonella henselae Rhodopseudomonas palustris Bradyrhizobium japonicum
Rhodobacter sphaeroidesSilicibacter sp TM1040
Rhodospirillum rubrum Caulobacter crescentus
Magnetospirillum gryphiswaldense Rickettsia typhi
Rickettsia sibirica Gluconobacter oxydans
Zymomonas mobilis Novosphingobium aromaticivorans
Novosphingobium aromaticivorans Magnetococcus sp MC-1
Myxococcus xanthusXanthomonas campestris
Desulfotalea psychrophila Wolinella succinogenes
Desulfotalea psychrophila Desulfovibrio vulgaris
Geobacter metallireducens Geobacter sulfurreducens
Geobacter metallireducens Geobacter sulfurreducens
Chlorobium tepidum b1bcf11h03ORF12
Bdellovibrio bacteriovorus b1dcf51c12ORF7
Psychrobacter sp 273-4 Acinetobacter sp ADP1
Microbulbifer degradans Pseudomonas syringae Pseudomonas aeruginosa
Rubrivivax gelatinosus Thiobacillus denitrificans Nitrosomonas europaea
Ralstonia solanacearum Ralstonia eutropha
Burkholderia fungorum Burkholderia cepacia
Burkholderia cepacia Burkholderia pseudomallei
Idiomarina loihiensisPhotobacterium profundum
Shewanella oneidensis Vibrio cholerae Vibrio vulnificus Vibrio parahaemolyticus
Haemophilus somnus Haemophilus influenzae
Pasteurella multocida Photorhabdus luminescens Yersinia pseudotuberculosis
Erwinia carotovora Salmonella enterica
Erwinia chrysanthemi
6155
79 61 83
7255
5467
71
52
65
5152
5474
82
52
73
528498 52
508992
8472 54
527383
698372
8783
77 92
52
LGT and phylogenetic assignment of metagenomic clones 2023
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
cluster with γ-proteobacteria and might therefore repre-sent recent within γ-proteobacteria transfers ORF40 inthe isin-proteobacterial clone b1dcf13c08 a short ORFanhas a G + C content of 222 compared with 347 forthe complete clone In addition ORF9 another ORFan inb1dcf13c08 has a marginally lower G + C content com-pared with the rest of the fosmid clone with 257 Simi-larly ORF26 in the Chloroflexi clone b1dcf13f01 has aG + C content of 478 G + C compared with 569 forthe complete fosmid clone
The first protein coding sequences from uncultivated lineages
Four of the fosmids that we sequenced were from uncul-tivated lineages These fosmid clones represent to ourknowledge the first protein coding sequences obtainedfrom these major bacterial lineages In agreement withtheir rRNA phylotype most of the CDSs with homologuesin GenBank are found as independent lineages in phylo-genetic trees (Fig 1 Table 1) These clones also containseveral large CDSs with no significant matches in Gen-Bank or only partial matches to known proteins (Fig 1Table 1) A t-test showed that both the proportion ofORFans (P = 0002) and the proportion of coding bases(P = 002) with no match in GenBank (excluding the envi-ronmental part of GenBank) were significantly higherthan what was observed in fosmid clones from lineagesthat have cultivated representatives
The two candidate division WS3 clones b1bcf11f04and b1dcf51c12 contain several large CDSs for whichwe can make no clear functional prediction or that haveno match in GenBank For instance for b1dcf51c12 halfof the clone is occupied by two CDSs that have no signif-icant matches in GenBank (ORF4) or only a single match(ORF5) Also none of these CDSs had significantmatches to domains in Pfam These CDSs might repre-sent lineage-specific proteins and homologues may beidentified when more sequences from this lineages areavailable The candidate division OP8 also contains anumber of ORFans however in this fosmid the predictedproteins tend to be smaller than what we observed for thetwo WS3 clones
The b1dcf51a06 clone encodes a large ORFan(ORF1) as well as several smaller ORFans (ORF5ORF7ndash9 ORF14) and CDSs with only single hits in Gen-Bank (ORF6 ORF11ndash13) (Fig 1) For ORF1 we canmake some functional prediction based on Pfamsearches This protein contains a nucleoside diphosphatekinases domain a fibronectin type III domain as well asa PBS lyase HEAT-like repeat (three repeat units) ThePBS lyase repeat is responsible for specifically attachingparticular phycobilins to apophycobiliprotein subunits inthe phycobilisomes (PBS) which are light harvesting mac-
romolecular complexes of cyanobacteria and red algae(Zhao et al 2000) The phycobilins are open-chain tet-rapyrrole chromophores which function as the photosyn-thetic light-harvesting pigments Interestingly two otherCDSs ndash ORF15 and ORF16 ndash also contain several PBSrepeats It is possible that the proteins encoded by thePBS-containing CDSs in b1dcf51a06 has a similar func-tion as the PBS lyase proteins in cyanobacteria andthat this fosmid clone originated from a photosyntheticorganism
Among the CDSs that do have matches in GenBank arepotential phylogenetic markers The candidate divisionWS3 clone b1bcf11f04 clone contains two CDSs withsimilarity to DNA polymerase III subunit A homologuesDnaE and the Gram-positive type PolC In phylogenetictrees of both genes the b1bcf11f04 homologue forms aseparate lineage (Fig 6) Conserved domain searches atNCBI showed that the PolC-like CDS shows similarity toonly part of this gene ndash the exonuclease domain ndash and itis fused to DinG that encodes Rad3-related DNA heli-cases Proteins with similar domain architecture are foundin several other bacterial genomes mostly Firmicutes aswell as S thermophilum and Chloroflexus aurantiacussuggesting that the candidate division WS3 might be spe-cifically related to one of these lineages In phylogenetictrees of the DinG domain of these proteins the fusionproteins are all found in the same clade (Fig 6) Howeverthe monophyly of this clade was not supported by boot-strap analyses In the Maximum Likelihood phylogeny theb1bcf11f4 CDS clusters at the bottom of the clade withC aurantiacus No non-fusion proteins are found inthis clade suggesting a single origin of this domainorganization
Summary
Metagenomic approaches play an increasing and highlyvisible role in microbial ecology The data sets they gen-erate are complex and coupling the information they pro-vide concerning the metabolic potential of an environmentto organismal lineage that may be present there remainsa challenge Here we have shown the utility of rRNA-targeted cloning and phylogenetic analysis of CDSs inmaking such a coupling We also show that LGT evenwhen not precluding provisional assignment to lineages(taxonomy) will likely complicate the history of any lin-eage (phylogenetics) making phylotype-ecotype infer-ences provisional Environmental metagenomic data opena window into a rich world of genetic interactions someof which might be partially reconstructed as we havedescribed here The bioinformatic challenges associatedwith a complete metagenomic assessment of an environ-ment as complex as Baltimore harbour sediment aredaunting indeed But progress in understanding our own
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
2014
C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
Fig
1
Ove
rvie
w o
f th
e se
quen
ced
fosm
id c
lone
s Y
ello
w C
DS
s ar
e su
gges
ted
to h
ave
been
acq
uire
d by
LG
T a
nd b
lue
CD
Ss
have
no
sign
ifica
nt m
atch
in G
enB
ank
A
α
-pro
teob
acte
ria B
β
-pr
oteo
bact
eria
D
δ
-pro
teob
acte
ria
E
ε
-pro
teob
acte
ria
G
γ
-pro
teob
acte
ria
C
Cya
noba
cter
ia
CB
C
hlor
obi-B
acte
roid
etes
F
Fir
mic
utes
P
pro
teob
acte
ria
CH
C
hlor
oflex
i T
D
The
rmus
-D
eino
cocc
us g
roup
A
CT
Act
inob
acte
ria
PL
Pla
ncto
myc
etes
S
PIR
S
piro
chae
tes
TH
ER
T
herm
otog
ales
A
Q
Aqu
ifeca
les
FU
SO
F
usob
acte
ria
AR
CH
A
rcha
eal
EU
K
Euk
aryo
tes
EN
Ven
viro
nmen
tal s
eque
nce
c
lust
er r
obus
tly w
ithin
a m
ixed
cla
de in
phy
loge
netic
tree
s ndash
no
sign
ifica
nt m
atch
in G
enB
ank
Upp
erca
se s
uppo
rted
by
phyl
ogen
etic
ana
lysi
s L
ower
case
sug
gest
edby
BLA
ST
sea
rche
s as
the
re w
as n
o su
ppor
ted
phyl
ogen
y T
he lo
w-q
ualit
y re
gion
in b
1dcf
13
c08
(pos
ition
119
2ndash13
42)
is in
dica
ted
by a
bla
ck b
ox T
he o
rang
e sh
adin
gs in
dica
tes
LGT-
CD
Ss
that
are
foun
d in
mor
e th
an o
ne fo
smid
ORFAN
A c
onju
gativ
e tr
ansp
oson
ob
tain
ed fr
om a
Bac
terio
ides
bac
teriu
m
unkn
own
b1dc
f51
a06
Chl
orof
exi
b1dc
f13
f01
Can
dida
te d
ivsi
on O
P8
b3cf
12
f09
Can
dida
te d
ivsi
on W
S3
b1bc
f11
f4
Can
dida
te d
ivsi
on W
S3
b1bc
f51
c12
d-pr
oteo
bact
eria
b1bc
f11
h03
d-pr
oteo
bact
eria
b1bc
f11
d04
e-pr
oteo
bact
era
b1dc
f13
c08
g-pr
oteo
bact
eria
b1dc
f12
d07
g-pr
oteo
bact
eria
b1bc
f11
c04
b-pr
oteo
bact
eria
b1bf
11
a01
Fla
voba
cter
iace
aeb1
bf1
10d
03
LGT and phylogenetic assignment of metagenomic clones
2015
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
Fig 2
rRNA phylogeniesA The minimum evolution tree estimated from LogDet distances of the 23S-tag from the CeuI-fosmids (984 positions in alignment) For the sequences from the fosmid clones the Maximum Likelihood topology was similar (GTR
+
G
+
I) except that the
δ
-proteobacteria where paraphyletic with the
γ
- and
β
-proteobacteria clustering within the
δ
-proteobacteria Moreover b1bcf11d04 fell at the bottom of this cladeB The minimum evolution tree estimated from LogDet distances of the 16S sequences (1243 positions in alignment) For the sequences from the fosmid clones the Maximum Likelihood (GTR
+
G
+
I) topology was identical However there where several differences in the backbone of the tree with for instance Geobacter clustering with Firmicutes The trees in both A and B were rooted by the
Thermotoga maritima
sequenceC The minimum evolution tree estimated from LogDet distances of the closest matches of the 16S fragment in b1bf110d03 (1046 positions in alignment) The Maximum Likelihood (GTR
+
G
+
I) topology was identicalFor all three trees numbers on branches refers to bootstrap values from the minimum evolution analysis (
italic
) and from the Maximum Likelihood analysis (plain text) If both bootstrap values were above 70 this is indicated by a grey circle while a black circle indicated that all three values were above 90
B
Thermotoga maritima Coprothermobacter proteolyticus
Acidobacterium capsulatumPirellula marina
R76-B102OPB95
OPB5HMMVPog-54
HS9-30
PBS-II-35
LD1-PB19PBS-III-30
PRR-12Simkania negevensisBorrelia burgdorferi
Synechococcus Chloroflexus aurantiacus
Dehalococcoides ethenogenes Bacteroides thetaiotaomicron
Cytophaga hutchinsoniiChlorobium tepidum
Leptospirillum ferrooxidans Deinococcus radiodurans
Geobacillus subterraneus Paenibacillus popilliae
Fusobacterium nucleatum Geobacter metallireducens
Bradyrhizobium japonicum Vibrio splendidus
Methylobacillus flagellatum Thiobacillus denitrificans
005 substitutionssite
b3cf12f09
b1bcf11f04
b1bf11a01
candidate division OP8
candidate division WS3
Betaproteobacteria
92
72
54
78
57
75
Porphyromonas gingivalis
Bacteroides thetaiotaomicron
Cytophaga hutchinsonii
Cellulophaga pacifica
Flavobacterium gelidilacus
Flavobacterium psychrolimnae
Flavobacterium frigoris
Flavobacterium xinjiangensis
Gelidibacter algens
Bizionia paragorgiae
Formosa algae
Algibacter lectus
Flavobacterium sp 5N-3
Psychroserpens burtonensis
Mesophilibacter yeosuensis
b1bf110d03
Flavobacteriaceae bacterium BSA CS 02
Flavobacteriaceae bacterium BSD RB 42
001 substitutionssite
C
isolated from estuarine and salt marsh sediments
b3cf12f09Chlorobium tepidum
Synechocystis sp D64000
Deinococcus radiodurans
b1dcf13f01Dehalococcoides ethenogenes
b1dcf511a06Fusobacterium nucleatum
b1bcf11f04b1dcf51c12
Mycobacterium kansasiiStreptomyces coelicolor Thermomonospora chromogena
Paenibacillus popilliaeOceanobacillus iheyensis
Geobacillus kaustophilus
Simkania negevensis Pirellula sp strain 1
b3cf12d07Pseudomonas stutzeri
005 substitutionssite
candidate division WS3
Wolinella succinogenes Helicobacter hepaticus
Campylobacter jejuni b1dcf13c08
Epsilonproteobacteria
b1bcf11d04Desulfotalea psychrophila
b1bcf11h03Nannocystis exedens
Stigmatella aurantiacaGeobacter metallireducens
Deltaproteobacteria
Methylobacillus flagellatusb1bf11a01Thiobacillus denitrificans
Halomonas pantelleriensis
Microbulbifer degradansVibrio splendidus
b1bcf11c04Uncultured bacterium 463 clone EBAC080-L32B05
Betaproteobacteria
Gammaproteobacteria
Thermotoga maritima
candidate division OP8
Chloroflexi
Symbiobacterium thermophilum
Bacillus cereus
Desulfovibrio vulgaris
A
51
6197
87
55
67
54100
61
58
84
57
8968
58
97
54
65
64
68
73
51
58
53
87
58
2016
C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
short to obtain reliable alignments the CDS was found ina lsquomixedrsquo clade also containing genes from the same bac-terial group or the CDS was found outside its group butdid not cluster with any specific lineage For three of theclones more than 30 of the CDSs have been acquiredby LGT (Table 1) two of these are from candidate divi-sions and one is from a
δ
-proteobacterium For all threeof these fosmids there appears to have been a transfer ofa large island of genes from a phylogenetically distantlineage Specifically we infer an
α
-proteobacterial islandin b3cf12f09 a
δ
-proteobacterial island in b1dcf51c12and an archaeal
β
-proteobacterial island in b1bcf11d04(Fig 1) It should be noted that the proportions of foreigngenes identified here might not represent the proportion
of foreign genes in the respective genomes that we havesampled but
rather the amount of LGT to be expectedwhen sequencing environmental fosmid clones
Forinstance in some genomes LGT might be enriched incertain variable parts of the genome Indeed the distribu-tion of proteins that match COG categories was signifi-cantly different (
P
=
13 e-13 in a
χ
2
-test) to what weobserved for the end-sequencing of lsquonormalrsquo fosmidclones (supplemental Fig S1) the main difference beingproportionally more J K U F and H category sequencesin the full fosmid sequences and more L P R and Scategory sequences among the end-sequences Whencomparing the distributions of different COG-groups (ieinformational metabolism etc) however the two datasets were significantly different only when including thepoorly characterized categories (R S) If such genes aremore frequently transferred than the other categoriesthen we would be underestimating the level of LGT thatwould be expected when analysing metagenomic clones
Interestingly in b1bcf11d04 the transfer vector for oneof the acquired gene clusters could be identified ORF6encodes an acetyl transferase gene and ORF8 ORF9and ORF10 encode subunits for an acyl-CoA synthase ndashtwo
α
-subunits and one
β
-subunit Phylogenetic analysessuggested all four CDSs have been acquired by LGTlikely from a
β
-proteobacterium The
β
-proteobacteriahave in turn likely acquired the acyl-CoA synthase genesfrom Archaea (Fig 3) In support of the archaeal origin ofthese genes the acyl-CoA synthase in bcf11d04 hassimilar domain organization to the acetyl-CoA synthase in
Pyrococcus
spp with two subunits (Sanchez
et al
2000)Furthermore these genes have been transferred multipletimes and the transfers involved all three domains of life[Fig 3 (Andersson
et al
2003)] These transferred CDSsare preceded by one integrase gene (ORF3) a trans-posase gene (ORF4) and an intergerasetransposasegene (ORF5 COG2801 Tra5 which contains an inte-gerase core domain Table S7) which probably wereresponsible for transferring this cluster into this genomeThe
α
-proteobacterial island in the b3cf12f09 cloneencodes a wide range of different functions and no typicalmobile elements were identified However as this islandextends to the 3
prime
end of the fosmid mobile genes mightbe found further downstream The first CDS of this islandencodes a DnaJ-class chaperone (ORF29) which is trun-cated at the 5
prime
end This pseudogene still shows 65protein identity to a homologue in
Magnetoospirillummagnetotacticum
(Table S3) Hence this probably repre-sents a very recent transfer (or rearrangement) Anotherpossibility is that this fosmid might be a chimera Howeverthe G
+
C content of the CDSs in the
α
-proteobacterialisland (595 G
+
C) is very similar to the rest of thefosmid (596 G
+
C supplemental Table S3) Also fur-ther upstream there is a proteobacterial transposase
Fig 3
Maximum Likelihood phylogeny of acetyl-CoA synthetase (ADP-forming) homologues estimated using PMBML (459 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF8 and ORF10 sequences against GenBank and the 100 best matches were retrieved and aligned Groups of very similar sequences from the same species or sister species were trimmed down to one sequence representative The tree was arbi-trarily rooted by Entamoeba histolytica Numbers on branches refers to bootstrap support obtained from using PMBML in bold PUZZLEBOOT in plain text and Neighbour-joining in italic If all bootstrap values were above 70 this is indicated by a grey circle while a black circle indicated that all three values were above 80
10
Entamoeba histolytica Parachlamydia sp UWE25
Rubrobacter xylanophilus Gloeobacter violaceus
Nostoc sp PCC 7120Thermosynechococcus elongatus
Dechloromonas aromaticaMesorhizobium sp BNC1
Sinorhizobium melilotiXanthomonas axonopodisRhodopseudomonas palustris
Bradyrhizobium japonicum Desulfovibrio desulfuricans
Rhodospirillum rubrumMagnetospirillum magnetotacticum
Magnetospirillum magnetotacticumShewanella oneidensis
Photobacterium profundumVibrio cholerae
Vibrio vulnificus Photorhabdus luminescens
Yersinia pestis Salmonella enterica
Escherichia coli Methanopyrus kandleri
Pyrococcus furiosus Archaeoglobus fulgidus
Methanococcus maripaludisMethanocaldococcus jannaschii
Magnetococcus sp MC-1 Chloroflexus aurantiacus
Spironucleus barkhanus Giardia intestinalis
Pyrococcus furiosusThermoplasma acidophilum Thermoplasma volcanium
Pyrococcus furiosus Streptomyces avermitilisBradyrhizobium japonicum
Ralstonia metalliduransFerroplasma acidarmanus
Sulfolobus solfataricusSulfolobus tokodaii
Pyrococcus furiosusPyrococcus furiosus
Pyrobaculum aerophilumMethanosarcina mazei Methanosarcina acetivoransThermobifida fusca
Archaeoglobus fulgidusArchaeoglobus fulgidus
Archaeoglobus fulgidusArchaeoglobus fulgidus
b1bcf11d04ORF8b1bcf11d04ORF10
Bordetella bronchiseptica Ralstonia metallidurans
Bordetella pertussis Bordetella bronchiseptica
Burkholderia fungorumBurkholderia fungorumRalstonia eutropha
Bordetella bronchisepticaRalstonia eutropha
Bradyrhizobium japonicumRalstonia eutropha
Burkholderia fungorumBordetella bronchiseptica
Ralstonia eutrophaBordetella bronchiseptica
Bradyrhizobium japonicumBordetella bronchiseptica
Pseudomonas mendocina Bradyrhizobium japonicum
7480
9764
75
52
83
52
57
60
61
70
89
51
64
6262
64
57
58
50
7173
62
100100
LGT and phylogenetic assignment of metagenomic clones 2017
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Tab
le 1
S
umm
ary
of p
hylo
gene
tic a
naly
ses
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
b1dc
f51
a06
No
clea
r af
filia
tion
with
exi
stin
gse
quen
ces
Cou
ld n
ot b
eam
plifi
ed
Mos
t C
DS
s ha
ve n
o or
only
a f
ew s
igni
fican
tm
atch
es in
Gen
Ban
kO
RF
4 cl
uste
rs w
ithLe
ptos
pira
inte
rrog
ans
with
in a
mix
ed c
lade
ho
wev
er
L in
terr
ogan
sha
s se
vera
l par
alog
ues
and
this
gen
e ap
pear
sto
hav
e be
en f
requ
ently
tran
sfer
red
and
islik
ely
to b
e a
tran
sfer
OR
F20
clu
ster
s w
ithM
etha
nosa
rcin
a w
ithin
δ-pr
oteo
bact
eria
O
RF
19cl
uste
rs w
ith G
eoba
cter
but
is m
ostly
foun
d in
met
hano
gens
OR
F17
and
OR
F18
have
hom
olog
ues
inM
etha
noge
ns o
nly
4 C
DS
s (1
9 o
f th
eto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
33
(38
)
b1dc
f13
f01
Clu
ster
s w
ithD
ehal
ococ
coid
eset
heno
gene
sC
hlor
oflex
usau
rant
iacu
s 23
SrR
NA
seq
uenc
eof
too
poo
r qu
ality
to in
clud
e in
the
tree
7 of
10
CD
Ss
(70
) w
ithsu
ppor
ted
phyl
ogen
etic
topo
logi
es a
gree
with
23S
fra
gmen
t In
addi
tion
6 C
DS
s w
hich
only
hit
Chl
orofl
exus
aura
ntia
cus
Two
CD
Ss
have
like
lybe
en a
cqui
red
thro
ugh
LGT
One
clu
ster
s w
ithhi
gh s
uppo
rt w
ithT
herm
otog
a m
ariti
ma
(OR
F16
) an
d on
e cl
uste
rsw
ithin
the
euk
aryo
tes
(OR
F25
)
OR
F2
has
only
sign
ifica
ntho
mol
ogue
s in
Cro
cosp
haer
aw
atso
nii
3 C
DS
s (1
1 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
14
(5
)
b3cf
12
f09
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A g
enes
and
do
not
clus
ter
with
in a
nysp
ecifi
c ba
cter
ial g
roup
Phy
loge
netic
ana
lysi
ssu
gges
ts t
hat
10 C
DS
sha
ve li
kely
bee
n ac
quire
dby
LG
T 8
of
thes
e ha
vebe
en a
cqui
red
from
an
α-pr
oteo
bact
eriu
man
d ar
e fo
und
linke
d
Thr
ee C
DS
s fo
und
linke
d to
CD
Ss
whe
reph
ylog
enet
ic a
naly
ses
sugg
est
LGT
hav
eal
so li
kely
bee
nac
quire
d by
LG
T
13 C
DS
s (3
2 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
OR
F16
is a
tran
spos
ase
of
prot
eoba
cter
ial
orig
in
and
show
slo
wer
GC
con
tent
than
the
res
t of
the
fosm
id T
wel
ve o
fth
e tr
ansf
erre
dC
DS
s (O
RF
29ndash
41)
are
linke
d an
dal
l app
ear
to h
ave
been
acq
uire
dfr
om a
n α-
prot
eoba
cter
ium
22
(9
)
b1bc
f11
f04
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A a
nd d
oes
not
clus
ter
with
any
spe
cific
bact
eria
l lin
eage
A
mon
g th
ese
was
the
high
ly c
onse
rved
Dna
Ege
ne
Two
CD
Ss
(OR
F14
and
OR
F15
) cl
uste
r w
ithse
quen
ces
from
the
Chl
orob
iBac
tero
idet
esgr
oup
2 C
DS
s (9
o
f to
tal)
hav
e b
een
acq
uir
ed b
y L
GT
26
(14
)
2018 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f51c
12C
andi
date
div
isio
nW
S3
bact
eriu
mM
ost
CD
Ss
have
no
oron
ly a
few
sig
nific
ant
mat
ches
in G
enB
ank
OR
F6ndash
OR
F11
are
als
ofo
und
in b
1bcf
11
h3 in
sam
e or
der
and
phyl
ogen
etic
ana
lysi
ssu
ppor
ts t
hat
OR
F7
OR
F8
and
OR
F10
wer
etr
ansf
erre
d fr
om a
δ-
prot
eoba
cter
ium
to
b1bc
f51c
12 O
RF
10 a
ndO
RF
11 a
lso
clus
ter
with
δ-pr
oteo
bact
eria
ho
wev
er
with
no
boot
stra
p su
ppor
t O
RF
9ha
s on
ly o
ne m
atch
inG
enB
ank
OR
F15
(fu
sA)
clus
ters
with
Chl
orob
ium
tepi
dum
with
inF
irm
icut
es
OR
F12
has
no
hom
olog
ue in
b1bc
f11
h3
but
doe
scl
uste
r w
ith δ
-pr
oteo
bact
eria
ho
wev
er w
ith n
obo
otst
rap
supp
ort
It is
like
ly t
hat
also
thi
sC
DS
was
tra
nsfe
rred
as p
art
of w
ith a
δ-
prot
eoba
cter
ial i
slan
d
8 C
DS
s (4
4 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
One
lar
ge lsquoi
slan
drsquo o
fδ-
prot
eoba
cter
ial
orig
in
22
(29
)
b1cf
11
1h0
3δ-
Pro
teob
acte
rium
ndash8
of 1
3 C
DS
s (5
7)
that
give
s su
ppor
ted
phyl
ogen
ies
agre
e w
ithth
e fr
agm
ent
orig
inat
ing
from
a δ
-pr
oteo
bact
eriu
m
Six
CD
Ss
have
like
ly b
een
acqu
ired
by L
GT
OR
F8
clus
ters
with
Clo
strid
ium
ther
moc
ellu
m a
ndTr
epon
ema
dent
icol
aO
RF
18 is
fou
ndse
para
ted
from
oth
erpr
oteo
bact
eria
inph
ylog
enet
ic t
rees
cl
uste
ring
with
Pla
smod
ium
spp
O
RF
23is
fou
nd in
a m
ixed
cla
dean
d ap
pear
s to
hav
ebe
en f
requ
ently
tran
sfer
red
OR
F28
clus
ters
with
β-
prot
eoba
cter
ia
OR
F29
clus
ters
with
γ-
prot
eoba
cter
ia a
ndO
RF
30 is
fou
nd a
tbo
ttom
of
clad
e th
atco
ntai
ns α
-pr
oteo
bact
eria
and
Act
inob
acte
ria
6 C
DS
s (1
7 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
OR
F11
ndashOR
F16
ha
ve b
een
tran
sfer
red
from
an
ance
stor
of
B1B
CF
11
h03
tob1
dcf5
1c
12 a
sw
ell t
o th
eC
hlor
obiu
m li
neag
e
6 (
1)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2019
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f11
d04
δ-P
rote
obac
teriu
mndash
12 o
f 18
CD
Ss
(67
)w
ith s
uppo
rted
phyl
ogen
etic
top
olog
ies
agre
e w
ith a
δ-
prot
eoba
cter
ial o
rigin
of
the
frag
men
t
Six
CD
Ss
are
sugg
este
dby
phy
loge
netic
ana
lyse
sto
hav
e be
en a
cqui
red
byLG
T O
ne o
f th
ese
tran
sfer
red
gene
s ndasht
hefu
sA h
omol
ogue
(OR
F19
) ndash is
als
o fo
und
inb1
bcf5
c12
Thi
s C
DS
has
been
tra
nsfe
rred
to
othe
r δ-
prot
eoba
cter
ia a
sw
ell
Thr
ee C
DS
s (O
RF
3ndash5)
that
enc
ode
anin
tege
rase
and
tw
otr
ansp
osas
es t
hat
prec
edes
fou
r of
the
LGT
gen
es d
etec
ted
in t
he p
hylo
gene
tican
alys
is
OR
F7
also
likel
y tr
ansf
erre
d w
ithO
RF
3 ndashO
RF
10
OR
F20
and
OR
F21
have
mai
nly
hom
olog
ues
inF
irm
icut
es a
nd is
the
neig
hbou
r of
OR
F19
that
has
als
o be
enac
quire
d fr
omF
irm
icut
es
12 C
DS
s (3
1 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
Inte
rest
ingl
y th
isfo
smid
clo
nepr
ovid
es t
hetr
ansf
er v
ecto
r ndash
the
inte
gera
se a
ndtr
ansp
osas
e ndash
for
8of
the
tra
nsfe
rred
gene
s
ndash
b1bc
f13
c08
ε-P
rote
obac
teriu
m
mos
t cl
osel
yre
late
d to
Cam
pylo
bact
erje
juni
21 C
DS
s gi
ve s
uppo
rted
phyl
ogen
ies
and
ofth
ese
19 (
90
) ag
ree
with
rR
NA
OR
F4
clus
ters
with
Geo
bact
er a
ndC
lost
ridiu
m
and
OR
F23
does
not
hav
eho
mol
ogue
s in
ε-
prot
eoba
cter
ia a
ndcl
uste
rs w
ith γ
- an
d β-
prot
eoba
cter
ia
OR
F24
doe
s no
t gi
ve a
supp
orte
d tr
ee b
utha
s al
so p
roba
bly
been
tra
nsfe
rred
fro
mγ-
or
β-pr
oteo
bact
eria
3 C
DS
s (7
o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
10
(3
)
b3cf
12
d07
γ-P
rote
obac
teriu
m
Clu
ster
s w
ithin
the
γ-pr
oteo
bact
eria
inLo
gDet
dis
tanc
etr
ees
but
at t
heba
se o
f γ-
prot
eoba
cter
ia a
ndβ-
prot
eoba
cter
iain
the
bes
tm
axim
umlik
elih
ood
tree
Onl
y 7
CD
Ss
give
su
ppor
ted
phyl
ogen
ies
O
f th
ese
4 (5
7)
agre
e w
ith r
RN
A
OR
F7
clus
ter
with
in β
-pr
oteo
bact
eria
OR
F15
ha
s a
patc
hy d
istr
ibut
ion
and
does
not
clu
ster
with
ot
her
prot
eoba
cter
ia in
th
e ph
ylog
enet
ic t
ree
Sev
eral
add
ition
al C
DS
s (O
RF
16ndashO
RF
25)
that
did
not
prod
uce
wel
l-re
solv
ed t
rees
ha
d on
ly d
iver
gent
hom
olog
ues
inG
enB
ank
or
nosi
gnifi
cant
hom
olog
ues
may
also
hav
e be
enac
quire
d by
LG
T I
nsu
ppor
t of
thi
sO
RF
26 e
ncod
es a
tran
spos
ase
2 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
O
RF
16 ndash
OR
F25
w
as n
ot in
clud
ed in
es
timat
e du
e to
lim
ited
evid
ence
for
th
e tr
ansf
er o
f the
se
23
(23
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
2020 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f1c
04γ-
Pro
teob
acte
rium
ndash14
CD
Ss
give
sup
port
edph
ylog
enie
s an
d of
thes
e 13
(93
)
agre
ew
ith r
RN
A
Phy
loge
netic
ana
lyse
ssh
ow t
hat
two
CD
Ss
have
bee
n ac
quire
d by
LGT
OR
F3
is f
ound
in a
mix
ed c
lade
whi
leO
RF
30 c
lust
er w
ithin
β-
prot
eoba
cter
ia
Thr
ee g
enes
tha
t sh
owun
cong
ruen
tph
ylog
enie
s b
utw
ith lo
w b
oots
trap
supp
ort
foun
d cl
ose
to O
RF
3 an
d O
RF
34ha
ve p
roba
bly
also
been
acq
uire
d by
LGT
O
RF
5 cl
uste
rsw
ith β
-pro
teob
acte
ria
OR
F31
clu
ster
s w
ithδ-
prot
eoba
cter
ia
and
OR
F32
(G
ST
) cl
uste
rsw
ith a
γ-pr
oteo
bact
eriu
m
but
appe
ars
toha
ve b
een
freq
uent
lytr
ansf
erre
d
5 C
DS
s (1
3 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
3 (
1)
b1bf
11
a01
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns (
98
iden
tity
at 2
3S
rRN
A)
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns(9
8 id
entit
yat
16S
rR
NA
)
Hig
h de
gree
of
gene
sy
nten
y co
mpa
red
with
Thi
obac
illus
de
nitr
ifica
ns
29 C
DS
sha
ve b
est
BLA
ST
mat
chin
Thi
obac
illus
de
nitr
ifica
ns 2
7 of
28
CD
Ss
(96
) th
at g
ive
stat
istic
ally
sup
port
edph
ylog
enie
s ag
ree
with
rR
NA
gen
es
One
OR
F30
(R
suA
)cl
uste
r w
ith γ
-pr
oteo
bact
eria
and
has
no
hom
olog
ue in
T
hiob
acill
us d
enitr
ifica
ns
Two
CD
Ss
(OR
F14
and
O
RF
31)
have
bee
n tr
ansf
erre
d to
bot
h fo
smid
an
d T
hiob
acill
us
deni
trifi
cans
OR
F29
has
no
sign
ifica
nt
hom
olog
ues
inpr
oteo
bact
eria
4 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
3 (
2)
b1bf
110
d03
ndashA
Fla
voba
cter
iace
aeba
cter
ium
am
ong
sequ
ence
dge
nom
es m
ost
clos
ely
rela
ted
toC
ytop
haga
hutc
hins
onii
16 o
f 18
(84
) C
DS
s w
ith
supp
orte
d ph
ylog
enet
icto
polo
gies
agr
ee w
ith16
S f
ragm
ent
OR
F5
and
OR
F10
hav
e no
cl
ose
hom
olog
ues
in
othe
r B
acte
roid
es a
ndph
ylog
enet
ic a
naly
sis
sugg
ests
fre
quen
ttr
ansf
er
OR
F4
has
no d
etec
tabl
eho
mol
ogue
s in
oth
er
Bac
tero
ides
A
tran
spos
on w
ith 8
C
DS
s lik
ely
acqu
ired
from
rel
ativ
e of
Bac
tero
ides
thet
aiot
aoim
icro
n
3 C
DS
s (1
0 o
f to
tal)
h
ave
likel
y b
een
acq
uir
ed b
y L
GT
The
tra
nspo
son
not
incl
uded
as
it ha
sbe
en t
rans
ferr
edw
ithin
the
B
acte
roid
es
10
(3
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
a O
nly
LGT
eve
nts
invo
lvin
g th
e C
DS
fro
m t
he fo
smid
clo
ne a
naly
sed
was
cou
nted
and
onl
y w
hen
they
wer
e su
ppor
ted
by p
hylo
gene
tic a
naly
ses
or c
lear
phy
loge
netic
dis
trib
utio
n pa
ttern
s (i
e
the
gene
is n
ot p
rese
nt in
its
rRN
A g
roup
but
pre
sent
in s
ome
othe
r di
stin
ct b
acte
rial g
roup
) N
umbe
r of
CD
Ss
acqu
ired
by L
GT
is s
how
n in
bol
db
O
RFa
ns w
here
cla
ssifi
ed a
s C
DS
s w
ith n
o si
gnifi
cant
mat
ch in
Gen
Ban
k M
atch
es t
o se
quen
ces
in t
he e
nviro
nmen
tal p
ortio
n of
Gen
Ban
k w
ere
not
cons
ider
ed I
n pa
rent
hesi
s is
giv
en t
he
prop
ortio
n of
pro
tein
cod
ing
DN
A t
hat
has
no m
atch
in G
enB
ank
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2021
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
(ORF16) showing that this lineage has indeed acquiredproteobacterial genes This CDS might have been part ofthe α-proteobacterial island upon transfer
In the Flavobacteriaceae fosmid b1bf11d10 a largeself-transmitting conjugative transposon was identified(Fig 1) This transposon is inserted next to a tRNA and issimilar in sequence and structure to the transposonsfound in Bacteroides thetaiotaomicron (Xu et al 2003)Bacteroides fragilis (Kuwahara et al 2004) and Porphy-romonas gingivalis (Nelson et al 2003) In the phyloge-netic tree of the transposase gene (ORF21) the CDSfrom the fosmid falls into a cluster containing numerousB thetaiotaomicron sequences separated from the singleCytophaga hutchinsonii homologue detected among the100 best BLAST hits For the other CDSs that are clearlypart of this transposon (ORF22ndashORF27) we found no
significant homologues in C hutchinsonii and the best(and in most cases the only) match was always to Bthetaiotaomicron and P gingivalis genes suggesting thatthis transposon has been acquired from the Bacteroidaleslineage It is likely that we have captured only part of thistransposon ndash because many of the CDSs found in thetransposons in B thetaiotaomicron are not present in thefragment we have sequenced ndash and that also the 3prime CDSsin this fosmid clone (ORF28ndashORF30) were transferredalong with this transposon Additional CDSs (possibly notinvolved in transposon function) where also present in theB thetaiotaomicron transposons (Xu et al 2003) Wenote that the acquisition of this transposon was notincluded in our LGT estimate as it originated from thesame major bacterial group as the fosmid clone
Interestingly one gene was found to have been trans-ferred to two of the fosmids the fusA paralogue inb1bcf11d04 and b1dcf51c12 (Figs 1 and 4) This pro-tein appears to be a distant paralogue of fusA and it hasa very patchy phylogenetic distribution suggesting that itoriginated in one of the lineages that possesses it andthen has been transferred to the other lineages Onecharacteristic common to the organisms encoding thisprotein is that they are all anaerobes or microaerophilic(Symbiobacterium thermophilum) and they are all foundin environments similar to the one sampled here Trans-ferred genes are likely to give a selective advantage in theenvironment where the organisms harbouring them liveand an ecological function for this fusA paralogue shouldbe sought
Another set of genes identified in two of the fosmidclones forms a cluster encoding outer membrane proteinsand proteins involved in biopolymer transport (OmpATolB TonB ExbD TolQ) This cluster is found in both thecandidate division WS3 clone b1dcf51c12 and the δ-proteobacterial clone b1bcf11h03 (Fig 1) In this casethe gene cluster appears to have been transferred from aδ-proteobacterium to b1dcf51c12 while it might be nativeto b1bcf11h03 (Fig 5) This gene cluster also appearsto have been transferred to Chlorobium tepidum as bothb1dcf51c12 and C tepidum cluster within the δ-proteo-bacteria for all these genes except TonB (from which wecould not make a reliable alignment) Robust phylogenieswere only obtained from OmpA and TolB However theconserved gene order in b1dcf51c12 C tepidumb1bcf11h03 and other δ-proteobacteria such as Geo-bacter suggests that this entire 4-kb fragment was trans-ferred from a δ-proteobacterium to C tepidum andb1dcf51c12 probably as two separate events Moreoverfor b1dcf51c12 the fusA paralogue discussed abovemay have been transferred as part of this gene cluster asthey are found close together in this clone The second δ-proteobacterial fosmid clone b1bcf11d04 also containsan OmpA homologue However this CDS is distantly
Fig 4 Maximum Likelihood phylogeny fusA homologues estimated using PMBML (661 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF19 and b1dcf51c12 ORF15 sequences against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbitrarily rooted by Aquifex aeolicus Results from bootstrap analyses are indicated as in Fig 3
10
Aquifex aeolicus Thermotoga maritima
Chlorobium tepidum b1dcf51c12ORF15
b1bcf11d04ORF19Desulfovibrio vulgaris
Desulfotalea psychrophila Magnetococcus sp MC-1
Geobacter sulfurreducens Geobacter metallireducens
Moorella thermoacetica Desulfitobacterium hafniense
Symbiobacterium thermophilum Chloroflexus aurantiacus
Dehalococcoides ethenogenesThermoanaerobacter tengcongensis
Clostridium thermocellumFusobacterium nucleatum
Clostridium perfringensClostridium tetani
Thermus thermophilus Rubrobacter xylanophilus
Mycoplasma penetransUreaplasma parvum
Geobacillus stearothermophilusExiguobacterium sp 255-15
Bacillus cereus Bacillus halodurans
Listeria monocytogenes Bacillus subtilis
Oceanobacillus iheyensis Staphylococcus aureus
Lactobacillus johnsonii Pediococcus pentosaceusLactobacillus plantarum
Enterococcus faecalisLactococcus lactis
Streptococcus mutans Streptococcus agalactiae
Moorella thermoacetica Symbiobacterium thermophilum
Thermoanaerobacter tengcongensis Clostridium thermocellum
Clostridium acetobutylicumClostridium perfringens
Clostridium tetani Chlorobium tepidum
Fusobacterium nucleatumThermobifida fusca
Desulfovibrio desulfuricansMagnetococcus sp MC-1
Geobacter sulfurreducensSynechococcus elongatus
Prochlorococcus marinus Synechococcus sp WH 8102
Thermosynechococcus elongatus Nostoc punctiforme
Synechocystis sp PCC 6803 Trichodesmium erythraeum
Spirulina platensis Campylobacter jejuni Helicobacter pylori Wolinella succinogenes
Legionella pneumophilaMethylococcus capsulatus
Coxiella burnetii Photorhabdus luminescens
Pasteurella multocida Shewanella oneidensis Photobacterium profundum Vibrio parahaemolyticusNeisseria meningitidis
Chromobacterium violaceum Bordetella parapertussis
Ralstonia metallidurans Bordetella bronchiseptica Burkholderia pseudomalleiRalstonia metallidurans
Azoarcus sp EbN1 Dechloromonas aromatica
Nitrosomonas europaea Thiobacillus denitrificans
66
57 65 55
61
5160
9072
80
86
88
6090
63
50 52 75 74
9094
50 68 74
78
53
7985
8481
72
53 9968
7790
70
2022 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
related to the OmpA found in this gene cluster and wasnot included in the alignment
We also identified some mobile genes that might beinvolved in biodegradation of pollutants by searching thePfam database In one of the γ-proteobacterial fosmidsb1bcf11c4 we identified a glutathione-S-transferase(GST ORF36) gene that was flanked by an acetyltrans-ferase gene (ORF35) and a transporter (ORF34) Eukary-otic GSTs are important in detoxifying metabolism Wellcharacterized bacterial GSTs (such as dichloromethanedehalogenase and 12-dichloroepoxyethane epoxidase)on the other hand are catabolic enzymes that play anessential role in growth on various difficult-to-degradechemicals (Vuilleumier and Pagni 2002) Considering theenvironment the fosmid originated from ndash highly pollutedmarine sediments ndash these CDSs would be good candi-dates for genes involved in biodegradation of a xenbiotic
compound The b1bf11c4 GST-gene clusters with a γ-proteobacterium (Acinetobacter sp ADP1 Accession noYP_046221) However as observed by Vuilleumier andPagni (2002) the phylogeny suggests that this gene hasbeen frequently transferred In support of this CDS havingbeen acquired by LGT its neighbour ndash ORF34 ndash clustersrobustly within the β-proteobacteria while ORF35 clusterswith δ-proteobacteria (although with no bootstrapsupport)
Another gene that might be involved in biodegradationof pollutants was identified among the CDSs that havebeen transferred into the β-proteobacterial fosmidb1bf11a01 ndash ORF31 which encodes a dienelactonehydrolases Dienelactone hydrolases play a crucial role inchlorocatechol degradation via the modified ortho cleav-age pathway (Eulberg et al 1998 Muller et al 2004)suggesting that the bacterium from which this fragmentoriginated might use chloraromatic compounds as energysource However it should be noted that this CDS is foundin a cluster of CDSs from genome projects with no exper-imentally confirmed function Again this gene is flankedby other genes that also have been acquired by LGT Thephylogeny of the neighbouring genes ndash ORF30 an S4domain protein suggests that it has been acquired froma γ-proteobacterium The next gene upstream ORF29could not be used in phylogenetic analyses However thisCDS has no match in its close relative T denitrificans andits best match was to a conserved membrane protein fromClostridium tetani (Table S11) Thus it is likely that allthese genes have been acquired by LGT Notably a shortinverted repeat (80 identity) was found to flank thesegenes (34021ndash34040 36693ndash36674)
Few laterally transferred CDSs identified by G + C content
Differences in G + C content are commonly used as anindication of recent LGT (Lawrence and Ochman 1997)We identified only eight CDSs that showed a G + C con-tent 10 higher or lower than the average for the respec-tive fosmid clone (see Tables S1ndash12) ORF20 in the δ-proteobacterial clone b1bcf11h3 has a G + C content of475 compared with 366 for the complete fosmid ThisCDS clusters with Desulfovibrio vulgaris within a mixedclade with no bootstrap support and was not included inthe LGT estimate for this fosmid A very short ORFan(ORF1) in the candidate division OP8 clone b3cf12f09has a G + C content of 436 compared with 594 forthe fosmid clone In addition the transposase (ORF16)and its neighbouring ORFan (ORF17) in the same clonehave a G + C content of 463 and 402 respectivelyORF11 ORF13 and ORF14 in the γ-proteobacterial cloneb3cf12d07 all show higher G + C content than the restof the fosmid with 664 657 and 647 comparedwith 525 for the rest of the fosmid All these CDSs
Fig 5 Maximum Likelihood phylogeny of OmpA homologues esti-mated using PMBML (135 positions in alignment) The sequences were obtained by blasting the b1dcf51c12 ORF7 sequence against Gen-Bank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative We also removed three sequences from Chlamydiaceae as these sequences formed a long unstable branch in the tree as well as some sequences that where considerably shorter than the remaining alignment The tree was arbitrarily rooted by Agrobacterium tumefaciens Results from bootstrap analyses are indicated as in Fig 3
10
Agrobacterium tumefaciens Sinorhizobium meliloti
Brucella melitensis Mesorhizobium loti
Mesorhizobium sp BNC1 Helicobacter bizzozeronii
Bartonella henselae Rhodopseudomonas palustris Bradyrhizobium japonicum
Rhodobacter sphaeroidesSilicibacter sp TM1040
Rhodospirillum rubrum Caulobacter crescentus
Magnetospirillum gryphiswaldense Rickettsia typhi
Rickettsia sibirica Gluconobacter oxydans
Zymomonas mobilis Novosphingobium aromaticivorans
Novosphingobium aromaticivorans Magnetococcus sp MC-1
Myxococcus xanthusXanthomonas campestris
Desulfotalea psychrophila Wolinella succinogenes
Desulfotalea psychrophila Desulfovibrio vulgaris
Geobacter metallireducens Geobacter sulfurreducens
Geobacter metallireducens Geobacter sulfurreducens
Chlorobium tepidum b1bcf11h03ORF12
Bdellovibrio bacteriovorus b1dcf51c12ORF7
Psychrobacter sp 273-4 Acinetobacter sp ADP1
Microbulbifer degradans Pseudomonas syringae Pseudomonas aeruginosa
Rubrivivax gelatinosus Thiobacillus denitrificans Nitrosomonas europaea
Ralstonia solanacearum Ralstonia eutropha
Burkholderia fungorum Burkholderia cepacia
Burkholderia cepacia Burkholderia pseudomallei
Idiomarina loihiensisPhotobacterium profundum
Shewanella oneidensis Vibrio cholerae Vibrio vulnificus Vibrio parahaemolyticus
Haemophilus somnus Haemophilus influenzae
Pasteurella multocida Photorhabdus luminescens Yersinia pseudotuberculosis
Erwinia carotovora Salmonella enterica
Erwinia chrysanthemi
6155
79 61 83
7255
5467
71
52
65
5152
5474
82
52
73
528498 52
508992
8472 54
527383
698372
8783
77 92
52
LGT and phylogenetic assignment of metagenomic clones 2023
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
cluster with γ-proteobacteria and might therefore repre-sent recent within γ-proteobacteria transfers ORF40 inthe isin-proteobacterial clone b1dcf13c08 a short ORFanhas a G + C content of 222 compared with 347 forthe complete clone In addition ORF9 another ORFan inb1dcf13c08 has a marginally lower G + C content com-pared with the rest of the fosmid clone with 257 Simi-larly ORF26 in the Chloroflexi clone b1dcf13f01 has aG + C content of 478 G + C compared with 569 forthe complete fosmid clone
The first protein coding sequences from uncultivated lineages
Four of the fosmids that we sequenced were from uncul-tivated lineages These fosmid clones represent to ourknowledge the first protein coding sequences obtainedfrom these major bacterial lineages In agreement withtheir rRNA phylotype most of the CDSs with homologuesin GenBank are found as independent lineages in phylo-genetic trees (Fig 1 Table 1) These clones also containseveral large CDSs with no significant matches in Gen-Bank or only partial matches to known proteins (Fig 1Table 1) A t-test showed that both the proportion ofORFans (P = 0002) and the proportion of coding bases(P = 002) with no match in GenBank (excluding the envi-ronmental part of GenBank) were significantly higherthan what was observed in fosmid clones from lineagesthat have cultivated representatives
The two candidate division WS3 clones b1bcf11f04and b1dcf51c12 contain several large CDSs for whichwe can make no clear functional prediction or that haveno match in GenBank For instance for b1dcf51c12 halfof the clone is occupied by two CDSs that have no signif-icant matches in GenBank (ORF4) or only a single match(ORF5) Also none of these CDSs had significantmatches to domains in Pfam These CDSs might repre-sent lineage-specific proteins and homologues may beidentified when more sequences from this lineages areavailable The candidate division OP8 also contains anumber of ORFans however in this fosmid the predictedproteins tend to be smaller than what we observed for thetwo WS3 clones
The b1dcf51a06 clone encodes a large ORFan(ORF1) as well as several smaller ORFans (ORF5ORF7ndash9 ORF14) and CDSs with only single hits in Gen-Bank (ORF6 ORF11ndash13) (Fig 1) For ORF1 we canmake some functional prediction based on Pfamsearches This protein contains a nucleoside diphosphatekinases domain a fibronectin type III domain as well asa PBS lyase HEAT-like repeat (three repeat units) ThePBS lyase repeat is responsible for specifically attachingparticular phycobilins to apophycobiliprotein subunits inthe phycobilisomes (PBS) which are light harvesting mac-
romolecular complexes of cyanobacteria and red algae(Zhao et al 2000) The phycobilins are open-chain tet-rapyrrole chromophores which function as the photosyn-thetic light-harvesting pigments Interestingly two otherCDSs ndash ORF15 and ORF16 ndash also contain several PBSrepeats It is possible that the proteins encoded by thePBS-containing CDSs in b1dcf51a06 has a similar func-tion as the PBS lyase proteins in cyanobacteria andthat this fosmid clone originated from a photosyntheticorganism
Among the CDSs that do have matches in GenBank arepotential phylogenetic markers The candidate divisionWS3 clone b1bcf11f04 clone contains two CDSs withsimilarity to DNA polymerase III subunit A homologuesDnaE and the Gram-positive type PolC In phylogenetictrees of both genes the b1bcf11f04 homologue forms aseparate lineage (Fig 6) Conserved domain searches atNCBI showed that the PolC-like CDS shows similarity toonly part of this gene ndash the exonuclease domain ndash and itis fused to DinG that encodes Rad3-related DNA heli-cases Proteins with similar domain architecture are foundin several other bacterial genomes mostly Firmicutes aswell as S thermophilum and Chloroflexus aurantiacussuggesting that the candidate division WS3 might be spe-cifically related to one of these lineages In phylogenetictrees of the DinG domain of these proteins the fusionproteins are all found in the same clade (Fig 6) Howeverthe monophyly of this clade was not supported by boot-strap analyses In the Maximum Likelihood phylogeny theb1bcf11f4 CDS clusters at the bottom of the clade withC aurantiacus No non-fusion proteins are found inthis clade suggesting a single origin of this domainorganization
Summary
Metagenomic approaches play an increasing and highlyvisible role in microbial ecology The data sets they gen-erate are complex and coupling the information they pro-vide concerning the metabolic potential of an environmentto organismal lineage that may be present there remainsa challenge Here we have shown the utility of rRNA-targeted cloning and phylogenetic analysis of CDSs inmaking such a coupling We also show that LGT evenwhen not precluding provisional assignment to lineages(taxonomy) will likely complicate the history of any lin-eage (phylogenetics) making phylotype-ecotype infer-ences provisional Environmental metagenomic data opena window into a rich world of genetic interactions someof which might be partially reconstructed as we havedescribed here The bioinformatic challenges associatedwith a complete metagenomic assessment of an environ-ment as complex as Baltimore harbour sediment aredaunting indeed But progress in understanding our own
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
LGT and phylogenetic assignment of metagenomic clones
2015
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
Fig 2
rRNA phylogeniesA The minimum evolution tree estimated from LogDet distances of the 23S-tag from the CeuI-fosmids (984 positions in alignment) For the sequences from the fosmid clones the Maximum Likelihood topology was similar (GTR
+
G
+
I) except that the
δ
-proteobacteria where paraphyletic with the
γ
- and
β
-proteobacteria clustering within the
δ
-proteobacteria Moreover b1bcf11d04 fell at the bottom of this cladeB The minimum evolution tree estimated from LogDet distances of the 16S sequences (1243 positions in alignment) For the sequences from the fosmid clones the Maximum Likelihood (GTR
+
G
+
I) topology was identical However there where several differences in the backbone of the tree with for instance Geobacter clustering with Firmicutes The trees in both A and B were rooted by the
Thermotoga maritima
sequenceC The minimum evolution tree estimated from LogDet distances of the closest matches of the 16S fragment in b1bf110d03 (1046 positions in alignment) The Maximum Likelihood (GTR
+
G
+
I) topology was identicalFor all three trees numbers on branches refers to bootstrap values from the minimum evolution analysis (
italic
) and from the Maximum Likelihood analysis (plain text) If both bootstrap values were above 70 this is indicated by a grey circle while a black circle indicated that all three values were above 90
B
Thermotoga maritima Coprothermobacter proteolyticus
Acidobacterium capsulatumPirellula marina
R76-B102OPB95
OPB5HMMVPog-54
HS9-30
PBS-II-35
LD1-PB19PBS-III-30
PRR-12Simkania negevensisBorrelia burgdorferi
Synechococcus Chloroflexus aurantiacus
Dehalococcoides ethenogenes Bacteroides thetaiotaomicron
Cytophaga hutchinsoniiChlorobium tepidum
Leptospirillum ferrooxidans Deinococcus radiodurans
Geobacillus subterraneus Paenibacillus popilliae
Fusobacterium nucleatum Geobacter metallireducens
Bradyrhizobium japonicum Vibrio splendidus
Methylobacillus flagellatum Thiobacillus denitrificans
005 substitutionssite
b3cf12f09
b1bcf11f04
b1bf11a01
candidate division OP8
candidate division WS3
Betaproteobacteria
92
72
54
78
57
75
Porphyromonas gingivalis
Bacteroides thetaiotaomicron
Cytophaga hutchinsonii
Cellulophaga pacifica
Flavobacterium gelidilacus
Flavobacterium psychrolimnae
Flavobacterium frigoris
Flavobacterium xinjiangensis
Gelidibacter algens
Bizionia paragorgiae
Formosa algae
Algibacter lectus
Flavobacterium sp 5N-3
Psychroserpens burtonensis
Mesophilibacter yeosuensis
b1bf110d03
Flavobacteriaceae bacterium BSA CS 02
Flavobacteriaceae bacterium BSD RB 42
001 substitutionssite
C
isolated from estuarine and salt marsh sediments
b3cf12f09Chlorobium tepidum
Synechocystis sp D64000
Deinococcus radiodurans
b1dcf13f01Dehalococcoides ethenogenes
b1dcf511a06Fusobacterium nucleatum
b1bcf11f04b1dcf51c12
Mycobacterium kansasiiStreptomyces coelicolor Thermomonospora chromogena
Paenibacillus popilliaeOceanobacillus iheyensis
Geobacillus kaustophilus
Simkania negevensis Pirellula sp strain 1
b3cf12d07Pseudomonas stutzeri
005 substitutionssite
candidate division WS3
Wolinella succinogenes Helicobacter hepaticus
Campylobacter jejuni b1dcf13c08
Epsilonproteobacteria
b1bcf11d04Desulfotalea psychrophila
b1bcf11h03Nannocystis exedens
Stigmatella aurantiacaGeobacter metallireducens
Deltaproteobacteria
Methylobacillus flagellatusb1bf11a01Thiobacillus denitrificans
Halomonas pantelleriensis
Microbulbifer degradansVibrio splendidus
b1bcf11c04Uncultured bacterium 463 clone EBAC080-L32B05
Betaproteobacteria
Gammaproteobacteria
Thermotoga maritima
candidate division OP8
Chloroflexi
Symbiobacterium thermophilum
Bacillus cereus
Desulfovibrio vulgaris
A
51
6197
87
55
67
54100
61
58
84
57
8968
58
97
54
65
64
68
73
51
58
53
87
58
2016
C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
short to obtain reliable alignments the CDS was found ina lsquomixedrsquo clade also containing genes from the same bac-terial group or the CDS was found outside its group butdid not cluster with any specific lineage For three of theclones more than 30 of the CDSs have been acquiredby LGT (Table 1) two of these are from candidate divi-sions and one is from a
δ
-proteobacterium For all threeof these fosmids there appears to have been a transfer ofa large island of genes from a phylogenetically distantlineage Specifically we infer an
α
-proteobacterial islandin b3cf12f09 a
δ
-proteobacterial island in b1dcf51c12and an archaeal
β
-proteobacterial island in b1bcf11d04(Fig 1) It should be noted that the proportions of foreigngenes identified here might not represent the proportion
of foreign genes in the respective genomes that we havesampled but
rather the amount of LGT to be expectedwhen sequencing environmental fosmid clones
Forinstance in some genomes LGT might be enriched incertain variable parts of the genome Indeed the distribu-tion of proteins that match COG categories was signifi-cantly different (
P
=
13 e-13 in a
χ
2
-test) to what weobserved for the end-sequencing of lsquonormalrsquo fosmidclones (supplemental Fig S1) the main difference beingproportionally more J K U F and H category sequencesin the full fosmid sequences and more L P R and Scategory sequences among the end-sequences Whencomparing the distributions of different COG-groups (ieinformational metabolism etc) however the two datasets were significantly different only when including thepoorly characterized categories (R S) If such genes aremore frequently transferred than the other categoriesthen we would be underestimating the level of LGT thatwould be expected when analysing metagenomic clones
Interestingly in b1bcf11d04 the transfer vector for oneof the acquired gene clusters could be identified ORF6encodes an acetyl transferase gene and ORF8 ORF9and ORF10 encode subunits for an acyl-CoA synthase ndashtwo
α
-subunits and one
β
-subunit Phylogenetic analysessuggested all four CDSs have been acquired by LGTlikely from a
β
-proteobacterium The
β
-proteobacteriahave in turn likely acquired the acyl-CoA synthase genesfrom Archaea (Fig 3) In support of the archaeal origin ofthese genes the acyl-CoA synthase in bcf11d04 hassimilar domain organization to the acetyl-CoA synthase in
Pyrococcus
spp with two subunits (Sanchez
et al
2000)Furthermore these genes have been transferred multipletimes and the transfers involved all three domains of life[Fig 3 (Andersson
et al
2003)] These transferred CDSsare preceded by one integrase gene (ORF3) a trans-posase gene (ORF4) and an intergerasetransposasegene (ORF5 COG2801 Tra5 which contains an inte-gerase core domain Table S7) which probably wereresponsible for transferring this cluster into this genomeThe
α
-proteobacterial island in the b3cf12f09 cloneencodes a wide range of different functions and no typicalmobile elements were identified However as this islandextends to the 3
prime
end of the fosmid mobile genes mightbe found further downstream The first CDS of this islandencodes a DnaJ-class chaperone (ORF29) which is trun-cated at the 5
prime
end This pseudogene still shows 65protein identity to a homologue in
Magnetoospirillummagnetotacticum
(Table S3) Hence this probably repre-sents a very recent transfer (or rearrangement) Anotherpossibility is that this fosmid might be a chimera Howeverthe G
+
C content of the CDSs in the
α
-proteobacterialisland (595 G
+
C) is very similar to the rest of thefosmid (596 G
+
C supplemental Table S3) Also fur-ther upstream there is a proteobacterial transposase
Fig 3
Maximum Likelihood phylogeny of acetyl-CoA synthetase (ADP-forming) homologues estimated using PMBML (459 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF8 and ORF10 sequences against GenBank and the 100 best matches were retrieved and aligned Groups of very similar sequences from the same species or sister species were trimmed down to one sequence representative The tree was arbi-trarily rooted by Entamoeba histolytica Numbers on branches refers to bootstrap support obtained from using PMBML in bold PUZZLEBOOT in plain text and Neighbour-joining in italic If all bootstrap values were above 70 this is indicated by a grey circle while a black circle indicated that all three values were above 80
10
Entamoeba histolytica Parachlamydia sp UWE25
Rubrobacter xylanophilus Gloeobacter violaceus
Nostoc sp PCC 7120Thermosynechococcus elongatus
Dechloromonas aromaticaMesorhizobium sp BNC1
Sinorhizobium melilotiXanthomonas axonopodisRhodopseudomonas palustris
Bradyrhizobium japonicum Desulfovibrio desulfuricans
Rhodospirillum rubrumMagnetospirillum magnetotacticum
Magnetospirillum magnetotacticumShewanella oneidensis
Photobacterium profundumVibrio cholerae
Vibrio vulnificus Photorhabdus luminescens
Yersinia pestis Salmonella enterica
Escherichia coli Methanopyrus kandleri
Pyrococcus furiosus Archaeoglobus fulgidus
Methanococcus maripaludisMethanocaldococcus jannaschii
Magnetococcus sp MC-1 Chloroflexus aurantiacus
Spironucleus barkhanus Giardia intestinalis
Pyrococcus furiosusThermoplasma acidophilum Thermoplasma volcanium
Pyrococcus furiosus Streptomyces avermitilisBradyrhizobium japonicum
Ralstonia metalliduransFerroplasma acidarmanus
Sulfolobus solfataricusSulfolobus tokodaii
Pyrococcus furiosusPyrococcus furiosus
Pyrobaculum aerophilumMethanosarcina mazei Methanosarcina acetivoransThermobifida fusca
Archaeoglobus fulgidusArchaeoglobus fulgidus
Archaeoglobus fulgidusArchaeoglobus fulgidus
b1bcf11d04ORF8b1bcf11d04ORF10
Bordetella bronchiseptica Ralstonia metallidurans
Bordetella pertussis Bordetella bronchiseptica
Burkholderia fungorumBurkholderia fungorumRalstonia eutropha
Bordetella bronchisepticaRalstonia eutropha
Bradyrhizobium japonicumRalstonia eutropha
Burkholderia fungorumBordetella bronchiseptica
Ralstonia eutrophaBordetella bronchiseptica
Bradyrhizobium japonicumBordetella bronchiseptica
Pseudomonas mendocina Bradyrhizobium japonicum
7480
9764
75
52
83
52
57
60
61
70
89
51
64
6262
64
57
58
50
7173
62
100100
LGT and phylogenetic assignment of metagenomic clones 2017
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Tab
le 1
S
umm
ary
of p
hylo
gene
tic a
naly
ses
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
b1dc
f51
a06
No
clea
r af
filia
tion
with
exi
stin
gse
quen
ces
Cou
ld n
ot b
eam
plifi
ed
Mos
t C
DS
s ha
ve n
o or
only
a f
ew s
igni
fican
tm
atch
es in
Gen
Ban
kO
RF
4 cl
uste
rs w
ithLe
ptos
pira
inte
rrog
ans
with
in a
mix
ed c
lade
ho
wev
er
L in
terr
ogan
sha
s se
vera
l par
alog
ues
and
this
gen
e ap
pear
sto
hav
e be
en f
requ
ently
tran
sfer
red
and
islik
ely
to b
e a
tran
sfer
OR
F20
clu
ster
s w
ithM
etha
nosa
rcin
a w
ithin
δ-pr
oteo
bact
eria
O
RF
19cl
uste
rs w
ith G
eoba
cter
but
is m
ostly
foun
d in
met
hano
gens
OR
F17
and
OR
F18
have
hom
olog
ues
inM
etha
noge
ns o
nly
4 C
DS
s (1
9 o
f th
eto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
33
(38
)
b1dc
f13
f01
Clu
ster
s w
ithD
ehal
ococ
coid
eset
heno
gene
sC
hlor
oflex
usau
rant
iacu
s 23
SrR
NA
seq
uenc
eof
too
poo
r qu
ality
to in
clud
e in
the
tree
7 of
10
CD
Ss
(70
) w
ithsu
ppor
ted
phyl
ogen
etic
topo
logi
es a
gree
with
23S
fra
gmen
t In
addi
tion
6 C
DS
s w
hich
only
hit
Chl
orofl
exus
aura
ntia
cus
Two
CD
Ss
have
like
lybe
en a
cqui
red
thro
ugh
LGT
One
clu
ster
s w
ithhi
gh s
uppo
rt w
ithT
herm
otog
a m
ariti
ma
(OR
F16
) an
d on
e cl
uste
rsw
ithin
the
euk
aryo
tes
(OR
F25
)
OR
F2
has
only
sign
ifica
ntho
mol
ogue
s in
Cro
cosp
haer
aw
atso
nii
3 C
DS
s (1
1 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
14
(5
)
b3cf
12
f09
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A g
enes
and
do
not
clus
ter
with
in a
nysp
ecifi
c ba
cter
ial g
roup
Phy
loge
netic
ana
lysi
ssu
gges
ts t
hat
10 C
DS
sha
ve li
kely
bee
n ac
quire
dby
LG
T 8
of
thes
e ha
vebe
en a
cqui
red
from
an
α-pr
oteo
bact
eriu
man
d ar
e fo
und
linke
d
Thr
ee C
DS
s fo
und
linke
d to
CD
Ss
whe
reph
ylog
enet
ic a
naly
ses
sugg
est
LGT
hav
eal
so li
kely
bee
nac
quire
d by
LG
T
13 C
DS
s (3
2 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
OR
F16
is a
tran
spos
ase
of
prot
eoba
cter
ial
orig
in
and
show
slo
wer
GC
con
tent
than
the
res
t of
the
fosm
id T
wel
ve o
fth
e tr
ansf
erre
dC
DS
s (O
RF
29ndash
41)
are
linke
d an
dal
l app
ear
to h
ave
been
acq
uire
dfr
om a
n α-
prot
eoba
cter
ium
22
(9
)
b1bc
f11
f04
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A a
nd d
oes
not
clus
ter
with
any
spe
cific
bact
eria
l lin
eage
A
mon
g th
ese
was
the
high
ly c
onse
rved
Dna
Ege
ne
Two
CD
Ss
(OR
F14
and
OR
F15
) cl
uste
r w
ithse
quen
ces
from
the
Chl
orob
iBac
tero
idet
esgr
oup
2 C
DS
s (9
o
f to
tal)
hav
e b
een
acq
uir
ed b
y L
GT
26
(14
)
2018 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f51c
12C
andi
date
div
isio
nW
S3
bact
eriu
mM
ost
CD
Ss
have
no
oron
ly a
few
sig
nific
ant
mat
ches
in G
enB
ank
OR
F6ndash
OR
F11
are
als
ofo
und
in b
1bcf
11
h3 in
sam
e or
der
and
phyl
ogen
etic
ana
lysi
ssu
ppor
ts t
hat
OR
F7
OR
F8
and
OR
F10
wer
etr
ansf
erre
d fr
om a
δ-
prot
eoba
cter
ium
to
b1bc
f51c
12 O
RF
10 a
ndO
RF
11 a
lso
clus
ter
with
δ-pr
oteo
bact
eria
ho
wev
er
with
no
boot
stra
p su
ppor
t O
RF
9ha
s on
ly o
ne m
atch
inG
enB
ank
OR
F15
(fu
sA)
clus
ters
with
Chl
orob
ium
tepi
dum
with
inF
irm
icut
es
OR
F12
has
no
hom
olog
ue in
b1bc
f11
h3
but
doe
scl
uste
r w
ith δ
-pr
oteo
bact
eria
ho
wev
er w
ith n
obo
otst
rap
supp
ort
It is
like
ly t
hat
also
thi
sC
DS
was
tra
nsfe
rred
as p
art
of w
ith a
δ-
prot
eoba
cter
ial i
slan
d
8 C
DS
s (4
4 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
One
lar
ge lsquoi
slan
drsquo o
fδ-
prot
eoba
cter
ial
orig
in
22
(29
)
b1cf
11
1h0
3δ-
Pro
teob
acte
rium
ndash8
of 1
3 C
DS
s (5
7)
that
give
s su
ppor
ted
phyl
ogen
ies
agre
e w
ithth
e fr
agm
ent
orig
inat
ing
from
a δ
-pr
oteo
bact
eriu
m
Six
CD
Ss
have
like
ly b
een
acqu
ired
by L
GT
OR
F8
clus
ters
with
Clo
strid
ium
ther
moc
ellu
m a
ndTr
epon
ema
dent
icol
aO
RF
18 is
fou
ndse
para
ted
from
oth
erpr
oteo
bact
eria
inph
ylog
enet
ic t
rees
cl
uste
ring
with
Pla
smod
ium
spp
O
RF
23is
fou
nd in
a m
ixed
cla
dean
d ap
pear
s to
hav
ebe
en f
requ
ently
tran
sfer
red
OR
F28
clus
ters
with
β-
prot
eoba
cter
ia
OR
F29
clus
ters
with
γ-
prot
eoba
cter
ia a
ndO
RF
30 is
fou
nd a
tbo
ttom
of
clad
e th
atco
ntai
ns α
-pr
oteo
bact
eria
and
Act
inob
acte
ria
6 C
DS
s (1
7 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
OR
F11
ndashOR
F16
ha
ve b
een
tran
sfer
red
from
an
ance
stor
of
B1B
CF
11
h03
tob1
dcf5
1c
12 a
sw
ell t
o th
eC
hlor
obiu
m li
neag
e
6 (
1)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2019
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f11
d04
δ-P
rote
obac
teriu
mndash
12 o
f 18
CD
Ss
(67
)w
ith s
uppo
rted
phyl
ogen
etic
top
olog
ies
agre
e w
ith a
δ-
prot
eoba
cter
ial o
rigin
of
the
frag
men
t
Six
CD
Ss
are
sugg
este
dby
phy
loge
netic
ana
lyse
sto
hav
e be
en a
cqui
red
byLG
T O
ne o
f th
ese
tran
sfer
red
gene
s ndasht
hefu
sA h
omol
ogue
(OR
F19
) ndash is
als
o fo
und
inb1
bcf5
c12
Thi
s C
DS
has
been
tra
nsfe
rred
to
othe
r δ-
prot
eoba
cter
ia a
sw
ell
Thr
ee C
DS
s (O
RF
3ndash5)
that
enc
ode
anin
tege
rase
and
tw
otr
ansp
osas
es t
hat
prec
edes
fou
r of
the
LGT
gen
es d
etec
ted
in t
he p
hylo
gene
tican
alys
is
OR
F7
also
likel
y tr
ansf
erre
d w
ithO
RF
3 ndashO
RF
10
OR
F20
and
OR
F21
have
mai
nly
hom
olog
ues
inF
irm
icut
es a
nd is
the
neig
hbou
r of
OR
F19
that
has
als
o be
enac
quire
d fr
omF
irm
icut
es
12 C
DS
s (3
1 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
Inte
rest
ingl
y th
isfo
smid
clo
nepr
ovid
es t
hetr
ansf
er v
ecto
r ndash
the
inte
gera
se a
ndtr
ansp
osas
e ndash
for
8of
the
tra
nsfe
rred
gene
s
ndash
b1bc
f13
c08
ε-P
rote
obac
teriu
m
mos
t cl
osel
yre
late
d to
Cam
pylo
bact
erje
juni
21 C
DS
s gi
ve s
uppo
rted
phyl
ogen
ies
and
ofth
ese
19 (
90
) ag
ree
with
rR
NA
OR
F4
clus
ters
with
Geo
bact
er a
ndC
lost
ridiu
m
and
OR
F23
does
not
hav
eho
mol
ogue
s in
ε-
prot
eoba
cter
ia a
ndcl
uste
rs w
ith γ
- an
d β-
prot
eoba
cter
ia
OR
F24
doe
s no
t gi
ve a
supp
orte
d tr
ee b
utha
s al
so p
roba
bly
been
tra
nsfe
rred
fro
mγ-
or
β-pr
oteo
bact
eria
3 C
DS
s (7
o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
10
(3
)
b3cf
12
d07
γ-P
rote
obac
teriu
m
Clu
ster
s w
ithin
the
γ-pr
oteo
bact
eria
inLo
gDet
dis
tanc
etr
ees
but
at t
heba
se o
f γ-
prot
eoba
cter
ia a
ndβ-
prot
eoba
cter
iain
the
bes
tm
axim
umlik
elih
ood
tree
Onl
y 7
CD
Ss
give
su
ppor
ted
phyl
ogen
ies
O
f th
ese
4 (5
7)
agre
e w
ith r
RN
A
OR
F7
clus
ter
with
in β
-pr
oteo
bact
eria
OR
F15
ha
s a
patc
hy d
istr
ibut
ion
and
does
not
clu
ster
with
ot
her
prot
eoba
cter
ia in
th
e ph
ylog
enet
ic t
ree
Sev
eral
add
ition
al C
DS
s (O
RF
16ndashO
RF
25)
that
did
not
prod
uce
wel
l-re
solv
ed t
rees
ha
d on
ly d
iver
gent
hom
olog
ues
inG
enB
ank
or
nosi
gnifi
cant
hom
olog
ues
may
also
hav
e be
enac
quire
d by
LG
T I
nsu
ppor
t of
thi
sO
RF
26 e
ncod
es a
tran
spos
ase
2 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
O
RF
16 ndash
OR
F25
w
as n
ot in
clud
ed in
es
timat
e du
e to
lim
ited
evid
ence
for
th
e tr
ansf
er o
f the
se
23
(23
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
2020 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f1c
04γ-
Pro
teob
acte
rium
ndash14
CD
Ss
give
sup
port
edph
ylog
enie
s an
d of
thes
e 13
(93
)
agre
ew
ith r
RN
A
Phy
loge
netic
ana
lyse
ssh
ow t
hat
two
CD
Ss
have
bee
n ac
quire
d by
LGT
OR
F3
is f
ound
in a
mix
ed c
lade
whi
leO
RF
30 c
lust
er w
ithin
β-
prot
eoba
cter
ia
Thr
ee g
enes
tha
t sh
owun
cong
ruen
tph
ylog
enie
s b
utw
ith lo
w b
oots
trap
supp
ort
foun
d cl
ose
to O
RF
3 an
d O
RF
34ha
ve p
roba
bly
also
been
acq
uire
d by
LGT
O
RF
5 cl
uste
rsw
ith β
-pro
teob
acte
ria
OR
F31
clu
ster
s w
ithδ-
prot
eoba
cter
ia
and
OR
F32
(G
ST
) cl
uste
rsw
ith a
γ-pr
oteo
bact
eriu
m
but
appe
ars
toha
ve b
een
freq
uent
lytr
ansf
erre
d
5 C
DS
s (1
3 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
3 (
1)
b1bf
11
a01
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns (
98
iden
tity
at 2
3S
rRN
A)
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns(9
8 id
entit
yat
16S
rR
NA
)
Hig
h de
gree
of
gene
sy
nten
y co
mpa
red
with
Thi
obac
illus
de
nitr
ifica
ns
29 C
DS
sha
ve b
est
BLA
ST
mat
chin
Thi
obac
illus
de
nitr
ifica
ns 2
7 of
28
CD
Ss
(96
) th
at g
ive
stat
istic
ally
sup
port
edph
ylog
enie
s ag
ree
with
rR
NA
gen
es
One
OR
F30
(R
suA
)cl
uste
r w
ith γ
-pr
oteo
bact
eria
and
has
no
hom
olog
ue in
T
hiob
acill
us d
enitr
ifica
ns
Two
CD
Ss
(OR
F14
and
O
RF
31)
have
bee
n tr
ansf
erre
d to
bot
h fo
smid
an
d T
hiob
acill
us
deni
trifi
cans
OR
F29
has
no
sign
ifica
nt
hom
olog
ues
inpr
oteo
bact
eria
4 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
3 (
2)
b1bf
110
d03
ndashA
Fla
voba
cter
iace
aeba
cter
ium
am
ong
sequ
ence
dge
nom
es m
ost
clos
ely
rela
ted
toC
ytop
haga
hutc
hins
onii
16 o
f 18
(84
) C
DS
s w
ith
supp
orte
d ph
ylog
enet
icto
polo
gies
agr
ee w
ith16
S f
ragm
ent
OR
F5
and
OR
F10
hav
e no
cl
ose
hom
olog
ues
in
othe
r B
acte
roid
es a
ndph
ylog
enet
ic a
naly
sis
sugg
ests
fre
quen
ttr
ansf
er
OR
F4
has
no d
etec
tabl
eho
mol
ogue
s in
oth
er
Bac
tero
ides
A
tran
spos
on w
ith 8
C
DS
s lik
ely
acqu
ired
from
rel
ativ
e of
Bac
tero
ides
thet
aiot
aoim
icro
n
3 C
DS
s (1
0 o
f to
tal)
h
ave
likel
y b
een
acq
uir
ed b
y L
GT
The
tra
nspo
son
not
incl
uded
as
it ha
sbe
en t
rans
ferr
edw
ithin
the
B
acte
roid
es
10
(3
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
a O
nly
LGT
eve
nts
invo
lvin
g th
e C
DS
fro
m t
he fo
smid
clo
ne a
naly
sed
was
cou
nted
and
onl
y w
hen
they
wer
e su
ppor
ted
by p
hylo
gene
tic a
naly
ses
or c
lear
phy
loge
netic
dis
trib
utio
n pa
ttern
s (i
e
the
gene
is n
ot p
rese
nt in
its
rRN
A g
roup
but
pre
sent
in s
ome
othe
r di
stin
ct b
acte
rial g
roup
) N
umbe
r of
CD
Ss
acqu
ired
by L
GT
is s
how
n in
bol
db
O
RFa
ns w
here
cla
ssifi
ed a
s C
DS
s w
ith n
o si
gnifi
cant
mat
ch in
Gen
Ban
k M
atch
es t
o se
quen
ces
in t
he e
nviro
nmen
tal p
ortio
n of
Gen
Ban
k w
ere
not
cons
ider
ed I
n pa
rent
hesi
s is
giv
en t
he
prop
ortio
n of
pro
tein
cod
ing
DN
A t
hat
has
no m
atch
in G
enB
ank
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2021
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
(ORF16) showing that this lineage has indeed acquiredproteobacterial genes This CDS might have been part ofthe α-proteobacterial island upon transfer
In the Flavobacteriaceae fosmid b1bf11d10 a largeself-transmitting conjugative transposon was identified(Fig 1) This transposon is inserted next to a tRNA and issimilar in sequence and structure to the transposonsfound in Bacteroides thetaiotaomicron (Xu et al 2003)Bacteroides fragilis (Kuwahara et al 2004) and Porphy-romonas gingivalis (Nelson et al 2003) In the phyloge-netic tree of the transposase gene (ORF21) the CDSfrom the fosmid falls into a cluster containing numerousB thetaiotaomicron sequences separated from the singleCytophaga hutchinsonii homologue detected among the100 best BLAST hits For the other CDSs that are clearlypart of this transposon (ORF22ndashORF27) we found no
significant homologues in C hutchinsonii and the best(and in most cases the only) match was always to Bthetaiotaomicron and P gingivalis genes suggesting thatthis transposon has been acquired from the Bacteroidaleslineage It is likely that we have captured only part of thistransposon ndash because many of the CDSs found in thetransposons in B thetaiotaomicron are not present in thefragment we have sequenced ndash and that also the 3prime CDSsin this fosmid clone (ORF28ndashORF30) were transferredalong with this transposon Additional CDSs (possibly notinvolved in transposon function) where also present in theB thetaiotaomicron transposons (Xu et al 2003) Wenote that the acquisition of this transposon was notincluded in our LGT estimate as it originated from thesame major bacterial group as the fosmid clone
Interestingly one gene was found to have been trans-ferred to two of the fosmids the fusA paralogue inb1bcf11d04 and b1dcf51c12 (Figs 1 and 4) This pro-tein appears to be a distant paralogue of fusA and it hasa very patchy phylogenetic distribution suggesting that itoriginated in one of the lineages that possesses it andthen has been transferred to the other lineages Onecharacteristic common to the organisms encoding thisprotein is that they are all anaerobes or microaerophilic(Symbiobacterium thermophilum) and they are all foundin environments similar to the one sampled here Trans-ferred genes are likely to give a selective advantage in theenvironment where the organisms harbouring them liveand an ecological function for this fusA paralogue shouldbe sought
Another set of genes identified in two of the fosmidclones forms a cluster encoding outer membrane proteinsand proteins involved in biopolymer transport (OmpATolB TonB ExbD TolQ) This cluster is found in both thecandidate division WS3 clone b1dcf51c12 and the δ-proteobacterial clone b1bcf11h03 (Fig 1) In this casethe gene cluster appears to have been transferred from aδ-proteobacterium to b1dcf51c12 while it might be nativeto b1bcf11h03 (Fig 5) This gene cluster also appearsto have been transferred to Chlorobium tepidum as bothb1dcf51c12 and C tepidum cluster within the δ-proteo-bacteria for all these genes except TonB (from which wecould not make a reliable alignment) Robust phylogenieswere only obtained from OmpA and TolB However theconserved gene order in b1dcf51c12 C tepidumb1bcf11h03 and other δ-proteobacteria such as Geo-bacter suggests that this entire 4-kb fragment was trans-ferred from a δ-proteobacterium to C tepidum andb1dcf51c12 probably as two separate events Moreoverfor b1dcf51c12 the fusA paralogue discussed abovemay have been transferred as part of this gene cluster asthey are found close together in this clone The second δ-proteobacterial fosmid clone b1bcf11d04 also containsan OmpA homologue However this CDS is distantly
Fig 4 Maximum Likelihood phylogeny fusA homologues estimated using PMBML (661 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF19 and b1dcf51c12 ORF15 sequences against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbitrarily rooted by Aquifex aeolicus Results from bootstrap analyses are indicated as in Fig 3
10
Aquifex aeolicus Thermotoga maritima
Chlorobium tepidum b1dcf51c12ORF15
b1bcf11d04ORF19Desulfovibrio vulgaris
Desulfotalea psychrophila Magnetococcus sp MC-1
Geobacter sulfurreducens Geobacter metallireducens
Moorella thermoacetica Desulfitobacterium hafniense
Symbiobacterium thermophilum Chloroflexus aurantiacus
Dehalococcoides ethenogenesThermoanaerobacter tengcongensis
Clostridium thermocellumFusobacterium nucleatum
Clostridium perfringensClostridium tetani
Thermus thermophilus Rubrobacter xylanophilus
Mycoplasma penetransUreaplasma parvum
Geobacillus stearothermophilusExiguobacterium sp 255-15
Bacillus cereus Bacillus halodurans
Listeria monocytogenes Bacillus subtilis
Oceanobacillus iheyensis Staphylococcus aureus
Lactobacillus johnsonii Pediococcus pentosaceusLactobacillus plantarum
Enterococcus faecalisLactococcus lactis
Streptococcus mutans Streptococcus agalactiae
Moorella thermoacetica Symbiobacterium thermophilum
Thermoanaerobacter tengcongensis Clostridium thermocellum
Clostridium acetobutylicumClostridium perfringens
Clostridium tetani Chlorobium tepidum
Fusobacterium nucleatumThermobifida fusca
Desulfovibrio desulfuricansMagnetococcus sp MC-1
Geobacter sulfurreducensSynechococcus elongatus
Prochlorococcus marinus Synechococcus sp WH 8102
Thermosynechococcus elongatus Nostoc punctiforme
Synechocystis sp PCC 6803 Trichodesmium erythraeum
Spirulina platensis Campylobacter jejuni Helicobacter pylori Wolinella succinogenes
Legionella pneumophilaMethylococcus capsulatus
Coxiella burnetii Photorhabdus luminescens
Pasteurella multocida Shewanella oneidensis Photobacterium profundum Vibrio parahaemolyticusNeisseria meningitidis
Chromobacterium violaceum Bordetella parapertussis
Ralstonia metallidurans Bordetella bronchiseptica Burkholderia pseudomalleiRalstonia metallidurans
Azoarcus sp EbN1 Dechloromonas aromatica
Nitrosomonas europaea Thiobacillus denitrificans
66
57 65 55
61
5160
9072
80
86
88
6090
63
50 52 75 74
9094
50 68 74
78
53
7985
8481
72
53 9968
7790
70
2022 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
related to the OmpA found in this gene cluster and wasnot included in the alignment
We also identified some mobile genes that might beinvolved in biodegradation of pollutants by searching thePfam database In one of the γ-proteobacterial fosmidsb1bcf11c4 we identified a glutathione-S-transferase(GST ORF36) gene that was flanked by an acetyltrans-ferase gene (ORF35) and a transporter (ORF34) Eukary-otic GSTs are important in detoxifying metabolism Wellcharacterized bacterial GSTs (such as dichloromethanedehalogenase and 12-dichloroepoxyethane epoxidase)on the other hand are catabolic enzymes that play anessential role in growth on various difficult-to-degradechemicals (Vuilleumier and Pagni 2002) Considering theenvironment the fosmid originated from ndash highly pollutedmarine sediments ndash these CDSs would be good candi-dates for genes involved in biodegradation of a xenbiotic
compound The b1bf11c4 GST-gene clusters with a γ-proteobacterium (Acinetobacter sp ADP1 Accession noYP_046221) However as observed by Vuilleumier andPagni (2002) the phylogeny suggests that this gene hasbeen frequently transferred In support of this CDS havingbeen acquired by LGT its neighbour ndash ORF34 ndash clustersrobustly within the β-proteobacteria while ORF35 clusterswith δ-proteobacteria (although with no bootstrapsupport)
Another gene that might be involved in biodegradationof pollutants was identified among the CDSs that havebeen transferred into the β-proteobacterial fosmidb1bf11a01 ndash ORF31 which encodes a dienelactonehydrolases Dienelactone hydrolases play a crucial role inchlorocatechol degradation via the modified ortho cleav-age pathway (Eulberg et al 1998 Muller et al 2004)suggesting that the bacterium from which this fragmentoriginated might use chloraromatic compounds as energysource However it should be noted that this CDS is foundin a cluster of CDSs from genome projects with no exper-imentally confirmed function Again this gene is flankedby other genes that also have been acquired by LGT Thephylogeny of the neighbouring genes ndash ORF30 an S4domain protein suggests that it has been acquired froma γ-proteobacterium The next gene upstream ORF29could not be used in phylogenetic analyses However thisCDS has no match in its close relative T denitrificans andits best match was to a conserved membrane protein fromClostridium tetani (Table S11) Thus it is likely that allthese genes have been acquired by LGT Notably a shortinverted repeat (80 identity) was found to flank thesegenes (34021ndash34040 36693ndash36674)
Few laterally transferred CDSs identified by G + C content
Differences in G + C content are commonly used as anindication of recent LGT (Lawrence and Ochman 1997)We identified only eight CDSs that showed a G + C con-tent 10 higher or lower than the average for the respec-tive fosmid clone (see Tables S1ndash12) ORF20 in the δ-proteobacterial clone b1bcf11h3 has a G + C content of475 compared with 366 for the complete fosmid ThisCDS clusters with Desulfovibrio vulgaris within a mixedclade with no bootstrap support and was not included inthe LGT estimate for this fosmid A very short ORFan(ORF1) in the candidate division OP8 clone b3cf12f09has a G + C content of 436 compared with 594 forthe fosmid clone In addition the transposase (ORF16)and its neighbouring ORFan (ORF17) in the same clonehave a G + C content of 463 and 402 respectivelyORF11 ORF13 and ORF14 in the γ-proteobacterial cloneb3cf12d07 all show higher G + C content than the restof the fosmid with 664 657 and 647 comparedwith 525 for the rest of the fosmid All these CDSs
Fig 5 Maximum Likelihood phylogeny of OmpA homologues esti-mated using PMBML (135 positions in alignment) The sequences were obtained by blasting the b1dcf51c12 ORF7 sequence against Gen-Bank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative We also removed three sequences from Chlamydiaceae as these sequences formed a long unstable branch in the tree as well as some sequences that where considerably shorter than the remaining alignment The tree was arbitrarily rooted by Agrobacterium tumefaciens Results from bootstrap analyses are indicated as in Fig 3
10
Agrobacterium tumefaciens Sinorhizobium meliloti
Brucella melitensis Mesorhizobium loti
Mesorhizobium sp BNC1 Helicobacter bizzozeronii
Bartonella henselae Rhodopseudomonas palustris Bradyrhizobium japonicum
Rhodobacter sphaeroidesSilicibacter sp TM1040
Rhodospirillum rubrum Caulobacter crescentus
Magnetospirillum gryphiswaldense Rickettsia typhi
Rickettsia sibirica Gluconobacter oxydans
Zymomonas mobilis Novosphingobium aromaticivorans
Novosphingobium aromaticivorans Magnetococcus sp MC-1
Myxococcus xanthusXanthomonas campestris
Desulfotalea psychrophila Wolinella succinogenes
Desulfotalea psychrophila Desulfovibrio vulgaris
Geobacter metallireducens Geobacter sulfurreducens
Geobacter metallireducens Geobacter sulfurreducens
Chlorobium tepidum b1bcf11h03ORF12
Bdellovibrio bacteriovorus b1dcf51c12ORF7
Psychrobacter sp 273-4 Acinetobacter sp ADP1
Microbulbifer degradans Pseudomonas syringae Pseudomonas aeruginosa
Rubrivivax gelatinosus Thiobacillus denitrificans Nitrosomonas europaea
Ralstonia solanacearum Ralstonia eutropha
Burkholderia fungorum Burkholderia cepacia
Burkholderia cepacia Burkholderia pseudomallei
Idiomarina loihiensisPhotobacterium profundum
Shewanella oneidensis Vibrio cholerae Vibrio vulnificus Vibrio parahaemolyticus
Haemophilus somnus Haemophilus influenzae
Pasteurella multocida Photorhabdus luminescens Yersinia pseudotuberculosis
Erwinia carotovora Salmonella enterica
Erwinia chrysanthemi
6155
79 61 83
7255
5467
71
52
65
5152
5474
82
52
73
528498 52
508992
8472 54
527383
698372
8783
77 92
52
LGT and phylogenetic assignment of metagenomic clones 2023
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
cluster with γ-proteobacteria and might therefore repre-sent recent within γ-proteobacteria transfers ORF40 inthe isin-proteobacterial clone b1dcf13c08 a short ORFanhas a G + C content of 222 compared with 347 forthe complete clone In addition ORF9 another ORFan inb1dcf13c08 has a marginally lower G + C content com-pared with the rest of the fosmid clone with 257 Simi-larly ORF26 in the Chloroflexi clone b1dcf13f01 has aG + C content of 478 G + C compared with 569 forthe complete fosmid clone
The first protein coding sequences from uncultivated lineages
Four of the fosmids that we sequenced were from uncul-tivated lineages These fosmid clones represent to ourknowledge the first protein coding sequences obtainedfrom these major bacterial lineages In agreement withtheir rRNA phylotype most of the CDSs with homologuesin GenBank are found as independent lineages in phylo-genetic trees (Fig 1 Table 1) These clones also containseveral large CDSs with no significant matches in Gen-Bank or only partial matches to known proteins (Fig 1Table 1) A t-test showed that both the proportion ofORFans (P = 0002) and the proportion of coding bases(P = 002) with no match in GenBank (excluding the envi-ronmental part of GenBank) were significantly higherthan what was observed in fosmid clones from lineagesthat have cultivated representatives
The two candidate division WS3 clones b1bcf11f04and b1dcf51c12 contain several large CDSs for whichwe can make no clear functional prediction or that haveno match in GenBank For instance for b1dcf51c12 halfof the clone is occupied by two CDSs that have no signif-icant matches in GenBank (ORF4) or only a single match(ORF5) Also none of these CDSs had significantmatches to domains in Pfam These CDSs might repre-sent lineage-specific proteins and homologues may beidentified when more sequences from this lineages areavailable The candidate division OP8 also contains anumber of ORFans however in this fosmid the predictedproteins tend to be smaller than what we observed for thetwo WS3 clones
The b1dcf51a06 clone encodes a large ORFan(ORF1) as well as several smaller ORFans (ORF5ORF7ndash9 ORF14) and CDSs with only single hits in Gen-Bank (ORF6 ORF11ndash13) (Fig 1) For ORF1 we canmake some functional prediction based on Pfamsearches This protein contains a nucleoside diphosphatekinases domain a fibronectin type III domain as well asa PBS lyase HEAT-like repeat (three repeat units) ThePBS lyase repeat is responsible for specifically attachingparticular phycobilins to apophycobiliprotein subunits inthe phycobilisomes (PBS) which are light harvesting mac-
romolecular complexes of cyanobacteria and red algae(Zhao et al 2000) The phycobilins are open-chain tet-rapyrrole chromophores which function as the photosyn-thetic light-harvesting pigments Interestingly two otherCDSs ndash ORF15 and ORF16 ndash also contain several PBSrepeats It is possible that the proteins encoded by thePBS-containing CDSs in b1dcf51a06 has a similar func-tion as the PBS lyase proteins in cyanobacteria andthat this fosmid clone originated from a photosyntheticorganism
Among the CDSs that do have matches in GenBank arepotential phylogenetic markers The candidate divisionWS3 clone b1bcf11f04 clone contains two CDSs withsimilarity to DNA polymerase III subunit A homologuesDnaE and the Gram-positive type PolC In phylogenetictrees of both genes the b1bcf11f04 homologue forms aseparate lineage (Fig 6) Conserved domain searches atNCBI showed that the PolC-like CDS shows similarity toonly part of this gene ndash the exonuclease domain ndash and itis fused to DinG that encodes Rad3-related DNA heli-cases Proteins with similar domain architecture are foundin several other bacterial genomes mostly Firmicutes aswell as S thermophilum and Chloroflexus aurantiacussuggesting that the candidate division WS3 might be spe-cifically related to one of these lineages In phylogenetictrees of the DinG domain of these proteins the fusionproteins are all found in the same clade (Fig 6) Howeverthe monophyly of this clade was not supported by boot-strap analyses In the Maximum Likelihood phylogeny theb1bcf11f4 CDS clusters at the bottom of the clade withC aurantiacus No non-fusion proteins are found inthis clade suggesting a single origin of this domainorganization
Summary
Metagenomic approaches play an increasing and highlyvisible role in microbial ecology The data sets they gen-erate are complex and coupling the information they pro-vide concerning the metabolic potential of an environmentto organismal lineage that may be present there remainsa challenge Here we have shown the utility of rRNA-targeted cloning and phylogenetic analysis of CDSs inmaking such a coupling We also show that LGT evenwhen not precluding provisional assignment to lineages(taxonomy) will likely complicate the history of any lin-eage (phylogenetics) making phylotype-ecotype infer-ences provisional Environmental metagenomic data opena window into a rich world of genetic interactions someof which might be partially reconstructed as we havedescribed here The bioinformatic challenges associatedwith a complete metagenomic assessment of an environ-ment as complex as Baltimore harbour sediment aredaunting indeed But progress in understanding our own
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
2016
C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd
Environmental Microbiology
7
2011ndash2026
short to obtain reliable alignments the CDS was found ina lsquomixedrsquo clade also containing genes from the same bac-terial group or the CDS was found outside its group butdid not cluster with any specific lineage For three of theclones more than 30 of the CDSs have been acquiredby LGT (Table 1) two of these are from candidate divi-sions and one is from a
δ
-proteobacterium For all threeof these fosmids there appears to have been a transfer ofa large island of genes from a phylogenetically distantlineage Specifically we infer an
α
-proteobacterial islandin b3cf12f09 a
δ
-proteobacterial island in b1dcf51c12and an archaeal
β
-proteobacterial island in b1bcf11d04(Fig 1) It should be noted that the proportions of foreigngenes identified here might not represent the proportion
of foreign genes in the respective genomes that we havesampled but
rather the amount of LGT to be expectedwhen sequencing environmental fosmid clones
Forinstance in some genomes LGT might be enriched incertain variable parts of the genome Indeed the distribu-tion of proteins that match COG categories was signifi-cantly different (
P
=
13 e-13 in a
χ
2
-test) to what weobserved for the end-sequencing of lsquonormalrsquo fosmidclones (supplemental Fig S1) the main difference beingproportionally more J K U F and H category sequencesin the full fosmid sequences and more L P R and Scategory sequences among the end-sequences Whencomparing the distributions of different COG-groups (ieinformational metabolism etc) however the two datasets were significantly different only when including thepoorly characterized categories (R S) If such genes aremore frequently transferred than the other categoriesthen we would be underestimating the level of LGT thatwould be expected when analysing metagenomic clones
Interestingly in b1bcf11d04 the transfer vector for oneof the acquired gene clusters could be identified ORF6encodes an acetyl transferase gene and ORF8 ORF9and ORF10 encode subunits for an acyl-CoA synthase ndashtwo
α
-subunits and one
β
-subunit Phylogenetic analysessuggested all four CDSs have been acquired by LGTlikely from a
β
-proteobacterium The
β
-proteobacteriahave in turn likely acquired the acyl-CoA synthase genesfrom Archaea (Fig 3) In support of the archaeal origin ofthese genes the acyl-CoA synthase in bcf11d04 hassimilar domain organization to the acetyl-CoA synthase in
Pyrococcus
spp with two subunits (Sanchez
et al
2000)Furthermore these genes have been transferred multipletimes and the transfers involved all three domains of life[Fig 3 (Andersson
et al
2003)] These transferred CDSsare preceded by one integrase gene (ORF3) a trans-posase gene (ORF4) and an intergerasetransposasegene (ORF5 COG2801 Tra5 which contains an inte-gerase core domain Table S7) which probably wereresponsible for transferring this cluster into this genomeThe
α
-proteobacterial island in the b3cf12f09 cloneencodes a wide range of different functions and no typicalmobile elements were identified However as this islandextends to the 3
prime
end of the fosmid mobile genes mightbe found further downstream The first CDS of this islandencodes a DnaJ-class chaperone (ORF29) which is trun-cated at the 5
prime
end This pseudogene still shows 65protein identity to a homologue in
Magnetoospirillummagnetotacticum
(Table S3) Hence this probably repre-sents a very recent transfer (or rearrangement) Anotherpossibility is that this fosmid might be a chimera Howeverthe G
+
C content of the CDSs in the
α
-proteobacterialisland (595 G
+
C) is very similar to the rest of thefosmid (596 G
+
C supplemental Table S3) Also fur-ther upstream there is a proteobacterial transposase
Fig 3
Maximum Likelihood phylogeny of acetyl-CoA synthetase (ADP-forming) homologues estimated using PMBML (459 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF8 and ORF10 sequences against GenBank and the 100 best matches were retrieved and aligned Groups of very similar sequences from the same species or sister species were trimmed down to one sequence representative The tree was arbi-trarily rooted by Entamoeba histolytica Numbers on branches refers to bootstrap support obtained from using PMBML in bold PUZZLEBOOT in plain text and Neighbour-joining in italic If all bootstrap values were above 70 this is indicated by a grey circle while a black circle indicated that all three values were above 80
10
Entamoeba histolytica Parachlamydia sp UWE25
Rubrobacter xylanophilus Gloeobacter violaceus
Nostoc sp PCC 7120Thermosynechococcus elongatus
Dechloromonas aromaticaMesorhizobium sp BNC1
Sinorhizobium melilotiXanthomonas axonopodisRhodopseudomonas palustris
Bradyrhizobium japonicum Desulfovibrio desulfuricans
Rhodospirillum rubrumMagnetospirillum magnetotacticum
Magnetospirillum magnetotacticumShewanella oneidensis
Photobacterium profundumVibrio cholerae
Vibrio vulnificus Photorhabdus luminescens
Yersinia pestis Salmonella enterica
Escherichia coli Methanopyrus kandleri
Pyrococcus furiosus Archaeoglobus fulgidus
Methanococcus maripaludisMethanocaldococcus jannaschii
Magnetococcus sp MC-1 Chloroflexus aurantiacus
Spironucleus barkhanus Giardia intestinalis
Pyrococcus furiosusThermoplasma acidophilum Thermoplasma volcanium
Pyrococcus furiosus Streptomyces avermitilisBradyrhizobium japonicum
Ralstonia metalliduransFerroplasma acidarmanus
Sulfolobus solfataricusSulfolobus tokodaii
Pyrococcus furiosusPyrococcus furiosus
Pyrobaculum aerophilumMethanosarcina mazei Methanosarcina acetivoransThermobifida fusca
Archaeoglobus fulgidusArchaeoglobus fulgidus
Archaeoglobus fulgidusArchaeoglobus fulgidus
b1bcf11d04ORF8b1bcf11d04ORF10
Bordetella bronchiseptica Ralstonia metallidurans
Bordetella pertussis Bordetella bronchiseptica
Burkholderia fungorumBurkholderia fungorumRalstonia eutropha
Bordetella bronchisepticaRalstonia eutropha
Bradyrhizobium japonicumRalstonia eutropha
Burkholderia fungorumBordetella bronchiseptica
Ralstonia eutrophaBordetella bronchiseptica
Bradyrhizobium japonicumBordetella bronchiseptica
Pseudomonas mendocina Bradyrhizobium japonicum
7480
9764
75
52
83
52
57
60
61
70
89
51
64
6262
64
57
58
50
7173
62
100100
LGT and phylogenetic assignment of metagenomic clones 2017
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Tab
le 1
S
umm
ary
of p
hylo
gene
tic a
naly
ses
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
b1dc
f51
a06
No
clea
r af
filia
tion
with
exi
stin
gse
quen
ces
Cou
ld n
ot b
eam
plifi
ed
Mos
t C
DS
s ha
ve n
o or
only
a f
ew s
igni
fican
tm
atch
es in
Gen
Ban
kO
RF
4 cl
uste
rs w
ithLe
ptos
pira
inte
rrog
ans
with
in a
mix
ed c
lade
ho
wev
er
L in
terr
ogan
sha
s se
vera
l par
alog
ues
and
this
gen
e ap
pear
sto
hav
e be
en f
requ
ently
tran
sfer
red
and
islik
ely
to b
e a
tran
sfer
OR
F20
clu
ster
s w
ithM
etha
nosa
rcin
a w
ithin
δ-pr
oteo
bact
eria
O
RF
19cl
uste
rs w
ith G
eoba
cter
but
is m
ostly
foun
d in
met
hano
gens
OR
F17
and
OR
F18
have
hom
olog
ues
inM
etha
noge
ns o
nly
4 C
DS
s (1
9 o
f th
eto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
33
(38
)
b1dc
f13
f01
Clu
ster
s w
ithD
ehal
ococ
coid
eset
heno
gene
sC
hlor
oflex
usau
rant
iacu
s 23
SrR
NA
seq
uenc
eof
too
poo
r qu
ality
to in
clud
e in
the
tree
7 of
10
CD
Ss
(70
) w
ithsu
ppor
ted
phyl
ogen
etic
topo
logi
es a
gree
with
23S
fra
gmen
t In
addi
tion
6 C
DS
s w
hich
only
hit
Chl
orofl
exus
aura
ntia
cus
Two
CD
Ss
have
like
lybe
en a
cqui
red
thro
ugh
LGT
One
clu
ster
s w
ithhi
gh s
uppo
rt w
ithT
herm
otog
a m
ariti
ma
(OR
F16
) an
d on
e cl
uste
rsw
ithin
the
euk
aryo
tes
(OR
F25
)
OR
F2
has
only
sign
ifica
ntho
mol
ogue
s in
Cro
cosp
haer
aw
atso
nii
3 C
DS
s (1
1 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
14
(5
)
b3cf
12
f09
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A g
enes
and
do
not
clus
ter
with
in a
nysp
ecifi
c ba
cter
ial g
roup
Phy
loge
netic
ana
lysi
ssu
gges
ts t
hat
10 C
DS
sha
ve li
kely
bee
n ac
quire
dby
LG
T 8
of
thes
e ha
vebe
en a
cqui
red
from
an
α-pr
oteo
bact
eriu
man
d ar
e fo
und
linke
d
Thr
ee C
DS
s fo
und
linke
d to
CD
Ss
whe
reph
ylog
enet
ic a
naly
ses
sugg
est
LGT
hav
eal
so li
kely
bee
nac
quire
d by
LG
T
13 C
DS
s (3
2 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
OR
F16
is a
tran
spos
ase
of
prot
eoba
cter
ial
orig
in
and
show
slo
wer
GC
con
tent
than
the
res
t of
the
fosm
id T
wel
ve o
fth
e tr
ansf
erre
dC
DS
s (O
RF
29ndash
41)
are
linke
d an
dal
l app
ear
to h
ave
been
acq
uire
dfr
om a
n α-
prot
eoba
cter
ium
22
(9
)
b1bc
f11
f04
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A a
nd d
oes
not
clus
ter
with
any
spe
cific
bact
eria
l lin
eage
A
mon
g th
ese
was
the
high
ly c
onse
rved
Dna
Ege
ne
Two
CD
Ss
(OR
F14
and
OR
F15
) cl
uste
r w
ithse
quen
ces
from
the
Chl
orob
iBac
tero
idet
esgr
oup
2 C
DS
s (9
o
f to
tal)
hav
e b
een
acq
uir
ed b
y L
GT
26
(14
)
2018 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f51c
12C
andi
date
div
isio
nW
S3
bact
eriu
mM
ost
CD
Ss
have
no
oron
ly a
few
sig
nific
ant
mat
ches
in G
enB
ank
OR
F6ndash
OR
F11
are
als
ofo
und
in b
1bcf
11
h3 in
sam
e or
der
and
phyl
ogen
etic
ana
lysi
ssu
ppor
ts t
hat
OR
F7
OR
F8
and
OR
F10
wer
etr
ansf
erre
d fr
om a
δ-
prot
eoba
cter
ium
to
b1bc
f51c
12 O
RF
10 a
ndO
RF
11 a
lso
clus
ter
with
δ-pr
oteo
bact
eria
ho
wev
er
with
no
boot
stra
p su
ppor
t O
RF
9ha
s on
ly o
ne m
atch
inG
enB
ank
OR
F15
(fu
sA)
clus
ters
with
Chl
orob
ium
tepi
dum
with
inF
irm
icut
es
OR
F12
has
no
hom
olog
ue in
b1bc
f11
h3
but
doe
scl
uste
r w
ith δ
-pr
oteo
bact
eria
ho
wev
er w
ith n
obo
otst
rap
supp
ort
It is
like
ly t
hat
also
thi
sC
DS
was
tra
nsfe
rred
as p
art
of w
ith a
δ-
prot
eoba
cter
ial i
slan
d
8 C
DS
s (4
4 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
One
lar
ge lsquoi
slan
drsquo o
fδ-
prot
eoba
cter
ial
orig
in
22
(29
)
b1cf
11
1h0
3δ-
Pro
teob
acte
rium
ndash8
of 1
3 C
DS
s (5
7)
that
give
s su
ppor
ted
phyl
ogen
ies
agre
e w
ithth
e fr
agm
ent
orig
inat
ing
from
a δ
-pr
oteo
bact
eriu
m
Six
CD
Ss
have
like
ly b
een
acqu
ired
by L
GT
OR
F8
clus
ters
with
Clo
strid
ium
ther
moc
ellu
m a
ndTr
epon
ema
dent
icol
aO
RF
18 is
fou
ndse
para
ted
from
oth
erpr
oteo
bact
eria
inph
ylog
enet
ic t
rees
cl
uste
ring
with
Pla
smod
ium
spp
O
RF
23is
fou
nd in
a m
ixed
cla
dean
d ap
pear
s to
hav
ebe
en f
requ
ently
tran
sfer
red
OR
F28
clus
ters
with
β-
prot
eoba
cter
ia
OR
F29
clus
ters
with
γ-
prot
eoba
cter
ia a
ndO
RF
30 is
fou
nd a
tbo
ttom
of
clad
e th
atco
ntai
ns α
-pr
oteo
bact
eria
and
Act
inob
acte
ria
6 C
DS
s (1
7 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
OR
F11
ndashOR
F16
ha
ve b
een
tran
sfer
red
from
an
ance
stor
of
B1B
CF
11
h03
tob1
dcf5
1c
12 a
sw
ell t
o th
eC
hlor
obiu
m li
neag
e
6 (
1)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2019
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f11
d04
δ-P
rote
obac
teriu
mndash
12 o
f 18
CD
Ss
(67
)w
ith s
uppo
rted
phyl
ogen
etic
top
olog
ies
agre
e w
ith a
δ-
prot
eoba
cter
ial o
rigin
of
the
frag
men
t
Six
CD
Ss
are
sugg
este
dby
phy
loge
netic
ana
lyse
sto
hav
e be
en a
cqui
red
byLG
T O
ne o
f th
ese
tran
sfer
red
gene
s ndasht
hefu
sA h
omol
ogue
(OR
F19
) ndash is
als
o fo
und
inb1
bcf5
c12
Thi
s C
DS
has
been
tra
nsfe
rred
to
othe
r δ-
prot
eoba
cter
ia a
sw
ell
Thr
ee C
DS
s (O
RF
3ndash5)
that
enc
ode
anin
tege
rase
and
tw
otr
ansp
osas
es t
hat
prec
edes
fou
r of
the
LGT
gen
es d
etec
ted
in t
he p
hylo
gene
tican
alys
is
OR
F7
also
likel
y tr
ansf
erre
d w
ithO
RF
3 ndashO
RF
10
OR
F20
and
OR
F21
have
mai
nly
hom
olog
ues
inF
irm
icut
es a
nd is
the
neig
hbou
r of
OR
F19
that
has
als
o be
enac
quire
d fr
omF
irm
icut
es
12 C
DS
s (3
1 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
Inte
rest
ingl
y th
isfo
smid
clo
nepr
ovid
es t
hetr
ansf
er v
ecto
r ndash
the
inte
gera
se a
ndtr
ansp
osas
e ndash
for
8of
the
tra
nsfe
rred
gene
s
ndash
b1bc
f13
c08
ε-P
rote
obac
teriu
m
mos
t cl
osel
yre
late
d to
Cam
pylo
bact
erje
juni
21 C
DS
s gi
ve s
uppo
rted
phyl
ogen
ies
and
ofth
ese
19 (
90
) ag
ree
with
rR
NA
OR
F4
clus
ters
with
Geo
bact
er a
ndC
lost
ridiu
m
and
OR
F23
does
not
hav
eho
mol
ogue
s in
ε-
prot
eoba
cter
ia a
ndcl
uste
rs w
ith γ
- an
d β-
prot
eoba
cter
ia
OR
F24
doe
s no
t gi
ve a
supp
orte
d tr
ee b
utha
s al
so p
roba
bly
been
tra
nsfe
rred
fro
mγ-
or
β-pr
oteo
bact
eria
3 C
DS
s (7
o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
10
(3
)
b3cf
12
d07
γ-P
rote
obac
teriu
m
Clu
ster
s w
ithin
the
γ-pr
oteo
bact
eria
inLo
gDet
dis
tanc
etr
ees
but
at t
heba
se o
f γ-
prot
eoba
cter
ia a
ndβ-
prot
eoba
cter
iain
the
bes
tm
axim
umlik
elih
ood
tree
Onl
y 7
CD
Ss
give
su
ppor
ted
phyl
ogen
ies
O
f th
ese
4 (5
7)
agre
e w
ith r
RN
A
OR
F7
clus
ter
with
in β
-pr
oteo
bact
eria
OR
F15
ha
s a
patc
hy d
istr
ibut
ion
and
does
not
clu
ster
with
ot
her
prot
eoba
cter
ia in
th
e ph
ylog
enet
ic t
ree
Sev
eral
add
ition
al C
DS
s (O
RF
16ndashO
RF
25)
that
did
not
prod
uce
wel
l-re
solv
ed t
rees
ha
d on
ly d
iver
gent
hom
olog
ues
inG
enB
ank
or
nosi
gnifi
cant
hom
olog
ues
may
also
hav
e be
enac
quire
d by
LG
T I
nsu
ppor
t of
thi
sO
RF
26 e
ncod
es a
tran
spos
ase
2 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
O
RF
16 ndash
OR
F25
w
as n
ot in
clud
ed in
es
timat
e du
e to
lim
ited
evid
ence
for
th
e tr
ansf
er o
f the
se
23
(23
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
2020 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f1c
04γ-
Pro
teob
acte
rium
ndash14
CD
Ss
give
sup
port
edph
ylog
enie
s an
d of
thes
e 13
(93
)
agre
ew
ith r
RN
A
Phy
loge
netic
ana
lyse
ssh
ow t
hat
two
CD
Ss
have
bee
n ac
quire
d by
LGT
OR
F3
is f
ound
in a
mix
ed c
lade
whi
leO
RF
30 c
lust
er w
ithin
β-
prot
eoba
cter
ia
Thr
ee g
enes
tha
t sh
owun
cong
ruen
tph
ylog
enie
s b
utw
ith lo
w b
oots
trap
supp
ort
foun
d cl
ose
to O
RF
3 an
d O
RF
34ha
ve p
roba
bly
also
been
acq
uire
d by
LGT
O
RF
5 cl
uste
rsw
ith β
-pro
teob
acte
ria
OR
F31
clu
ster
s w
ithδ-
prot
eoba
cter
ia
and
OR
F32
(G
ST
) cl
uste
rsw
ith a
γ-pr
oteo
bact
eriu
m
but
appe
ars
toha
ve b
een
freq
uent
lytr
ansf
erre
d
5 C
DS
s (1
3 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
3 (
1)
b1bf
11
a01
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns (
98
iden
tity
at 2
3S
rRN
A)
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns(9
8 id
entit
yat
16S
rR
NA
)
Hig
h de
gree
of
gene
sy
nten
y co
mpa
red
with
Thi
obac
illus
de
nitr
ifica
ns
29 C
DS
sha
ve b
est
BLA
ST
mat
chin
Thi
obac
illus
de
nitr
ifica
ns 2
7 of
28
CD
Ss
(96
) th
at g
ive
stat
istic
ally
sup
port
edph
ylog
enie
s ag
ree
with
rR
NA
gen
es
One
OR
F30
(R
suA
)cl
uste
r w
ith γ
-pr
oteo
bact
eria
and
has
no
hom
olog
ue in
T
hiob
acill
us d
enitr
ifica
ns
Two
CD
Ss
(OR
F14
and
O
RF
31)
have
bee
n tr
ansf
erre
d to
bot
h fo
smid
an
d T
hiob
acill
us
deni
trifi
cans
OR
F29
has
no
sign
ifica
nt
hom
olog
ues
inpr
oteo
bact
eria
4 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
3 (
2)
b1bf
110
d03
ndashA
Fla
voba
cter
iace
aeba
cter
ium
am
ong
sequ
ence
dge
nom
es m
ost
clos
ely
rela
ted
toC
ytop
haga
hutc
hins
onii
16 o
f 18
(84
) C
DS
s w
ith
supp
orte
d ph
ylog
enet
icto
polo
gies
agr
ee w
ith16
S f
ragm
ent
OR
F5
and
OR
F10
hav
e no
cl
ose
hom
olog
ues
in
othe
r B
acte
roid
es a
ndph
ylog
enet
ic a
naly
sis
sugg
ests
fre
quen
ttr
ansf
er
OR
F4
has
no d
etec
tabl
eho
mol
ogue
s in
oth
er
Bac
tero
ides
A
tran
spos
on w
ith 8
C
DS
s lik
ely
acqu
ired
from
rel
ativ
e of
Bac
tero
ides
thet
aiot
aoim
icro
n
3 C
DS
s (1
0 o
f to
tal)
h
ave
likel
y b
een
acq
uir
ed b
y L
GT
The
tra
nspo
son
not
incl
uded
as
it ha
sbe
en t
rans
ferr
edw
ithin
the
B
acte
roid
es
10
(3
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
a O
nly
LGT
eve
nts
invo
lvin
g th
e C
DS
fro
m t
he fo
smid
clo
ne a
naly
sed
was
cou
nted
and
onl
y w
hen
they
wer
e su
ppor
ted
by p
hylo
gene
tic a
naly
ses
or c
lear
phy
loge
netic
dis
trib
utio
n pa
ttern
s (i
e
the
gene
is n
ot p
rese
nt in
its
rRN
A g
roup
but
pre
sent
in s
ome
othe
r di
stin
ct b
acte
rial g
roup
) N
umbe
r of
CD
Ss
acqu
ired
by L
GT
is s
how
n in
bol
db
O
RFa
ns w
here
cla
ssifi
ed a
s C
DS
s w
ith n
o si
gnifi
cant
mat
ch in
Gen
Ban
k M
atch
es t
o se
quen
ces
in t
he e
nviro
nmen
tal p
ortio
n of
Gen
Ban
k w
ere
not
cons
ider
ed I
n pa
rent
hesi
s is
giv
en t
he
prop
ortio
n of
pro
tein
cod
ing
DN
A t
hat
has
no m
atch
in G
enB
ank
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2021
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
(ORF16) showing that this lineage has indeed acquiredproteobacterial genes This CDS might have been part ofthe α-proteobacterial island upon transfer
In the Flavobacteriaceae fosmid b1bf11d10 a largeself-transmitting conjugative transposon was identified(Fig 1) This transposon is inserted next to a tRNA and issimilar in sequence and structure to the transposonsfound in Bacteroides thetaiotaomicron (Xu et al 2003)Bacteroides fragilis (Kuwahara et al 2004) and Porphy-romonas gingivalis (Nelson et al 2003) In the phyloge-netic tree of the transposase gene (ORF21) the CDSfrom the fosmid falls into a cluster containing numerousB thetaiotaomicron sequences separated from the singleCytophaga hutchinsonii homologue detected among the100 best BLAST hits For the other CDSs that are clearlypart of this transposon (ORF22ndashORF27) we found no
significant homologues in C hutchinsonii and the best(and in most cases the only) match was always to Bthetaiotaomicron and P gingivalis genes suggesting thatthis transposon has been acquired from the Bacteroidaleslineage It is likely that we have captured only part of thistransposon ndash because many of the CDSs found in thetransposons in B thetaiotaomicron are not present in thefragment we have sequenced ndash and that also the 3prime CDSsin this fosmid clone (ORF28ndashORF30) were transferredalong with this transposon Additional CDSs (possibly notinvolved in transposon function) where also present in theB thetaiotaomicron transposons (Xu et al 2003) Wenote that the acquisition of this transposon was notincluded in our LGT estimate as it originated from thesame major bacterial group as the fosmid clone
Interestingly one gene was found to have been trans-ferred to two of the fosmids the fusA paralogue inb1bcf11d04 and b1dcf51c12 (Figs 1 and 4) This pro-tein appears to be a distant paralogue of fusA and it hasa very patchy phylogenetic distribution suggesting that itoriginated in one of the lineages that possesses it andthen has been transferred to the other lineages Onecharacteristic common to the organisms encoding thisprotein is that they are all anaerobes or microaerophilic(Symbiobacterium thermophilum) and they are all foundin environments similar to the one sampled here Trans-ferred genes are likely to give a selective advantage in theenvironment where the organisms harbouring them liveand an ecological function for this fusA paralogue shouldbe sought
Another set of genes identified in two of the fosmidclones forms a cluster encoding outer membrane proteinsand proteins involved in biopolymer transport (OmpATolB TonB ExbD TolQ) This cluster is found in both thecandidate division WS3 clone b1dcf51c12 and the δ-proteobacterial clone b1bcf11h03 (Fig 1) In this casethe gene cluster appears to have been transferred from aδ-proteobacterium to b1dcf51c12 while it might be nativeto b1bcf11h03 (Fig 5) This gene cluster also appearsto have been transferred to Chlorobium tepidum as bothb1dcf51c12 and C tepidum cluster within the δ-proteo-bacteria for all these genes except TonB (from which wecould not make a reliable alignment) Robust phylogenieswere only obtained from OmpA and TolB However theconserved gene order in b1dcf51c12 C tepidumb1bcf11h03 and other δ-proteobacteria such as Geo-bacter suggests that this entire 4-kb fragment was trans-ferred from a δ-proteobacterium to C tepidum andb1dcf51c12 probably as two separate events Moreoverfor b1dcf51c12 the fusA paralogue discussed abovemay have been transferred as part of this gene cluster asthey are found close together in this clone The second δ-proteobacterial fosmid clone b1bcf11d04 also containsan OmpA homologue However this CDS is distantly
Fig 4 Maximum Likelihood phylogeny fusA homologues estimated using PMBML (661 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF19 and b1dcf51c12 ORF15 sequences against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbitrarily rooted by Aquifex aeolicus Results from bootstrap analyses are indicated as in Fig 3
10
Aquifex aeolicus Thermotoga maritima
Chlorobium tepidum b1dcf51c12ORF15
b1bcf11d04ORF19Desulfovibrio vulgaris
Desulfotalea psychrophila Magnetococcus sp MC-1
Geobacter sulfurreducens Geobacter metallireducens
Moorella thermoacetica Desulfitobacterium hafniense
Symbiobacterium thermophilum Chloroflexus aurantiacus
Dehalococcoides ethenogenesThermoanaerobacter tengcongensis
Clostridium thermocellumFusobacterium nucleatum
Clostridium perfringensClostridium tetani
Thermus thermophilus Rubrobacter xylanophilus
Mycoplasma penetransUreaplasma parvum
Geobacillus stearothermophilusExiguobacterium sp 255-15
Bacillus cereus Bacillus halodurans
Listeria monocytogenes Bacillus subtilis
Oceanobacillus iheyensis Staphylococcus aureus
Lactobacillus johnsonii Pediococcus pentosaceusLactobacillus plantarum
Enterococcus faecalisLactococcus lactis
Streptococcus mutans Streptococcus agalactiae
Moorella thermoacetica Symbiobacterium thermophilum
Thermoanaerobacter tengcongensis Clostridium thermocellum
Clostridium acetobutylicumClostridium perfringens
Clostridium tetani Chlorobium tepidum
Fusobacterium nucleatumThermobifida fusca
Desulfovibrio desulfuricansMagnetococcus sp MC-1
Geobacter sulfurreducensSynechococcus elongatus
Prochlorococcus marinus Synechococcus sp WH 8102
Thermosynechococcus elongatus Nostoc punctiforme
Synechocystis sp PCC 6803 Trichodesmium erythraeum
Spirulina platensis Campylobacter jejuni Helicobacter pylori Wolinella succinogenes
Legionella pneumophilaMethylococcus capsulatus
Coxiella burnetii Photorhabdus luminescens
Pasteurella multocida Shewanella oneidensis Photobacterium profundum Vibrio parahaemolyticusNeisseria meningitidis
Chromobacterium violaceum Bordetella parapertussis
Ralstonia metallidurans Bordetella bronchiseptica Burkholderia pseudomalleiRalstonia metallidurans
Azoarcus sp EbN1 Dechloromonas aromatica
Nitrosomonas europaea Thiobacillus denitrificans
66
57 65 55
61
5160
9072
80
86
88
6090
63
50 52 75 74
9094
50 68 74
78
53
7985
8481
72
53 9968
7790
70
2022 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
related to the OmpA found in this gene cluster and wasnot included in the alignment
We also identified some mobile genes that might beinvolved in biodegradation of pollutants by searching thePfam database In one of the γ-proteobacterial fosmidsb1bcf11c4 we identified a glutathione-S-transferase(GST ORF36) gene that was flanked by an acetyltrans-ferase gene (ORF35) and a transporter (ORF34) Eukary-otic GSTs are important in detoxifying metabolism Wellcharacterized bacterial GSTs (such as dichloromethanedehalogenase and 12-dichloroepoxyethane epoxidase)on the other hand are catabolic enzymes that play anessential role in growth on various difficult-to-degradechemicals (Vuilleumier and Pagni 2002) Considering theenvironment the fosmid originated from ndash highly pollutedmarine sediments ndash these CDSs would be good candi-dates for genes involved in biodegradation of a xenbiotic
compound The b1bf11c4 GST-gene clusters with a γ-proteobacterium (Acinetobacter sp ADP1 Accession noYP_046221) However as observed by Vuilleumier andPagni (2002) the phylogeny suggests that this gene hasbeen frequently transferred In support of this CDS havingbeen acquired by LGT its neighbour ndash ORF34 ndash clustersrobustly within the β-proteobacteria while ORF35 clusterswith δ-proteobacteria (although with no bootstrapsupport)
Another gene that might be involved in biodegradationof pollutants was identified among the CDSs that havebeen transferred into the β-proteobacterial fosmidb1bf11a01 ndash ORF31 which encodes a dienelactonehydrolases Dienelactone hydrolases play a crucial role inchlorocatechol degradation via the modified ortho cleav-age pathway (Eulberg et al 1998 Muller et al 2004)suggesting that the bacterium from which this fragmentoriginated might use chloraromatic compounds as energysource However it should be noted that this CDS is foundin a cluster of CDSs from genome projects with no exper-imentally confirmed function Again this gene is flankedby other genes that also have been acquired by LGT Thephylogeny of the neighbouring genes ndash ORF30 an S4domain protein suggests that it has been acquired froma γ-proteobacterium The next gene upstream ORF29could not be used in phylogenetic analyses However thisCDS has no match in its close relative T denitrificans andits best match was to a conserved membrane protein fromClostridium tetani (Table S11) Thus it is likely that allthese genes have been acquired by LGT Notably a shortinverted repeat (80 identity) was found to flank thesegenes (34021ndash34040 36693ndash36674)
Few laterally transferred CDSs identified by G + C content
Differences in G + C content are commonly used as anindication of recent LGT (Lawrence and Ochman 1997)We identified only eight CDSs that showed a G + C con-tent 10 higher or lower than the average for the respec-tive fosmid clone (see Tables S1ndash12) ORF20 in the δ-proteobacterial clone b1bcf11h3 has a G + C content of475 compared with 366 for the complete fosmid ThisCDS clusters with Desulfovibrio vulgaris within a mixedclade with no bootstrap support and was not included inthe LGT estimate for this fosmid A very short ORFan(ORF1) in the candidate division OP8 clone b3cf12f09has a G + C content of 436 compared with 594 forthe fosmid clone In addition the transposase (ORF16)and its neighbouring ORFan (ORF17) in the same clonehave a G + C content of 463 and 402 respectivelyORF11 ORF13 and ORF14 in the γ-proteobacterial cloneb3cf12d07 all show higher G + C content than the restof the fosmid with 664 657 and 647 comparedwith 525 for the rest of the fosmid All these CDSs
Fig 5 Maximum Likelihood phylogeny of OmpA homologues esti-mated using PMBML (135 positions in alignment) The sequences were obtained by blasting the b1dcf51c12 ORF7 sequence against Gen-Bank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative We also removed three sequences from Chlamydiaceae as these sequences formed a long unstable branch in the tree as well as some sequences that where considerably shorter than the remaining alignment The tree was arbitrarily rooted by Agrobacterium tumefaciens Results from bootstrap analyses are indicated as in Fig 3
10
Agrobacterium tumefaciens Sinorhizobium meliloti
Brucella melitensis Mesorhizobium loti
Mesorhizobium sp BNC1 Helicobacter bizzozeronii
Bartonella henselae Rhodopseudomonas palustris Bradyrhizobium japonicum
Rhodobacter sphaeroidesSilicibacter sp TM1040
Rhodospirillum rubrum Caulobacter crescentus
Magnetospirillum gryphiswaldense Rickettsia typhi
Rickettsia sibirica Gluconobacter oxydans
Zymomonas mobilis Novosphingobium aromaticivorans
Novosphingobium aromaticivorans Magnetococcus sp MC-1
Myxococcus xanthusXanthomonas campestris
Desulfotalea psychrophila Wolinella succinogenes
Desulfotalea psychrophila Desulfovibrio vulgaris
Geobacter metallireducens Geobacter sulfurreducens
Geobacter metallireducens Geobacter sulfurreducens
Chlorobium tepidum b1bcf11h03ORF12
Bdellovibrio bacteriovorus b1dcf51c12ORF7
Psychrobacter sp 273-4 Acinetobacter sp ADP1
Microbulbifer degradans Pseudomonas syringae Pseudomonas aeruginosa
Rubrivivax gelatinosus Thiobacillus denitrificans Nitrosomonas europaea
Ralstonia solanacearum Ralstonia eutropha
Burkholderia fungorum Burkholderia cepacia
Burkholderia cepacia Burkholderia pseudomallei
Idiomarina loihiensisPhotobacterium profundum
Shewanella oneidensis Vibrio cholerae Vibrio vulnificus Vibrio parahaemolyticus
Haemophilus somnus Haemophilus influenzae
Pasteurella multocida Photorhabdus luminescens Yersinia pseudotuberculosis
Erwinia carotovora Salmonella enterica
Erwinia chrysanthemi
6155
79 61 83
7255
5467
71
52
65
5152
5474
82
52
73
528498 52
508992
8472 54
527383
698372
8783
77 92
52
LGT and phylogenetic assignment of metagenomic clones 2023
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
cluster with γ-proteobacteria and might therefore repre-sent recent within γ-proteobacteria transfers ORF40 inthe isin-proteobacterial clone b1dcf13c08 a short ORFanhas a G + C content of 222 compared with 347 forthe complete clone In addition ORF9 another ORFan inb1dcf13c08 has a marginally lower G + C content com-pared with the rest of the fosmid clone with 257 Simi-larly ORF26 in the Chloroflexi clone b1dcf13f01 has aG + C content of 478 G + C compared with 569 forthe complete fosmid clone
The first protein coding sequences from uncultivated lineages
Four of the fosmids that we sequenced were from uncul-tivated lineages These fosmid clones represent to ourknowledge the first protein coding sequences obtainedfrom these major bacterial lineages In agreement withtheir rRNA phylotype most of the CDSs with homologuesin GenBank are found as independent lineages in phylo-genetic trees (Fig 1 Table 1) These clones also containseveral large CDSs with no significant matches in Gen-Bank or only partial matches to known proteins (Fig 1Table 1) A t-test showed that both the proportion ofORFans (P = 0002) and the proportion of coding bases(P = 002) with no match in GenBank (excluding the envi-ronmental part of GenBank) were significantly higherthan what was observed in fosmid clones from lineagesthat have cultivated representatives
The two candidate division WS3 clones b1bcf11f04and b1dcf51c12 contain several large CDSs for whichwe can make no clear functional prediction or that haveno match in GenBank For instance for b1dcf51c12 halfof the clone is occupied by two CDSs that have no signif-icant matches in GenBank (ORF4) or only a single match(ORF5) Also none of these CDSs had significantmatches to domains in Pfam These CDSs might repre-sent lineage-specific proteins and homologues may beidentified when more sequences from this lineages areavailable The candidate division OP8 also contains anumber of ORFans however in this fosmid the predictedproteins tend to be smaller than what we observed for thetwo WS3 clones
The b1dcf51a06 clone encodes a large ORFan(ORF1) as well as several smaller ORFans (ORF5ORF7ndash9 ORF14) and CDSs with only single hits in Gen-Bank (ORF6 ORF11ndash13) (Fig 1) For ORF1 we canmake some functional prediction based on Pfamsearches This protein contains a nucleoside diphosphatekinases domain a fibronectin type III domain as well asa PBS lyase HEAT-like repeat (three repeat units) ThePBS lyase repeat is responsible for specifically attachingparticular phycobilins to apophycobiliprotein subunits inthe phycobilisomes (PBS) which are light harvesting mac-
romolecular complexes of cyanobacteria and red algae(Zhao et al 2000) The phycobilins are open-chain tet-rapyrrole chromophores which function as the photosyn-thetic light-harvesting pigments Interestingly two otherCDSs ndash ORF15 and ORF16 ndash also contain several PBSrepeats It is possible that the proteins encoded by thePBS-containing CDSs in b1dcf51a06 has a similar func-tion as the PBS lyase proteins in cyanobacteria andthat this fosmid clone originated from a photosyntheticorganism
Among the CDSs that do have matches in GenBank arepotential phylogenetic markers The candidate divisionWS3 clone b1bcf11f04 clone contains two CDSs withsimilarity to DNA polymerase III subunit A homologuesDnaE and the Gram-positive type PolC In phylogenetictrees of both genes the b1bcf11f04 homologue forms aseparate lineage (Fig 6) Conserved domain searches atNCBI showed that the PolC-like CDS shows similarity toonly part of this gene ndash the exonuclease domain ndash and itis fused to DinG that encodes Rad3-related DNA heli-cases Proteins with similar domain architecture are foundin several other bacterial genomes mostly Firmicutes aswell as S thermophilum and Chloroflexus aurantiacussuggesting that the candidate division WS3 might be spe-cifically related to one of these lineages In phylogenetictrees of the DinG domain of these proteins the fusionproteins are all found in the same clade (Fig 6) Howeverthe monophyly of this clade was not supported by boot-strap analyses In the Maximum Likelihood phylogeny theb1bcf11f4 CDS clusters at the bottom of the clade withC aurantiacus No non-fusion proteins are found inthis clade suggesting a single origin of this domainorganization
Summary
Metagenomic approaches play an increasing and highlyvisible role in microbial ecology The data sets they gen-erate are complex and coupling the information they pro-vide concerning the metabolic potential of an environmentto organismal lineage that may be present there remainsa challenge Here we have shown the utility of rRNA-targeted cloning and phylogenetic analysis of CDSs inmaking such a coupling We also show that LGT evenwhen not precluding provisional assignment to lineages(taxonomy) will likely complicate the history of any lin-eage (phylogenetics) making phylotype-ecotype infer-ences provisional Environmental metagenomic data opena window into a rich world of genetic interactions someof which might be partially reconstructed as we havedescribed here The bioinformatic challenges associatedwith a complete metagenomic assessment of an environ-ment as complex as Baltimore harbour sediment aredaunting indeed But progress in understanding our own
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
LGT and phylogenetic assignment of metagenomic clones 2017
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Tab
le 1
S
umm
ary
of p
hylo
gene
tic a
naly
ses
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
b1dc
f51
a06
No
clea
r af
filia
tion
with
exi
stin
gse
quen
ces
Cou
ld n
ot b
eam
plifi
ed
Mos
t C
DS
s ha
ve n
o or
only
a f
ew s
igni
fican
tm
atch
es in
Gen
Ban
kO
RF
4 cl
uste
rs w
ithLe
ptos
pira
inte
rrog
ans
with
in a
mix
ed c
lade
ho
wev
er
L in
terr
ogan
sha
s se
vera
l par
alog
ues
and
this
gen
e ap
pear
sto
hav
e be
en f
requ
ently
tran
sfer
red
and
islik
ely
to b
e a
tran
sfer
OR
F20
clu
ster
s w
ithM
etha
nosa
rcin
a w
ithin
δ-pr
oteo
bact
eria
O
RF
19cl
uste
rs w
ith G
eoba
cter
but
is m
ostly
foun
d in
met
hano
gens
OR
F17
and
OR
F18
have
hom
olog
ues
inM
etha
noge
ns o
nly
4 C
DS
s (1
9 o
f th
eto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
33
(38
)
b1dc
f13
f01
Clu
ster
s w
ithD
ehal
ococ
coid
eset
heno
gene
sC
hlor
oflex
usau
rant
iacu
s 23
SrR
NA
seq
uenc
eof
too
poo
r qu
ality
to in
clud
e in
the
tree
7 of
10
CD
Ss
(70
) w
ithsu
ppor
ted
phyl
ogen
etic
topo
logi
es a
gree
with
23S
fra
gmen
t In
addi
tion
6 C
DS
s w
hich
only
hit
Chl
orofl
exus
aura
ntia
cus
Two
CD
Ss
have
like
lybe
en a
cqui
red
thro
ugh
LGT
One
clu
ster
s w
ithhi
gh s
uppo
rt w
ithT
herm
otog
a m
ariti
ma
(OR
F16
) an
d on
e cl
uste
rsw
ithin
the
euk
aryo
tes
(OR
F25
)
OR
F2
has
only
sign
ifica
ntho
mol
ogue
s in
Cro
cosp
haer
aw
atso
nii
3 C
DS
s (1
1 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
14
(5
)
b3cf
12
f09
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Can
dida
te d
ivis
ion
OP
8 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A g
enes
and
do
not
clus
ter
with
in a
nysp
ecifi
c ba
cter
ial g
roup
Phy
loge
netic
ana
lysi
ssu
gges
ts t
hat
10 C
DS
sha
ve li
kely
bee
n ac
quire
dby
LG
T 8
of
thes
e ha
vebe
en a
cqui
red
from
an
α-pr
oteo
bact
eriu
man
d ar
e fo
und
linke
d
Thr
ee C
DS
s fo
und
linke
d to
CD
Ss
whe
reph
ylog
enet
ic a
naly
ses
sugg
est
LGT
hav
eal
so li
kely
bee
nac
quire
d by
LG
T
13 C
DS
s (3
2 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
OR
F16
is a
tran
spos
ase
of
prot
eoba
cter
ial
orig
in
and
show
slo
wer
GC
con
tent
than
the
res
t of
the
fosm
id T
wel
ve o
fth
e tr
ansf
erre
dC
DS
s (O
RF
29ndash
41)
are
linke
d an
dal
l app
ear
to h
ave
been
acq
uire
dfr
om a
n α-
prot
eoba
cter
ium
22
(9
)
b1bc
f11
f04
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Can
dida
te d
ivis
ion
WS
3 ba
cter
ium
Mos
t C
DS
s ag
ree
with
the
rRN
A a
nd d
oes
not
clus
ter
with
any
spe
cific
bact
eria
l lin
eage
A
mon
g th
ese
was
the
high
ly c
onse
rved
Dna
Ege
ne
Two
CD
Ss
(OR
F14
and
OR
F15
) cl
uste
r w
ithse
quen
ces
from
the
Chl
orob
iBac
tero
idet
esgr
oup
2 C
DS
s (9
o
f to
tal)
hav
e b
een
acq
uir
ed b
y L
GT
26
(14
)
2018 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f51c
12C
andi
date
div
isio
nW
S3
bact
eriu
mM
ost
CD
Ss
have
no
oron
ly a
few
sig
nific
ant
mat
ches
in G
enB
ank
OR
F6ndash
OR
F11
are
als
ofo
und
in b
1bcf
11
h3 in
sam
e or
der
and
phyl
ogen
etic
ana
lysi
ssu
ppor
ts t
hat
OR
F7
OR
F8
and
OR
F10
wer
etr
ansf
erre
d fr
om a
δ-
prot
eoba
cter
ium
to
b1bc
f51c
12 O
RF
10 a
ndO
RF
11 a
lso
clus
ter
with
δ-pr
oteo
bact
eria
ho
wev
er
with
no
boot
stra
p su
ppor
t O
RF
9ha
s on
ly o
ne m
atch
inG
enB
ank
OR
F15
(fu
sA)
clus
ters
with
Chl
orob
ium
tepi
dum
with
inF
irm
icut
es
OR
F12
has
no
hom
olog
ue in
b1bc
f11
h3
but
doe
scl
uste
r w
ith δ
-pr
oteo
bact
eria
ho
wev
er w
ith n
obo
otst
rap
supp
ort
It is
like
ly t
hat
also
thi
sC
DS
was
tra
nsfe
rred
as p
art
of w
ith a
δ-
prot
eoba
cter
ial i
slan
d
8 C
DS
s (4
4 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
One
lar
ge lsquoi
slan
drsquo o
fδ-
prot
eoba
cter
ial
orig
in
22
(29
)
b1cf
11
1h0
3δ-
Pro
teob
acte
rium
ndash8
of 1
3 C
DS
s (5
7)
that
give
s su
ppor
ted
phyl
ogen
ies
agre
e w
ithth
e fr
agm
ent
orig
inat
ing
from
a δ
-pr
oteo
bact
eriu
m
Six
CD
Ss
have
like
ly b
een
acqu
ired
by L
GT
OR
F8
clus
ters
with
Clo
strid
ium
ther
moc
ellu
m a
ndTr
epon
ema
dent
icol
aO
RF
18 is
fou
ndse
para
ted
from
oth
erpr
oteo
bact
eria
inph
ylog
enet
ic t
rees
cl
uste
ring
with
Pla
smod
ium
spp
O
RF
23is
fou
nd in
a m
ixed
cla
dean
d ap
pear
s to
hav
ebe
en f
requ
ently
tran
sfer
red
OR
F28
clus
ters
with
β-
prot
eoba
cter
ia
OR
F29
clus
ters
with
γ-
prot
eoba
cter
ia a
ndO
RF
30 is
fou
nd a
tbo
ttom
of
clad
e th
atco
ntai
ns α
-pr
oteo
bact
eria
and
Act
inob
acte
ria
6 C
DS
s (1
7 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
OR
F11
ndashOR
F16
ha
ve b
een
tran
sfer
red
from
an
ance
stor
of
B1B
CF
11
h03
tob1
dcf5
1c
12 a
sw
ell t
o th
eC
hlor
obiu
m li
neag
e
6 (
1)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2019
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f11
d04
δ-P
rote
obac
teriu
mndash
12 o
f 18
CD
Ss
(67
)w
ith s
uppo
rted
phyl
ogen
etic
top
olog
ies
agre
e w
ith a
δ-
prot
eoba
cter
ial o
rigin
of
the
frag
men
t
Six
CD
Ss
are
sugg
este
dby
phy
loge
netic
ana
lyse
sto
hav
e be
en a
cqui
red
byLG
T O
ne o
f th
ese
tran
sfer
red
gene
s ndasht
hefu
sA h
omol
ogue
(OR
F19
) ndash is
als
o fo
und
inb1
bcf5
c12
Thi
s C
DS
has
been
tra
nsfe
rred
to
othe
r δ-
prot
eoba
cter
ia a
sw
ell
Thr
ee C
DS
s (O
RF
3ndash5)
that
enc
ode
anin
tege
rase
and
tw
otr
ansp
osas
es t
hat
prec
edes
fou
r of
the
LGT
gen
es d
etec
ted
in t
he p
hylo
gene
tican
alys
is
OR
F7
also
likel
y tr
ansf
erre
d w
ithO
RF
3 ndashO
RF
10
OR
F20
and
OR
F21
have
mai
nly
hom
olog
ues
inF
irm
icut
es a
nd is
the
neig
hbou
r of
OR
F19
that
has
als
o be
enac
quire
d fr
omF
irm
icut
es
12 C
DS
s (3
1 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
Inte
rest
ingl
y th
isfo
smid
clo
nepr
ovid
es t
hetr
ansf
er v
ecto
r ndash
the
inte
gera
se a
ndtr
ansp
osas
e ndash
for
8of
the
tra
nsfe
rred
gene
s
ndash
b1bc
f13
c08
ε-P
rote
obac
teriu
m
mos
t cl
osel
yre
late
d to
Cam
pylo
bact
erje
juni
21 C
DS
s gi
ve s
uppo
rted
phyl
ogen
ies
and
ofth
ese
19 (
90
) ag
ree
with
rR
NA
OR
F4
clus
ters
with
Geo
bact
er a
ndC
lost
ridiu
m
and
OR
F23
does
not
hav
eho
mol
ogue
s in
ε-
prot
eoba
cter
ia a
ndcl
uste
rs w
ith γ
- an
d β-
prot
eoba
cter
ia
OR
F24
doe
s no
t gi
ve a
supp
orte
d tr
ee b
utha
s al
so p
roba
bly
been
tra
nsfe
rred
fro
mγ-
or
β-pr
oteo
bact
eria
3 C
DS
s (7
o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
10
(3
)
b3cf
12
d07
γ-P
rote
obac
teriu
m
Clu
ster
s w
ithin
the
γ-pr
oteo
bact
eria
inLo
gDet
dis
tanc
etr
ees
but
at t
heba
se o
f γ-
prot
eoba
cter
ia a
ndβ-
prot
eoba
cter
iain
the
bes
tm
axim
umlik
elih
ood
tree
Onl
y 7
CD
Ss
give
su
ppor
ted
phyl
ogen
ies
O
f th
ese
4 (5
7)
agre
e w
ith r
RN
A
OR
F7
clus
ter
with
in β
-pr
oteo
bact
eria
OR
F15
ha
s a
patc
hy d
istr
ibut
ion
and
does
not
clu
ster
with
ot
her
prot
eoba
cter
ia in
th
e ph
ylog
enet
ic t
ree
Sev
eral
add
ition
al C
DS
s (O
RF
16ndashO
RF
25)
that
did
not
prod
uce
wel
l-re
solv
ed t
rees
ha
d on
ly d
iver
gent
hom
olog
ues
inG
enB
ank
or
nosi
gnifi
cant
hom
olog
ues
may
also
hav
e be
enac
quire
d by
LG
T I
nsu
ppor
t of
thi
sO
RF
26 e
ncod
es a
tran
spos
ase
2 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
O
RF
16 ndash
OR
F25
w
as n
ot in
clud
ed in
es
timat
e du
e to
lim
ited
evid
ence
for
th
e tr
ansf
er o
f the
se
23
(23
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
2020 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f1c
04γ-
Pro
teob
acte
rium
ndash14
CD
Ss
give
sup
port
edph
ylog
enie
s an
d of
thes
e 13
(93
)
agre
ew
ith r
RN
A
Phy
loge
netic
ana
lyse
ssh
ow t
hat
two
CD
Ss
have
bee
n ac
quire
d by
LGT
OR
F3
is f
ound
in a
mix
ed c
lade
whi
leO
RF
30 c
lust
er w
ithin
β-
prot
eoba
cter
ia
Thr
ee g
enes
tha
t sh
owun
cong
ruen
tph
ylog
enie
s b
utw
ith lo
w b
oots
trap
supp
ort
foun
d cl
ose
to O
RF
3 an
d O
RF
34ha
ve p
roba
bly
also
been
acq
uire
d by
LGT
O
RF
5 cl
uste
rsw
ith β
-pro
teob
acte
ria
OR
F31
clu
ster
s w
ithδ-
prot
eoba
cter
ia
and
OR
F32
(G
ST
) cl
uste
rsw
ith a
γ-pr
oteo
bact
eriu
m
but
appe
ars
toha
ve b
een
freq
uent
lytr
ansf
erre
d
5 C
DS
s (1
3 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
3 (
1)
b1bf
11
a01
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns (
98
iden
tity
at 2
3S
rRN
A)
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns(9
8 id
entit
yat
16S
rR
NA
)
Hig
h de
gree
of
gene
sy
nten
y co
mpa
red
with
Thi
obac
illus
de
nitr
ifica
ns
29 C
DS
sha
ve b
est
BLA
ST
mat
chin
Thi
obac
illus
de
nitr
ifica
ns 2
7 of
28
CD
Ss
(96
) th
at g
ive
stat
istic
ally
sup
port
edph
ylog
enie
s ag
ree
with
rR
NA
gen
es
One
OR
F30
(R
suA
)cl
uste
r w
ith γ
-pr
oteo
bact
eria
and
has
no
hom
olog
ue in
T
hiob
acill
us d
enitr
ifica
ns
Two
CD
Ss
(OR
F14
and
O
RF
31)
have
bee
n tr
ansf
erre
d to
bot
h fo
smid
an
d T
hiob
acill
us
deni
trifi
cans
OR
F29
has
no
sign
ifica
nt
hom
olog
ues
inpr
oteo
bact
eria
4 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
3 (
2)
b1bf
110
d03
ndashA
Fla
voba
cter
iace
aeba
cter
ium
am
ong
sequ
ence
dge
nom
es m
ost
clos
ely
rela
ted
toC
ytop
haga
hutc
hins
onii
16 o
f 18
(84
) C
DS
s w
ith
supp
orte
d ph
ylog
enet
icto
polo
gies
agr
ee w
ith16
S f
ragm
ent
OR
F5
and
OR
F10
hav
e no
cl
ose
hom
olog
ues
in
othe
r B
acte
roid
es a
ndph
ylog
enet
ic a
naly
sis
sugg
ests
fre
quen
ttr
ansf
er
OR
F4
has
no d
etec
tabl
eho
mol
ogue
s in
oth
er
Bac
tero
ides
A
tran
spos
on w
ith 8
C
DS
s lik
ely
acqu
ired
from
rel
ativ
e of
Bac
tero
ides
thet
aiot
aoim
icro
n
3 C
DS
s (1
0 o
f to
tal)
h
ave
likel
y b
een
acq
uir
ed b
y L
GT
The
tra
nspo
son
not
incl
uded
as
it ha
sbe
en t
rans
ferr
edw
ithin
the
B
acte
roid
es
10
(3
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
a O
nly
LGT
eve
nts
invo
lvin
g th
e C
DS
fro
m t
he fo
smid
clo
ne a
naly
sed
was
cou
nted
and
onl
y w
hen
they
wer
e su
ppor
ted
by p
hylo
gene
tic a
naly
ses
or c
lear
phy
loge
netic
dis
trib
utio
n pa
ttern
s (i
e
the
gene
is n
ot p
rese
nt in
its
rRN
A g
roup
but
pre
sent
in s
ome
othe
r di
stin
ct b
acte
rial g
roup
) N
umbe
r of
CD
Ss
acqu
ired
by L
GT
is s
how
n in
bol
db
O
RFa
ns w
here
cla
ssifi
ed a
s C
DS
s w
ith n
o si
gnifi
cant
mat
ch in
Gen
Ban
k M
atch
es t
o se
quen
ces
in t
he e
nviro
nmen
tal p
ortio
n of
Gen
Ban
k w
ere
not
cons
ider
ed I
n pa
rent
hesi
s is
giv
en t
he
prop
ortio
n of
pro
tein
cod
ing
DN
A t
hat
has
no m
atch
in G
enB
ank
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2021
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
(ORF16) showing that this lineage has indeed acquiredproteobacterial genes This CDS might have been part ofthe α-proteobacterial island upon transfer
In the Flavobacteriaceae fosmid b1bf11d10 a largeself-transmitting conjugative transposon was identified(Fig 1) This transposon is inserted next to a tRNA and issimilar in sequence and structure to the transposonsfound in Bacteroides thetaiotaomicron (Xu et al 2003)Bacteroides fragilis (Kuwahara et al 2004) and Porphy-romonas gingivalis (Nelson et al 2003) In the phyloge-netic tree of the transposase gene (ORF21) the CDSfrom the fosmid falls into a cluster containing numerousB thetaiotaomicron sequences separated from the singleCytophaga hutchinsonii homologue detected among the100 best BLAST hits For the other CDSs that are clearlypart of this transposon (ORF22ndashORF27) we found no
significant homologues in C hutchinsonii and the best(and in most cases the only) match was always to Bthetaiotaomicron and P gingivalis genes suggesting thatthis transposon has been acquired from the Bacteroidaleslineage It is likely that we have captured only part of thistransposon ndash because many of the CDSs found in thetransposons in B thetaiotaomicron are not present in thefragment we have sequenced ndash and that also the 3prime CDSsin this fosmid clone (ORF28ndashORF30) were transferredalong with this transposon Additional CDSs (possibly notinvolved in transposon function) where also present in theB thetaiotaomicron transposons (Xu et al 2003) Wenote that the acquisition of this transposon was notincluded in our LGT estimate as it originated from thesame major bacterial group as the fosmid clone
Interestingly one gene was found to have been trans-ferred to two of the fosmids the fusA paralogue inb1bcf11d04 and b1dcf51c12 (Figs 1 and 4) This pro-tein appears to be a distant paralogue of fusA and it hasa very patchy phylogenetic distribution suggesting that itoriginated in one of the lineages that possesses it andthen has been transferred to the other lineages Onecharacteristic common to the organisms encoding thisprotein is that they are all anaerobes or microaerophilic(Symbiobacterium thermophilum) and they are all foundin environments similar to the one sampled here Trans-ferred genes are likely to give a selective advantage in theenvironment where the organisms harbouring them liveand an ecological function for this fusA paralogue shouldbe sought
Another set of genes identified in two of the fosmidclones forms a cluster encoding outer membrane proteinsand proteins involved in biopolymer transport (OmpATolB TonB ExbD TolQ) This cluster is found in both thecandidate division WS3 clone b1dcf51c12 and the δ-proteobacterial clone b1bcf11h03 (Fig 1) In this casethe gene cluster appears to have been transferred from aδ-proteobacterium to b1dcf51c12 while it might be nativeto b1bcf11h03 (Fig 5) This gene cluster also appearsto have been transferred to Chlorobium tepidum as bothb1dcf51c12 and C tepidum cluster within the δ-proteo-bacteria for all these genes except TonB (from which wecould not make a reliable alignment) Robust phylogenieswere only obtained from OmpA and TolB However theconserved gene order in b1dcf51c12 C tepidumb1bcf11h03 and other δ-proteobacteria such as Geo-bacter suggests that this entire 4-kb fragment was trans-ferred from a δ-proteobacterium to C tepidum andb1dcf51c12 probably as two separate events Moreoverfor b1dcf51c12 the fusA paralogue discussed abovemay have been transferred as part of this gene cluster asthey are found close together in this clone The second δ-proteobacterial fosmid clone b1bcf11d04 also containsan OmpA homologue However this CDS is distantly
Fig 4 Maximum Likelihood phylogeny fusA homologues estimated using PMBML (661 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF19 and b1dcf51c12 ORF15 sequences against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbitrarily rooted by Aquifex aeolicus Results from bootstrap analyses are indicated as in Fig 3
10
Aquifex aeolicus Thermotoga maritima
Chlorobium tepidum b1dcf51c12ORF15
b1bcf11d04ORF19Desulfovibrio vulgaris
Desulfotalea psychrophila Magnetococcus sp MC-1
Geobacter sulfurreducens Geobacter metallireducens
Moorella thermoacetica Desulfitobacterium hafniense
Symbiobacterium thermophilum Chloroflexus aurantiacus
Dehalococcoides ethenogenesThermoanaerobacter tengcongensis
Clostridium thermocellumFusobacterium nucleatum
Clostridium perfringensClostridium tetani
Thermus thermophilus Rubrobacter xylanophilus
Mycoplasma penetransUreaplasma parvum
Geobacillus stearothermophilusExiguobacterium sp 255-15
Bacillus cereus Bacillus halodurans
Listeria monocytogenes Bacillus subtilis
Oceanobacillus iheyensis Staphylococcus aureus
Lactobacillus johnsonii Pediococcus pentosaceusLactobacillus plantarum
Enterococcus faecalisLactococcus lactis
Streptococcus mutans Streptococcus agalactiae
Moorella thermoacetica Symbiobacterium thermophilum
Thermoanaerobacter tengcongensis Clostridium thermocellum
Clostridium acetobutylicumClostridium perfringens
Clostridium tetani Chlorobium tepidum
Fusobacterium nucleatumThermobifida fusca
Desulfovibrio desulfuricansMagnetococcus sp MC-1
Geobacter sulfurreducensSynechococcus elongatus
Prochlorococcus marinus Synechococcus sp WH 8102
Thermosynechococcus elongatus Nostoc punctiforme
Synechocystis sp PCC 6803 Trichodesmium erythraeum
Spirulina platensis Campylobacter jejuni Helicobacter pylori Wolinella succinogenes
Legionella pneumophilaMethylococcus capsulatus
Coxiella burnetii Photorhabdus luminescens
Pasteurella multocida Shewanella oneidensis Photobacterium profundum Vibrio parahaemolyticusNeisseria meningitidis
Chromobacterium violaceum Bordetella parapertussis
Ralstonia metallidurans Bordetella bronchiseptica Burkholderia pseudomalleiRalstonia metallidurans
Azoarcus sp EbN1 Dechloromonas aromatica
Nitrosomonas europaea Thiobacillus denitrificans
66
57 65 55
61
5160
9072
80
86
88
6090
63
50 52 75 74
9094
50 68 74
78
53
7985
8481
72
53 9968
7790
70
2022 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
related to the OmpA found in this gene cluster and wasnot included in the alignment
We also identified some mobile genes that might beinvolved in biodegradation of pollutants by searching thePfam database In one of the γ-proteobacterial fosmidsb1bcf11c4 we identified a glutathione-S-transferase(GST ORF36) gene that was flanked by an acetyltrans-ferase gene (ORF35) and a transporter (ORF34) Eukary-otic GSTs are important in detoxifying metabolism Wellcharacterized bacterial GSTs (such as dichloromethanedehalogenase and 12-dichloroepoxyethane epoxidase)on the other hand are catabolic enzymes that play anessential role in growth on various difficult-to-degradechemicals (Vuilleumier and Pagni 2002) Considering theenvironment the fosmid originated from ndash highly pollutedmarine sediments ndash these CDSs would be good candi-dates for genes involved in biodegradation of a xenbiotic
compound The b1bf11c4 GST-gene clusters with a γ-proteobacterium (Acinetobacter sp ADP1 Accession noYP_046221) However as observed by Vuilleumier andPagni (2002) the phylogeny suggests that this gene hasbeen frequently transferred In support of this CDS havingbeen acquired by LGT its neighbour ndash ORF34 ndash clustersrobustly within the β-proteobacteria while ORF35 clusterswith δ-proteobacteria (although with no bootstrapsupport)
Another gene that might be involved in biodegradationof pollutants was identified among the CDSs that havebeen transferred into the β-proteobacterial fosmidb1bf11a01 ndash ORF31 which encodes a dienelactonehydrolases Dienelactone hydrolases play a crucial role inchlorocatechol degradation via the modified ortho cleav-age pathway (Eulberg et al 1998 Muller et al 2004)suggesting that the bacterium from which this fragmentoriginated might use chloraromatic compounds as energysource However it should be noted that this CDS is foundin a cluster of CDSs from genome projects with no exper-imentally confirmed function Again this gene is flankedby other genes that also have been acquired by LGT Thephylogeny of the neighbouring genes ndash ORF30 an S4domain protein suggests that it has been acquired froma γ-proteobacterium The next gene upstream ORF29could not be used in phylogenetic analyses However thisCDS has no match in its close relative T denitrificans andits best match was to a conserved membrane protein fromClostridium tetani (Table S11) Thus it is likely that allthese genes have been acquired by LGT Notably a shortinverted repeat (80 identity) was found to flank thesegenes (34021ndash34040 36693ndash36674)
Few laterally transferred CDSs identified by G + C content
Differences in G + C content are commonly used as anindication of recent LGT (Lawrence and Ochman 1997)We identified only eight CDSs that showed a G + C con-tent 10 higher or lower than the average for the respec-tive fosmid clone (see Tables S1ndash12) ORF20 in the δ-proteobacterial clone b1bcf11h3 has a G + C content of475 compared with 366 for the complete fosmid ThisCDS clusters with Desulfovibrio vulgaris within a mixedclade with no bootstrap support and was not included inthe LGT estimate for this fosmid A very short ORFan(ORF1) in the candidate division OP8 clone b3cf12f09has a G + C content of 436 compared with 594 forthe fosmid clone In addition the transposase (ORF16)and its neighbouring ORFan (ORF17) in the same clonehave a G + C content of 463 and 402 respectivelyORF11 ORF13 and ORF14 in the γ-proteobacterial cloneb3cf12d07 all show higher G + C content than the restof the fosmid with 664 657 and 647 comparedwith 525 for the rest of the fosmid All these CDSs
Fig 5 Maximum Likelihood phylogeny of OmpA homologues esti-mated using PMBML (135 positions in alignment) The sequences were obtained by blasting the b1dcf51c12 ORF7 sequence against Gen-Bank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative We also removed three sequences from Chlamydiaceae as these sequences formed a long unstable branch in the tree as well as some sequences that where considerably shorter than the remaining alignment The tree was arbitrarily rooted by Agrobacterium tumefaciens Results from bootstrap analyses are indicated as in Fig 3
10
Agrobacterium tumefaciens Sinorhizobium meliloti
Brucella melitensis Mesorhizobium loti
Mesorhizobium sp BNC1 Helicobacter bizzozeronii
Bartonella henselae Rhodopseudomonas palustris Bradyrhizobium japonicum
Rhodobacter sphaeroidesSilicibacter sp TM1040
Rhodospirillum rubrum Caulobacter crescentus
Magnetospirillum gryphiswaldense Rickettsia typhi
Rickettsia sibirica Gluconobacter oxydans
Zymomonas mobilis Novosphingobium aromaticivorans
Novosphingobium aromaticivorans Magnetococcus sp MC-1
Myxococcus xanthusXanthomonas campestris
Desulfotalea psychrophila Wolinella succinogenes
Desulfotalea psychrophila Desulfovibrio vulgaris
Geobacter metallireducens Geobacter sulfurreducens
Geobacter metallireducens Geobacter sulfurreducens
Chlorobium tepidum b1bcf11h03ORF12
Bdellovibrio bacteriovorus b1dcf51c12ORF7
Psychrobacter sp 273-4 Acinetobacter sp ADP1
Microbulbifer degradans Pseudomonas syringae Pseudomonas aeruginosa
Rubrivivax gelatinosus Thiobacillus denitrificans Nitrosomonas europaea
Ralstonia solanacearum Ralstonia eutropha
Burkholderia fungorum Burkholderia cepacia
Burkholderia cepacia Burkholderia pseudomallei
Idiomarina loihiensisPhotobacterium profundum
Shewanella oneidensis Vibrio cholerae Vibrio vulnificus Vibrio parahaemolyticus
Haemophilus somnus Haemophilus influenzae
Pasteurella multocida Photorhabdus luminescens Yersinia pseudotuberculosis
Erwinia carotovora Salmonella enterica
Erwinia chrysanthemi
6155
79 61 83
7255
5467
71
52
65
5152
5474
82
52
73
528498 52
508992
8472 54
527383
698372
8783
77 92
52
LGT and phylogenetic assignment of metagenomic clones 2023
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
cluster with γ-proteobacteria and might therefore repre-sent recent within γ-proteobacteria transfers ORF40 inthe isin-proteobacterial clone b1dcf13c08 a short ORFanhas a G + C content of 222 compared with 347 forthe complete clone In addition ORF9 another ORFan inb1dcf13c08 has a marginally lower G + C content com-pared with the rest of the fosmid clone with 257 Simi-larly ORF26 in the Chloroflexi clone b1dcf13f01 has aG + C content of 478 G + C compared with 569 forthe complete fosmid clone
The first protein coding sequences from uncultivated lineages
Four of the fosmids that we sequenced were from uncul-tivated lineages These fosmid clones represent to ourknowledge the first protein coding sequences obtainedfrom these major bacterial lineages In agreement withtheir rRNA phylotype most of the CDSs with homologuesin GenBank are found as independent lineages in phylo-genetic trees (Fig 1 Table 1) These clones also containseveral large CDSs with no significant matches in Gen-Bank or only partial matches to known proteins (Fig 1Table 1) A t-test showed that both the proportion ofORFans (P = 0002) and the proportion of coding bases(P = 002) with no match in GenBank (excluding the envi-ronmental part of GenBank) were significantly higherthan what was observed in fosmid clones from lineagesthat have cultivated representatives
The two candidate division WS3 clones b1bcf11f04and b1dcf51c12 contain several large CDSs for whichwe can make no clear functional prediction or that haveno match in GenBank For instance for b1dcf51c12 halfof the clone is occupied by two CDSs that have no signif-icant matches in GenBank (ORF4) or only a single match(ORF5) Also none of these CDSs had significantmatches to domains in Pfam These CDSs might repre-sent lineage-specific proteins and homologues may beidentified when more sequences from this lineages areavailable The candidate division OP8 also contains anumber of ORFans however in this fosmid the predictedproteins tend to be smaller than what we observed for thetwo WS3 clones
The b1dcf51a06 clone encodes a large ORFan(ORF1) as well as several smaller ORFans (ORF5ORF7ndash9 ORF14) and CDSs with only single hits in Gen-Bank (ORF6 ORF11ndash13) (Fig 1) For ORF1 we canmake some functional prediction based on Pfamsearches This protein contains a nucleoside diphosphatekinases domain a fibronectin type III domain as well asa PBS lyase HEAT-like repeat (three repeat units) ThePBS lyase repeat is responsible for specifically attachingparticular phycobilins to apophycobiliprotein subunits inthe phycobilisomes (PBS) which are light harvesting mac-
romolecular complexes of cyanobacteria and red algae(Zhao et al 2000) The phycobilins are open-chain tet-rapyrrole chromophores which function as the photosyn-thetic light-harvesting pigments Interestingly two otherCDSs ndash ORF15 and ORF16 ndash also contain several PBSrepeats It is possible that the proteins encoded by thePBS-containing CDSs in b1dcf51a06 has a similar func-tion as the PBS lyase proteins in cyanobacteria andthat this fosmid clone originated from a photosyntheticorganism
Among the CDSs that do have matches in GenBank arepotential phylogenetic markers The candidate divisionWS3 clone b1bcf11f04 clone contains two CDSs withsimilarity to DNA polymerase III subunit A homologuesDnaE and the Gram-positive type PolC In phylogenetictrees of both genes the b1bcf11f04 homologue forms aseparate lineage (Fig 6) Conserved domain searches atNCBI showed that the PolC-like CDS shows similarity toonly part of this gene ndash the exonuclease domain ndash and itis fused to DinG that encodes Rad3-related DNA heli-cases Proteins with similar domain architecture are foundin several other bacterial genomes mostly Firmicutes aswell as S thermophilum and Chloroflexus aurantiacussuggesting that the candidate division WS3 might be spe-cifically related to one of these lineages In phylogenetictrees of the DinG domain of these proteins the fusionproteins are all found in the same clade (Fig 6) Howeverthe monophyly of this clade was not supported by boot-strap analyses In the Maximum Likelihood phylogeny theb1bcf11f4 CDS clusters at the bottom of the clade withC aurantiacus No non-fusion proteins are found inthis clade suggesting a single origin of this domainorganization
Summary
Metagenomic approaches play an increasing and highlyvisible role in microbial ecology The data sets they gen-erate are complex and coupling the information they pro-vide concerning the metabolic potential of an environmentto organismal lineage that may be present there remainsa challenge Here we have shown the utility of rRNA-targeted cloning and phylogenetic analysis of CDSs inmaking such a coupling We also show that LGT evenwhen not precluding provisional assignment to lineages(taxonomy) will likely complicate the history of any lin-eage (phylogenetics) making phylotype-ecotype infer-ences provisional Environmental metagenomic data opena window into a rich world of genetic interactions someof which might be partially reconstructed as we havedescribed here The bioinformatic challenges associatedwith a complete metagenomic assessment of an environ-ment as complex as Baltimore harbour sediment aredaunting indeed But progress in understanding our own
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
2018 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f51c
12C
andi
date
div
isio
nW
S3
bact
eriu
mM
ost
CD
Ss
have
no
oron
ly a
few
sig
nific
ant
mat
ches
in G
enB
ank
OR
F6ndash
OR
F11
are
als
ofo
und
in b
1bcf
11
h3 in
sam
e or
der
and
phyl
ogen
etic
ana
lysi
ssu
ppor
ts t
hat
OR
F7
OR
F8
and
OR
F10
wer
etr
ansf
erre
d fr
om a
δ-
prot
eoba
cter
ium
to
b1bc
f51c
12 O
RF
10 a
ndO
RF
11 a
lso
clus
ter
with
δ-pr
oteo
bact
eria
ho
wev
er
with
no
boot
stra
p su
ppor
t O
RF
9ha
s on
ly o
ne m
atch
inG
enB
ank
OR
F15
(fu
sA)
clus
ters
with
Chl
orob
ium
tepi
dum
with
inF
irm
icut
es
OR
F12
has
no
hom
olog
ue in
b1bc
f11
h3
but
doe
scl
uste
r w
ith δ
-pr
oteo
bact
eria
ho
wev
er w
ith n
obo
otst
rap
supp
ort
It is
like
ly t
hat
also
thi
sC
DS
was
tra
nsfe
rred
as p
art
of w
ith a
δ-
prot
eoba
cter
ial i
slan
d
8 C
DS
s (4
4 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
One
lar
ge lsquoi
slan
drsquo o
fδ-
prot
eoba
cter
ial
orig
in
22
(29
)
b1cf
11
1h0
3δ-
Pro
teob
acte
rium
ndash8
of 1
3 C
DS
s (5
7)
that
give
s su
ppor
ted
phyl
ogen
ies
agre
e w
ithth
e fr
agm
ent
orig
inat
ing
from
a δ
-pr
oteo
bact
eriu
m
Six
CD
Ss
have
like
ly b
een
acqu
ired
by L
GT
OR
F8
clus
ters
with
Clo
strid
ium
ther
moc
ellu
m a
ndTr
epon
ema
dent
icol
aO
RF
18 is
fou
ndse
para
ted
from
oth
erpr
oteo
bact
eria
inph
ylog
enet
ic t
rees
cl
uste
ring
with
Pla
smod
ium
spp
O
RF
23is
fou
nd in
a m
ixed
cla
dean
d ap
pear
s to
hav
ebe
en f
requ
ently
tran
sfer
red
OR
F28
clus
ters
with
β-
prot
eoba
cter
ia
OR
F29
clus
ters
with
γ-
prot
eoba
cter
ia a
ndO
RF
30 is
fou
nd a
tbo
ttom
of
clad
e th
atco
ntai
ns α
-pr
oteo
bact
eria
and
Act
inob
acte
ria
6 C
DS
s (1
7 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
OR
F11
ndashOR
F16
ha
ve b
een
tran
sfer
red
from
an
ance
stor
of
B1B
CF
11
h03
tob1
dcf5
1c
12 a
sw
ell t
o th
eC
hlor
obiu
m li
neag
e
6 (
1)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2019
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f11
d04
δ-P
rote
obac
teriu
mndash
12 o
f 18
CD
Ss
(67
)w
ith s
uppo
rted
phyl
ogen
etic
top
olog
ies
agre
e w
ith a
δ-
prot
eoba
cter
ial o
rigin
of
the
frag
men
t
Six
CD
Ss
are
sugg
este
dby
phy
loge
netic
ana
lyse
sto
hav
e be
en a
cqui
red
byLG
T O
ne o
f th
ese
tran
sfer
red
gene
s ndasht
hefu
sA h
omol
ogue
(OR
F19
) ndash is
als
o fo
und
inb1
bcf5
c12
Thi
s C
DS
has
been
tra
nsfe
rred
to
othe
r δ-
prot
eoba
cter
ia a
sw
ell
Thr
ee C
DS
s (O
RF
3ndash5)
that
enc
ode
anin
tege
rase
and
tw
otr
ansp
osas
es t
hat
prec
edes
fou
r of
the
LGT
gen
es d
etec
ted
in t
he p
hylo
gene
tican
alys
is
OR
F7
also
likel
y tr
ansf
erre
d w
ithO
RF
3 ndashO
RF
10
OR
F20
and
OR
F21
have
mai
nly
hom
olog
ues
inF
irm
icut
es a
nd is
the
neig
hbou
r of
OR
F19
that
has
als
o be
enac
quire
d fr
omF
irm
icut
es
12 C
DS
s (3
1 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
Inte
rest
ingl
y th
isfo
smid
clo
nepr
ovid
es t
hetr
ansf
er v
ecto
r ndash
the
inte
gera
se a
ndtr
ansp
osas
e ndash
for
8of
the
tra
nsfe
rred
gene
s
ndash
b1bc
f13
c08
ε-P
rote
obac
teriu
m
mos
t cl
osel
yre
late
d to
Cam
pylo
bact
erje
juni
21 C
DS
s gi
ve s
uppo
rted
phyl
ogen
ies
and
ofth
ese
19 (
90
) ag
ree
with
rR
NA
OR
F4
clus
ters
with
Geo
bact
er a
ndC
lost
ridiu
m
and
OR
F23
does
not
hav
eho
mol
ogue
s in
ε-
prot
eoba
cter
ia a
ndcl
uste
rs w
ith γ
- an
d β-
prot
eoba
cter
ia
OR
F24
doe
s no
t gi
ve a
supp
orte
d tr
ee b
utha
s al
so p
roba
bly
been
tra
nsfe
rred
fro
mγ-
or
β-pr
oteo
bact
eria
3 C
DS
s (7
o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
10
(3
)
b3cf
12
d07
γ-P
rote
obac
teriu
m
Clu
ster
s w
ithin
the
γ-pr
oteo
bact
eria
inLo
gDet
dis
tanc
etr
ees
but
at t
heba
se o
f γ-
prot
eoba
cter
ia a
ndβ-
prot
eoba
cter
iain
the
bes
tm
axim
umlik
elih
ood
tree
Onl
y 7
CD
Ss
give
su
ppor
ted
phyl
ogen
ies
O
f th
ese
4 (5
7)
agre
e w
ith r
RN
A
OR
F7
clus
ter
with
in β
-pr
oteo
bact
eria
OR
F15
ha
s a
patc
hy d
istr
ibut
ion
and
does
not
clu
ster
with
ot
her
prot
eoba
cter
ia in
th
e ph
ylog
enet
ic t
ree
Sev
eral
add
ition
al C
DS
s (O
RF
16ndashO
RF
25)
that
did
not
prod
uce
wel
l-re
solv
ed t
rees
ha
d on
ly d
iver
gent
hom
olog
ues
inG
enB
ank
or
nosi
gnifi
cant
hom
olog
ues
may
also
hav
e be
enac
quire
d by
LG
T I
nsu
ppor
t of
thi
sO
RF
26 e
ncod
es a
tran
spos
ase
2 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
O
RF
16 ndash
OR
F25
w
as n
ot in
clud
ed in
es
timat
e du
e to
lim
ited
evid
ence
for
th
e tr
ansf
er o
f the
se
23
(23
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
2020 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f1c
04γ-
Pro
teob
acte
rium
ndash14
CD
Ss
give
sup
port
edph
ylog
enie
s an
d of
thes
e 13
(93
)
agre
ew
ith r
RN
A
Phy
loge
netic
ana
lyse
ssh
ow t
hat
two
CD
Ss
have
bee
n ac
quire
d by
LGT
OR
F3
is f
ound
in a
mix
ed c
lade
whi
leO
RF
30 c
lust
er w
ithin
β-
prot
eoba
cter
ia
Thr
ee g
enes
tha
t sh
owun
cong
ruen
tph
ylog
enie
s b
utw
ith lo
w b
oots
trap
supp
ort
foun
d cl
ose
to O
RF
3 an
d O
RF
34ha
ve p
roba
bly
also
been
acq
uire
d by
LGT
O
RF
5 cl
uste
rsw
ith β
-pro
teob
acte
ria
OR
F31
clu
ster
s w
ithδ-
prot
eoba
cter
ia
and
OR
F32
(G
ST
) cl
uste
rsw
ith a
γ-pr
oteo
bact
eriu
m
but
appe
ars
toha
ve b
een
freq
uent
lytr
ansf
erre
d
5 C
DS
s (1
3 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
3 (
1)
b1bf
11
a01
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns (
98
iden
tity
at 2
3S
rRN
A)
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns(9
8 id
entit
yat
16S
rR
NA
)
Hig
h de
gree
of
gene
sy
nten
y co
mpa
red
with
Thi
obac
illus
de
nitr
ifica
ns
29 C
DS
sha
ve b
est
BLA
ST
mat
chin
Thi
obac
illus
de
nitr
ifica
ns 2
7 of
28
CD
Ss
(96
) th
at g
ive
stat
istic
ally
sup
port
edph
ylog
enie
s ag
ree
with
rR
NA
gen
es
One
OR
F30
(R
suA
)cl
uste
r w
ith γ
-pr
oteo
bact
eria
and
has
no
hom
olog
ue in
T
hiob
acill
us d
enitr
ifica
ns
Two
CD
Ss
(OR
F14
and
O
RF
31)
have
bee
n tr
ansf
erre
d to
bot
h fo
smid
an
d T
hiob
acill
us
deni
trifi
cans
OR
F29
has
no
sign
ifica
nt
hom
olog
ues
inpr
oteo
bact
eria
4 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
3 (
2)
b1bf
110
d03
ndashA
Fla
voba
cter
iace
aeba
cter
ium
am
ong
sequ
ence
dge
nom
es m
ost
clos
ely
rela
ted
toC
ytop
haga
hutc
hins
onii
16 o
f 18
(84
) C
DS
s w
ith
supp
orte
d ph
ylog
enet
icto
polo
gies
agr
ee w
ith16
S f
ragm
ent
OR
F5
and
OR
F10
hav
e no
cl
ose
hom
olog
ues
in
othe
r B
acte
roid
es a
ndph
ylog
enet
ic a
naly
sis
sugg
ests
fre
quen
ttr
ansf
er
OR
F4
has
no d
etec
tabl
eho
mol
ogue
s in
oth
er
Bac
tero
ides
A
tran
spos
on w
ith 8
C
DS
s lik
ely
acqu
ired
from
rel
ativ
e of
Bac
tero
ides
thet
aiot
aoim
icro
n
3 C
DS
s (1
0 o
f to
tal)
h
ave
likel
y b
een
acq
uir
ed b
y L
GT
The
tra
nspo
son
not
incl
uded
as
it ha
sbe
en t
rans
ferr
edw
ithin
the
B
acte
roid
es
10
(3
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
a O
nly
LGT
eve
nts
invo
lvin
g th
e C
DS
fro
m t
he fo
smid
clo
ne a
naly
sed
was
cou
nted
and
onl
y w
hen
they
wer
e su
ppor
ted
by p
hylo
gene
tic a
naly
ses
or c
lear
phy
loge
netic
dis
trib
utio
n pa
ttern
s (i
e
the
gene
is n
ot p
rese
nt in
its
rRN
A g
roup
but
pre
sent
in s
ome
othe
r di
stin
ct b
acte
rial g
roup
) N
umbe
r of
CD
Ss
acqu
ired
by L
GT
is s
how
n in
bol
db
O
RFa
ns w
here
cla
ssifi
ed a
s C
DS
s w
ith n
o si
gnifi
cant
mat
ch in
Gen
Ban
k M
atch
es t
o se
quen
ces
in t
he e
nviro
nmen
tal p
ortio
n of
Gen
Ban
k w
ere
not
cons
ider
ed I
n pa
rent
hesi
s is
giv
en t
he
prop
ortio
n of
pro
tein
cod
ing
DN
A t
hat
has
no m
atch
in G
enB
ank
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2021
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
(ORF16) showing that this lineage has indeed acquiredproteobacterial genes This CDS might have been part ofthe α-proteobacterial island upon transfer
In the Flavobacteriaceae fosmid b1bf11d10 a largeself-transmitting conjugative transposon was identified(Fig 1) This transposon is inserted next to a tRNA and issimilar in sequence and structure to the transposonsfound in Bacteroides thetaiotaomicron (Xu et al 2003)Bacteroides fragilis (Kuwahara et al 2004) and Porphy-romonas gingivalis (Nelson et al 2003) In the phyloge-netic tree of the transposase gene (ORF21) the CDSfrom the fosmid falls into a cluster containing numerousB thetaiotaomicron sequences separated from the singleCytophaga hutchinsonii homologue detected among the100 best BLAST hits For the other CDSs that are clearlypart of this transposon (ORF22ndashORF27) we found no
significant homologues in C hutchinsonii and the best(and in most cases the only) match was always to Bthetaiotaomicron and P gingivalis genes suggesting thatthis transposon has been acquired from the Bacteroidaleslineage It is likely that we have captured only part of thistransposon ndash because many of the CDSs found in thetransposons in B thetaiotaomicron are not present in thefragment we have sequenced ndash and that also the 3prime CDSsin this fosmid clone (ORF28ndashORF30) were transferredalong with this transposon Additional CDSs (possibly notinvolved in transposon function) where also present in theB thetaiotaomicron transposons (Xu et al 2003) Wenote that the acquisition of this transposon was notincluded in our LGT estimate as it originated from thesame major bacterial group as the fosmid clone
Interestingly one gene was found to have been trans-ferred to two of the fosmids the fusA paralogue inb1bcf11d04 and b1dcf51c12 (Figs 1 and 4) This pro-tein appears to be a distant paralogue of fusA and it hasa very patchy phylogenetic distribution suggesting that itoriginated in one of the lineages that possesses it andthen has been transferred to the other lineages Onecharacteristic common to the organisms encoding thisprotein is that they are all anaerobes or microaerophilic(Symbiobacterium thermophilum) and they are all foundin environments similar to the one sampled here Trans-ferred genes are likely to give a selective advantage in theenvironment where the organisms harbouring them liveand an ecological function for this fusA paralogue shouldbe sought
Another set of genes identified in two of the fosmidclones forms a cluster encoding outer membrane proteinsand proteins involved in biopolymer transport (OmpATolB TonB ExbD TolQ) This cluster is found in both thecandidate division WS3 clone b1dcf51c12 and the δ-proteobacterial clone b1bcf11h03 (Fig 1) In this casethe gene cluster appears to have been transferred from aδ-proteobacterium to b1dcf51c12 while it might be nativeto b1bcf11h03 (Fig 5) This gene cluster also appearsto have been transferred to Chlorobium tepidum as bothb1dcf51c12 and C tepidum cluster within the δ-proteo-bacteria for all these genes except TonB (from which wecould not make a reliable alignment) Robust phylogenieswere only obtained from OmpA and TolB However theconserved gene order in b1dcf51c12 C tepidumb1bcf11h03 and other δ-proteobacteria such as Geo-bacter suggests that this entire 4-kb fragment was trans-ferred from a δ-proteobacterium to C tepidum andb1dcf51c12 probably as two separate events Moreoverfor b1dcf51c12 the fusA paralogue discussed abovemay have been transferred as part of this gene cluster asthey are found close together in this clone The second δ-proteobacterial fosmid clone b1bcf11d04 also containsan OmpA homologue However this CDS is distantly
Fig 4 Maximum Likelihood phylogeny fusA homologues estimated using PMBML (661 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF19 and b1dcf51c12 ORF15 sequences against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbitrarily rooted by Aquifex aeolicus Results from bootstrap analyses are indicated as in Fig 3
10
Aquifex aeolicus Thermotoga maritima
Chlorobium tepidum b1dcf51c12ORF15
b1bcf11d04ORF19Desulfovibrio vulgaris
Desulfotalea psychrophila Magnetococcus sp MC-1
Geobacter sulfurreducens Geobacter metallireducens
Moorella thermoacetica Desulfitobacterium hafniense
Symbiobacterium thermophilum Chloroflexus aurantiacus
Dehalococcoides ethenogenesThermoanaerobacter tengcongensis
Clostridium thermocellumFusobacterium nucleatum
Clostridium perfringensClostridium tetani
Thermus thermophilus Rubrobacter xylanophilus
Mycoplasma penetransUreaplasma parvum
Geobacillus stearothermophilusExiguobacterium sp 255-15
Bacillus cereus Bacillus halodurans
Listeria monocytogenes Bacillus subtilis
Oceanobacillus iheyensis Staphylococcus aureus
Lactobacillus johnsonii Pediococcus pentosaceusLactobacillus plantarum
Enterococcus faecalisLactococcus lactis
Streptococcus mutans Streptococcus agalactiae
Moorella thermoacetica Symbiobacterium thermophilum
Thermoanaerobacter tengcongensis Clostridium thermocellum
Clostridium acetobutylicumClostridium perfringens
Clostridium tetani Chlorobium tepidum
Fusobacterium nucleatumThermobifida fusca
Desulfovibrio desulfuricansMagnetococcus sp MC-1
Geobacter sulfurreducensSynechococcus elongatus
Prochlorococcus marinus Synechococcus sp WH 8102
Thermosynechococcus elongatus Nostoc punctiforme
Synechocystis sp PCC 6803 Trichodesmium erythraeum
Spirulina platensis Campylobacter jejuni Helicobacter pylori Wolinella succinogenes
Legionella pneumophilaMethylococcus capsulatus
Coxiella burnetii Photorhabdus luminescens
Pasteurella multocida Shewanella oneidensis Photobacterium profundum Vibrio parahaemolyticusNeisseria meningitidis
Chromobacterium violaceum Bordetella parapertussis
Ralstonia metallidurans Bordetella bronchiseptica Burkholderia pseudomalleiRalstonia metallidurans
Azoarcus sp EbN1 Dechloromonas aromatica
Nitrosomonas europaea Thiobacillus denitrificans
66
57 65 55
61
5160
9072
80
86
88
6090
63
50 52 75 74
9094
50 68 74
78
53
7985
8481
72
53 9968
7790
70
2022 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
related to the OmpA found in this gene cluster and wasnot included in the alignment
We also identified some mobile genes that might beinvolved in biodegradation of pollutants by searching thePfam database In one of the γ-proteobacterial fosmidsb1bcf11c4 we identified a glutathione-S-transferase(GST ORF36) gene that was flanked by an acetyltrans-ferase gene (ORF35) and a transporter (ORF34) Eukary-otic GSTs are important in detoxifying metabolism Wellcharacterized bacterial GSTs (such as dichloromethanedehalogenase and 12-dichloroepoxyethane epoxidase)on the other hand are catabolic enzymes that play anessential role in growth on various difficult-to-degradechemicals (Vuilleumier and Pagni 2002) Considering theenvironment the fosmid originated from ndash highly pollutedmarine sediments ndash these CDSs would be good candi-dates for genes involved in biodegradation of a xenbiotic
compound The b1bf11c4 GST-gene clusters with a γ-proteobacterium (Acinetobacter sp ADP1 Accession noYP_046221) However as observed by Vuilleumier andPagni (2002) the phylogeny suggests that this gene hasbeen frequently transferred In support of this CDS havingbeen acquired by LGT its neighbour ndash ORF34 ndash clustersrobustly within the β-proteobacteria while ORF35 clusterswith δ-proteobacteria (although with no bootstrapsupport)
Another gene that might be involved in biodegradationof pollutants was identified among the CDSs that havebeen transferred into the β-proteobacterial fosmidb1bf11a01 ndash ORF31 which encodes a dienelactonehydrolases Dienelactone hydrolases play a crucial role inchlorocatechol degradation via the modified ortho cleav-age pathway (Eulberg et al 1998 Muller et al 2004)suggesting that the bacterium from which this fragmentoriginated might use chloraromatic compounds as energysource However it should be noted that this CDS is foundin a cluster of CDSs from genome projects with no exper-imentally confirmed function Again this gene is flankedby other genes that also have been acquired by LGT Thephylogeny of the neighbouring genes ndash ORF30 an S4domain protein suggests that it has been acquired froma γ-proteobacterium The next gene upstream ORF29could not be used in phylogenetic analyses However thisCDS has no match in its close relative T denitrificans andits best match was to a conserved membrane protein fromClostridium tetani (Table S11) Thus it is likely that allthese genes have been acquired by LGT Notably a shortinverted repeat (80 identity) was found to flank thesegenes (34021ndash34040 36693ndash36674)
Few laterally transferred CDSs identified by G + C content
Differences in G + C content are commonly used as anindication of recent LGT (Lawrence and Ochman 1997)We identified only eight CDSs that showed a G + C con-tent 10 higher or lower than the average for the respec-tive fosmid clone (see Tables S1ndash12) ORF20 in the δ-proteobacterial clone b1bcf11h3 has a G + C content of475 compared with 366 for the complete fosmid ThisCDS clusters with Desulfovibrio vulgaris within a mixedclade with no bootstrap support and was not included inthe LGT estimate for this fosmid A very short ORFan(ORF1) in the candidate division OP8 clone b3cf12f09has a G + C content of 436 compared with 594 forthe fosmid clone In addition the transposase (ORF16)and its neighbouring ORFan (ORF17) in the same clonehave a G + C content of 463 and 402 respectivelyORF11 ORF13 and ORF14 in the γ-proteobacterial cloneb3cf12d07 all show higher G + C content than the restof the fosmid with 664 657 and 647 comparedwith 525 for the rest of the fosmid All these CDSs
Fig 5 Maximum Likelihood phylogeny of OmpA homologues esti-mated using PMBML (135 positions in alignment) The sequences were obtained by blasting the b1dcf51c12 ORF7 sequence against Gen-Bank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative We also removed three sequences from Chlamydiaceae as these sequences formed a long unstable branch in the tree as well as some sequences that where considerably shorter than the remaining alignment The tree was arbitrarily rooted by Agrobacterium tumefaciens Results from bootstrap analyses are indicated as in Fig 3
10
Agrobacterium tumefaciens Sinorhizobium meliloti
Brucella melitensis Mesorhizobium loti
Mesorhizobium sp BNC1 Helicobacter bizzozeronii
Bartonella henselae Rhodopseudomonas palustris Bradyrhizobium japonicum
Rhodobacter sphaeroidesSilicibacter sp TM1040
Rhodospirillum rubrum Caulobacter crescentus
Magnetospirillum gryphiswaldense Rickettsia typhi
Rickettsia sibirica Gluconobacter oxydans
Zymomonas mobilis Novosphingobium aromaticivorans
Novosphingobium aromaticivorans Magnetococcus sp MC-1
Myxococcus xanthusXanthomonas campestris
Desulfotalea psychrophila Wolinella succinogenes
Desulfotalea psychrophila Desulfovibrio vulgaris
Geobacter metallireducens Geobacter sulfurreducens
Geobacter metallireducens Geobacter sulfurreducens
Chlorobium tepidum b1bcf11h03ORF12
Bdellovibrio bacteriovorus b1dcf51c12ORF7
Psychrobacter sp 273-4 Acinetobacter sp ADP1
Microbulbifer degradans Pseudomonas syringae Pseudomonas aeruginosa
Rubrivivax gelatinosus Thiobacillus denitrificans Nitrosomonas europaea
Ralstonia solanacearum Ralstonia eutropha
Burkholderia fungorum Burkholderia cepacia
Burkholderia cepacia Burkholderia pseudomallei
Idiomarina loihiensisPhotobacterium profundum
Shewanella oneidensis Vibrio cholerae Vibrio vulnificus Vibrio parahaemolyticus
Haemophilus somnus Haemophilus influenzae
Pasteurella multocida Photorhabdus luminescens Yersinia pseudotuberculosis
Erwinia carotovora Salmonella enterica
Erwinia chrysanthemi
6155
79 61 83
7255
5467
71
52
65
5152
5474
82
52
73
528498 52
508992
8472 54
527383
698372
8783
77 92
52
LGT and phylogenetic assignment of metagenomic clones 2023
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
cluster with γ-proteobacteria and might therefore repre-sent recent within γ-proteobacteria transfers ORF40 inthe isin-proteobacterial clone b1dcf13c08 a short ORFanhas a G + C content of 222 compared with 347 forthe complete clone In addition ORF9 another ORFan inb1dcf13c08 has a marginally lower G + C content com-pared with the rest of the fosmid clone with 257 Simi-larly ORF26 in the Chloroflexi clone b1dcf13f01 has aG + C content of 478 G + C compared with 569 forthe complete fosmid clone
The first protein coding sequences from uncultivated lineages
Four of the fosmids that we sequenced were from uncul-tivated lineages These fosmid clones represent to ourknowledge the first protein coding sequences obtainedfrom these major bacterial lineages In agreement withtheir rRNA phylotype most of the CDSs with homologuesin GenBank are found as independent lineages in phylo-genetic trees (Fig 1 Table 1) These clones also containseveral large CDSs with no significant matches in Gen-Bank or only partial matches to known proteins (Fig 1Table 1) A t-test showed that both the proportion ofORFans (P = 0002) and the proportion of coding bases(P = 002) with no match in GenBank (excluding the envi-ronmental part of GenBank) were significantly higherthan what was observed in fosmid clones from lineagesthat have cultivated representatives
The two candidate division WS3 clones b1bcf11f04and b1dcf51c12 contain several large CDSs for whichwe can make no clear functional prediction or that haveno match in GenBank For instance for b1dcf51c12 halfof the clone is occupied by two CDSs that have no signif-icant matches in GenBank (ORF4) or only a single match(ORF5) Also none of these CDSs had significantmatches to domains in Pfam These CDSs might repre-sent lineage-specific proteins and homologues may beidentified when more sequences from this lineages areavailable The candidate division OP8 also contains anumber of ORFans however in this fosmid the predictedproteins tend to be smaller than what we observed for thetwo WS3 clones
The b1dcf51a06 clone encodes a large ORFan(ORF1) as well as several smaller ORFans (ORF5ORF7ndash9 ORF14) and CDSs with only single hits in Gen-Bank (ORF6 ORF11ndash13) (Fig 1) For ORF1 we canmake some functional prediction based on Pfamsearches This protein contains a nucleoside diphosphatekinases domain a fibronectin type III domain as well asa PBS lyase HEAT-like repeat (three repeat units) ThePBS lyase repeat is responsible for specifically attachingparticular phycobilins to apophycobiliprotein subunits inthe phycobilisomes (PBS) which are light harvesting mac-
romolecular complexes of cyanobacteria and red algae(Zhao et al 2000) The phycobilins are open-chain tet-rapyrrole chromophores which function as the photosyn-thetic light-harvesting pigments Interestingly two otherCDSs ndash ORF15 and ORF16 ndash also contain several PBSrepeats It is possible that the proteins encoded by thePBS-containing CDSs in b1dcf51a06 has a similar func-tion as the PBS lyase proteins in cyanobacteria andthat this fosmid clone originated from a photosyntheticorganism
Among the CDSs that do have matches in GenBank arepotential phylogenetic markers The candidate divisionWS3 clone b1bcf11f04 clone contains two CDSs withsimilarity to DNA polymerase III subunit A homologuesDnaE and the Gram-positive type PolC In phylogenetictrees of both genes the b1bcf11f04 homologue forms aseparate lineage (Fig 6) Conserved domain searches atNCBI showed that the PolC-like CDS shows similarity toonly part of this gene ndash the exonuclease domain ndash and itis fused to DinG that encodes Rad3-related DNA heli-cases Proteins with similar domain architecture are foundin several other bacterial genomes mostly Firmicutes aswell as S thermophilum and Chloroflexus aurantiacussuggesting that the candidate division WS3 might be spe-cifically related to one of these lineages In phylogenetictrees of the DinG domain of these proteins the fusionproteins are all found in the same clade (Fig 6) Howeverthe monophyly of this clade was not supported by boot-strap analyses In the Maximum Likelihood phylogeny theb1bcf11f4 CDS clusters at the bottom of the clade withC aurantiacus No non-fusion proteins are found inthis clade suggesting a single origin of this domainorganization
Summary
Metagenomic approaches play an increasing and highlyvisible role in microbial ecology The data sets they gen-erate are complex and coupling the information they pro-vide concerning the metabolic potential of an environmentto organismal lineage that may be present there remainsa challenge Here we have shown the utility of rRNA-targeted cloning and phylogenetic analysis of CDSs inmaking such a coupling We also show that LGT evenwhen not precluding provisional assignment to lineages(taxonomy) will likely complicate the history of any lin-eage (phylogenetics) making phylotype-ecotype infer-ences provisional Environmental metagenomic data opena window into a rich world of genetic interactions someof which might be partially reconstructed as we havedescribed here The bioinformatic challenges associatedwith a complete metagenomic assessment of an environ-ment as complex as Baltimore harbour sediment aredaunting indeed But progress in understanding our own
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
LGT and phylogenetic assignment of metagenomic clones 2019
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f11
d04
δ-P
rote
obac
teriu
mndash
12 o
f 18
CD
Ss
(67
)w
ith s
uppo
rted
phyl
ogen
etic
top
olog
ies
agre
e w
ith a
δ-
prot
eoba
cter
ial o
rigin
of
the
frag
men
t
Six
CD
Ss
are
sugg
este
dby
phy
loge
netic
ana
lyse
sto
hav
e be
en a
cqui
red
byLG
T O
ne o
f th
ese
tran
sfer
red
gene
s ndasht
hefu
sA h
omol
ogue
(OR
F19
) ndash is
als
o fo
und
inb1
bcf5
c12
Thi
s C
DS
has
been
tra
nsfe
rred
to
othe
r δ-
prot
eoba
cter
ia a
sw
ell
Thr
ee C
DS
s (O
RF
3ndash5)
that
enc
ode
anin
tege
rase
and
tw
otr
ansp
osas
es t
hat
prec
edes
fou
r of
the
LGT
gen
es d
etec
ted
in t
he p
hylo
gene
tican
alys
is
OR
F7
also
likel
y tr
ansf
erre
d w
ithO
RF
3 ndashO
RF
10
OR
F20
and
OR
F21
have
mai
nly
hom
olog
ues
inF
irm
icut
es a
nd is
the
neig
hbou
r of
OR
F19
that
has
als
o be
enac
quire
d fr
omF
irm
icut
es
12 C
DS
s (3
1 o
fto
tal)
hav
e b
een
acq
uir
ed b
y L
GT
Inte
rest
ingl
y th
isfo
smid
clo
nepr
ovid
es t
hetr
ansf
er v
ecto
r ndash
the
inte
gera
se a
ndtr
ansp
osas
e ndash
for
8of
the
tra
nsfe
rred
gene
s
ndash
b1bc
f13
c08
ε-P
rote
obac
teriu
m
mos
t cl
osel
yre
late
d to
Cam
pylo
bact
erje
juni
21 C
DS
s gi
ve s
uppo
rted
phyl
ogen
ies
and
ofth
ese
19 (
90
) ag
ree
with
rR
NA
OR
F4
clus
ters
with
Geo
bact
er a
ndC
lost
ridiu
m
and
OR
F23
does
not
hav
eho
mol
ogue
s in
ε-
prot
eoba
cter
ia a
ndcl
uste
rs w
ith γ
- an
d β-
prot
eoba
cter
ia
OR
F24
doe
s no
t gi
ve a
supp
orte
d tr
ee b
utha
s al
so p
roba
bly
been
tra
nsfe
rred
fro
mγ-
or
β-pr
oteo
bact
eria
3 C
DS
s (7
o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
10
(3
)
b3cf
12
d07
γ-P
rote
obac
teriu
m
Clu
ster
s w
ithin
the
γ-pr
oteo
bact
eria
inLo
gDet
dis
tanc
etr
ees
but
at t
heba
se o
f γ-
prot
eoba
cter
ia a
ndβ-
prot
eoba
cter
iain
the
bes
tm
axim
umlik
elih
ood
tree
Onl
y 7
CD
Ss
give
su
ppor
ted
phyl
ogen
ies
O
f th
ese
4 (5
7)
agre
e w
ith r
RN
A
OR
F7
clus
ter
with
in β
-pr
oteo
bact
eria
OR
F15
ha
s a
patc
hy d
istr
ibut
ion
and
does
not
clu
ster
with
ot
her
prot
eoba
cter
ia in
th
e ph
ylog
enet
ic t
ree
Sev
eral
add
ition
al C
DS
s (O
RF
16ndashO
RF
25)
that
did
not
prod
uce
wel
l-re
solv
ed t
rees
ha
d on
ly d
iver
gent
hom
olog
ues
inG
enB
ank
or
nosi
gnifi
cant
hom
olog
ues
may
also
hav
e be
enac
quire
d by
LG
T I
nsu
ppor
t of
thi
sO
RF
26 e
ncod
es a
tran
spos
ase
2 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
O
RF
16 ndash
OR
F25
w
as n
ot in
clud
ed in
es
timat
e du
e to
lim
ited
evid
ence
for
th
e tr
ansf
er o
f the
se
23
(23
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
2020 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f1c
04γ-
Pro
teob
acte
rium
ndash14
CD
Ss
give
sup
port
edph
ylog
enie
s an
d of
thes
e 13
(93
)
agre
ew
ith r
RN
A
Phy
loge
netic
ana
lyse
ssh
ow t
hat
two
CD
Ss
have
bee
n ac
quire
d by
LGT
OR
F3
is f
ound
in a
mix
ed c
lade
whi
leO
RF
30 c
lust
er w
ithin
β-
prot
eoba
cter
ia
Thr
ee g
enes
tha
t sh
owun
cong
ruen
tph
ylog
enie
s b
utw
ith lo
w b
oots
trap
supp
ort
foun
d cl
ose
to O
RF
3 an
d O
RF
34ha
ve p
roba
bly
also
been
acq
uire
d by
LGT
O
RF
5 cl
uste
rsw
ith β
-pro
teob
acte
ria
OR
F31
clu
ster
s w
ithδ-
prot
eoba
cter
ia
and
OR
F32
(G
ST
) cl
uste
rsw
ith a
γ-pr
oteo
bact
eriu
m
but
appe
ars
toha
ve b
een
freq
uent
lytr
ansf
erre
d
5 C
DS
s (1
3 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
3 (
1)
b1bf
11
a01
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns (
98
iden
tity
at 2
3S
rRN
A)
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns(9
8 id
entit
yat
16S
rR
NA
)
Hig
h de
gree
of
gene
sy
nten
y co
mpa
red
with
Thi
obac
illus
de
nitr
ifica
ns
29 C
DS
sha
ve b
est
BLA
ST
mat
chin
Thi
obac
illus
de
nitr
ifica
ns 2
7 of
28
CD
Ss
(96
) th
at g
ive
stat
istic
ally
sup
port
edph
ylog
enie
s ag
ree
with
rR
NA
gen
es
One
OR
F30
(R
suA
)cl
uste
r w
ith γ
-pr
oteo
bact
eria
and
has
no
hom
olog
ue in
T
hiob
acill
us d
enitr
ifica
ns
Two
CD
Ss
(OR
F14
and
O
RF
31)
have
bee
n tr
ansf
erre
d to
bot
h fo
smid
an
d T
hiob
acill
us
deni
trifi
cans
OR
F29
has
no
sign
ifica
nt
hom
olog
ues
inpr
oteo
bact
eria
4 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
3 (
2)
b1bf
110
d03
ndashA
Fla
voba
cter
iace
aeba
cter
ium
am
ong
sequ
ence
dge
nom
es m
ost
clos
ely
rela
ted
toC
ytop
haga
hutc
hins
onii
16 o
f 18
(84
) C
DS
s w
ith
supp
orte
d ph
ylog
enet
icto
polo
gies
agr
ee w
ith16
S f
ragm
ent
OR
F5
and
OR
F10
hav
e no
cl
ose
hom
olog
ues
in
othe
r B
acte
roid
es a
ndph
ylog
enet
ic a
naly
sis
sugg
ests
fre
quen
ttr
ansf
er
OR
F4
has
no d
etec
tabl
eho
mol
ogue
s in
oth
er
Bac
tero
ides
A
tran
spos
on w
ith 8
C
DS
s lik
ely
acqu
ired
from
rel
ativ
e of
Bac
tero
ides
thet
aiot
aoim
icro
n
3 C
DS
s (1
0 o
f to
tal)
h
ave
likel
y b
een
acq
uir
ed b
y L
GT
The
tra
nspo
son
not
incl
uded
as
it ha
sbe
en t
rans
ferr
edw
ithin
the
B
acte
roid
es
10
(3
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
a O
nly
LGT
eve
nts
invo
lvin
g th
e C
DS
fro
m t
he fo
smid
clo
ne a
naly
sed
was
cou
nted
and
onl
y w
hen
they
wer
e su
ppor
ted
by p
hylo
gene
tic a
naly
ses
or c
lear
phy
loge
netic
dis
trib
utio
n pa
ttern
s (i
e
the
gene
is n
ot p
rese
nt in
its
rRN
A g
roup
but
pre
sent
in s
ome
othe
r di
stin
ct b
acte
rial g
roup
) N
umbe
r of
CD
Ss
acqu
ired
by L
GT
is s
how
n in
bol
db
O
RFa
ns w
here
cla
ssifi
ed a
s C
DS
s w
ith n
o si
gnifi
cant
mat
ch in
Gen
Ban
k M
atch
es t
o se
quen
ces
in t
he e
nviro
nmen
tal p
ortio
n of
Gen
Ban
k w
ere
not
cons
ider
ed I
n pa
rent
hesi
s is
giv
en t
he
prop
ortio
n of
pro
tein
cod
ing
DN
A t
hat
has
no m
atch
in G
enB
ank
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2021
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
(ORF16) showing that this lineage has indeed acquiredproteobacterial genes This CDS might have been part ofthe α-proteobacterial island upon transfer
In the Flavobacteriaceae fosmid b1bf11d10 a largeself-transmitting conjugative transposon was identified(Fig 1) This transposon is inserted next to a tRNA and issimilar in sequence and structure to the transposonsfound in Bacteroides thetaiotaomicron (Xu et al 2003)Bacteroides fragilis (Kuwahara et al 2004) and Porphy-romonas gingivalis (Nelson et al 2003) In the phyloge-netic tree of the transposase gene (ORF21) the CDSfrom the fosmid falls into a cluster containing numerousB thetaiotaomicron sequences separated from the singleCytophaga hutchinsonii homologue detected among the100 best BLAST hits For the other CDSs that are clearlypart of this transposon (ORF22ndashORF27) we found no
significant homologues in C hutchinsonii and the best(and in most cases the only) match was always to Bthetaiotaomicron and P gingivalis genes suggesting thatthis transposon has been acquired from the Bacteroidaleslineage It is likely that we have captured only part of thistransposon ndash because many of the CDSs found in thetransposons in B thetaiotaomicron are not present in thefragment we have sequenced ndash and that also the 3prime CDSsin this fosmid clone (ORF28ndashORF30) were transferredalong with this transposon Additional CDSs (possibly notinvolved in transposon function) where also present in theB thetaiotaomicron transposons (Xu et al 2003) Wenote that the acquisition of this transposon was notincluded in our LGT estimate as it originated from thesame major bacterial group as the fosmid clone
Interestingly one gene was found to have been trans-ferred to two of the fosmids the fusA paralogue inb1bcf11d04 and b1dcf51c12 (Figs 1 and 4) This pro-tein appears to be a distant paralogue of fusA and it hasa very patchy phylogenetic distribution suggesting that itoriginated in one of the lineages that possesses it andthen has been transferred to the other lineages Onecharacteristic common to the organisms encoding thisprotein is that they are all anaerobes or microaerophilic(Symbiobacterium thermophilum) and they are all foundin environments similar to the one sampled here Trans-ferred genes are likely to give a selective advantage in theenvironment where the organisms harbouring them liveand an ecological function for this fusA paralogue shouldbe sought
Another set of genes identified in two of the fosmidclones forms a cluster encoding outer membrane proteinsand proteins involved in biopolymer transport (OmpATolB TonB ExbD TolQ) This cluster is found in both thecandidate division WS3 clone b1dcf51c12 and the δ-proteobacterial clone b1bcf11h03 (Fig 1) In this casethe gene cluster appears to have been transferred from aδ-proteobacterium to b1dcf51c12 while it might be nativeto b1bcf11h03 (Fig 5) This gene cluster also appearsto have been transferred to Chlorobium tepidum as bothb1dcf51c12 and C tepidum cluster within the δ-proteo-bacteria for all these genes except TonB (from which wecould not make a reliable alignment) Robust phylogenieswere only obtained from OmpA and TolB However theconserved gene order in b1dcf51c12 C tepidumb1bcf11h03 and other δ-proteobacteria such as Geo-bacter suggests that this entire 4-kb fragment was trans-ferred from a δ-proteobacterium to C tepidum andb1dcf51c12 probably as two separate events Moreoverfor b1dcf51c12 the fusA paralogue discussed abovemay have been transferred as part of this gene cluster asthey are found close together in this clone The second δ-proteobacterial fosmid clone b1bcf11d04 also containsan OmpA homologue However this CDS is distantly
Fig 4 Maximum Likelihood phylogeny fusA homologues estimated using PMBML (661 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF19 and b1dcf51c12 ORF15 sequences against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbitrarily rooted by Aquifex aeolicus Results from bootstrap analyses are indicated as in Fig 3
10
Aquifex aeolicus Thermotoga maritima
Chlorobium tepidum b1dcf51c12ORF15
b1bcf11d04ORF19Desulfovibrio vulgaris
Desulfotalea psychrophila Magnetococcus sp MC-1
Geobacter sulfurreducens Geobacter metallireducens
Moorella thermoacetica Desulfitobacterium hafniense
Symbiobacterium thermophilum Chloroflexus aurantiacus
Dehalococcoides ethenogenesThermoanaerobacter tengcongensis
Clostridium thermocellumFusobacterium nucleatum
Clostridium perfringensClostridium tetani
Thermus thermophilus Rubrobacter xylanophilus
Mycoplasma penetransUreaplasma parvum
Geobacillus stearothermophilusExiguobacterium sp 255-15
Bacillus cereus Bacillus halodurans
Listeria monocytogenes Bacillus subtilis
Oceanobacillus iheyensis Staphylococcus aureus
Lactobacillus johnsonii Pediococcus pentosaceusLactobacillus plantarum
Enterococcus faecalisLactococcus lactis
Streptococcus mutans Streptococcus agalactiae
Moorella thermoacetica Symbiobacterium thermophilum
Thermoanaerobacter tengcongensis Clostridium thermocellum
Clostridium acetobutylicumClostridium perfringens
Clostridium tetani Chlorobium tepidum
Fusobacterium nucleatumThermobifida fusca
Desulfovibrio desulfuricansMagnetococcus sp MC-1
Geobacter sulfurreducensSynechococcus elongatus
Prochlorococcus marinus Synechococcus sp WH 8102
Thermosynechococcus elongatus Nostoc punctiforme
Synechocystis sp PCC 6803 Trichodesmium erythraeum
Spirulina platensis Campylobacter jejuni Helicobacter pylori Wolinella succinogenes
Legionella pneumophilaMethylococcus capsulatus
Coxiella burnetii Photorhabdus luminescens
Pasteurella multocida Shewanella oneidensis Photobacterium profundum Vibrio parahaemolyticusNeisseria meningitidis
Chromobacterium violaceum Bordetella parapertussis
Ralstonia metallidurans Bordetella bronchiseptica Burkholderia pseudomalleiRalstonia metallidurans
Azoarcus sp EbN1 Dechloromonas aromatica
Nitrosomonas europaea Thiobacillus denitrificans
66
57 65 55
61
5160
9072
80
86
88
6090
63
50 52 75 74
9094
50 68 74
78
53
7985
8481
72
53 9968
7790
70
2022 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
related to the OmpA found in this gene cluster and wasnot included in the alignment
We also identified some mobile genes that might beinvolved in biodegradation of pollutants by searching thePfam database In one of the γ-proteobacterial fosmidsb1bcf11c4 we identified a glutathione-S-transferase(GST ORF36) gene that was flanked by an acetyltrans-ferase gene (ORF35) and a transporter (ORF34) Eukary-otic GSTs are important in detoxifying metabolism Wellcharacterized bacterial GSTs (such as dichloromethanedehalogenase and 12-dichloroepoxyethane epoxidase)on the other hand are catabolic enzymes that play anessential role in growth on various difficult-to-degradechemicals (Vuilleumier and Pagni 2002) Considering theenvironment the fosmid originated from ndash highly pollutedmarine sediments ndash these CDSs would be good candi-dates for genes involved in biodegradation of a xenbiotic
compound The b1bf11c4 GST-gene clusters with a γ-proteobacterium (Acinetobacter sp ADP1 Accession noYP_046221) However as observed by Vuilleumier andPagni (2002) the phylogeny suggests that this gene hasbeen frequently transferred In support of this CDS havingbeen acquired by LGT its neighbour ndash ORF34 ndash clustersrobustly within the β-proteobacteria while ORF35 clusterswith δ-proteobacteria (although with no bootstrapsupport)
Another gene that might be involved in biodegradationof pollutants was identified among the CDSs that havebeen transferred into the β-proteobacterial fosmidb1bf11a01 ndash ORF31 which encodes a dienelactonehydrolases Dienelactone hydrolases play a crucial role inchlorocatechol degradation via the modified ortho cleav-age pathway (Eulberg et al 1998 Muller et al 2004)suggesting that the bacterium from which this fragmentoriginated might use chloraromatic compounds as energysource However it should be noted that this CDS is foundin a cluster of CDSs from genome projects with no exper-imentally confirmed function Again this gene is flankedby other genes that also have been acquired by LGT Thephylogeny of the neighbouring genes ndash ORF30 an S4domain protein suggests that it has been acquired froma γ-proteobacterium The next gene upstream ORF29could not be used in phylogenetic analyses However thisCDS has no match in its close relative T denitrificans andits best match was to a conserved membrane protein fromClostridium tetani (Table S11) Thus it is likely that allthese genes have been acquired by LGT Notably a shortinverted repeat (80 identity) was found to flank thesegenes (34021ndash34040 36693ndash36674)
Few laterally transferred CDSs identified by G + C content
Differences in G + C content are commonly used as anindication of recent LGT (Lawrence and Ochman 1997)We identified only eight CDSs that showed a G + C con-tent 10 higher or lower than the average for the respec-tive fosmid clone (see Tables S1ndash12) ORF20 in the δ-proteobacterial clone b1bcf11h3 has a G + C content of475 compared with 366 for the complete fosmid ThisCDS clusters with Desulfovibrio vulgaris within a mixedclade with no bootstrap support and was not included inthe LGT estimate for this fosmid A very short ORFan(ORF1) in the candidate division OP8 clone b3cf12f09has a G + C content of 436 compared with 594 forthe fosmid clone In addition the transposase (ORF16)and its neighbouring ORFan (ORF17) in the same clonehave a G + C content of 463 and 402 respectivelyORF11 ORF13 and ORF14 in the γ-proteobacterial cloneb3cf12d07 all show higher G + C content than the restof the fosmid with 664 657 and 647 comparedwith 525 for the rest of the fosmid All these CDSs
Fig 5 Maximum Likelihood phylogeny of OmpA homologues esti-mated using PMBML (135 positions in alignment) The sequences were obtained by blasting the b1dcf51c12 ORF7 sequence against Gen-Bank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative We also removed three sequences from Chlamydiaceae as these sequences formed a long unstable branch in the tree as well as some sequences that where considerably shorter than the remaining alignment The tree was arbitrarily rooted by Agrobacterium tumefaciens Results from bootstrap analyses are indicated as in Fig 3
10
Agrobacterium tumefaciens Sinorhizobium meliloti
Brucella melitensis Mesorhizobium loti
Mesorhizobium sp BNC1 Helicobacter bizzozeronii
Bartonella henselae Rhodopseudomonas palustris Bradyrhizobium japonicum
Rhodobacter sphaeroidesSilicibacter sp TM1040
Rhodospirillum rubrum Caulobacter crescentus
Magnetospirillum gryphiswaldense Rickettsia typhi
Rickettsia sibirica Gluconobacter oxydans
Zymomonas mobilis Novosphingobium aromaticivorans
Novosphingobium aromaticivorans Magnetococcus sp MC-1
Myxococcus xanthusXanthomonas campestris
Desulfotalea psychrophila Wolinella succinogenes
Desulfotalea psychrophila Desulfovibrio vulgaris
Geobacter metallireducens Geobacter sulfurreducens
Geobacter metallireducens Geobacter sulfurreducens
Chlorobium tepidum b1bcf11h03ORF12
Bdellovibrio bacteriovorus b1dcf51c12ORF7
Psychrobacter sp 273-4 Acinetobacter sp ADP1
Microbulbifer degradans Pseudomonas syringae Pseudomonas aeruginosa
Rubrivivax gelatinosus Thiobacillus denitrificans Nitrosomonas europaea
Ralstonia solanacearum Ralstonia eutropha
Burkholderia fungorum Burkholderia cepacia
Burkholderia cepacia Burkholderia pseudomallei
Idiomarina loihiensisPhotobacterium profundum
Shewanella oneidensis Vibrio cholerae Vibrio vulnificus Vibrio parahaemolyticus
Haemophilus somnus Haemophilus influenzae
Pasteurella multocida Photorhabdus luminescens Yersinia pseudotuberculosis
Erwinia carotovora Salmonella enterica
Erwinia chrysanthemi
6155
79 61 83
7255
5467
71
52
65
5152
5474
82
52
73
528498 52
508992
8472 54
527383
698372
8783
77 92
52
LGT and phylogenetic assignment of metagenomic clones 2023
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
cluster with γ-proteobacteria and might therefore repre-sent recent within γ-proteobacteria transfers ORF40 inthe isin-proteobacterial clone b1dcf13c08 a short ORFanhas a G + C content of 222 compared with 347 forthe complete clone In addition ORF9 another ORFan inb1dcf13c08 has a marginally lower G + C content com-pared with the rest of the fosmid clone with 257 Simi-larly ORF26 in the Chloroflexi clone b1dcf13f01 has aG + C content of 478 G + C compared with 569 forthe complete fosmid clone
The first protein coding sequences from uncultivated lineages
Four of the fosmids that we sequenced were from uncul-tivated lineages These fosmid clones represent to ourknowledge the first protein coding sequences obtainedfrom these major bacterial lineages In agreement withtheir rRNA phylotype most of the CDSs with homologuesin GenBank are found as independent lineages in phylo-genetic trees (Fig 1 Table 1) These clones also containseveral large CDSs with no significant matches in Gen-Bank or only partial matches to known proteins (Fig 1Table 1) A t-test showed that both the proportion ofORFans (P = 0002) and the proportion of coding bases(P = 002) with no match in GenBank (excluding the envi-ronmental part of GenBank) were significantly higherthan what was observed in fosmid clones from lineagesthat have cultivated representatives
The two candidate division WS3 clones b1bcf11f04and b1dcf51c12 contain several large CDSs for whichwe can make no clear functional prediction or that haveno match in GenBank For instance for b1dcf51c12 halfof the clone is occupied by two CDSs that have no signif-icant matches in GenBank (ORF4) or only a single match(ORF5) Also none of these CDSs had significantmatches to domains in Pfam These CDSs might repre-sent lineage-specific proteins and homologues may beidentified when more sequences from this lineages areavailable The candidate division OP8 also contains anumber of ORFans however in this fosmid the predictedproteins tend to be smaller than what we observed for thetwo WS3 clones
The b1dcf51a06 clone encodes a large ORFan(ORF1) as well as several smaller ORFans (ORF5ORF7ndash9 ORF14) and CDSs with only single hits in Gen-Bank (ORF6 ORF11ndash13) (Fig 1) For ORF1 we canmake some functional prediction based on Pfamsearches This protein contains a nucleoside diphosphatekinases domain a fibronectin type III domain as well asa PBS lyase HEAT-like repeat (three repeat units) ThePBS lyase repeat is responsible for specifically attachingparticular phycobilins to apophycobiliprotein subunits inthe phycobilisomes (PBS) which are light harvesting mac-
romolecular complexes of cyanobacteria and red algae(Zhao et al 2000) The phycobilins are open-chain tet-rapyrrole chromophores which function as the photosyn-thetic light-harvesting pigments Interestingly two otherCDSs ndash ORF15 and ORF16 ndash also contain several PBSrepeats It is possible that the proteins encoded by thePBS-containing CDSs in b1dcf51a06 has a similar func-tion as the PBS lyase proteins in cyanobacteria andthat this fosmid clone originated from a photosyntheticorganism
Among the CDSs that do have matches in GenBank arepotential phylogenetic markers The candidate divisionWS3 clone b1bcf11f04 clone contains two CDSs withsimilarity to DNA polymerase III subunit A homologuesDnaE and the Gram-positive type PolC In phylogenetictrees of both genes the b1bcf11f04 homologue forms aseparate lineage (Fig 6) Conserved domain searches atNCBI showed that the PolC-like CDS shows similarity toonly part of this gene ndash the exonuclease domain ndash and itis fused to DinG that encodes Rad3-related DNA heli-cases Proteins with similar domain architecture are foundin several other bacterial genomes mostly Firmicutes aswell as S thermophilum and Chloroflexus aurantiacussuggesting that the candidate division WS3 might be spe-cifically related to one of these lineages In phylogenetictrees of the DinG domain of these proteins the fusionproteins are all found in the same clade (Fig 6) Howeverthe monophyly of this clade was not supported by boot-strap analyses In the Maximum Likelihood phylogeny theb1bcf11f4 CDS clusters at the bottom of the clade withC aurantiacus No non-fusion proteins are found inthis clade suggesting a single origin of this domainorganization
Summary
Metagenomic approaches play an increasing and highlyvisible role in microbial ecology The data sets they gen-erate are complex and coupling the information they pro-vide concerning the metabolic potential of an environmentto organismal lineage that may be present there remainsa challenge Here we have shown the utility of rRNA-targeted cloning and phylogenetic analysis of CDSs inmaking such a coupling We also show that LGT evenwhen not precluding provisional assignment to lineages(taxonomy) will likely complicate the history of any lin-eage (phylogenetics) making phylotype-ecotype infer-ences provisional Environmental metagenomic data opena window into a rich world of genetic interactions someof which might be partially reconstructed as we havedescribed here The bioinformatic challenges associatedwith a complete metagenomic assessment of an environ-ment as complex as Baltimore harbour sediment aredaunting indeed But progress in understanding our own
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
2020 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
b1bc
f1c
04γ-
Pro
teob
acte
rium
ndash14
CD
Ss
give
sup
port
edph
ylog
enie
s an
d of
thes
e 13
(93
)
agre
ew
ith r
RN
A
Phy
loge
netic
ana
lyse
ssh
ow t
hat
two
CD
Ss
have
bee
n ac
quire
d by
LGT
OR
F3
is f
ound
in a
mix
ed c
lade
whi
leO
RF
30 c
lust
er w
ithin
β-
prot
eoba
cter
ia
Thr
ee g
enes
tha
t sh
owun
cong
ruen
tph
ylog
enie
s b
utw
ith lo
w b
oots
trap
supp
ort
foun
d cl
ose
to O
RF
3 an
d O
RF
34ha
ve p
roba
bly
also
been
acq
uire
d by
LGT
O
RF
5 cl
uste
rsw
ith β
-pro
teob
acte
ria
OR
F31
clu
ster
s w
ithδ-
prot
eoba
cter
ia
and
OR
F32
(G
ST
) cl
uste
rsw
ith a
γ-pr
oteo
bact
eriu
m
but
appe
ars
toha
ve b
een
freq
uent
lytr
ansf
erre
d
5 C
DS
s (1
3 o
f to
tal)
hav
e b
een
ac
qu
ired
by
LG
T
3 (
1)
b1bf
11
a01
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns (
98
iden
tity
at 2
3S
rRN
A)
β-P
rote
obac
teriu
m
clos
ely
rela
ted
toT
hiob
acill
usde
nitr
ifica
ns(9
8 id
entit
yat
16S
rR
NA
)
Hig
h de
gree
of
gene
sy
nten
y co
mpa
red
with
Thi
obac
illus
de
nitr
ifica
ns
29 C
DS
sha
ve b
est
BLA
ST
mat
chin
Thi
obac
illus
de
nitr
ifica
ns 2
7 of
28
CD
Ss
(96
) th
at g
ive
stat
istic
ally
sup
port
edph
ylog
enie
s ag
ree
with
rR
NA
gen
es
One
OR
F30
(R
suA
)cl
uste
r w
ith γ
-pr
oteo
bact
eria
and
has
no
hom
olog
ue in
T
hiob
acill
us d
enitr
ifica
ns
Two
CD
Ss
(OR
F14
and
O
RF
31)
have
bee
n tr
ansf
erre
d to
bot
h fo
smid
an
d T
hiob
acill
us
deni
trifi
cans
OR
F29
has
no
sign
ifica
nt
hom
olog
ues
inpr
oteo
bact
eria
4 C
DS
s (8
o
f to
tal)
h
ave
bee
n
acq
uir
ed b
y L
GT
3 (
2)
b1bf
110
d03
ndashA
Fla
voba
cter
iace
aeba
cter
ium
am
ong
sequ
ence
dge
nom
es m
ost
clos
ely
rela
ted
toC
ytop
haga
hutc
hins
onii
16 o
f 18
(84
) C
DS
s w
ith
supp
orte
d ph
ylog
enet
icto
polo
gies
agr
ee w
ith16
S f
ragm
ent
OR
F5
and
OR
F10
hav
e no
cl
ose
hom
olog
ues
in
othe
r B
acte
roid
es a
ndph
ylog
enet
ic a
naly
sis
sugg
ests
fre
quen
ttr
ansf
er
OR
F4
has
no d
etec
tabl
eho
mol
ogue
s in
oth
er
Bac
tero
ides
A
tran
spos
on w
ith 8
C
DS
s lik
ely
acqu
ired
from
rel
ativ
e of
Bac
tero
ides
thet
aiot
aoim
icro
n
3 C
DS
s (1
0 o
f to
tal)
h
ave
likel
y b
een
acq
uir
ed b
y L
GT
The
tra
nspo
son
not
incl
uded
as
it ha
sbe
en t
rans
ferr
edw
ithin
the
B
acte
roid
es
10
(3
)
Fos
mid
Phy
loge
ny
LGT
a
O
RFa
nsb
23S
rR
NA
16S
rR
NA
CD
Ss
Phy
loge
netic
tre
esP
hylo
gene
tic d
istr
ibut
ion
or g
enom
e co
ntex
tTo
tal
a O
nly
LGT
eve
nts
invo
lvin
g th
e C
DS
fro
m t
he fo
smid
clo
ne a
naly
sed
was
cou
nted
and
onl
y w
hen
they
wer
e su
ppor
ted
by p
hylo
gene
tic a
naly
ses
or c
lear
phy
loge
netic
dis
trib
utio
n pa
ttern
s (i
e
the
gene
is n
ot p
rese
nt in
its
rRN
A g
roup
but
pre
sent
in s
ome
othe
r di
stin
ct b
acte
rial g
roup
) N
umbe
r of
CD
Ss
acqu
ired
by L
GT
is s
how
n in
bol
db
O
RFa
ns w
here
cla
ssifi
ed a
s C
DS
s w
ith n
o si
gnifi
cant
mat
ch in
Gen
Ban
k M
atch
es t
o se
quen
ces
in t
he e
nviro
nmen
tal p
ortio
n of
Gen
Ban
k w
ere
not
cons
ider
ed I
n pa
rent
hesi
s is
giv
en t
he
prop
ortio
n of
pro
tein
cod
ing
DN
A t
hat
has
no m
atch
in G
enB
ank
Tab
le 1
co
nt
LGT and phylogenetic assignment of metagenomic clones 2021
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
(ORF16) showing that this lineage has indeed acquiredproteobacterial genes This CDS might have been part ofthe α-proteobacterial island upon transfer
In the Flavobacteriaceae fosmid b1bf11d10 a largeself-transmitting conjugative transposon was identified(Fig 1) This transposon is inserted next to a tRNA and issimilar in sequence and structure to the transposonsfound in Bacteroides thetaiotaomicron (Xu et al 2003)Bacteroides fragilis (Kuwahara et al 2004) and Porphy-romonas gingivalis (Nelson et al 2003) In the phyloge-netic tree of the transposase gene (ORF21) the CDSfrom the fosmid falls into a cluster containing numerousB thetaiotaomicron sequences separated from the singleCytophaga hutchinsonii homologue detected among the100 best BLAST hits For the other CDSs that are clearlypart of this transposon (ORF22ndashORF27) we found no
significant homologues in C hutchinsonii and the best(and in most cases the only) match was always to Bthetaiotaomicron and P gingivalis genes suggesting thatthis transposon has been acquired from the Bacteroidaleslineage It is likely that we have captured only part of thistransposon ndash because many of the CDSs found in thetransposons in B thetaiotaomicron are not present in thefragment we have sequenced ndash and that also the 3prime CDSsin this fosmid clone (ORF28ndashORF30) were transferredalong with this transposon Additional CDSs (possibly notinvolved in transposon function) where also present in theB thetaiotaomicron transposons (Xu et al 2003) Wenote that the acquisition of this transposon was notincluded in our LGT estimate as it originated from thesame major bacterial group as the fosmid clone
Interestingly one gene was found to have been trans-ferred to two of the fosmids the fusA paralogue inb1bcf11d04 and b1dcf51c12 (Figs 1 and 4) This pro-tein appears to be a distant paralogue of fusA and it hasa very patchy phylogenetic distribution suggesting that itoriginated in one of the lineages that possesses it andthen has been transferred to the other lineages Onecharacteristic common to the organisms encoding thisprotein is that they are all anaerobes or microaerophilic(Symbiobacterium thermophilum) and they are all foundin environments similar to the one sampled here Trans-ferred genes are likely to give a selective advantage in theenvironment where the organisms harbouring them liveand an ecological function for this fusA paralogue shouldbe sought
Another set of genes identified in two of the fosmidclones forms a cluster encoding outer membrane proteinsand proteins involved in biopolymer transport (OmpATolB TonB ExbD TolQ) This cluster is found in both thecandidate division WS3 clone b1dcf51c12 and the δ-proteobacterial clone b1bcf11h03 (Fig 1) In this casethe gene cluster appears to have been transferred from aδ-proteobacterium to b1dcf51c12 while it might be nativeto b1bcf11h03 (Fig 5) This gene cluster also appearsto have been transferred to Chlorobium tepidum as bothb1dcf51c12 and C tepidum cluster within the δ-proteo-bacteria for all these genes except TonB (from which wecould not make a reliable alignment) Robust phylogenieswere only obtained from OmpA and TolB However theconserved gene order in b1dcf51c12 C tepidumb1bcf11h03 and other δ-proteobacteria such as Geo-bacter suggests that this entire 4-kb fragment was trans-ferred from a δ-proteobacterium to C tepidum andb1dcf51c12 probably as two separate events Moreoverfor b1dcf51c12 the fusA paralogue discussed abovemay have been transferred as part of this gene cluster asthey are found close together in this clone The second δ-proteobacterial fosmid clone b1bcf11d04 also containsan OmpA homologue However this CDS is distantly
Fig 4 Maximum Likelihood phylogeny fusA homologues estimated using PMBML (661 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF19 and b1dcf51c12 ORF15 sequences against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbitrarily rooted by Aquifex aeolicus Results from bootstrap analyses are indicated as in Fig 3
10
Aquifex aeolicus Thermotoga maritima
Chlorobium tepidum b1dcf51c12ORF15
b1bcf11d04ORF19Desulfovibrio vulgaris
Desulfotalea psychrophila Magnetococcus sp MC-1
Geobacter sulfurreducens Geobacter metallireducens
Moorella thermoacetica Desulfitobacterium hafniense
Symbiobacterium thermophilum Chloroflexus aurantiacus
Dehalococcoides ethenogenesThermoanaerobacter tengcongensis
Clostridium thermocellumFusobacterium nucleatum
Clostridium perfringensClostridium tetani
Thermus thermophilus Rubrobacter xylanophilus
Mycoplasma penetransUreaplasma parvum
Geobacillus stearothermophilusExiguobacterium sp 255-15
Bacillus cereus Bacillus halodurans
Listeria monocytogenes Bacillus subtilis
Oceanobacillus iheyensis Staphylococcus aureus
Lactobacillus johnsonii Pediococcus pentosaceusLactobacillus plantarum
Enterococcus faecalisLactococcus lactis
Streptococcus mutans Streptococcus agalactiae
Moorella thermoacetica Symbiobacterium thermophilum
Thermoanaerobacter tengcongensis Clostridium thermocellum
Clostridium acetobutylicumClostridium perfringens
Clostridium tetani Chlorobium tepidum
Fusobacterium nucleatumThermobifida fusca
Desulfovibrio desulfuricansMagnetococcus sp MC-1
Geobacter sulfurreducensSynechococcus elongatus
Prochlorococcus marinus Synechococcus sp WH 8102
Thermosynechococcus elongatus Nostoc punctiforme
Synechocystis sp PCC 6803 Trichodesmium erythraeum
Spirulina platensis Campylobacter jejuni Helicobacter pylori Wolinella succinogenes
Legionella pneumophilaMethylococcus capsulatus
Coxiella burnetii Photorhabdus luminescens
Pasteurella multocida Shewanella oneidensis Photobacterium profundum Vibrio parahaemolyticusNeisseria meningitidis
Chromobacterium violaceum Bordetella parapertussis
Ralstonia metallidurans Bordetella bronchiseptica Burkholderia pseudomalleiRalstonia metallidurans
Azoarcus sp EbN1 Dechloromonas aromatica
Nitrosomonas europaea Thiobacillus denitrificans
66
57 65 55
61
5160
9072
80
86
88
6090
63
50 52 75 74
9094
50 68 74
78
53
7985
8481
72
53 9968
7790
70
2022 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
related to the OmpA found in this gene cluster and wasnot included in the alignment
We also identified some mobile genes that might beinvolved in biodegradation of pollutants by searching thePfam database In one of the γ-proteobacterial fosmidsb1bcf11c4 we identified a glutathione-S-transferase(GST ORF36) gene that was flanked by an acetyltrans-ferase gene (ORF35) and a transporter (ORF34) Eukary-otic GSTs are important in detoxifying metabolism Wellcharacterized bacterial GSTs (such as dichloromethanedehalogenase and 12-dichloroepoxyethane epoxidase)on the other hand are catabolic enzymes that play anessential role in growth on various difficult-to-degradechemicals (Vuilleumier and Pagni 2002) Considering theenvironment the fosmid originated from ndash highly pollutedmarine sediments ndash these CDSs would be good candi-dates for genes involved in biodegradation of a xenbiotic
compound The b1bf11c4 GST-gene clusters with a γ-proteobacterium (Acinetobacter sp ADP1 Accession noYP_046221) However as observed by Vuilleumier andPagni (2002) the phylogeny suggests that this gene hasbeen frequently transferred In support of this CDS havingbeen acquired by LGT its neighbour ndash ORF34 ndash clustersrobustly within the β-proteobacteria while ORF35 clusterswith δ-proteobacteria (although with no bootstrapsupport)
Another gene that might be involved in biodegradationof pollutants was identified among the CDSs that havebeen transferred into the β-proteobacterial fosmidb1bf11a01 ndash ORF31 which encodes a dienelactonehydrolases Dienelactone hydrolases play a crucial role inchlorocatechol degradation via the modified ortho cleav-age pathway (Eulberg et al 1998 Muller et al 2004)suggesting that the bacterium from which this fragmentoriginated might use chloraromatic compounds as energysource However it should be noted that this CDS is foundin a cluster of CDSs from genome projects with no exper-imentally confirmed function Again this gene is flankedby other genes that also have been acquired by LGT Thephylogeny of the neighbouring genes ndash ORF30 an S4domain protein suggests that it has been acquired froma γ-proteobacterium The next gene upstream ORF29could not be used in phylogenetic analyses However thisCDS has no match in its close relative T denitrificans andits best match was to a conserved membrane protein fromClostridium tetani (Table S11) Thus it is likely that allthese genes have been acquired by LGT Notably a shortinverted repeat (80 identity) was found to flank thesegenes (34021ndash34040 36693ndash36674)
Few laterally transferred CDSs identified by G + C content
Differences in G + C content are commonly used as anindication of recent LGT (Lawrence and Ochman 1997)We identified only eight CDSs that showed a G + C con-tent 10 higher or lower than the average for the respec-tive fosmid clone (see Tables S1ndash12) ORF20 in the δ-proteobacterial clone b1bcf11h3 has a G + C content of475 compared with 366 for the complete fosmid ThisCDS clusters with Desulfovibrio vulgaris within a mixedclade with no bootstrap support and was not included inthe LGT estimate for this fosmid A very short ORFan(ORF1) in the candidate division OP8 clone b3cf12f09has a G + C content of 436 compared with 594 forthe fosmid clone In addition the transposase (ORF16)and its neighbouring ORFan (ORF17) in the same clonehave a G + C content of 463 and 402 respectivelyORF11 ORF13 and ORF14 in the γ-proteobacterial cloneb3cf12d07 all show higher G + C content than the restof the fosmid with 664 657 and 647 comparedwith 525 for the rest of the fosmid All these CDSs
Fig 5 Maximum Likelihood phylogeny of OmpA homologues esti-mated using PMBML (135 positions in alignment) The sequences were obtained by blasting the b1dcf51c12 ORF7 sequence against Gen-Bank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative We also removed three sequences from Chlamydiaceae as these sequences formed a long unstable branch in the tree as well as some sequences that where considerably shorter than the remaining alignment The tree was arbitrarily rooted by Agrobacterium tumefaciens Results from bootstrap analyses are indicated as in Fig 3
10
Agrobacterium tumefaciens Sinorhizobium meliloti
Brucella melitensis Mesorhizobium loti
Mesorhizobium sp BNC1 Helicobacter bizzozeronii
Bartonella henselae Rhodopseudomonas palustris Bradyrhizobium japonicum
Rhodobacter sphaeroidesSilicibacter sp TM1040
Rhodospirillum rubrum Caulobacter crescentus
Magnetospirillum gryphiswaldense Rickettsia typhi
Rickettsia sibirica Gluconobacter oxydans
Zymomonas mobilis Novosphingobium aromaticivorans
Novosphingobium aromaticivorans Magnetococcus sp MC-1
Myxococcus xanthusXanthomonas campestris
Desulfotalea psychrophila Wolinella succinogenes
Desulfotalea psychrophila Desulfovibrio vulgaris
Geobacter metallireducens Geobacter sulfurreducens
Geobacter metallireducens Geobacter sulfurreducens
Chlorobium tepidum b1bcf11h03ORF12
Bdellovibrio bacteriovorus b1dcf51c12ORF7
Psychrobacter sp 273-4 Acinetobacter sp ADP1
Microbulbifer degradans Pseudomonas syringae Pseudomonas aeruginosa
Rubrivivax gelatinosus Thiobacillus denitrificans Nitrosomonas europaea
Ralstonia solanacearum Ralstonia eutropha
Burkholderia fungorum Burkholderia cepacia
Burkholderia cepacia Burkholderia pseudomallei
Idiomarina loihiensisPhotobacterium profundum
Shewanella oneidensis Vibrio cholerae Vibrio vulnificus Vibrio parahaemolyticus
Haemophilus somnus Haemophilus influenzae
Pasteurella multocida Photorhabdus luminescens Yersinia pseudotuberculosis
Erwinia carotovora Salmonella enterica
Erwinia chrysanthemi
6155
79 61 83
7255
5467
71
52
65
5152
5474
82
52
73
528498 52
508992
8472 54
527383
698372
8783
77 92
52
LGT and phylogenetic assignment of metagenomic clones 2023
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
cluster with γ-proteobacteria and might therefore repre-sent recent within γ-proteobacteria transfers ORF40 inthe isin-proteobacterial clone b1dcf13c08 a short ORFanhas a G + C content of 222 compared with 347 forthe complete clone In addition ORF9 another ORFan inb1dcf13c08 has a marginally lower G + C content com-pared with the rest of the fosmid clone with 257 Simi-larly ORF26 in the Chloroflexi clone b1dcf13f01 has aG + C content of 478 G + C compared with 569 forthe complete fosmid clone
The first protein coding sequences from uncultivated lineages
Four of the fosmids that we sequenced were from uncul-tivated lineages These fosmid clones represent to ourknowledge the first protein coding sequences obtainedfrom these major bacterial lineages In agreement withtheir rRNA phylotype most of the CDSs with homologuesin GenBank are found as independent lineages in phylo-genetic trees (Fig 1 Table 1) These clones also containseveral large CDSs with no significant matches in Gen-Bank or only partial matches to known proteins (Fig 1Table 1) A t-test showed that both the proportion ofORFans (P = 0002) and the proportion of coding bases(P = 002) with no match in GenBank (excluding the envi-ronmental part of GenBank) were significantly higherthan what was observed in fosmid clones from lineagesthat have cultivated representatives
The two candidate division WS3 clones b1bcf11f04and b1dcf51c12 contain several large CDSs for whichwe can make no clear functional prediction or that haveno match in GenBank For instance for b1dcf51c12 halfof the clone is occupied by two CDSs that have no signif-icant matches in GenBank (ORF4) or only a single match(ORF5) Also none of these CDSs had significantmatches to domains in Pfam These CDSs might repre-sent lineage-specific proteins and homologues may beidentified when more sequences from this lineages areavailable The candidate division OP8 also contains anumber of ORFans however in this fosmid the predictedproteins tend to be smaller than what we observed for thetwo WS3 clones
The b1dcf51a06 clone encodes a large ORFan(ORF1) as well as several smaller ORFans (ORF5ORF7ndash9 ORF14) and CDSs with only single hits in Gen-Bank (ORF6 ORF11ndash13) (Fig 1) For ORF1 we canmake some functional prediction based on Pfamsearches This protein contains a nucleoside diphosphatekinases domain a fibronectin type III domain as well asa PBS lyase HEAT-like repeat (three repeat units) ThePBS lyase repeat is responsible for specifically attachingparticular phycobilins to apophycobiliprotein subunits inthe phycobilisomes (PBS) which are light harvesting mac-
romolecular complexes of cyanobacteria and red algae(Zhao et al 2000) The phycobilins are open-chain tet-rapyrrole chromophores which function as the photosyn-thetic light-harvesting pigments Interestingly two otherCDSs ndash ORF15 and ORF16 ndash also contain several PBSrepeats It is possible that the proteins encoded by thePBS-containing CDSs in b1dcf51a06 has a similar func-tion as the PBS lyase proteins in cyanobacteria andthat this fosmid clone originated from a photosyntheticorganism
Among the CDSs that do have matches in GenBank arepotential phylogenetic markers The candidate divisionWS3 clone b1bcf11f04 clone contains two CDSs withsimilarity to DNA polymerase III subunit A homologuesDnaE and the Gram-positive type PolC In phylogenetictrees of both genes the b1bcf11f04 homologue forms aseparate lineage (Fig 6) Conserved domain searches atNCBI showed that the PolC-like CDS shows similarity toonly part of this gene ndash the exonuclease domain ndash and itis fused to DinG that encodes Rad3-related DNA heli-cases Proteins with similar domain architecture are foundin several other bacterial genomes mostly Firmicutes aswell as S thermophilum and Chloroflexus aurantiacussuggesting that the candidate division WS3 might be spe-cifically related to one of these lineages In phylogenetictrees of the DinG domain of these proteins the fusionproteins are all found in the same clade (Fig 6) Howeverthe monophyly of this clade was not supported by boot-strap analyses In the Maximum Likelihood phylogeny theb1bcf11f4 CDS clusters at the bottom of the clade withC aurantiacus No non-fusion proteins are found inthis clade suggesting a single origin of this domainorganization
Summary
Metagenomic approaches play an increasing and highlyvisible role in microbial ecology The data sets they gen-erate are complex and coupling the information they pro-vide concerning the metabolic potential of an environmentto organismal lineage that may be present there remainsa challenge Here we have shown the utility of rRNA-targeted cloning and phylogenetic analysis of CDSs inmaking such a coupling We also show that LGT evenwhen not precluding provisional assignment to lineages(taxonomy) will likely complicate the history of any lin-eage (phylogenetics) making phylotype-ecotype infer-ences provisional Environmental metagenomic data opena window into a rich world of genetic interactions someof which might be partially reconstructed as we havedescribed here The bioinformatic challenges associatedwith a complete metagenomic assessment of an environ-ment as complex as Baltimore harbour sediment aredaunting indeed But progress in understanding our own
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
LGT and phylogenetic assignment of metagenomic clones 2021
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
(ORF16) showing that this lineage has indeed acquiredproteobacterial genes This CDS might have been part ofthe α-proteobacterial island upon transfer
In the Flavobacteriaceae fosmid b1bf11d10 a largeself-transmitting conjugative transposon was identified(Fig 1) This transposon is inserted next to a tRNA and issimilar in sequence and structure to the transposonsfound in Bacteroides thetaiotaomicron (Xu et al 2003)Bacteroides fragilis (Kuwahara et al 2004) and Porphy-romonas gingivalis (Nelson et al 2003) In the phyloge-netic tree of the transposase gene (ORF21) the CDSfrom the fosmid falls into a cluster containing numerousB thetaiotaomicron sequences separated from the singleCytophaga hutchinsonii homologue detected among the100 best BLAST hits For the other CDSs that are clearlypart of this transposon (ORF22ndashORF27) we found no
significant homologues in C hutchinsonii and the best(and in most cases the only) match was always to Bthetaiotaomicron and P gingivalis genes suggesting thatthis transposon has been acquired from the Bacteroidaleslineage It is likely that we have captured only part of thistransposon ndash because many of the CDSs found in thetransposons in B thetaiotaomicron are not present in thefragment we have sequenced ndash and that also the 3prime CDSsin this fosmid clone (ORF28ndashORF30) were transferredalong with this transposon Additional CDSs (possibly notinvolved in transposon function) where also present in theB thetaiotaomicron transposons (Xu et al 2003) Wenote that the acquisition of this transposon was notincluded in our LGT estimate as it originated from thesame major bacterial group as the fosmid clone
Interestingly one gene was found to have been trans-ferred to two of the fosmids the fusA paralogue inb1bcf11d04 and b1dcf51c12 (Figs 1 and 4) This pro-tein appears to be a distant paralogue of fusA and it hasa very patchy phylogenetic distribution suggesting that itoriginated in one of the lineages that possesses it andthen has been transferred to the other lineages Onecharacteristic common to the organisms encoding thisprotein is that they are all anaerobes or microaerophilic(Symbiobacterium thermophilum) and they are all foundin environments similar to the one sampled here Trans-ferred genes are likely to give a selective advantage in theenvironment where the organisms harbouring them liveand an ecological function for this fusA paralogue shouldbe sought
Another set of genes identified in two of the fosmidclones forms a cluster encoding outer membrane proteinsand proteins involved in biopolymer transport (OmpATolB TonB ExbD TolQ) This cluster is found in both thecandidate division WS3 clone b1dcf51c12 and the δ-proteobacterial clone b1bcf11h03 (Fig 1) In this casethe gene cluster appears to have been transferred from aδ-proteobacterium to b1dcf51c12 while it might be nativeto b1bcf11h03 (Fig 5) This gene cluster also appearsto have been transferred to Chlorobium tepidum as bothb1dcf51c12 and C tepidum cluster within the δ-proteo-bacteria for all these genes except TonB (from which wecould not make a reliable alignment) Robust phylogenieswere only obtained from OmpA and TolB However theconserved gene order in b1dcf51c12 C tepidumb1bcf11h03 and other δ-proteobacteria such as Geo-bacter suggests that this entire 4-kb fragment was trans-ferred from a δ-proteobacterium to C tepidum andb1dcf51c12 probably as two separate events Moreoverfor b1dcf51c12 the fusA paralogue discussed abovemay have been transferred as part of this gene cluster asthey are found close together in this clone The second δ-proteobacterial fosmid clone b1bcf11d04 also containsan OmpA homologue However this CDS is distantly
Fig 4 Maximum Likelihood phylogeny fusA homologues estimated using PMBML (661 positions in alignment) The sequences were obtained by blasting the b1bcf11d04 ORF19 and b1dcf51c12 ORF15 sequences against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbitrarily rooted by Aquifex aeolicus Results from bootstrap analyses are indicated as in Fig 3
10
Aquifex aeolicus Thermotoga maritima
Chlorobium tepidum b1dcf51c12ORF15
b1bcf11d04ORF19Desulfovibrio vulgaris
Desulfotalea psychrophila Magnetococcus sp MC-1
Geobacter sulfurreducens Geobacter metallireducens
Moorella thermoacetica Desulfitobacterium hafniense
Symbiobacterium thermophilum Chloroflexus aurantiacus
Dehalococcoides ethenogenesThermoanaerobacter tengcongensis
Clostridium thermocellumFusobacterium nucleatum
Clostridium perfringensClostridium tetani
Thermus thermophilus Rubrobacter xylanophilus
Mycoplasma penetransUreaplasma parvum
Geobacillus stearothermophilusExiguobacterium sp 255-15
Bacillus cereus Bacillus halodurans
Listeria monocytogenes Bacillus subtilis
Oceanobacillus iheyensis Staphylococcus aureus
Lactobacillus johnsonii Pediococcus pentosaceusLactobacillus plantarum
Enterococcus faecalisLactococcus lactis
Streptococcus mutans Streptococcus agalactiae
Moorella thermoacetica Symbiobacterium thermophilum
Thermoanaerobacter tengcongensis Clostridium thermocellum
Clostridium acetobutylicumClostridium perfringens
Clostridium tetani Chlorobium tepidum
Fusobacterium nucleatumThermobifida fusca
Desulfovibrio desulfuricansMagnetococcus sp MC-1
Geobacter sulfurreducensSynechococcus elongatus
Prochlorococcus marinus Synechococcus sp WH 8102
Thermosynechococcus elongatus Nostoc punctiforme
Synechocystis sp PCC 6803 Trichodesmium erythraeum
Spirulina platensis Campylobacter jejuni Helicobacter pylori Wolinella succinogenes
Legionella pneumophilaMethylococcus capsulatus
Coxiella burnetii Photorhabdus luminescens
Pasteurella multocida Shewanella oneidensis Photobacterium profundum Vibrio parahaemolyticusNeisseria meningitidis
Chromobacterium violaceum Bordetella parapertussis
Ralstonia metallidurans Bordetella bronchiseptica Burkholderia pseudomalleiRalstonia metallidurans
Azoarcus sp EbN1 Dechloromonas aromatica
Nitrosomonas europaea Thiobacillus denitrificans
66
57 65 55
61
5160
9072
80
86
88
6090
63
50 52 75 74
9094
50 68 74
78
53
7985
8481
72
53 9968
7790
70
2022 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
related to the OmpA found in this gene cluster and wasnot included in the alignment
We also identified some mobile genes that might beinvolved in biodegradation of pollutants by searching thePfam database In one of the γ-proteobacterial fosmidsb1bcf11c4 we identified a glutathione-S-transferase(GST ORF36) gene that was flanked by an acetyltrans-ferase gene (ORF35) and a transporter (ORF34) Eukary-otic GSTs are important in detoxifying metabolism Wellcharacterized bacterial GSTs (such as dichloromethanedehalogenase and 12-dichloroepoxyethane epoxidase)on the other hand are catabolic enzymes that play anessential role in growth on various difficult-to-degradechemicals (Vuilleumier and Pagni 2002) Considering theenvironment the fosmid originated from ndash highly pollutedmarine sediments ndash these CDSs would be good candi-dates for genes involved in biodegradation of a xenbiotic
compound The b1bf11c4 GST-gene clusters with a γ-proteobacterium (Acinetobacter sp ADP1 Accession noYP_046221) However as observed by Vuilleumier andPagni (2002) the phylogeny suggests that this gene hasbeen frequently transferred In support of this CDS havingbeen acquired by LGT its neighbour ndash ORF34 ndash clustersrobustly within the β-proteobacteria while ORF35 clusterswith δ-proteobacteria (although with no bootstrapsupport)
Another gene that might be involved in biodegradationof pollutants was identified among the CDSs that havebeen transferred into the β-proteobacterial fosmidb1bf11a01 ndash ORF31 which encodes a dienelactonehydrolases Dienelactone hydrolases play a crucial role inchlorocatechol degradation via the modified ortho cleav-age pathway (Eulberg et al 1998 Muller et al 2004)suggesting that the bacterium from which this fragmentoriginated might use chloraromatic compounds as energysource However it should be noted that this CDS is foundin a cluster of CDSs from genome projects with no exper-imentally confirmed function Again this gene is flankedby other genes that also have been acquired by LGT Thephylogeny of the neighbouring genes ndash ORF30 an S4domain protein suggests that it has been acquired froma γ-proteobacterium The next gene upstream ORF29could not be used in phylogenetic analyses However thisCDS has no match in its close relative T denitrificans andits best match was to a conserved membrane protein fromClostridium tetani (Table S11) Thus it is likely that allthese genes have been acquired by LGT Notably a shortinverted repeat (80 identity) was found to flank thesegenes (34021ndash34040 36693ndash36674)
Few laterally transferred CDSs identified by G + C content
Differences in G + C content are commonly used as anindication of recent LGT (Lawrence and Ochman 1997)We identified only eight CDSs that showed a G + C con-tent 10 higher or lower than the average for the respec-tive fosmid clone (see Tables S1ndash12) ORF20 in the δ-proteobacterial clone b1bcf11h3 has a G + C content of475 compared with 366 for the complete fosmid ThisCDS clusters with Desulfovibrio vulgaris within a mixedclade with no bootstrap support and was not included inthe LGT estimate for this fosmid A very short ORFan(ORF1) in the candidate division OP8 clone b3cf12f09has a G + C content of 436 compared with 594 forthe fosmid clone In addition the transposase (ORF16)and its neighbouring ORFan (ORF17) in the same clonehave a G + C content of 463 and 402 respectivelyORF11 ORF13 and ORF14 in the γ-proteobacterial cloneb3cf12d07 all show higher G + C content than the restof the fosmid with 664 657 and 647 comparedwith 525 for the rest of the fosmid All these CDSs
Fig 5 Maximum Likelihood phylogeny of OmpA homologues esti-mated using PMBML (135 positions in alignment) The sequences were obtained by blasting the b1dcf51c12 ORF7 sequence against Gen-Bank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative We also removed three sequences from Chlamydiaceae as these sequences formed a long unstable branch in the tree as well as some sequences that where considerably shorter than the remaining alignment The tree was arbitrarily rooted by Agrobacterium tumefaciens Results from bootstrap analyses are indicated as in Fig 3
10
Agrobacterium tumefaciens Sinorhizobium meliloti
Brucella melitensis Mesorhizobium loti
Mesorhizobium sp BNC1 Helicobacter bizzozeronii
Bartonella henselae Rhodopseudomonas palustris Bradyrhizobium japonicum
Rhodobacter sphaeroidesSilicibacter sp TM1040
Rhodospirillum rubrum Caulobacter crescentus
Magnetospirillum gryphiswaldense Rickettsia typhi
Rickettsia sibirica Gluconobacter oxydans
Zymomonas mobilis Novosphingobium aromaticivorans
Novosphingobium aromaticivorans Magnetococcus sp MC-1
Myxococcus xanthusXanthomonas campestris
Desulfotalea psychrophila Wolinella succinogenes
Desulfotalea psychrophila Desulfovibrio vulgaris
Geobacter metallireducens Geobacter sulfurreducens
Geobacter metallireducens Geobacter sulfurreducens
Chlorobium tepidum b1bcf11h03ORF12
Bdellovibrio bacteriovorus b1dcf51c12ORF7
Psychrobacter sp 273-4 Acinetobacter sp ADP1
Microbulbifer degradans Pseudomonas syringae Pseudomonas aeruginosa
Rubrivivax gelatinosus Thiobacillus denitrificans Nitrosomonas europaea
Ralstonia solanacearum Ralstonia eutropha
Burkholderia fungorum Burkholderia cepacia
Burkholderia cepacia Burkholderia pseudomallei
Idiomarina loihiensisPhotobacterium profundum
Shewanella oneidensis Vibrio cholerae Vibrio vulnificus Vibrio parahaemolyticus
Haemophilus somnus Haemophilus influenzae
Pasteurella multocida Photorhabdus luminescens Yersinia pseudotuberculosis
Erwinia carotovora Salmonella enterica
Erwinia chrysanthemi
6155
79 61 83
7255
5467
71
52
65
5152
5474
82
52
73
528498 52
508992
8472 54
527383
698372
8783
77 92
52
LGT and phylogenetic assignment of metagenomic clones 2023
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
cluster with γ-proteobacteria and might therefore repre-sent recent within γ-proteobacteria transfers ORF40 inthe isin-proteobacterial clone b1dcf13c08 a short ORFanhas a G + C content of 222 compared with 347 forthe complete clone In addition ORF9 another ORFan inb1dcf13c08 has a marginally lower G + C content com-pared with the rest of the fosmid clone with 257 Simi-larly ORF26 in the Chloroflexi clone b1dcf13f01 has aG + C content of 478 G + C compared with 569 forthe complete fosmid clone
The first protein coding sequences from uncultivated lineages
Four of the fosmids that we sequenced were from uncul-tivated lineages These fosmid clones represent to ourknowledge the first protein coding sequences obtainedfrom these major bacterial lineages In agreement withtheir rRNA phylotype most of the CDSs with homologuesin GenBank are found as independent lineages in phylo-genetic trees (Fig 1 Table 1) These clones also containseveral large CDSs with no significant matches in Gen-Bank or only partial matches to known proteins (Fig 1Table 1) A t-test showed that both the proportion ofORFans (P = 0002) and the proportion of coding bases(P = 002) with no match in GenBank (excluding the envi-ronmental part of GenBank) were significantly higherthan what was observed in fosmid clones from lineagesthat have cultivated representatives
The two candidate division WS3 clones b1bcf11f04and b1dcf51c12 contain several large CDSs for whichwe can make no clear functional prediction or that haveno match in GenBank For instance for b1dcf51c12 halfof the clone is occupied by two CDSs that have no signif-icant matches in GenBank (ORF4) or only a single match(ORF5) Also none of these CDSs had significantmatches to domains in Pfam These CDSs might repre-sent lineage-specific proteins and homologues may beidentified when more sequences from this lineages areavailable The candidate division OP8 also contains anumber of ORFans however in this fosmid the predictedproteins tend to be smaller than what we observed for thetwo WS3 clones
The b1dcf51a06 clone encodes a large ORFan(ORF1) as well as several smaller ORFans (ORF5ORF7ndash9 ORF14) and CDSs with only single hits in Gen-Bank (ORF6 ORF11ndash13) (Fig 1) For ORF1 we canmake some functional prediction based on Pfamsearches This protein contains a nucleoside diphosphatekinases domain a fibronectin type III domain as well asa PBS lyase HEAT-like repeat (three repeat units) ThePBS lyase repeat is responsible for specifically attachingparticular phycobilins to apophycobiliprotein subunits inthe phycobilisomes (PBS) which are light harvesting mac-
romolecular complexes of cyanobacteria and red algae(Zhao et al 2000) The phycobilins are open-chain tet-rapyrrole chromophores which function as the photosyn-thetic light-harvesting pigments Interestingly two otherCDSs ndash ORF15 and ORF16 ndash also contain several PBSrepeats It is possible that the proteins encoded by thePBS-containing CDSs in b1dcf51a06 has a similar func-tion as the PBS lyase proteins in cyanobacteria andthat this fosmid clone originated from a photosyntheticorganism
Among the CDSs that do have matches in GenBank arepotential phylogenetic markers The candidate divisionWS3 clone b1bcf11f04 clone contains two CDSs withsimilarity to DNA polymerase III subunit A homologuesDnaE and the Gram-positive type PolC In phylogenetictrees of both genes the b1bcf11f04 homologue forms aseparate lineage (Fig 6) Conserved domain searches atNCBI showed that the PolC-like CDS shows similarity toonly part of this gene ndash the exonuclease domain ndash and itis fused to DinG that encodes Rad3-related DNA heli-cases Proteins with similar domain architecture are foundin several other bacterial genomes mostly Firmicutes aswell as S thermophilum and Chloroflexus aurantiacussuggesting that the candidate division WS3 might be spe-cifically related to one of these lineages In phylogenetictrees of the DinG domain of these proteins the fusionproteins are all found in the same clade (Fig 6) Howeverthe monophyly of this clade was not supported by boot-strap analyses In the Maximum Likelihood phylogeny theb1bcf11f4 CDS clusters at the bottom of the clade withC aurantiacus No non-fusion proteins are found inthis clade suggesting a single origin of this domainorganization
Summary
Metagenomic approaches play an increasing and highlyvisible role in microbial ecology The data sets they gen-erate are complex and coupling the information they pro-vide concerning the metabolic potential of an environmentto organismal lineage that may be present there remainsa challenge Here we have shown the utility of rRNA-targeted cloning and phylogenetic analysis of CDSs inmaking such a coupling We also show that LGT evenwhen not precluding provisional assignment to lineages(taxonomy) will likely complicate the history of any lin-eage (phylogenetics) making phylotype-ecotype infer-ences provisional Environmental metagenomic data opena window into a rich world of genetic interactions someof which might be partially reconstructed as we havedescribed here The bioinformatic challenges associatedwith a complete metagenomic assessment of an environ-ment as complex as Baltimore harbour sediment aredaunting indeed But progress in understanding our own
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
2022 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
related to the OmpA found in this gene cluster and wasnot included in the alignment
We also identified some mobile genes that might beinvolved in biodegradation of pollutants by searching thePfam database In one of the γ-proteobacterial fosmidsb1bcf11c4 we identified a glutathione-S-transferase(GST ORF36) gene that was flanked by an acetyltrans-ferase gene (ORF35) and a transporter (ORF34) Eukary-otic GSTs are important in detoxifying metabolism Wellcharacterized bacterial GSTs (such as dichloromethanedehalogenase and 12-dichloroepoxyethane epoxidase)on the other hand are catabolic enzymes that play anessential role in growth on various difficult-to-degradechemicals (Vuilleumier and Pagni 2002) Considering theenvironment the fosmid originated from ndash highly pollutedmarine sediments ndash these CDSs would be good candi-dates for genes involved in biodegradation of a xenbiotic
compound The b1bf11c4 GST-gene clusters with a γ-proteobacterium (Acinetobacter sp ADP1 Accession noYP_046221) However as observed by Vuilleumier andPagni (2002) the phylogeny suggests that this gene hasbeen frequently transferred In support of this CDS havingbeen acquired by LGT its neighbour ndash ORF34 ndash clustersrobustly within the β-proteobacteria while ORF35 clusterswith δ-proteobacteria (although with no bootstrapsupport)
Another gene that might be involved in biodegradationof pollutants was identified among the CDSs that havebeen transferred into the β-proteobacterial fosmidb1bf11a01 ndash ORF31 which encodes a dienelactonehydrolases Dienelactone hydrolases play a crucial role inchlorocatechol degradation via the modified ortho cleav-age pathway (Eulberg et al 1998 Muller et al 2004)suggesting that the bacterium from which this fragmentoriginated might use chloraromatic compounds as energysource However it should be noted that this CDS is foundin a cluster of CDSs from genome projects with no exper-imentally confirmed function Again this gene is flankedby other genes that also have been acquired by LGT Thephylogeny of the neighbouring genes ndash ORF30 an S4domain protein suggests that it has been acquired froma γ-proteobacterium The next gene upstream ORF29could not be used in phylogenetic analyses However thisCDS has no match in its close relative T denitrificans andits best match was to a conserved membrane protein fromClostridium tetani (Table S11) Thus it is likely that allthese genes have been acquired by LGT Notably a shortinverted repeat (80 identity) was found to flank thesegenes (34021ndash34040 36693ndash36674)
Few laterally transferred CDSs identified by G + C content
Differences in G + C content are commonly used as anindication of recent LGT (Lawrence and Ochman 1997)We identified only eight CDSs that showed a G + C con-tent 10 higher or lower than the average for the respec-tive fosmid clone (see Tables S1ndash12) ORF20 in the δ-proteobacterial clone b1bcf11h3 has a G + C content of475 compared with 366 for the complete fosmid ThisCDS clusters with Desulfovibrio vulgaris within a mixedclade with no bootstrap support and was not included inthe LGT estimate for this fosmid A very short ORFan(ORF1) in the candidate division OP8 clone b3cf12f09has a G + C content of 436 compared with 594 forthe fosmid clone In addition the transposase (ORF16)and its neighbouring ORFan (ORF17) in the same clonehave a G + C content of 463 and 402 respectivelyORF11 ORF13 and ORF14 in the γ-proteobacterial cloneb3cf12d07 all show higher G + C content than the restof the fosmid with 664 657 and 647 comparedwith 525 for the rest of the fosmid All these CDSs
Fig 5 Maximum Likelihood phylogeny of OmpA homologues esti-mated using PMBML (135 positions in alignment) The sequences were obtained by blasting the b1dcf51c12 ORF7 sequence against Gen-Bank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative We also removed three sequences from Chlamydiaceae as these sequences formed a long unstable branch in the tree as well as some sequences that where considerably shorter than the remaining alignment The tree was arbitrarily rooted by Agrobacterium tumefaciens Results from bootstrap analyses are indicated as in Fig 3
10
Agrobacterium tumefaciens Sinorhizobium meliloti
Brucella melitensis Mesorhizobium loti
Mesorhizobium sp BNC1 Helicobacter bizzozeronii
Bartonella henselae Rhodopseudomonas palustris Bradyrhizobium japonicum
Rhodobacter sphaeroidesSilicibacter sp TM1040
Rhodospirillum rubrum Caulobacter crescentus
Magnetospirillum gryphiswaldense Rickettsia typhi
Rickettsia sibirica Gluconobacter oxydans
Zymomonas mobilis Novosphingobium aromaticivorans
Novosphingobium aromaticivorans Magnetococcus sp MC-1
Myxococcus xanthusXanthomonas campestris
Desulfotalea psychrophila Wolinella succinogenes
Desulfotalea psychrophila Desulfovibrio vulgaris
Geobacter metallireducens Geobacter sulfurreducens
Geobacter metallireducens Geobacter sulfurreducens
Chlorobium tepidum b1bcf11h03ORF12
Bdellovibrio bacteriovorus b1dcf51c12ORF7
Psychrobacter sp 273-4 Acinetobacter sp ADP1
Microbulbifer degradans Pseudomonas syringae Pseudomonas aeruginosa
Rubrivivax gelatinosus Thiobacillus denitrificans Nitrosomonas europaea
Ralstonia solanacearum Ralstonia eutropha
Burkholderia fungorum Burkholderia cepacia
Burkholderia cepacia Burkholderia pseudomallei
Idiomarina loihiensisPhotobacterium profundum
Shewanella oneidensis Vibrio cholerae Vibrio vulnificus Vibrio parahaemolyticus
Haemophilus somnus Haemophilus influenzae
Pasteurella multocida Photorhabdus luminescens Yersinia pseudotuberculosis
Erwinia carotovora Salmonella enterica
Erwinia chrysanthemi
6155
79 61 83
7255
5467
71
52
65
5152
5474
82
52
73
528498 52
508992
8472 54
527383
698372
8783
77 92
52
LGT and phylogenetic assignment of metagenomic clones 2023
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
cluster with γ-proteobacteria and might therefore repre-sent recent within γ-proteobacteria transfers ORF40 inthe isin-proteobacterial clone b1dcf13c08 a short ORFanhas a G + C content of 222 compared with 347 forthe complete clone In addition ORF9 another ORFan inb1dcf13c08 has a marginally lower G + C content com-pared with the rest of the fosmid clone with 257 Simi-larly ORF26 in the Chloroflexi clone b1dcf13f01 has aG + C content of 478 G + C compared with 569 forthe complete fosmid clone
The first protein coding sequences from uncultivated lineages
Four of the fosmids that we sequenced were from uncul-tivated lineages These fosmid clones represent to ourknowledge the first protein coding sequences obtainedfrom these major bacterial lineages In agreement withtheir rRNA phylotype most of the CDSs with homologuesin GenBank are found as independent lineages in phylo-genetic trees (Fig 1 Table 1) These clones also containseveral large CDSs with no significant matches in Gen-Bank or only partial matches to known proteins (Fig 1Table 1) A t-test showed that both the proportion ofORFans (P = 0002) and the proportion of coding bases(P = 002) with no match in GenBank (excluding the envi-ronmental part of GenBank) were significantly higherthan what was observed in fosmid clones from lineagesthat have cultivated representatives
The two candidate division WS3 clones b1bcf11f04and b1dcf51c12 contain several large CDSs for whichwe can make no clear functional prediction or that haveno match in GenBank For instance for b1dcf51c12 halfof the clone is occupied by two CDSs that have no signif-icant matches in GenBank (ORF4) or only a single match(ORF5) Also none of these CDSs had significantmatches to domains in Pfam These CDSs might repre-sent lineage-specific proteins and homologues may beidentified when more sequences from this lineages areavailable The candidate division OP8 also contains anumber of ORFans however in this fosmid the predictedproteins tend to be smaller than what we observed for thetwo WS3 clones
The b1dcf51a06 clone encodes a large ORFan(ORF1) as well as several smaller ORFans (ORF5ORF7ndash9 ORF14) and CDSs with only single hits in Gen-Bank (ORF6 ORF11ndash13) (Fig 1) For ORF1 we canmake some functional prediction based on Pfamsearches This protein contains a nucleoside diphosphatekinases domain a fibronectin type III domain as well asa PBS lyase HEAT-like repeat (three repeat units) ThePBS lyase repeat is responsible for specifically attachingparticular phycobilins to apophycobiliprotein subunits inthe phycobilisomes (PBS) which are light harvesting mac-
romolecular complexes of cyanobacteria and red algae(Zhao et al 2000) The phycobilins are open-chain tet-rapyrrole chromophores which function as the photosyn-thetic light-harvesting pigments Interestingly two otherCDSs ndash ORF15 and ORF16 ndash also contain several PBSrepeats It is possible that the proteins encoded by thePBS-containing CDSs in b1dcf51a06 has a similar func-tion as the PBS lyase proteins in cyanobacteria andthat this fosmid clone originated from a photosyntheticorganism
Among the CDSs that do have matches in GenBank arepotential phylogenetic markers The candidate divisionWS3 clone b1bcf11f04 clone contains two CDSs withsimilarity to DNA polymerase III subunit A homologuesDnaE and the Gram-positive type PolC In phylogenetictrees of both genes the b1bcf11f04 homologue forms aseparate lineage (Fig 6) Conserved domain searches atNCBI showed that the PolC-like CDS shows similarity toonly part of this gene ndash the exonuclease domain ndash and itis fused to DinG that encodes Rad3-related DNA heli-cases Proteins with similar domain architecture are foundin several other bacterial genomes mostly Firmicutes aswell as S thermophilum and Chloroflexus aurantiacussuggesting that the candidate division WS3 might be spe-cifically related to one of these lineages In phylogenetictrees of the DinG domain of these proteins the fusionproteins are all found in the same clade (Fig 6) Howeverthe monophyly of this clade was not supported by boot-strap analyses In the Maximum Likelihood phylogeny theb1bcf11f4 CDS clusters at the bottom of the clade withC aurantiacus No non-fusion proteins are found inthis clade suggesting a single origin of this domainorganization
Summary
Metagenomic approaches play an increasing and highlyvisible role in microbial ecology The data sets they gen-erate are complex and coupling the information they pro-vide concerning the metabolic potential of an environmentto organismal lineage that may be present there remainsa challenge Here we have shown the utility of rRNA-targeted cloning and phylogenetic analysis of CDSs inmaking such a coupling We also show that LGT evenwhen not precluding provisional assignment to lineages(taxonomy) will likely complicate the history of any lin-eage (phylogenetics) making phylotype-ecotype infer-ences provisional Environmental metagenomic data opena window into a rich world of genetic interactions someof which might be partially reconstructed as we havedescribed here The bioinformatic challenges associatedwith a complete metagenomic assessment of an environ-ment as complex as Baltimore harbour sediment aredaunting indeed But progress in understanding our own
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
LGT and phylogenetic assignment of metagenomic clones 2023
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
cluster with γ-proteobacteria and might therefore repre-sent recent within γ-proteobacteria transfers ORF40 inthe isin-proteobacterial clone b1dcf13c08 a short ORFanhas a G + C content of 222 compared with 347 forthe complete clone In addition ORF9 another ORFan inb1dcf13c08 has a marginally lower G + C content com-pared with the rest of the fosmid clone with 257 Simi-larly ORF26 in the Chloroflexi clone b1dcf13f01 has aG + C content of 478 G + C compared with 569 forthe complete fosmid clone
The first protein coding sequences from uncultivated lineages
Four of the fosmids that we sequenced were from uncul-tivated lineages These fosmid clones represent to ourknowledge the first protein coding sequences obtainedfrom these major bacterial lineages In agreement withtheir rRNA phylotype most of the CDSs with homologuesin GenBank are found as independent lineages in phylo-genetic trees (Fig 1 Table 1) These clones also containseveral large CDSs with no significant matches in Gen-Bank or only partial matches to known proteins (Fig 1Table 1) A t-test showed that both the proportion ofORFans (P = 0002) and the proportion of coding bases(P = 002) with no match in GenBank (excluding the envi-ronmental part of GenBank) were significantly higherthan what was observed in fosmid clones from lineagesthat have cultivated representatives
The two candidate division WS3 clones b1bcf11f04and b1dcf51c12 contain several large CDSs for whichwe can make no clear functional prediction or that haveno match in GenBank For instance for b1dcf51c12 halfof the clone is occupied by two CDSs that have no signif-icant matches in GenBank (ORF4) or only a single match(ORF5) Also none of these CDSs had significantmatches to domains in Pfam These CDSs might repre-sent lineage-specific proteins and homologues may beidentified when more sequences from this lineages areavailable The candidate division OP8 also contains anumber of ORFans however in this fosmid the predictedproteins tend to be smaller than what we observed for thetwo WS3 clones
The b1dcf51a06 clone encodes a large ORFan(ORF1) as well as several smaller ORFans (ORF5ORF7ndash9 ORF14) and CDSs with only single hits in Gen-Bank (ORF6 ORF11ndash13) (Fig 1) For ORF1 we canmake some functional prediction based on Pfamsearches This protein contains a nucleoside diphosphatekinases domain a fibronectin type III domain as well asa PBS lyase HEAT-like repeat (three repeat units) ThePBS lyase repeat is responsible for specifically attachingparticular phycobilins to apophycobiliprotein subunits inthe phycobilisomes (PBS) which are light harvesting mac-
romolecular complexes of cyanobacteria and red algae(Zhao et al 2000) The phycobilins are open-chain tet-rapyrrole chromophores which function as the photosyn-thetic light-harvesting pigments Interestingly two otherCDSs ndash ORF15 and ORF16 ndash also contain several PBSrepeats It is possible that the proteins encoded by thePBS-containing CDSs in b1dcf51a06 has a similar func-tion as the PBS lyase proteins in cyanobacteria andthat this fosmid clone originated from a photosyntheticorganism
Among the CDSs that do have matches in GenBank arepotential phylogenetic markers The candidate divisionWS3 clone b1bcf11f04 clone contains two CDSs withsimilarity to DNA polymerase III subunit A homologuesDnaE and the Gram-positive type PolC In phylogenetictrees of both genes the b1bcf11f04 homologue forms aseparate lineage (Fig 6) Conserved domain searches atNCBI showed that the PolC-like CDS shows similarity toonly part of this gene ndash the exonuclease domain ndash and itis fused to DinG that encodes Rad3-related DNA heli-cases Proteins with similar domain architecture are foundin several other bacterial genomes mostly Firmicutes aswell as S thermophilum and Chloroflexus aurantiacussuggesting that the candidate division WS3 might be spe-cifically related to one of these lineages In phylogenetictrees of the DinG domain of these proteins the fusionproteins are all found in the same clade (Fig 6) Howeverthe monophyly of this clade was not supported by boot-strap analyses In the Maximum Likelihood phylogeny theb1bcf11f4 CDS clusters at the bottom of the clade withC aurantiacus No non-fusion proteins are found inthis clade suggesting a single origin of this domainorganization
Summary
Metagenomic approaches play an increasing and highlyvisible role in microbial ecology The data sets they gen-erate are complex and coupling the information they pro-vide concerning the metabolic potential of an environmentto organismal lineage that may be present there remainsa challenge Here we have shown the utility of rRNA-targeted cloning and phylogenetic analysis of CDSs inmaking such a coupling We also show that LGT evenwhen not precluding provisional assignment to lineages(taxonomy) will likely complicate the history of any lin-eage (phylogenetics) making phylotype-ecotype infer-ences provisional Environmental metagenomic data opena window into a rich world of genetic interactions someof which might be partially reconstructed as we havedescribed here The bioinformatic challenges associatedwith a complete metagenomic assessment of an environ-ment as complex as Baltimore harbour sediment aredaunting indeed But progress in understanding our own
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
2024 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
genome when only 20 years ago the notion of sequenc-ing it was not widely supported gives reason forconfidence
Experimental procedures
DNA was isolated from anaerobic sediments sampled fromBaltimore harbour The samples were a gift from Dr Joy Watts(Center of Marine Biotechnology University of MarylandBiotechnology Institute) and were obtained as described inHoloman and colleagues (1998) DNA was extracted follow-ing the protocol in Rondon and colleagues (2000) except thatinstead of electroeluting the DNA after preparative pulsed-field gel electrophoresis we cleaned it using the GELase-kitfrom Epicentre
The B1BF1 fosmid libraries were constructed using theCopyControltrade Fosmid Library Production Kit from Epicentrefollowing the protocol of manufacturer Fosmid clones wereminipreped using either alkaline lysis with GeneMachinerobotics (Genomic Solutions) or the REAL Prep 96 Plas-mid Kit (Qiagen) End-sequencing of minipreped fosmidclones was performed using the DYEnamictrade ET Dye Termi-nator Kit (MegaBACE) and a MegaBACEtrade 1000 (Amer-sham) Ten 96-plates of preped fosmids were screened usingthe I-CeuI homing endonuclease (NEB)
A fosmid vector containing an I-CeuI site and a blunt-endsite was constructed by ligating the adaptor CGTAACTATAACGGTCCTAAGGTAGCGAACACGTG into pCC1Fos(Epicentre) In order to obtain as many CDSs as possible in
our fosmid clones we chose to clone in the direction 23SrRNAminus5S rRNA for our present study The vector for cloningin the direction 23S rRNAminus16S rRNA was also constructedand is available from the authors (pCC1FosCeuI16S) Themodified vector pCC1FosCeuI23S was prepared using theLarge Construct Kit (Qiagen) and cut with I-CeuI overnightAfter cleaning the vector from gel the vector was cut withPmlI overnight to make a blunt site The vector was thendephosphorylated using shrimp alkaline phosphatase(Amersham Biosciences) followed by phenolchloroformextraction and ethanol precipitation Ligation of DNA intopCC1FosCeuI23S was performed as described aboveexcept DNA was cut overnight with I-CeuI following the end-repair step in the CopyControltrade Fosmid Library ProductionKit protocol
Subcloning of fosmid clones was performed using theTOPOreg Shotgun Subcloning Kit (Invitrogen) and each fos-mid was sequenced to gt8 times coverage Low-quality regionsand gaps were targeted by PCR (final 82ndash143 times coverage)For one low-quality region we were not able to obtain high-quality sequence position 1192ndash1342 in b1dcf13c08 Thefosmid clones were assembled using PhredPhrap CDSswere identified using the run-glimmer2 script using the stan-dard settings provided in this script (Delcher et al 1999) andCDSs shorter than 100 bp were eliminated If two overlap-ping CDSs were identified we selected the one that hadsignificant homologues in GenBank In cases where CDSswhere idenitified that have no match in GenBank we analy-sed the region using ORF-finder (httpwwwncbinlmnihgovgorfgorfhtml) and finally by doing BLASTX searches If an
PolC + DinG fusion proteinssame domain structure as b1bcf11f04ORF17
10
Actinobacillus pleuropneumoniae
Yersinia pestis
Vibrio cholerae
Photobacterium profundum
Idiomarina loihiensis
Methylococcus capsulatus
Xanthomonas oryzae
62
876175
Polaromonas sp JS666
Thiobacillus denitrificans
71
Burkholderia cepacia Bordetella parapertussis
74
Methylobacillus flagellatusAzoarcus sp EbN1
Desulfotalea psychrophila Magnetococcus sp MC-1 61
53Gloeobacter violaceus
Propionibacterium acnes Mycobacterium avium
Corynebacterium diphtheriae
Nocardia farcinica 62 92100
Shewanella oneidensis
Vibrio cholerae
Photobacterium profundum
83
Xanthomonas axonopodis
Neisseria meningitidisProteus vulgaris Microbulbifer degradansAzotobacter vinelandii
Leptospira interrogans
51
Rhodopirellula baltica
6463
Fusobacterium nucleatum
59Treponema denticola
558960
Parachlamydia sp UWE25
Geobacter sulfurreducens
Geobacter metallireducens
b1bcf11f04ORF17Chloroflexus aurantiacus
Moorella thermoacetica
Desulfitobacterium hafniense5353
80
5269
61
Exiguobacterium sp 255-15
Symbiobacterium thermophilum
Bacillus halodurans
Geobacillus kaustophilus
Bacillus cereus Oceanobacillus iheyensis
Listeria monocytogenes Pediococcus pentosaceus
Bacillus licheniformis
Bacillus subtilis
Fig 6 Maximum Likelihood phylogeny of the DinG domain of homologues of b1bcf11f04 ORF17 estimated using PMBML (517 positions in alignment) The sequences were obtained by blasting the b1bcf11f04 ORF17 sequence against GenBank and the 100 best matches where retrieved and aligned Groups of very similar sequences from the same species or sister species where trimmed down to one sequence representative The tree was arbi-trarily rooted by Actinobacillus pleuropneumo-niae Results from bootstrap analyses are indicated as in Fig 3
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
LGT and phylogenetic assignment of metagenomic clones 2025
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
alternative CDS was obtained using ORF-finder that did havea match in GenBank then that CDS was selected T-RNAswere identified with tRNAscan-SE (Lowe and Eddy 1997)The CDSs were annotated using BLASTP searches (Altschulet al 1997) of GenBank at httpwwwncbinlmnihgovBLAST and Pfam searches (Bateman et al 2004) at httpwwwsangeracukSoftwarePfamsearchshtml
Phylogenetic analyses of the 1000 bp 23S rRNA fragmentand 16S rRNA genes were carried out in PAUP (Swofford2001) Minimum evolution trees were constructed using Log-Det distances and Maximum Likelihood trees were con-structed using a general time-reversible model with gammadistributed rates with four categories and invariable sites(GTR + Γ + Ι) Ten random addition cycles of the sequencesand tree bisection and reconnection (TBR) branch swappingwere used in both cases Homologues of the CDSs in Gen-Bank were identified and retrieved using BLASTP searches athttpwwwncbinlmnihgovBLAST For b1dcf13f01 wealso searched the draft genome of C aurantiacus at httpgenomejgi-psforgmicrobial Initially up to 100 significantmatches were retrieved and aligned Clusters of very similarsequences from the same or sister taxa were trimmeddown to one representative sequence We also removedsequences that were considerably shorter than the rest of thealignment as well as sequences that were difficult to alignThe alignments were edited by deleting regions with many orlarge gaps Phylogenetic analysis of protein sequences(CDSs) was carried out in two steps First simple Neighbour-joining trees with bootstrap analyses were performed for allCDSs with significant matches in BLASTP searches If thephylogeny of the CDS disagreed with the phylogeny of therRNA ie if the CDS clustered with another major bacterialgroup than the rRNA a minimum evolution tree (with boot-strap analysis 100 replicates with global rearrangements)was estimated from Maximum Likelihood distances [JTT(Jones et al 1992) + Γ global rearrangements and 10 ran-dom addition replicates] If the trees supported a differentphylogenetic grouping than that observed from the rRNA(with bootstrap support gt50) the CDS was classified asbeing acquired by LGT It should be noted that we onlyclassified as LGT transfers between bacterial groups orphyla eg from α-proteobacteria to γ-proteobacteria or fromthe BacteroidetesChlorobi-group to γ-proteobacteria nowithin-group transfers were included For some of these treesthe CDS from the fosmid was found within a clade containingrepresentatives from several different bacterial groups sug-gesting frequent transfers of the gene (see Table 1) In thesecases we classified the CDS as acquired by LGT but itshould be noted that for such phylogenies it is not possibleto identify the donor and recipients For some LGT-CDSs wealso constructed protein Maximum Likelihood phylogeniesusing PMBML (Veerassamy et al 2003) a modified version ofthe of PROML within the PHYLIP package version 36a2(Felsenstein 2001) For these analyses we used a JTT + Γmodel global rearrangements and 10 random addition repli-cates In the Maximum Likelihood bootstrap analyses we didnot use global rearrangements and we only did one randomaddition of sequences per bootstrap replicate
All sequences have been submitted to GenBank withAccession numbers AJ937675 and AJ937676 (rRNA oper-ons) and AJ937760ndashAJ937771 (fosmid clones)
Acknowledgements
This work was supported by funds from the Canadian Insti-tutes for Health Research (MOP 4467) and Genome Canada(Genome Atlantic) Sequencing was performed at theGenome Atlantic sequencing platform We want to thank DrFrancisco E Rodriguez Valera Rebecca J Case and Ter-ence L Marsh for invaluable discussions on the I-CeuIapproach to obtaining rRNA containing clones environmen-tal microbiology and LGT
References
Aagaard C Awayez MJ and Garrett RA (1997) Profileof the DNA recognition site of the archaeal homing endo-nuclease I-DmoI Nucleic Acids Res 25 1523ndash1530
Altschul SF Madden TL Schaffer AA Zhang JZhang Z Miller W and Lipman DJ (1997) GappedBLAST and PSI-BLAST a new generation of protein databasesearch programs Nucleic Acids Res 25 3389ndash3402
Andersson JO Sjogren AM Davis LA Embley TMand Roger AJ (2003) Phylogenetic analyses ofdiplomonad genes reveal frequent lateral gene transfersaffecting eukaryotes Curr Biol 13 94ndash104
Bateman A Coin L Durbin R Finn RD Hollich VGriffiths-Jones S et al (2004) The Pfam protein familiesdatabase Nucleic Acids Res 32 D138ndashD141
Beja O Aravind L Koonin EV Suzuki MT Hadd ANguyen LP et al (2000) Bacterial rhodopsin evidencefor a new type of phototrophy in the sea Science 2891902ndash1906
Beja O Spudich EN Spudich JL Leclerc M andDeLong EF (2001) Proteorhodopsin phototrophy in theocean Nature 411 786ndash789
Cannone JJ Subramanian S Schnare MN Collett JRDu DrsquoSouza LM Y et al (2002) The comparative RNAWeb (CRW) site an online database of comparativesequence and structure information for ribosomal intronand other RNAs [WWW document] URL httpwwwrnaicmbutexasedu BMC Bioinformatics 3 2
Chevalier B Turmel M Lemieux C Monnat RJ Jr andStoddard BL (2003) Flexible DNA target site recognitionby divergent homing endonuclease isoschizomers I-CreIand I-MsoI J Mol Biol 329 253ndash269
de la Torre JR Christianson LM Beja O Suzuki MTKarl DM Heidelberg J amp DeLong EF (2003) Proteor-hodopsin genes are distributed among divergent marinebacterial taxa Proc Natl Acad Sci USA 100 12830ndash12835
Delcher AL Harmon D Kasif S White O and SalzbergSL (1999) Improved microbial gene identification withGLIMMER Nucleic Acids Res 27 4636ndash4641
Dojka MA Hugenholtz P Haack SK and Pace NR(1998) Microbial diversity in a hydrocarbon- and chlori-nated-solvent-contaminated aquifer undergoing intrinsicbioremediation Appl Environ Microbiol 64 3869ndash3877
Eulberg D Kourbatova EM Golovleva LA and Schlo-mann M (1998) Evolutionary relationship between chloro-catechol catabolic enzymes from Rhodococcus opacus1CP and their counterparts in proteobacteria sequencedivergence and functional convergence J Bacteriol 1801082ndash1094
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom
2026 C L Nesboslash Y Boucher M Dlutek and W F Doolittle
copy 2005 Society for Applied Microbiology and Blackwell Publishing Ltd Environmental Microbiology 7 2011ndash2026
Felsenstein J (2001) PHYLIP Phylogeny Inference PackageSeattle USA Department of Genetics University of Wash-ington
Holoman TR Elberson MA Cutter LA May HD andSowers KR (1998) Characterization of a defined 2356-tetrachlorobiphenyl-ortho-dechlorinating microbial com-munity by comparative sequence analysis of genes codingfor 16S rRNA Appl Environ Microbiol 64 3359ndash3367
Hugenholtz P Pitulle C Hershberger KL and Pace NR(1998) Novel division level bacterial diversity in a Yellow-stone hot spring J Bacteriol 180 366ndash376
Jones DT Taylor WR and Thornton JM (1992) Therapid generation of mutation data matrices from proteinsequences Comput Appl Biosci 8 275ndash282
Kuwahara T Yamashita A Hirakawa H Nakayama HToh H Okada N et al (2004) Genomic analysis ofBacteroides fragilis reveals extensive DNA inversions reg-ulating cell surface adaptation Proc Natl Acad Sci USA101 14919ndash14924
Lawrence JG and Ochman H (1997) Amelioration of bac-terial genomes rates of change and exchange J Mol Evol44 383ndash397
Lowe TM and Eddy SR (1997) tRNAscan-SE a programfor improved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25 955ndash964
Marshall P and Lemieux C (1992) The I-CeuI endonu-clease recognizes a sequence of 19 base pairs and pref-erentially cleaves the coding strand of the Chlamydomonasmoewusii chloroplast large subunit rRNA gene NucleicAcids Res 20 6401ndash6407
Muller TA Byrde SM Werlen C van der Meer JR andKohler HP (2004) Genetic analysis of phenoxyalkanoicacid degradation in Sphingomonas herbicidovorans MHAppl Environ Microbiol 70 6066ndash6075
Nelson KE Fleischmann RD DeBoy RT Paulsen ITFouts DE Eisen JA et al (2003) Complete genomesequence of the oral pathogenic Bacterium porphyromo-nas gingivalis strain W83 J Bacteriol 185 5591ndash5601
Nesboslash CL and Doolittle WF (2003) Active self-splicinggroup I introns in the 23S rRNA genes of hyperthermophilicbacteria derived from introns in eukaryotic organellesPNAS 100 10806ndash10811
Riesenfeld CS Schloss PD and Handelsman J (2004)Metagenomics genomic analysis of microbial communi-ties Annu Rev Genet 38 525ndash552
Rondon MR August PR Bettermann AD Brady SFGrossman TH Liles MR et al (2000) Cloning the soilmetagenome a strategy for accessing the genetic andfunctional diversity of uncultured microorganisms ApplEnviron Microbiol 66 2541ndash2547
Sanchez LB Galperin MY and Muller M (2000) Acetyl-CoA synthetase from the amitochondriate eukaryote Giar-
dia lamblia belongs to the newly recognized superfamily ofacyl-CoA synthetases (Nucleoside diphosphate-forming)J Biol Chem 275 5794ndash5803
Suzuki MT Preston CM Beja O de la Torre JRSteward GF and DeLong EF (2004) Phylogeneticscreening of ribosomal RNA gene-containing clones inbacterial artificial chromosome (BAC) libraries from dif-ferent depths in Monterey Bay Microb Ecol 48 473ndash488
Swofford DL (2001) PAUP Phylogenetic Analysis UsingParsimony (and Other Methods) Sunderland MA USASinauer Associates
Treusch AH Kletzin A Raddatz G Ochsenreiter TQuaiser A Meurer G et al (2004) Characterization oflarge-insert DNA libraries from soil for environmentalgenomic studies of Archaea Environ Microbiol 6 970ndash980
Veerassamy S Smith A and Tillier ER (2003) A transi-tion probability model for amino acid substitutions fromblocks J Comput Biol 10 997ndash1010
Vuilleumier S and Pagni M (2002) The elusive roles ofbacterial glutathione S-transferases new lessons fromgenomes Appl Microbiol Biotechnol 58 138ndash146
Xu J Bjursell MK Himrod J Deng S Carmichael LKChiang HC et al (2003) A genomic view of thehumanndashBacteroides thetaiotaomicron symbiosis Science299 2074ndash2076
Zhao KH Deng MG Zheng M Zhou M Parbel AStorf M et al (2000) Novel activity of a phycobiliproteinlyase both the attachment of phycocyanobilin and theisomerization to phycoviolobilin are catalyzed by the pro-teins PecE and PecF encoded by the phycoerythrocyaninoperon FEBS Lett 469 9ndash13
Supplementary material
The following supplementary material is available for thisarticle onlineFigure S1 A Number of BLAST hits with exp lt10 eminus10 todifferent taxonomic groupsB Distribution of G + C content of the sequencesC Distribution of the COG category of the BLAST hits explt10 eminus10Black bars refer to end-sequences and grey bars refer to thesequenced fosmid clonesTables S1ndash12 Annotation of b1dcf51a06 b1dcf13f01b3cf12f09 b1bcf11f04 b1dcf51c12 b1bcf11h03b1bcf11d04 b1dcf13c8 b3cf12d07 b1bcf11c04b1bf11a01 b1bf110d03
This material is available as part of the online article fromhttpwwwblackwell-synergycom