Date post: | 21-Apr-2023 |
Category: |
Documents |
Upload: | independent |
View: | 0 times |
Download: | 0 times |
LETTERS
Functional metagenomic profiling of nine biomesElizabeth A. Dinsdale1,5*, Robert A. Edwards1,2,3,6*, Dana Hall1, Florent Angly1,4, Mya Breitbart7, Jennifer M. Brulc8,Mike Furlan1, Christelle Desnues1{, Matthew Haynes1, Linlin Li1, Lauren McDaniel7, Mary Ann Moran10,Karen E. Nelson11, Christina Nilsson12, Robert Olson6, John Paul7, Beltran Rodriguez Brito1,4, Yijun Ruan12,Brandon K. Swan13, Rick Stevens6, David L. Valentine13, Rebecca Vega Thurber1, Linda Wegley1, Bryan A. White8,9
& Forest Rohwer1,2
Microbial activities shape the biogeochemistry of the planet1,2 andmacroorganism health3. Determining the metabolic processesperformed by microbes is important both for understanding andfor manipulating ecosystems (for example, disruption of key pro-cesses that lead to disease, conservation of environmental services,and so on). Describing microbial function is hampered by theinability to culture most microbes and by high levels of genomicplasticity. Metagenomic approaches analyse microbial communit-ies to determine the metabolic processes that are important forgrowth and survival in any given environment. Here we conduct ametagenomic comparison of almost 15 million sequences from 45distinct microbiomes and, for the first time, 42 distinct viromesand show that there are strongly discriminatory metabolic profilesacross environments. Most of the functional diversity was main-tained in all of the communities, but the relative occurrence ofmetabolisms varied, and the differences between metagenomespredicted the biogeochemical conditions of each environment.The magnitude of the microbial metabolic capabilities encodedby the viromes was extensive, suggesting that they serve as a repo-sitory for storing and sharing genes among their microbial hostsand influence global evolutionary and metabolic processes.
Genomic plasticity of microbes causes variations in the gene con-tent of closely related strains4, making predictions of communitymetabolism on the basis of representative genomes and signaturegenes such as 16S ribosomal RNA unreliable. Although it seems thatcore genomes are relatively stable and shared among most indivi-duals of the same species, parts of the genome (for example, pro-phages, CRISPRs, pathogenicity/ecological islands, ORFans) arehyper-variable5. Together, these two components make up the pan-genome4. Unlike the signature genes approach, metagenomicapproaches analyse the complete genetic information of microbialand viral communities6,7. In this way, the relative abundances of allgenes can be determined and used to generate a description of thefunctional potential of each community8–14.
Here we use a comparative metagenomic approach to statisticallyanalyse the frequency distribution of 14,585,213 microbial and viralmetagenomic sequences to elucidate the functional potential ofnine biomes including: subterranean (that is, mine samples);hypersaline ponds from solar salterns; marine; freshwater; coral-associated; microbialites (including stromatolites and thrombolites);aquaculture-fish-associated; terrestrial-animal-associated; and
mosquito-associated (details in Supplementary Table 1 andSupplementary Fig. 1). Microbial and viral metagenomes(Supplementary Fig. 2 and Supplementary Table 2) were isolatedand pyrosequenced. The sequences were compared to the 2007SEED platform (http://www.theseed.org) using the BLASTX algo-rithm, and hits with an E-value of ,0.001 were considered to besignificant (Methods). A total of 1,040,665 sequences from the 45microbial metagenomes and 541,979 sequences from the 42 viralmetagenomes were significantly similar to functional genes withinthe SEED (Supplementary Table 1). The SEED arranges metabolicpathways into a hierarchical structure in which all of the genesrequired for a specific task are arranged into subsystems15. At thehighest level of organization, the subsystems include both catabolicand anabolic functions (for example, DNA metabolism) and at thelowest levels the subsystems are specific pathways (for example, thesynthesis pathway for thymidine).
Table 1 shows the relative abundances of sequences assigned toeach major subsystem in the combined analysis of the microbiomes
*These authors contributed equally to this work.
1Department of Biology, 2Center for Microbial Sciences, 3Department of Computer Sciences, and 4Computational Science Research Centre, San Diego State University, San Diego,California 92182, USA. 5School of Biological Sciences, Flinders University, Adelaide, South Australia 5042, Australia. 6Mathematics and Computer Science Division, Argonne NationalLaboratory, Argonne, Illinois 60439, USA. 7University of South Florida, College of Marine Science, 140 7th Avenue South, St Petersburg, Florida 33701, USA. 8Department of AnimalSciences, and 9The Institute for Genomic Biology, University of Illinois, Urbana, Illinois 61801, USA. 10Department of Marine Sciences, University of Georgia, Athens, 30602 Georgia,USA. 11The J. Craig Venter Institute, 9712 Medical Center Drive, Rockville, Maryland 20850, USA. 12Genome Institute of Singapore, 60 Biopolis Street, 02-01, Genome, Singapore138672, Singapore. 13Department of Earth Science, University of California Santa Barbara, Santa Barbara, California 93106, USA. {Present address: Unite des Rickettsies, CNRS-UMR6020, Faculte de medecine, 13385 Marseille, France.
Table 1 | Mean percentage of sequences (6 s.e.m.) similar to majormetabolisms
Metabolic category Microbial metagenomes Viral metagenomes
Carbohydrates 17.218 (6 0.648) 14.353 (6 0.718)Amino acids 12.036 (6 0.491) 10.132 (6 0.642)Virulence 9.788 (6 0.339) 11.175 (6 0.508)Protein metabolism 9.123 (6 0.497) 8.838 (6 0.522)Respiration 7.139 (6 1.285) 3.718 (6 0.276)Photosynthesis 6.965 (6 2.148) 1.984 (6 0.554)Cofactors, vitamins, and so on 5.411 (6 0.226) 6.661 (6 0.393)RNA metabolism 3.971 (6 0.195) 4.324 (6 0.387)DNA metabolism 3.970 (6 0.180) 7.555 (6 0.943)Nucleosides and nucleotides 3.316 (6 0.149) 7.666 (6 0.817)Cell wall and capsule 3.235 (6 0.223) 5.098 (6 0.649)Fatty acids and lipids 3.095 (6 0.160) 3.002 (6 0.242)Membrane transport 2.736 (6 0.158) 2.371 (6 0.182)Stress response 2.599 (6 0.115) 3.354 (6 0.326)Aromatic compounds 2.351 (6 0.175) 2.550 (6 0.340)Cell division and cell cycle 1.791 (6 0.091) 1.983 (6 0.212)Nitrogen metabolism 1.547 (6 0.070) 1.135 (6 0.093)Sulphur metabolism 1.230 (6 0.102) 1.302 (6 0.134)Motility and chemotaxis 1.022 (6 0.096) 1.011 (6 0.083)Phosphorus metabolism 0.909 (6 0.080) 1.319 (6 0.167)Cell signalling 0.885 (6 0.076) 0.885 (6 0.072)Potassium metabolism 0.796 (6 0.048) 0.846 (6 0.079)Secondary metabolism 0.159 (6 0.014) 0.235 (6 0.047)
doi:10.1038/nature06810
1Nature Publishing Group©2008
compared with the viromes. Over 30% of the identifiable genes inthe microbiomes were associated with carbohydrate or protein meta-bolism. Respiration and photosynthesis subsystems accounted for anadditional ,15% of the similarities. Subsystems responsible for nuc-leic acid metabolism and virulence were overrepresented in the viralfractions (Table 1), whereas respiration and photosynthesis geneswere less frequent.
The functional diversity represented by the metagenomesapproached its theoretical limit of 2.81 in all environments(Table 2), showing that most subsystems were represented in all ofthe samples. Only the coral-associated microbes showed a lower func-tional diversity; this is because they have fewer secondary metabo-lisms, virulence pathways, cell signalling pathways and membranetransport pathways. Because microbes associated with corals are taxo-nomically diverse11, functional reduction may have occurred in thesecommunities, similar to microbes in other symbiotic relationships16.
Diversity is a function of both richness (that is, the number ofmetabolic processes) and evenness (that is, the relative abundanceof a particular metabolic process in a sample). The evenness for themetagenomes was very low (,0.1; Table 2 and Supplementary Fig.3), showing that there are a few dominant metabolisms in eachenvironment. Differential dominant metabolisms suggest that thereare characteristic functional profiles of the metagenomes.
To test the hypothesis that each environment has a distinguishingmetabolic profile, a canonical discriminant analysis (CDA) was con-ducted (Fig. 1). Most of the variance between the different environ-ments (79.8% of the combined microbiome and 69.9% of thevirome) was explained in this analysis, showing that metagenomesare highly predictive of metabolic potential within an ecosystem. Incontrast, a recent analysis of 16S rRNA genes from multiple environ-ments only explained about 10% of the variance17, suggesting thatdifferent ecosystems cannot be distinguished by their taxa.
The position of each metagenome in Fig. 1 reflects the frequencycombination of sequences associated with each subsystem; the vec-tors indicate which metabolisms most strongly determined the dis-tribution. Using these as clues, it is possible to determine whichmetabolisms are important for the organisms in that environmentrelative to other environments. For example, subsystems involved inrespiration and protein metabolism placed the coral-associatedmicrobes apart from the microbes found within terrestrial animals.This trend is visualized in Fig. 2, which shows that ,20% of the coral-associated microbial genes were involved in respiration, comparedwith only 3% in the microbiomes associated with terrestrial animals.The relatively high occurrence of respiration-associated genes in thecoral-associated microbiomes reflects the diurnally fluctuatingoxygen environment, which is supersaturated with oxygen in theday and essentially anaerobic at night18. In contrast, microbes livingwithin the stable anaerobic alimentary tracts of terrestrial animals areless likely to experience selection for multiple respiration pathways.
Similarly, virulence genes were proportionally more abundantin the organism-associated microbes than in free-living microbes.These are the factors necessary to facilitate symbiotic relationships(mutualism, parasitism or commensalisms; Fig. 2f–h). Another
example of the predictive power of the metagenomes is the sulphurmetabolisms associated with aquaculture fish. In particular, two sub-systems—alkanesulphonate and taurine metabolism—were overre-presented in fish-associated metagenomes (Supplementary Fig. 4).Alkanesulphonates are involved in the use of both inorganic andorganic sulphur, such as taurine and aliphatic sulphonates19 (taurineis a sulphur organic acid used to supplement aquaculture fish food20).
Table 2 | Mean functional diversity and evenness (6 s.e.m.) of metagenomes, sampled from nine environments
Functional diversity (H9) Functional evenness
Biome Microbial Viral Microbial Viral
Subterranean 2.393 (6 0.030) 0.005 (6 1.2 3 1024)
Hypersaline 2.361 (6 0.006) 2.041 (6 0.021) 0.005 (6 1.4 3 1024) 0.012 (6 5.6 3 10
24)Marine 2.313 (6 0.021) 2.162 (6 0.026) 0.005 (6 0.9 3 10
24) 0.007 (6 4.0 3 1024)
Freshwater 2.430 (6 0.003) 2.080 (6 0.034) 0.005 (6 0.9 3 1024) 0.010 (6 6.7 3 10
24)Coral 1.733 (6 0.059) 2.289 (6 0.023) 0.009 (6 5.2 3 10
24) 0.007 (6 1.1 3 1024)
Microbialites 2.408 (6 0.015) 1.743 (6 0.115) 0.005 (6 3.8 3 1024) 0.019 (6 6.9 3 10
23)Fish 2.447 (6 0.001) 2.439 (6 3.1310
24) 0.005 (6 0.4 3 1024) 0.005 (6 0.7 3 10
24)Terrestrial animals 2.428 (6 0.006) 2.016 (6 0.173) 0.004 (6 0.1 3 10
24) 0.017 (6 4.5 3 1023)
Mosquito 2.395 (6 0.015) 0.004 (6 0.5 3 1024)
There are no subterranean viral metagenomes and no mosquito microbial metagenomes.
Cell wall
VirulenceMembrane transport
StressSulphur
Signalling
Motility
Respiration
Protein
Canonical discriminant function 1 (48.0%)
SubterraneanHypersalineMarineFreshwaterCoralMicrobialitesFishTerrestrial animalsMosquito
Membrane transportCarbohydrates
Fatty acids
Secondary metabolites
PhosphorusVirulence Cell division
DNA
Potassium
Motility
Canonical discriminant function 1 (38.9%)
Can
onic
al d
iscr
imin
ant f
unct
ion
2 (3
1.0%
)C
anon
ical
dis
crim
inan
t fun
ctio
n 2
(31.
9%)
a
b
Figure 1 | Functional analysis of microbial and viral metagenomes. TheCDA of the microbial (a) and viral (b) metagenomes identified that themetabolic processes grouped these communities in the two-dimensionalspaced described by canonical discriminant functions 1 and 2. The symbolsrepresent the position of each metagenome and the vectors represent thestructural matrix for subsystems that were identified as influencing theseparation of the metagenomes using the stepwise procedure. The length ofthe vectors represents the strength of influence of the particular metabolicprocess. The cross-validation scores for the microbial and viralmetagenomes were 66.7 and 59.9%, respectively.
LETTERS NATURE
2Nature Publishing Group©2008
Together, these examples show that metagenomes predict important,emergent biological characters of the environments. By substitutingenvironmental groups in multiple CDAs, the predictive nature ofmetagenomes was confirmed (Supplementary Figs 5 and 6).
Shifting of a metagenome from its sister group in the CDA was alsopredictive of ecological differences. For example, one of the marinemetagenomes (number 27 Supplementary Table 1) was positionedmore negatively than the rest of the marine metagenomes (Fig. 1a).This sample was taken from waters that were unusually rich in nitro-gen, phosphate and dissolved organic carbon21. The ability to deter-mine subtle differences in metabolic potential will allow the detectionof environmental changes at early stages of perturbation and identifypreviously unknown pathways for therapeutics.
The viromes are dominated by phage, which are expected to havesimilar lifestyles in every environment (infection, replication, hostlysis and release of free virions). Phage have also been shown to movebetween environments22, which suggests that their metabolic profilesare similar in different ecosystems. In contrast, other studies haveshown that phage carry ‘specialization’ genes23, including phosphatemetabolism24 and cyanobacterial photosystems25, to manipulate hostmetabolisms associated with a particular ecosystem. Phage ‘sample’their host’s genetic material and incorporate extra pieces of DNAcalled MORONS26, suggesting that phage metagenomes may insteadshow distinctive profiles based on their environment. As shown inFigs 1b and 2, the viromes have highly predictive metabolic profilesthat suggest enrichment for specific genes in different environments,and thus support the latter hypothesis (69.9% of the variance).
Because phages and viruses are non-motile, the abundance ofmotility and chemotaxis proteins within the combined viral
metagenomes was the most unexpected example of specialized meta-bolisms being carried within the viromes (Fig. 3). A total of 130SEED-annotated motility and chemotaxis proteins (out of a possible157) were present in the viromes. There was a non-random acquisi-tion of these proteins by the viral community, shown by the variationin relative abundances of these proteins between the microbial andviral metagenomes (Supplementary Table 3). In the viromes, flagellarbiosynthesis protein FlhA, the chemotaxis response regulatorproteins CheA and CheB and deacylases were overrepresented(Supplementary Table 3), whereas the twitching motility proteinPilT, type II secretary pathways and GldJ were overrepresented inthe microbiomes. cheA and cheB genes within microbes worktogether to control flagella motor switching rates27, but their rolewithin the phage remains an outstanding question.
Essentially all of the functional diversity was represented in theviromes. Unlike their cellular hosts, most viruses must carry a specificamount of DNA to correctly pack their capsids (that is, viruses arenot evolutionarily penalized for carrying ‘extra’ DNA). If there is aselective advantage of the extra DNA (resulting in increased phageprogeny), these genes are fixed in the phage genome; otherwise theywill be lost. Because there are an estimated 1031 phages on the planetand they can move between environments, the potential reservoir ofgenes that can be transferred both locally and globally12 by phage isenormous28. As our research shows, there is little restriction to thetypes of genes carried by the viral community, suggesting that theyinfluence a wide range of processes, including biogeochemical cyc-ling, short-term adaptation and long-term evolution of microbes.
The low functional evenness measured for both microbial and viralmetagenomes is even lower than the functional diversity calculatedfor individual bacterial genomes (Table 2 and Supplementary Fig. 3).This finding is diametrically opposed to the high taxonomic evennessreported for both microbial and viral communities2,12, ranging from0.6 to 1 for human faecal and marine viruses9,12 and about 0.9 for soilmicrobes29. To resolve this apparent dilemma, we propose that thefrequency of a gene encoding a particular metabolic function reflectsits relative importance in an environment, and that genetic sweepsfavour particular gene frequencies regardless of their taxonomicalbackground. That is, rather than changing taxa, variation in genecontent, presumably by means of horizontal gene transfer30 between
Microbial metagenomes
a Subterranean e Coral i Mosquito m Coral
b Hypersaline f Microbialites j Hypersaline n Microbialites
c Marine g Fish k Marine o Fish
l Freshwater p Terrestrial animals
2520
1510
50
2520
1510
50
2520
1510
50
2520
1510
50
Viral metagenomes
h Terrestrial animalsd Freshwater
Viru
lenc
eP
rote
inR
esp
iratio
nC
ell w
all
Mem
bra
ne t
rans
por
tS
tres
sM
otili
tyC
ell s
igna
lling
Sul
phu
r
Viru
lenc
eP
rote
inR
esp
iratio
nC
ell w
all
Mem
bra
ne t
rans
por
tS
tres
sM
otili
tyC
ell s
igna
lling
Sul
phu
r
Car
boh
ydra
tes
Viru
lenc
eD
NA
met
abol
ism
Cel
l div
isio
nFa
tty
acid
sM
emb
rane
tra
nsp
ort
Mot
ility
Pho
spho
rus
Pot
assi
umS
econ
dar
y m
etab
olite
s
Car
boh
ydra
tes
Viru
lenc
eD
NA
met
abol
ism
Cel
l div
isio
nFa
tty
acid
sM
emb
rane
tra
nsp
ort
Mot
ility
Pho
spho
rus
Pot
assi
umS
econ
dar
y m
etab
olite
s
Metabolic processes
Per
cent
age
of s
eque
nces
(%)
Figure 2 | A one-dimensional representation of the environmentalmetabolic profiles for the microbial and viral metagenomes sampled fromthe nine environments. Microbial metagenomes are shown in a–h, and viralmetagenomes are shown in i–p. Each bar represents the mean for eachmetabolic category. For subterranean, n 5 2 (a); for hypersaline, n 5 9(b); for marine, n 5 8 (c); for freshwater, n 5 4 (d); for coral, n 5 7 (e);for microbialites, n 5 3 (f); for fish, n 5 4 (g); for terrestrial animals,n 5 8 (h); for mosquito, n 5 3 (i); for hypersaline, n 5 12 (j); for marine,n 5 10 (k); for freshwater n 5 4 (l); for coral n 5 6 (m); for microbialites,n 5 3 (n); for fish, n 5 2 (o); and for terrestrial animals, n 5 2 (p).
Subte
rrane
an
Hyper
salin
e
Mar
ine
Fres
hwat
erCor
al
Micr
obial
ites
Fish
Terre
strial
anim
als
Hyper
salin
e
Mar
ine
Fres
hwat
erCor
al
Micr
obial
itesFis
h
Terre
strial
anim
als
Mos
quito
2.52.0
1.51.00.5
0
2.52.0
1.51.00.5
0
2.52.0
1.51.00.5
0
2.52.0
1.51.00.5
0
2.52.0
1.51.00.5
0
2.52.0
1.51.00.5
0
Microbial metagenomes Viral metagenomes
Per
cent
age
of s
eque
nces
(%)
a
b
c
Biome
Figure 3 | A comparison of the distribution of sequences similar to motilityand chemotaxis genes identified within the microbiomes (n 5 43) andviromes (n 5 41). Microbial metagenomes are shown on the left, and viralmetagenomes are shown on the right. The abundance of sequences identifiedwithin each of three fine-scale subsystems including flagellum (a), bacterialchemotaxis (b) and gliding motility (c), as described by the SEED platform.
NATURE LETTERS
3Nature Publishing Group©2008
sympatric microbes, is controlling gene distribution within anenvironment. The large amount of variation (,70%) explained bythe functional analysis presented here supports this hypothesis.
METHODS SUMMARY
Samples for metagenomes were collected and fractioned using standard tech-
niques, sequenced using pyrosequencing and compared to the functional genes
in the SEED platform11,12 (Methods). All statistics were performed on the per-
centage of sequences showing similarities to known functions. For the CDA,
sequences were grouped according to the SEED classification scheme and the
analysis was conducted on the principal metabolic functions. The CDA builds a
model for group membership. A discriminative value is calculated for each
metagenomic sample, which is a linear combination of the response variables
(metabolic processes) represented in the new dimensional space. These values
are used to visualize group membership.
An advantage of the CDA is that it identifies which variables are driving the
separation between the groups; it uses these to build the model and discards
those that are not influential. Identification of influential variables was con-
ducted by a stepwise method, using Wilk’s lambda with P 5 0.05, and was con-
firmed with analysis of variance (ANOVA; Supplementary Table 4). The level of
influence of each variable is provided by the structural matrix and can be visua-
lized using an h-plot, in which the length of the line is representative of the level
of influence. The CDA also performs a cross-validation analysis that identifies
the likelihood of correctly classifying each sample. Cross validation removes the
predetermined grouping for each sample and uses the response variables to align
the individual sample to a group. Because the data were divided into nine
predetermined groups (biomes), the number of samples correctly identified by
chance alone is 11%. The percentage-correct classification has to be substantially
larger than this number for the metabolic processes to be useful for classifying the
metagenomes into environments.
Full Methods and any associated references are available in the online version ofthe paper at www.nature.com/nature.
Received 18 November 2007; accepted 6 February 2008.Published online 12 March 2008.
1. Newman, D. K. & Banfield, J. F. Geomicrobiology: how molecular-scaleinteractions underpin biogeochemical systems. Science 296, 1071–1076 (2002).
2. Prosser, J. I. et al. The role of ecological theory in microbial ecology. Nature Rev.Microbiol. 5, 384–392 (2007).
3. Ley, R. E., Turnbaugh, P. J., Klein, S. & Gordon, J. I. Microbial ecology: human gutmicrobes associated with obesity. Nature 444, 1022–1023 (2006).
4. Medini, D. et al. The microbial pan-genome. Curr. Opin. Genet. Dev. 15, 589–594(2005).
5. Coleman, M. L. et al. Genomic islands and the ecology and evolution ofProchlorococcus. Science 311, 1768–1770 (2006).
6. DeLong, E. F. et al. Community genomics among stratified microbial assemblagesin the ocean’s interior. Science 311, 496–503 (2006).
7. Tringe, S. G. & Rubin, E. M. Metagenomics: DNA sequencing of environmentalsamples. Nature Rev. Genet. 6, 805–814 (2005).
8. Edwards, R. A. et al. Using pyrosequencing to shed light on deep mine microbialecology. BMC Genomics 7, 57 (2006).
9. Breitbart, M. et al. Metagenomic analyses of an uncultured viral community fromhuman feces. J. Bacteriol. 185, 6220–6223 (2003).
10. Breitbart, M. et al. Genomic analysis of uncultured marine viral communities. Proc.Natl Acad. Sci. USA 99, 14250–14255 (2002).
11. Wegley, L., Breitbart, M., Edwards, R. A. & Rohwer, F. Metagenomic analysis of themicrobial community associated with the coral Porites astreoides. Environ.Microbiol. 9, 2707–2719 (2007).
12. Angly, F. et al. The marine viromes of four oceanic regions. PLoS Biol. 4, e368(2006).
13. Breitbart, M. & Rohwer, F. Method for discovering novel DNA viruses in bloodusing viral particle selection and shotgun sequencing. Biotechniques 39, 729–736(2005).
14. Fierer, N. et al. Metagenomic and small-subunit rRNA analyses reveal the geneticdiversity of Bacteria, Archaea, Fungi, and viruses in soil. Appl. Environ. Microbiol.73, 7059–7066 (2007).
15. Overbeek, R. et al. The subsystems approach to genome annotation and its use inthe project to annotate 1000 genomes. Nucleic Acids Res. 33, 5691–5702 (2005).
16. Mira, A., Ochman, H. & Moran, N. A. Deletional bias and the evolution of bacterialgenomes. Trends Microbiol. 17, 589–596 (2001).
17. Lozupone, C. A. & Knight, R. Global patterns in bacterial diversity. Proc. Natl Acad.Sci. USA 104, 11436–11440 (2007).
18. Shashar, N., Cohen, Y. & Loya, Y. Extreme diel fluctuations of oxygen in diffusiveboundary layers surrounding stony corals. Biol. Bull. 185, 455–461 (1993).
19. Iwanicka-Nowicka, R. et al. Regulation of sulfur assimilation pathways inBurkholderia cenocepacia: identification of transcription factors CysB and SsuR andtheir role in control of target genes. J. Bacteriol. 189, 1675–1688 (2007).
20. Aksnes, A., Hope, B., Hostmark, O. & Albrektsen, S. Inclusion of size fractionatedfish hydrolysate in high plant protein diets for Atlantic cod, Gadus morhua.Aquaculture 261, 1102–1110 (2006).
21. Dinsdale, E. A. et al. Microbial ecology of four coral atolls in the Northern LineIslands. Plos One 3, e1584 (2008).
22. Sano, E., Carlson, S., Wegley, L. & Rohwer, F. Movement of virus between biomes.Appl. Environ. Microbiol. 70, 5842–5846 (2004).
23. Davis, B. M. & Waldor, K. in Mobile DNA II (ed. Craig, N. L., Gragie, R., Gellert, M. &Lambowitz, A. M.) 1040–1055 (ASM, Washington DC, 2002).
24. Rohwer, F. et al. The complete genomic sequence of the marine phageRoseophage SIO1 shares homology with nonmarine phages. Limnol. Oceanogr. 45,408–418 (2000).
25. Mann, N. et al. Marine ecosystems: bacterial photosynthesis genes in a virus.Nature 424, 741 (2003).
26. Hendrix, R. W., Smith, M. C. M., Burns, R. N. & Ford, M. E. Evolutionaryrelationships among diverse bacteriophages and prophages: all the world’s aphage. Proc. Natl Acad. Sci. USA 96, 2192–2197 (1999).
27. Wadhams, G. H. & Armitage, J. P. Making sense of it all: bacterial chemotaxis.Nature Rev. Mol. Cell Biol. 5, 1024–1037 (2004).
28. Hendrix, R. W. Bacteriophage: evolution of the majority. Theor. Popul. Biol. 61,471–480 (2002).
29. Dunbar, J., Ticknor, L. O. & Kuske, C. R. Assessment of microbial diversity in foursouthwestern United States soils by 16S rRNA gene terminal restriction fragmentanalysis. Appl. Environ. Microbiol. 66, 2943–2950 (2000).
30. Frigaard, N.-U., Martinez, A., Mincer, T. J. & Delong, E. F. Proteorhodopsin lateralgene transfer between marine planktonic Bacteria and Archaea. Nature 439,847–850 (2006).
Supplementary Information is linked to the online version of the paper atwww.nature.com/nature.
Acknowledgements This project was supported by the Gordon and Betty MooreFoundation Marine Microbial Initiative, National Science Foundation grants (F.R.and D.L.V.), a Department of Commerce ATP grant (F.R.), a National ResearchInitiative Competitive Grant from the USDA Cooperative State Research,Education and Extension Service (B.W.), the National Institute of Allergy andInfectious Diseases, the National Institutes of Health and the Department of Healthand Human Services (R.S.).
Author Contributions E.A.D. conceptualized the project, conducted the CDA andwrote the manuscript. R.A.E., R.O. and R.S. performed the bioinformatics. D.H.conducted the non-parametric statistical analysis. F.R. oversaw most of themetagenomic projects. All other authors collected the metagenomes and providedcomments on the manuscript.
Author Information The metagenomes used in this paper are freely available fromthe SEED platform and are being made accessible from CAMERA and the NCBIShort Read Archive. The accession numbers are shown in Supplementary Table 1.The NCBI genome project IDs used in this study are: 28619, 28613, 28611, 28609,28607, 28605, 28603, 28601, 28599, 28597, 28469, 28467, 28465, 28463,28461, 28459, 28457, 28455, 28453, 28451, 28449, 28447, 28445, 28443,28441, 28439, 28437, 28435, 28433, 28431, 28429, 28427, 28425, 28423,28421, 28419, 28417, 28415, 28413, 28411, 28409, 28407, 28405, 28403, 28401,28395, 28393, 28391, 28389, 28387, 28385, 28383, 28381, 28379, 28377, 28375,28373, 28371, 28361, 28359, 28357, 28355, 28353 and 28351. Reprints andpermissions information is available at www.nature.com/reprints.Correspondence and requests for materials should be addressed to E.A.D.([email protected]).
LETTERS NATURE
4Nature Publishing Group©2008
METHODSCollection of the metagenomes. Metagenomic samples were collected and DNA
was prepared by the different groups involved; each laboratory used slight
modifications on the basic protocol. Sample locations were widely dispersed
or separate organisms (Supplementary Fig. 1 and Supplementary Table 1).
Metagenomes were collected to answer broad ecological questions such as viral
community dynamics in the lungs of healthy and cystic fibrosis patients and the
microbial communities on coral reefs (Supplementary Table 1). Typically, the
microbiome process starts by filtering samples onto 0.22mm Sterivex filters,
removing the filter membranes and extracting DNA using a bead-beating pro-tocol (MoBio). In some samples, the DNA was amplified with Genomiphi (GE
Healthcare Life Sciences) in six to eight 18-h reactions22,31. The reactions were
pooled and purified using silica columns (Qiagen). The DNA was precipitated
with ethanol and resuspended in water at a concentration of approximately
300 ngml21. Microbial metagenomes capture Bacteria, Archaea, some small
protists as well as a few trapped viral-like particles (Supplementary Table 2).
The viruses in the small metagenomic fractions (that is, 0.22-mm filtrate
treated with chloroform) were purified using caesium chloride (CsCl) step gra-
dients to remove free DNA and any cellular material10,12. Viral samples were
visually checked for microbial contamination using epifluorescent microscopy.
Viral DNA was isolated using CTAB (cyltrimethylammonium bromide) and
25:24:1 phenol:chloroform:isoamyl alcohol mix extractions and amplified using
Genomiphi reactions. These reactions were pooled and purified using silica
columns (Qiagen). The DNA was precipitated with ethanol and resuspended
in water at a concentration of approximately 300 ngml21. One viral metagenome
(number 40, Supplementary Table 1) was prepared by concentrating a natural
microbial sample and inducing it with mitomycin C. All metagenome libraries
consisted of approximately 5 mg of DNA. The viral metagenomes containedviruses, phage and prophage, and as expected the proportion of phage and
prophage are higher in these metagenomes than in the microbial fraction
(Supplementary Table 2).
Sequencing and bioinformatics. Sequencing was performed using pyrosequen-
cing on Roche Applied Sciences and 454 Life Sciences GS20 platforms32 with a
practical limit of 105 bp. DNA sequences were analysed in the metagenomics
RAST pipeline—an open-access metagenome curation and analysis platform
(http://metagenomics.theseed.org/)33. First, sequences were screened to remove
exactly duplicated sequences—a known artefact of the pyrosequencing
approach. The sequences were compared to the SEED platform, which comprises
all known protein sequences, using the NCBI BLASTX algorithm on the NMPDR
compute cluster (Argonne National Laboratory; http://www.nmpdr.org/). The
SEED platform includes all available genome data, DNA and protein sequences,
and is supplemented with data from genome sequencing centres as available.
Every metagenome was compared to exactly the same data set using the same
BLAST parameters at the same time to ensure congruity of the data. Connections
between the metagenomes and the SEED subsystems were calculated by iden-
tifying matches to the SEED platform where the matched protein was curated tobe in a subsystem, and the expect value from the BLAST search was less than
0.001. The SEED subsystems are manually curated collections of proteins with
related functions and are available at http://www.theseed.org/. Simultaneously,
all sequences were compared to the 16S databases using BLASTN. The databases
were extracted from GreenGenes34, the Ribosomal Database Project35 and the
European Ribosomal Database Project36.
Several metagenomes were constructed from environments that were likely to
contain DNA from other organisms such as humans, corals and mosquitoes. To
test and to remove contaminants, 20,000 sequences were chosen at random from
every metagenome and compared to the March 2006 build of the human genome
and the February 2003 build of the Anopheles gambiae genome (both down-
loaded from http://genome.ucsc.edu/). The comparisons were performed using
BLASTN with an expect (E) value cutoff of 1 3 1025. Every sample (including
the mosquito samples) had less than 1% of their sequences with significant
similarity to the A. gambiae genome, and only two samples had .5% of sequence
similarity to the human genome. These two samples, from the human virome
studies, were compared in full and human sequences excluded. To identify and
remove dinoflagellate sequences, such as Symbiodinium (a coral symbiont), a
custom database was created from the nucleotide and RNA (expressed sequence
tag) sequences in GenBank; all coral reef water and coral samples were analysed
as described above and dinoflagellates sequences were excluded.
Statistical analysis. Statistics were performed on the proportions of sequences
within each subsystem, thus normalizing data across metagenomes and
removing differences in reaction efficiencies. Total numbers of sequences and
numbers of sequences that showed similarities to the SEED are provided in
Supplementary Table 1, and ,11% of sequences were similar to functional
genes. The SEED platform housed 654 well-documented subsystems that were
used to calculate the Shannon index (H9). Maximum diversity occurs when every
functional category is present in equal numbers, thus Hmax 5 log S, where S is
number of categories. Evenness is H9 divided by the number of subsystems in
each sample (evenness ranges from 0 to 1, which is even). As a comparison to the
metagenomic analyses, the diversity and evenness was calculated for all 842
sequenced bacterial genomes. These calculations were conducted on the number
of genes within each subsystem, rather than on the number of sequences that was
used for the metagenomes (Supplementary Fig. 3).
To analyse the stability of the CDAs, an experiment was conducted in which
several of the metagenomic groups were removed and the analysis re-run. In the
first trial, the subterranean, fish and mosquito metagenomes were removed
(Supplementary Fig. 5). In the second trial, these metagenomes were re-added
and the hypersaline metagenomes removed (Supplementary Fig. 6). Multiple
trials were required because CDAs are sensitive to the number of samples (that is,
metagenomes) relative to the number of variables (that is, metabolic processes).
The data were further analysed using a non-parametric ANOVA, a Kruskal–
Wallis test and a median test, and the results compared to ensure that stable
results could be obtained (Supplementary Table 3). Environments driving the
variation were identified using Duncan comparisons (degrees of freedom were
set at 7).
All metagenomes were provided by authors of this manuscript. Further mater-
ial, including direct access to the data, is available at http://www.theseed.org/
DinsdaleSupplementalMaterial/. The NCBI genome project IDs used in this
study that were associated with previous publications are: 28369, 28367,
28365, 28363, 28349, 28347, 28345, 28343, 19145 17771, 17769, 17767, 17765
17635, 17633 and 17401.
31. Gunn, M. R. et al. A test of the efficacy of whole-genome amplification on DNAobtained from low-yield samples. Mol. Ecol. Notes 7, 393–399 (2007).
32. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitrereactors. Nature 437, 376–380 (2005).
33. Meyer, F. et al. The metagenomics RAST server — a public resource for theautomatic phylogenetic and functional analysis of metagenomes. BMC Bioinf.(submitted).
34. DeSantis, T. Z. et al. Greengenes, a chimera-checked 16S rRNA gene database andworkbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072(2006).
35. Cole, J. R. et al. The ribosomal database project (RDP-II): introducing myRDPspace and quality controlled public data. Nucleic Acids Res. 35, D169–D172(2007).
36. Wuyts, J., Perriere, G. & de Peer, Y. V. The European ribosomal RNA database.Nucleic Acids Res. 32, D101–D103 (2004).
doi:10.1038/nature06810
Nature Publishing Group©2008
Functional Metagenomic Profiling of Nine Biomes
Elizabeth A. Dinsdale1,2*, Robert A. Edwards1,3,4,5, Dana Hall1, Florent Angly1,6, Mya
Breitbart7, Jennifer M. Brulc 8,, Mike Furlan1, Christelle Desnues1,9, Matthew Haynes1,
Linlin Li1, Lauren McDaniel7, Mary Ann Moran10, Karen E. Nelson11, Christina
Nilsson12, Robert Olson5, John Paul7, Beltran Rodriguez Brito1,6, Yijun Ruan12, Brandon
K. Swan13, Rick Stevens5, David L. Valentine13, Rebecca Vega Thurber1, Linda
Wegley1, Bryan A. White8,14, and Forest Rohwer1,3
1Department of Biology, San Diego State University, San Diego, CA 92182 USA
2School of Biological Sciences, Flinders University, Adelaide, SA 5042, Australia
3Center for Microbial Sciences, San Diego State University, San Diego, CA 92182 USA
4Department of Computer Sciences, San Diego State University, San Diego, CA 92182
USA
5Mathematics and Computer Science Division, Argonne National Laboratory, Argonne,
IL 60439 USA
6Computational Science Research Centre, San Diego State University, San Diego, CA
92182-1245 USA
7University of South Florida, College of Marine Science 140 7th Avenue S., St.
Petersburg, FL 33701 USA 8 Department of Animal Sciences, University of Illinois, Urbana, IL 61801, USA
SUPPLEMENTARY INFORMATION
doi: 10.1038/nature06810
www.nature.com/nature 1
9Current address: Unité des Rickettsies, CNRS-UMR 6020, Faculté de médecine,
13385 Marseille, France
10 Department of Marine Sciences, University of Georgia, Athens, GA, USA
30602.
11 The J. Craig Venter Institute, 9712 Medical Center Drive, Rockville, MD, 20850, USA
12 Genome Institute of Singapore, 60 Biopolis Street, #02-01, Genome, Singapore
138672
13 Department of Earth Science, University of California Santa Barbara, Santa
Barbara, CA 93106, USA 14 The Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, USA
doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION
www.nature.com/nature 2
Supplementary information includes four tables presenting accession numbers and
descriptions of metagenomes, phage and prophage content of each metagenome, the
motility proteins present in the microbial and viral metagenomes and statistical
comparisons of the proportions of metabolic processes across the nine biomes. Six
figures provide information about the geographic separation of samples, diversity versus
sequences number, comparison of diversity between metagenomes and sequenced
whole bacterial genomes, the fine-scale details about the sulfur metabolic processes, and
two experiments that show the strength of the CDA across multiple groupings.
doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION
www.nature.com/nature 3
Tab
le S1. M
etagen
om
es used
in th
is man
uscrip
t listed u
sing
collecto
r’s descrip
tion
and
bio
me assig
nm
ent. A
ll
metag
eno
mes w
ere sequ
enced
usin
g 454 L
ife Scien
ce GS
20 pyro
sequ
encin
g. S
imp
le statistics of th
e ind
ividu
al
libraries, in
clud
ing
nu
mb
er of seq
uen
ces, blast h
its and
16Sr R
NA
gen
es are pro
vided
. M = m
icrob
ial library an
d V
=
Viral lib
rary. Th
e metag
eno
mes u
sed in
this p
aper are freely availab
le from
the S
EE
D p
latform
and
are bein
g m
ade
accessible fro
m C
AM
ER
A an
d th
e NC
BI S
ho
rt Read
Arch
ive wh
en availab
le. Th
e accession
nu
mb
ers are sho
wn
and
furth
er material an
d d
irect links to
the d
ata is available at h
ttp://w
ww
.theseed
.org
/Din
sdaleS
up
plem
entalM
aterial/.
ID
Nam
e S
EE
D
accession #
NC
BI
Genom
e
project #
Type
Biom
e # of
Sequences
# of Blast hits
# of 16S
1 S
oudan Red
4440281.3 17633
M
Subterranean
334,386 55,069
321
2 S
oudan Black
4440282.3 17635
M
Subterranean
388,627 43,079
24
3 S
olar Salterns low
Salinity S
an Diego
4440437.3 28359
M
Hyper-saline
268,206 52,745
243
4 S
olar Salterns m
edium
Salinity S
an Diego
4440435.3
28377
M
Hyper-saline
38,929 10,151
41
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature4
5 S
olar Salterns m
edium
Salinity S
an Diego
4440434.3
28379
M
Hyper-saline
23,261 5,630
26
6 S
olar Salterns P
lasmid
component
4440090.3
28443
M
Hyper-saline
111,431 19,365
129
7 S
olar Salterns m
edium
salinity west C
alifornia
4440416.3
28449
M
Hyper-saline
8,062 770
3
8 S
olar Salterns high
salinity west C
alifornia
4440419.3
28453
M
Hyper-saline
35,446 8,778
11
9 S
alton Sea
4440329.3 28613
M
Hyper-saline
178,407 17,531
43
10 S
olar Salterns m
edium
salinity west C
alifornia 4440425.3
28459
M
Hyper-saline
120,987 32,871
110
11 S
olar Salterns low
salinity
west C
alifornia 4440426.3
28461
M
Hyper-saline
34,296 3,754
26
12 S
olar Salterns m
edium
salinity west C
alifornia 4440427.3
28463
V
Hyper-saline
39,943 414
13 S
olar Salterns m
edium
salinity west C
alifornia 4440428.3
28465
V
Hyper-saline
58,735 1,822
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature5
14 S
olar Salterns high
salinity West C
alifornia 4440421.3
28457
V
Hyper-saline
154,167 3,028
15 S
olar Salterns low
salinity
San D
iego 4440436.3
28353
V
Hyper-saline
268,534 6,920
16 S
olar Salterns low
salinity
San D
iego 4440432.3
28373
V
Hyper-saline
110,511 3,068
17 S
olar Salterns m
edium
salinity west C
alifornia 4440431.3
28375
V
Hyper-saline
39,578 929
18 S
olar Salterns m
edium
salinity West C
alifornia 4440417.3
28445
V
Hyper-saline
55,903 904
19 S
olar Salterns high
salinity west C
alifornia 4440145.4
28447
V
Hyper-saline
47,587 2,601
20 S
olar Salterns high
salinity west C
alifornia 4440144.4
28451
V
Hyper-saline
4,645 947
21 S
olar Salterns low
salinity
west C
alifornia 4440420.3
28455
V
Hyper-saline
62,685 11,369
22 S
alton Sea
4440327.3 28613
V
Hyper-saline
55,787 926
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature6
23 S
alton Sea
4440328.3 28613
V
Hyper-saline
29,970 454
24 Line Is K
ingman
4440037.3 28343
M
Marine
188,445 11,309
6
25 Line Is C
hristmas
4440041.3 28347
M
Marine
227,542 11,574
18
26 Line Is P
almyra
4440039.3 28363
M
Marine
289,723 26,173
97
27 Line Is T
abuaeran 4440279.3
28367M
M
arine 290,844
12,631 100
28 D
MS
P T
reated 4440364.3
19145M
M
arine 54,848
11,725 24
29 D
MS
P T
reated 4440360.3
19145M
M
arine 50,313
7,198 52
30 V
anillate Treated
4440365.3 19145
M
Marine
12,446 1,720
48
31 V
anillate Treated
4440363.3 19145
M
Marine
33,773 6,610
7
32 M
arine GO
M
4440304.3 17765
V
Marine
263,908 28,878
33 M
arine BB
C
4440305.3 17767
V
Marine
416,456 20,770
34 M
arine Arctic
4440306.3 17769
V
Marine
688,590 197,018
35 M
arine SA
R
4440322.3 17771
V
Marine
399,343 17,813
36 Line Is K
ingman
4440036.3 28345
V
Marine
94,915 6,597
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature7
37 Line Is C
hristmas
4440038.3 28349
V
Marine
283,390 69,501
38 Line Is P
almyra
4440040.3 28365
V
Marine
320,397 9,608
39 Line Is T
abuaeran 4440280.3
28369V
M
arine 380,355
10,716
40 T
ampa B
ay Mitom
ycin C
induced 4440102.3
28619
V
Marine
280,019 8,767
41 S
kan Bay
4440330.3 28619
V
Marine
31,375 417
42 T
ilapia pond 4440440.3
28387M
F
reshwater
381,076 58,596
177
43 H
ealthy fish pond 4440413.3
28405M
F
reshwater
63,978 8,911
48
44 H
ealthy fish Prebead
4440411.3 28407
M
Freshw
ater 44,094
6,937 32
45 T
ilapia pond 3 4440422.3
28603M
F
reshwater
67,612 10,549
71
46 T
ilapia pond 3 4440424.3
28601V
F
reshwater
267,640 9,055
47 H
ealthy fish pond 4440412.3
28409V
F
reshwater
60,319 1,152
48 H
ealthy fish Prebead
4440414.3 28411
V
Freshw
ater 67,988
1,739
49 T
ilapia pond 4440439.3
28361V
F
reshwater
57,134 1,226
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature8
50 P
orites compressa tim
e
zero 4440380.3
28427
M
Coral
53,473 2,560
0
51 P
orites compressa
control 4440378.3
28429
M
Coral
65,191 2,030
2
52 P
orites compressa
temperature
4440373.3 28431
M
Coral
61,356 1,359
13
53 P
orites compressa D
OC
4440372.3
28433M
C
oral 62,959
1,566 7
54 P
orites compressa pH
4440379.3
28435M
C
oral 67,994
1,913 5
55 P
orites compressa
Nutrient
4440381.3 28437
M
Coral
65,008 3,258
11
56 P
orites asteriodes 4440319.3
28371M
C
oral 316,279
39,004 393
57 P
orites compressa tim
e
zero 4440376.3
28415
V
Coral
39,270 2,772
58 P
orites compressa
control 4440374.3
28417
V
Coral
39,340 5,276
59 P
orites compressa D
OC
4440370.3
28421V
C
oral 35,680
2,410
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature9
60 P
orites compressa pH
4440371.3
28423V
C
oral 50,364
2,710
61 P
orites compressa
nutrients 4440377.3
28425
V
Coral
34,433 2,338
62 P
orites compressa
Tem
perature 4440375.3
28419
V
Coral
39,036 2,141
63 R
io Mesquites
4440060.3 28351
M
Microbialites
124,694 21,374
10
64 H
ighborne Cay
4440061.3 28383
M
Microbialites
257,573 5,286
12
65 P
ozas Azule II
4440067.3 28385
M
Microbialites
326,146 36,468
61
66 P
ozas Azules II
4440320.3 28355
V
Microbialites
302,987 3,947
67 R
ios Mesquites
4440321.3 28357
V
Microbialites
328,656 14,561
68 H
ighborne Cay
4440323.3 28381
V
Microbialites
150,223 3,063
69 H
ealthy fish slime
4440059.3 28393
M
Fish
66,066 15,686
68
70 M
orbid fish slime
4440066.3 28395
M
Fish
82,442 20,635
147
71 H
ealthy fish gut 4440055.3
28389M
F
ish 51,498
16,377 63
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature10
72 M
orbid fish gut 4440056.3
28391M
F
ish 60,311
17,996 91
73 H
ealthy fish slime
4440065.3 28401
V
Fish
61,476 9,051
74 M
orbid fish slime
4440064.3 28403
V
Fish
60,111 13,826
75 C
ow rum
ens pool
plankton 4440357.3
28611
M
Terrestrial
Anim
als
236,830 38,626
313
76 C
ow rum
ens 80F6
4440356.3 28605
M
Terrestrial
Anim
als
178,713 29,989
240
77 C
ow rum
ens 640F6
4440355.3 28607
M
Terrestrial
Anim
als
264,849 39,775
386
78 C
ow rum
ens 710 F
4440387.3 28609
M
Terrestrial
Anim
als
345,317 130,089
757
79 Lean M
ice 4440324.3
17401
M
Terrestrial
Anim
als
49,074 8,688
42
80 O
bese Mice
4440325.3 17401
M
Terrestrial
Anim
als
35,053 9,161
37
81 C
hicken cecum N
CT
C
4440367.3 28599
M
Terrestrial
237,940 49,256
451
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature11
Anim
als
82 C
hicken cecum
Uninfected
4440368.3 28597
M
Terrestrial
Anim
als
294,682 83,912
533
83 Lung sputum
Cystic
fibrosis patient 4440441.3
28441
V
Terrestrial
Anim
als
92,223 7,946
84 Lung sputum
Healthy
4440442.4
28439
V
Terrestrial
Anim
als
39,807 3,292
85 M
osquito
Oceanside C
a
4440052.3
28413
V
Mosquito
340,098 97,269
86 M
osquito San D
iego 4440053.3
28467V
M
osquito 657,204
232,886
87 M
osquito Mission V
alley
Ca
4440054.3 28469
V
Mosquito
615,576 112,761
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature12
Tab
le S2. T
he p
ercent o
f ph
age an
d p
rop
hag
e sequ
ences in th
e micro
bial an
d viral m
etagen
om
es. Ns = n
o sam
ple.
Type
Microbial m
etagenomes
Viral m
etagenomes
S
ample
number
Percent
phage
Percent
prophage
Sam
ple
number
Percent
phage
Percent
prophage
Subterranean
1 1.879
3.802
ns ns
Subterranean
2 1.838
3.638
ns ns
Hyper-saline
3 0.983
3.802 12
3.922 5.456
Hyper-saline
4 0.000
3.595 13
8.861 3.927
Hyper-saline
5 0.375
3.638 14
25.517 3.744
Hyper-saline
6 0.557
3.802 15
14.463 3.554
Hyper-saline
7 0.000
1.238 16
29.762 3.578
Hyper-saline
8 1.695
2.779 17
34.884 4.940
Hyper-saline
9 4.918
3.802 18
17.647 3.263
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature13
Hyper-saline
10 1.286
3.802 19
4.545 4.341
Hyper-saline
11 1.961
3.638 20
1.056 4.777
Hyper-saline
ns
ns 21
3.198 3.667
Hyper-saline
ns
ns 22
25.000 2.626
Hyper-saline
ns
ns 23
60.000 4.001
Marine
24 0.589
3.638 32
1.051 3.474
Marine
25 3.797
3.580 33
2.171 3.523
Marine
26 1.073
3.762 34
0.351 3.802
Marine
27 0.763
3.146 35
15.764 3.803
Marine
28 0.727
3.720 36
3.243 2.655
Marine
29 1.342
3.299 37
0.531 3.802
Marine
30 0.478
3.746 38
11.189 3.864
Marine
31 1.370
3.415 39
7.563 3.921
Marine
ns
ns 40
30.469 3.855
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature14
Marine
ns
ns 41
8.824 4.352
Freshw
ater 42
6.759 3.802
46 41.176
3.185
Freshw
ater 43
3.204 3.809
47 68.182
5.143
Freshw
ater 44
3.472 4.032
48 50.000
4.628
Freshw
ater 45
0.321 3.802
49 58.301
3.723
Coral
50 5.797
3.575 57
2.602 3.503
Coral
51 0.000
2.839 58
9.385 4.047
Coral
52 30.864
3.786 59
2.871 3.903
Coral
53 2.222
3.385 60
11.765 4.357
Coral
54 2.941
4.504 61
4.348 3.602
Coral
55 0.000
3.807 62
2.985 3.205
Coral
56 0.472
3.712
ns ns
Microbialites
63 3.162
3.536 66
11.712 3.214
Microbialites
64 9.063
3.192 67
92.548 4.178
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature15
Microbialites
65 0.591
3.802 68
0.000 6.258
Fish
69 1.467
3.645 73
0.628 3.707
Fish
70 3.101
3.638 74
0.922 3.489
Fish
71 0.949
3.638
ns ns
Fish
72 0.833
3.675
ns ns
Terrestrial
animals
75 4.245
3.802 83
0.000 4.486
Terrestrial
animals
76 4.504
3.802 84
0.000 3.579
Terrestrial
animals
77 1.380
3.802
ns ns
Terrestrial
animals
78 3.229
3.802
ns ns
Terrestrial
animals
79 4.195
3.802
ns ns
Terrestrial
80 3.624
3.802
ns ns
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature16
animals
Terrestrial
animals
81 5.481
3.802
ns ns
Terrestrial
animals
82 5.472
3.802
ns ns
Mosquito
ns
ns 85
11.995 3.638
Mosquito
ns
ns 86
9.115 3.802
Mosquito
ns
ns 87
2.192 3.802
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature17
Tab
le S3. T
he th
irty mo
st abu
nd
ant m
otility an
d ch
emo
taxis pro
tein seq
uen
ces fou
nd
with
in th
e metag
eno
mes,
ord
ered w
ith resp
ect to th
e micro
bial m
etagen
om
es.
Motility proteins
Microbial
metagenom
es
Viral
metagenom
es
Tw
itchin
g m
otility p
rotein
PilT
0.0
33
0.0
23
Meth
yl-acceptin
g ch
emotaxis p
rotein
I 0.0
29
0.0
33
Flagellar b
iosyn
thesis p
rotein
flhA
0.0
25
0.0
89
Chem
otaxis p
rotein
CheA
0.0
18
0.0
59
Dip
eptid
e-bin
din
g A
BC tran
sporter
0.0
18
0.0
64
Typ
e II secretory p
athw
ay 0.0
17
0.0
08
Chem
otaxis p
rotein
meth
yltransferase C
heR
0.0
16
0.0
26
Gld
J 0.0
15
0.0
05
Acetylo
rnith
ine d
eacetylases 0.0
15
0.0
76
Flagellu
m-sp
ecific ATP syn
thase fliI
0.0
14
0.0
32
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature18
Flagellar m
oto
r rotatio
n p
rotein
motB
0.0
14
0.0
21
Flagellar h
ook-len
gth
contro
l pro
tein fliK
0.0
13
0.0
33
Flagellar h
ook p
rotein
flgE
0.0
10
0.0
14
Flagellar b
asal-b
ody ro
d p
rotein
flgG
0.0
10
0.0
27
Chem
orecep
tor sig
nals to
flagelllar m
oto
r CheY
0.0
10
0.0
12
type 4
fimbria
l bio
gen
esis pro
tein PilY
1
0.0
10
0.0
22
Flagellar reg
ulato
ry pro
tein fleQ
0.0
10
0.0
11
Gen
eral secretion p
athw
ay protein
E /A
TPase PilB
0.0
10
0.0
02
Flagellar m
oto
r rotatio
n p
rotein
motA
0.0
09
0.0
18
lagellin
pro
tein flaA
0.0
09
0.0
09
Chem
otaxis resp
onse reg
ulato
r CheB
0.0
09
0.0
51
Aero
taxis senso
r recepto
r pro
tein
0.0
08
0.0
16
Flagellar m
oto
r switch
pro
tein fliG
0.0
08
0.0
14
Flagellar b
iosyn
thesis p
rotein
flhB
0.0
08
0.0
30
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature19
Cell d
ivision p
rotein
ftsX
0.0
07
0.0
08
Chem
otaxis p
rotein
CheV
0.0
07
0.0
12
Flagellar m
oto
r switch
pro
tein fliM
0.0
07
0.0
15
Flagellar m
oto
r switch
pro
tein fliG
0.0
07
0.0
09
Flagellar b
iosyn
thesis p
rotein
fliP 0.0
06
0.0
15
Malto
se/malto
dextrin
ABC tran
sporter M
alE
0.0
06
0.0
42
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature20
Tab
le S4. T
he variatio
n fo
r each m
etabo
lism id
entified
for th
e micro
bial an
d viral co
mm
un
ities across th
e nin
e
bio
mes, u
sing
three statistical tests. T
he tab
le inclu
des th
e F valu
e and
P valu
e and
wh
ere po
ssible th
e bio
me th
at
was id
entified
as sho
win
g d
ifferences fo
r the p
articular m
etabo
lism.
M
icrobial metagenom
es V
iral metagenom
es
Metabolism
A
NO
VA
K
rus/wal
Median
Duncan
AN
OV
A
Krus/w
al M
edium
Duncan
Am
ino Acids
F=5.655
P<0.001
F=22.01
P=0.003
F=13.15
P=0.012
Coral
F=1.743
P=0.132
F=9.919
P=0.193
F=10.84
P=0.064
Carbohydrates
F=4.965
P<0.001
F=12.56
P=0.083
F=18.35
P=0.226
Coral
F=5.335
P<0.001
F=20.17
P=0.005
F=14.80
P=0.012
Multiple
Cell D
ivision &
Cell C
ycle
F=12.55
P<0.001
F=29.79
P<0.001
F=1.865
P=0.002
Coral,
Terrestrial
animals.
Microbialite
F=3.040
P=0.014
F=17.47
P=0.015
F=1.754
P=0.023
Multiple
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature21
Cell W
all and
Capsule
F=9.929
P<0.001
F=34.78
P<0.001
F=3.171
P<0.001
Coral,
Hyper-saline
Marine
F=0.875
P=0.536
F=6.260
P=0.510
F=3.562
P=0.339
Cofactors,
Vitam
ins, etc
F=8.950
P<0.001
F=26.66
P<0.001
F=5.593
P<0.001
Coral
F=1.266
P=0.296
F=9.063
P=0.248
F=6.147
P=0.692
DN
A M
etabolism
F=16.20
P<0.001
F=35.33
P<0.001
F=4.138
P<0.001
Multiple
F=6.236
P<0.001
F=26.70
P<0.001
F=5.453
P=0.002
Microbialite
Freshw
ater
Fatty A
cids and
Lipids
F=2.765
P=0.020
F=18.101
P=0.012
F=3.063
P=0.040
Multiple
F=1.514
P=0.196
F=10.75
P=0.150
F=3.006
P=0.151
Mem
brane
Transport
F=15.92
P<0.001
F=29.99
P<0.001
F=2.551
P<0.001
Multiple
F=4.494
P=0.001
F=14.95
P=0.037
F=2.435
P=0.204
Fish
mosquito
Arom
atic
Com
pounds
F=8.464
P<0.001
F=22.43
P=0.002
F=2.137
P=0.017
Fish
F=2.225
P=0.056
F=16.28
P=0.023
F=1.834
P=0.020
None obvious
Motility and
F=3.517
F=19.27
F=0.858
Fish
F=3.692
F=15.26
F=0.833
Multiple
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature22
Chem
otaxis P
=0.005 P
=0.007 P
=0.007 S
ubterranean P
=0.005 P
=0.033 P
=0.047
Nitrogen
Metabolism
F=8.887
P<0.001
F=26.28
P<0.001
F=1.613
P=0.003
Coral
F=2.252
P=0.054
F=12.79
P=0.077
F=1.137
P=0.057
Nucleosides,
Nucleotides
F=6.949
P<0.001
F=18.87
P=0.009
F=3.424
P=0.014
Coral
F=2.022
P=0.081
F=17.58
P=0.014
F=6.701
P=0.012
None obvious
Phosphorus
Metabolism
F=1.498
P=0.198
F=15.65
P=0.029
F=0.809
P=0.020
F=1.904
P=0.099
F=11.50
P=0.118
F=1.033
P=0.532
Photosynthesis
F=10.46
P<0.001
F=29.49
P<0.001
F=0.049
P=0.001
Coral
F=1.722
P=0.137
F=13.53
P=0.060
F=0.050
P=0.074
Potassium
metabolism
F=4.720
P=0.001
F=20.37
P=0.005
F=0.791
P=0.009
Multiple
F=4.634
P=0.001
F=17.35
P=0.015
F=0.680
P=0.103
Protein
Metabolism
F=6.814
P<0.001
F=23.93
P=0.001
F=9.316
P<0.001
Multiple
F=1.631
P=0.160
F=14.17
P=0.048
F=8.448
P=0.074
Cell signaling
F=4.701
F=21.06
F=0.717
Microbialite
F=2.346
F=12.89
F=0.734
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature23
P
=0.001 P
=0.004 P
=0.012 P
=0.046 P
=0.075 P
=0.115
Respiration
F=5.158
P<0.001
F=26.00
P=0.001
F=4.607
P=0.003
Coral
F=3.633
P=0.005
F=14.70
P=0.040
F=3.669
P=0.052
Multiple
RN
A M
etabolism
F=2.740
P=0.021
F=19.41
P=0.007
F=3.858
P=0.144
F=1.348
P=0.259
F=8.769
P=0.270
F=3.721
P=0.122
Secondary
Metabolism
F=1.366
P=0.249
F=13.47
P=0.061
F=0.131
P=0.116
F=1.200
P=0.329
F=10.65
P=0.154
F=0.093
P=0.230
Stress
Response
F=6.162
F<0.001
F=23.40
P=0.001
F=2.616
P=0.018
Coral
Fish
Freshw
ater
F=1.878
P=0.104
F=16.23
P=0.023
F=3.133
P=0.033
Sulfur
Metabolism
F=12.05
P<0.001
F=28.86
P<0.001
F=1.084
P=0.005
Fish
F=2.290
P=0.050
F=10.06
P=0.185
F=1.079
P=0.327
Virulence
F=5.150
P<0.001
F=30.79
P<0.001
F=9.698
F=0.002
Coral
Marine
F=3.953
P=0.003
F=13.67
P=0.057
F=10.65
P=0.208
Microbialite
do
i: 10.10
38
/n
atu
re0
68
10 S
UP
PL
EM
EN
TA
RY
INF
OR
MA
TIO
N
www.nature.com
/nature24
Equator
Tropic of Cancer
Arctic Circle
Pacific Ocean
Atlantic Ocean
2
0�
20�
40�
60�
80�
Subterranean MarineHyper-salineFreshwaterCoralMicrobialiteFishTerrestrial AnimalsMosquito
66
2
65
242
44
4912
3
Figure S1. The sampling location of the metagenomes, circles indicate
microbial and squares viral metagenomes. The number of metagenomes
collected at each site is given, except where only one metagenome per site was
taken.
doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION
www.nature.com/nature 25
Figure S2. Functional diversity of the a) microbial and b) viral metagenomes
quantified as a function of sequence number, suggesting high functional
diversity is gained at low sequence number. Note the different scale on the x-
axis.
doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION
www.nature.com/nature 26
Figure S3. Comparison of mean (± s.e.m.) functional diversity and evenness
between microbial and viral metagenomes and all sequenced bacterial
genomes. Note the different scale on the y-axis.
doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION
www.nature.com/nature 27
Figure S4. The percent of sequences found within the sulfur metabolism
pathways within the microbial metagenomes. The overrepresentation of the a)
alkanesulfonates assimilation, b) alkanesulfonates utilization and c) taurine
utilization subsystem suggests the addition of an organic source of the sulfur,
most likely taurine, whereas the subsystems involved with the utilization of
inorganic sulfur (d) were not overrepresented.
doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION
www.nature.com/nature 28
Figure S5. Canonical discriminant analysis of the a) microbial and b) viral
metagenomes on a reduced set of biomes (subterranean, fish and mosquito
metagenomes removed) to demonstrate the stability of the analysis and
variations in the influence of the potential metabolic processes between
environments.
doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION
www.nature.com/nature 29
Figure S6. Canonical discriminant analysis of the a) microbial and b) viral
metagenomes on a reduced set of biomes (hyper-saline biomes removed) to
demonstrate the stability of the analysis and variations in the influence of the
potential metabolic processes between environments.
doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION
www.nature.com/nature 30