+ All Categories
Home > Documents > Functional metagenomic profiling of nine biomes

Functional metagenomic profiling of nine biomes

Date post: 21-Apr-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
35
LETTERS Functional metagenomic profiling of nine biomes Elizabeth A. Dinsdale 1,5 *, Robert A. Edwards 1,2,3,6 *, Dana Hall 1 , Florent Angly 1,4 , Mya Breitbart 7 , Jennifer M. Brulc 8 , Mike Furlan 1 , Christelle Desnues 1 {, Matthew Haynes 1 , Linlin Li 1 , Lauren McDaniel 7 , Mary Ann Moran 10 , Karen E. Nelson 11 , Christina Nilsson 12 , Robert Olson 6 , John Paul 7 , Beltran Rodriguez Brito 1,4 , Yijun Ruan 12 , Brandon K. Swan 13 , Rick Stevens 6 , David L. Valentine 13 , Rebecca Vega Thurber 1 , Linda Wegley 1 , Bryan A. White 8,9 & Forest Rohwer 1,2 Microbial activities shape the biogeochemistry of the planet 1,2 and macroorganism health 3 . Determining the metabolic processes performed by microbes is important both for understanding and for manipulating ecosystems (for example, disruption of key pro- cesses that lead to disease, conservation of environmental services, and so on). Describing microbial function is hampered by the inability to culture most microbes and by high levels of genomic plasticity. Metagenomic approaches analyse microbial communit- ies to determine the metabolic processes that are important for growth and survival in any given environment. Here we conduct a metagenomic comparison of almost 15 million sequences from 45 distinct microbiomes and, for the first time, 42 distinct viromes and show that there are strongly discriminatory metabolic profiles across environments. Most of the functional diversity was main- tained in all of the communities, but the relative occurrence of metabolisms varied, and the differences between metagenomes predicted the biogeochemical conditions of each environment. The magnitude of the microbial metabolic capabilities encoded by the viromes was extensive, suggesting that they serve as a repo- sitory for storing and sharing genes among their microbial hosts and influence global evolutionary and metabolic processes. Genomic plasticity of microbes causes variations in the gene con- tent of closely related strains 4 , making predictions of community metabolism on the basis of representative genomes and signature genes such as 16S ribosomal RNA unreliable. Although it seems that core genomes are relatively stable and shared among most indivi- duals of the same species, parts of the genome (for example, pro- phages, CRISPRs, pathogenicity/ecological islands, ORFans) are hyper-variable 5 . Together, these two components make up the pan- genome 4 . Unlike the signature genes approach, metagenomic approaches analyse the complete genetic information of microbial and viral communities 6,7 . In this way, the relative abundances of all genes can be determined and used to generate a description of the functional potential of each community 8–14 . Here we use a comparative metagenomic approach to statistically analyse the frequency distribution of 14,585,213 microbial and viral metagenomic sequences to elucidate the functional potential of nine biomes including: subterranean (that is, mine samples); hypersaline ponds from solar salterns; marine; freshwater; coral- associated; microbialites (including stromatolites and thrombolites); aquaculture-fish-associated; terrestrial-animal-associated; and mosquito-associated (details in Supplementary Table 1 and Supplementary Fig. 1). Microbial and viral metagenomes (Supplementary Fig. 2 and Supplementary Table 2) were isolated and pyrosequenced. The sequences were compared to the 2007 SEED platform (http://www.theseed.org) using the BLASTX algo- rithm, and hits with an E-value of ,0.001 were considered to be significant (Methods). A total of 1,040,665 sequences from the 45 microbial metagenomes and 541,979 sequences from the 42 viral metagenomes were significantly similar to functional genes within the SEED (Supplementary Table 1). The SEED arranges metabolic pathways into a hierarchical structure in which all of the genes required for a specific task are arranged into subsystems 15 . At the highest level of organization, the subsystems include both catabolic and anabolic functions (for example, DNA metabolism) and at the lowest levels the subsystems are specific pathways (for example, the synthesis pathway for thymidine). Table 1 shows the relative abundances of sequences assigned to each major subsystem in the combined analysis of the microbiomes *These authors contributed equally to this work. 1 Department of Biology, 2 Center for Microbial Sciences, 3 Department of Computer Sciences, and 4 Computational Science Research Centre, San Diego State University, San Diego, California 92182, USA. 5 School of Biological Sciences, Flinders University, Adelaide, South Australia 5042, Australia. 6 Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois 60439, USA. 7 University of South Florida, College of Marine Science, 140 7th Avenue South, St Petersburg, Florida 33701, USA. 8 Department of Animal Sciences, and 9 The Institute for Genomic Biology, University of Illinois, Urbana, Illinois 61801, USA. 10 Department of Marine Sciences, University of Georgia, Athens, 30602 Georgia, USA. 11 The J. Craig Venter Institute, 9712 Medical Center Drive, Rockville, Maryland 20850, USA. 12 Genome Institute of Singapore, 60 Biopolis Street, 02-01, Genome, Singapore 138672, Singapore. 13 Department of Earth Science, University of California Santa Barbara, Santa Barbara, California 93106, USA. {Present address: Unite ´ des Rickettsies, CNRS-UMR 6020, Faculte ´ de me ´decine, 13385 Marseille, France. Table 1 | Mean percentage of sequences (6 s.e.m.) similar to major metabolisms Metabolic category Microbial metagenomes Viral metagenomes Carbohydrates 17.218 (6 0.648) 14.353 (6 0.718) Amino acids 12.036 (6 0.491) 10.132 (6 0.642) Virulence 9.788 (6 0.339) 11.175 (6 0.508) Protein metabolism 9.123 (6 0.497) 8.838 (6 0.522) Respiration 7.139 (6 1.285) 3.718 (6 0.276) Photosynthesis 6.965 (6 2.148) 1.984 (6 0.554) Cofactors, vitamins, and so on 5.411 (6 0.226) 6.661 (6 0.393) RNA metabolism 3.971 (6 0.195) 4.324 (6 0.387) DNA metabolism 3.970 (6 0.180) 7.555 (6 0.943) Nucleosides and nucleotides 3.316 (6 0.149) 7.666 (6 0.817) Cell wall and capsule 3.235 (6 0.223) 5.098 (6 0.649) Fatty acids and lipids 3.095 (6 0.160) 3.002 (6 0.242) Membrane transport 2.736 (6 0.158) 2.371 (6 0.182) Stress response 2.599 (6 0.115) 3.354 (6 0.326) Aromatic compounds 2.351 (6 0.175) 2.550 (6 0.340) Cell division and cell cycle 1.791 (6 0.091) 1.983 (6 0.212) Nitrogen metabolism 1.547 (6 0.070) 1.135 (6 0.093) Sulphur metabolism 1.230 (6 0.102) 1.302 (6 0.134) Motility and chemotaxis 1.022 (6 0.096) 1.011 (6 0.083) Phosphorus metabolism 0.909 (6 0.080) 1.319 (6 0.167) Cell signalling 0.885 (6 0.076) 0.885 (6 0.072) Potassium metabolism 0.796 (6 0.048) 0.846 (6 0.079) Secondary metabolism 0.159 (6 0.014) 0.235 (6 0.047) doi:10.1038/nature06810 1 Nature Publishing Group ©2008
Transcript

LETTERS

Functional metagenomic profiling of nine biomesElizabeth A. Dinsdale1,5*, Robert A. Edwards1,2,3,6*, Dana Hall1, Florent Angly1,4, Mya Breitbart7, Jennifer M. Brulc8,Mike Furlan1, Christelle Desnues1{, Matthew Haynes1, Linlin Li1, Lauren McDaniel7, Mary Ann Moran10,Karen E. Nelson11, Christina Nilsson12, Robert Olson6, John Paul7, Beltran Rodriguez Brito1,4, Yijun Ruan12,Brandon K. Swan13, Rick Stevens6, David L. Valentine13, Rebecca Vega Thurber1, Linda Wegley1, Bryan A. White8,9

& Forest Rohwer1,2

Microbial activities shape the biogeochemistry of the planet1,2 andmacroorganism health3. Determining the metabolic processesperformed by microbes is important both for understanding andfor manipulating ecosystems (for example, disruption of key pro-cesses that lead to disease, conservation of environmental services,and so on). Describing microbial function is hampered by theinability to culture most microbes and by high levels of genomicplasticity. Metagenomic approaches analyse microbial communit-ies to determine the metabolic processes that are important forgrowth and survival in any given environment. Here we conduct ametagenomic comparison of almost 15 million sequences from 45distinct microbiomes and, for the first time, 42 distinct viromesand show that there are strongly discriminatory metabolic profilesacross environments. Most of the functional diversity was main-tained in all of the communities, but the relative occurrence ofmetabolisms varied, and the differences between metagenomespredicted the biogeochemical conditions of each environment.The magnitude of the microbial metabolic capabilities encodedby the viromes was extensive, suggesting that they serve as a repo-sitory for storing and sharing genes among their microbial hostsand influence global evolutionary and metabolic processes.

Genomic plasticity of microbes causes variations in the gene con-tent of closely related strains4, making predictions of communitymetabolism on the basis of representative genomes and signaturegenes such as 16S ribosomal RNA unreliable. Although it seems thatcore genomes are relatively stable and shared among most indivi-duals of the same species, parts of the genome (for example, pro-phages, CRISPRs, pathogenicity/ecological islands, ORFans) arehyper-variable5. Together, these two components make up the pan-genome4. Unlike the signature genes approach, metagenomicapproaches analyse the complete genetic information of microbialand viral communities6,7. In this way, the relative abundances of allgenes can be determined and used to generate a description of thefunctional potential of each community8–14.

Here we use a comparative metagenomic approach to statisticallyanalyse the frequency distribution of 14,585,213 microbial and viralmetagenomic sequences to elucidate the functional potential ofnine biomes including: subterranean (that is, mine samples);hypersaline ponds from solar salterns; marine; freshwater; coral-associated; microbialites (including stromatolites and thrombolites);aquaculture-fish-associated; terrestrial-animal-associated; and

mosquito-associated (details in Supplementary Table 1 andSupplementary Fig. 1). Microbial and viral metagenomes(Supplementary Fig. 2 and Supplementary Table 2) were isolatedand pyrosequenced. The sequences were compared to the 2007SEED platform (http://www.theseed.org) using the BLASTX algo-rithm, and hits with an E-value of ,0.001 were considered to besignificant (Methods). A total of 1,040,665 sequences from the 45microbial metagenomes and 541,979 sequences from the 42 viralmetagenomes were significantly similar to functional genes withinthe SEED (Supplementary Table 1). The SEED arranges metabolicpathways into a hierarchical structure in which all of the genesrequired for a specific task are arranged into subsystems15. At thehighest level of organization, the subsystems include both catabolicand anabolic functions (for example, DNA metabolism) and at thelowest levels the subsystems are specific pathways (for example, thesynthesis pathway for thymidine).

Table 1 shows the relative abundances of sequences assigned toeach major subsystem in the combined analysis of the microbiomes

*These authors contributed equally to this work.

1Department of Biology, 2Center for Microbial Sciences, 3Department of Computer Sciences, and 4Computational Science Research Centre, San Diego State University, San Diego,California 92182, USA. 5School of Biological Sciences, Flinders University, Adelaide, South Australia 5042, Australia. 6Mathematics and Computer Science Division, Argonne NationalLaboratory, Argonne, Illinois 60439, USA. 7University of South Florida, College of Marine Science, 140 7th Avenue South, St Petersburg, Florida 33701, USA. 8Department of AnimalSciences, and 9The Institute for Genomic Biology, University of Illinois, Urbana, Illinois 61801, USA. 10Department of Marine Sciences, University of Georgia, Athens, 30602 Georgia,USA. 11The J. Craig Venter Institute, 9712 Medical Center Drive, Rockville, Maryland 20850, USA. 12Genome Institute of Singapore, 60 Biopolis Street, 02-01, Genome, Singapore138672, Singapore. 13Department of Earth Science, University of California Santa Barbara, Santa Barbara, California 93106, USA. {Present address: Unite des Rickettsies, CNRS-UMR6020, Faculte de medecine, 13385 Marseille, France.

Table 1 | Mean percentage of sequences (6 s.e.m.) similar to majormetabolisms

Metabolic category Microbial metagenomes Viral metagenomes

Carbohydrates 17.218 (6 0.648) 14.353 (6 0.718)Amino acids 12.036 (6 0.491) 10.132 (6 0.642)Virulence 9.788 (6 0.339) 11.175 (6 0.508)Protein metabolism 9.123 (6 0.497) 8.838 (6 0.522)Respiration 7.139 (6 1.285) 3.718 (6 0.276)Photosynthesis 6.965 (6 2.148) 1.984 (6 0.554)Cofactors, vitamins, and so on 5.411 (6 0.226) 6.661 (6 0.393)RNA metabolism 3.971 (6 0.195) 4.324 (6 0.387)DNA metabolism 3.970 (6 0.180) 7.555 (6 0.943)Nucleosides and nucleotides 3.316 (6 0.149) 7.666 (6 0.817)Cell wall and capsule 3.235 (6 0.223) 5.098 (6 0.649)Fatty acids and lipids 3.095 (6 0.160) 3.002 (6 0.242)Membrane transport 2.736 (6 0.158) 2.371 (6 0.182)Stress response 2.599 (6 0.115) 3.354 (6 0.326)Aromatic compounds 2.351 (6 0.175) 2.550 (6 0.340)Cell division and cell cycle 1.791 (6 0.091) 1.983 (6 0.212)Nitrogen metabolism 1.547 (6 0.070) 1.135 (6 0.093)Sulphur metabolism 1.230 (6 0.102) 1.302 (6 0.134)Motility and chemotaxis 1.022 (6 0.096) 1.011 (6 0.083)Phosphorus metabolism 0.909 (6 0.080) 1.319 (6 0.167)Cell signalling 0.885 (6 0.076) 0.885 (6 0.072)Potassium metabolism 0.796 (6 0.048) 0.846 (6 0.079)Secondary metabolism 0.159 (6 0.014) 0.235 (6 0.047)

doi:10.1038/nature06810

1Nature Publishing Group©2008

compared with the viromes. Over 30% of the identifiable genes inthe microbiomes were associated with carbohydrate or protein meta-bolism. Respiration and photosynthesis subsystems accounted for anadditional ,15% of the similarities. Subsystems responsible for nuc-leic acid metabolism and virulence were overrepresented in the viralfractions (Table 1), whereas respiration and photosynthesis geneswere less frequent.

The functional diversity represented by the metagenomesapproached its theoretical limit of 2.81 in all environments(Table 2), showing that most subsystems were represented in all ofthe samples. Only the coral-associated microbes showed a lower func-tional diversity; this is because they have fewer secondary metabo-lisms, virulence pathways, cell signalling pathways and membranetransport pathways. Because microbes associated with corals are taxo-nomically diverse11, functional reduction may have occurred in thesecommunities, similar to microbes in other symbiotic relationships16.

Diversity is a function of both richness (that is, the number ofmetabolic processes) and evenness (that is, the relative abundanceof a particular metabolic process in a sample). The evenness for themetagenomes was very low (,0.1; Table 2 and Supplementary Fig.3), showing that there are a few dominant metabolisms in eachenvironment. Differential dominant metabolisms suggest that thereare characteristic functional profiles of the metagenomes.

To test the hypothesis that each environment has a distinguishingmetabolic profile, a canonical discriminant analysis (CDA) was con-ducted (Fig. 1). Most of the variance between the different environ-ments (79.8% of the combined microbiome and 69.9% of thevirome) was explained in this analysis, showing that metagenomesare highly predictive of metabolic potential within an ecosystem. Incontrast, a recent analysis of 16S rRNA genes from multiple environ-ments only explained about 10% of the variance17, suggesting thatdifferent ecosystems cannot be distinguished by their taxa.

The position of each metagenome in Fig. 1 reflects the frequencycombination of sequences associated with each subsystem; the vec-tors indicate which metabolisms most strongly determined the dis-tribution. Using these as clues, it is possible to determine whichmetabolisms are important for the organisms in that environmentrelative to other environments. For example, subsystems involved inrespiration and protein metabolism placed the coral-associatedmicrobes apart from the microbes found within terrestrial animals.This trend is visualized in Fig. 2, which shows that ,20% of the coral-associated microbial genes were involved in respiration, comparedwith only 3% in the microbiomes associated with terrestrial animals.The relatively high occurrence of respiration-associated genes in thecoral-associated microbiomes reflects the diurnally fluctuatingoxygen environment, which is supersaturated with oxygen in theday and essentially anaerobic at night18. In contrast, microbes livingwithin the stable anaerobic alimentary tracts of terrestrial animals areless likely to experience selection for multiple respiration pathways.

Similarly, virulence genes were proportionally more abundantin the organism-associated microbes than in free-living microbes.These are the factors necessary to facilitate symbiotic relationships(mutualism, parasitism or commensalisms; Fig. 2f–h). Another

example of the predictive power of the metagenomes is the sulphurmetabolisms associated with aquaculture fish. In particular, two sub-systems—alkanesulphonate and taurine metabolism—were overre-presented in fish-associated metagenomes (Supplementary Fig. 4).Alkanesulphonates are involved in the use of both inorganic andorganic sulphur, such as taurine and aliphatic sulphonates19 (taurineis a sulphur organic acid used to supplement aquaculture fish food20).

Table 2 | Mean functional diversity and evenness (6 s.e.m.) of metagenomes, sampled from nine environments

Functional diversity (H9) Functional evenness

Biome Microbial Viral Microbial Viral

Subterranean 2.393 (6 0.030) 0.005 (6 1.2 3 1024)

Hypersaline 2.361 (6 0.006) 2.041 (6 0.021) 0.005 (6 1.4 3 1024) 0.012 (6 5.6 3 10

24)Marine 2.313 (6 0.021) 2.162 (6 0.026) 0.005 (6 0.9 3 10

24) 0.007 (6 4.0 3 1024)

Freshwater 2.430 (6 0.003) 2.080 (6 0.034) 0.005 (6 0.9 3 1024) 0.010 (6 6.7 3 10

24)Coral 1.733 (6 0.059) 2.289 (6 0.023) 0.009 (6 5.2 3 10

24) 0.007 (6 1.1 3 1024)

Microbialites 2.408 (6 0.015) 1.743 (6 0.115) 0.005 (6 3.8 3 1024) 0.019 (6 6.9 3 10

23)Fish 2.447 (6 0.001) 2.439 (6 3.1310

24) 0.005 (6 0.4 3 1024) 0.005 (6 0.7 3 10

24)Terrestrial animals 2.428 (6 0.006) 2.016 (6 0.173) 0.004 (6 0.1 3 10

24) 0.017 (6 4.5 3 1023)

Mosquito 2.395 (6 0.015) 0.004 (6 0.5 3 1024)

There are no subterranean viral metagenomes and no mosquito microbial metagenomes.

Cell wall

VirulenceMembrane transport

StressSulphur

Signalling

Motility

Respiration

Protein

Canonical discriminant function 1 (48.0%)

SubterraneanHypersalineMarineFreshwaterCoralMicrobialitesFishTerrestrial animalsMosquito

Membrane transportCarbohydrates

Fatty acids

Secondary metabolites

PhosphorusVirulence Cell division

DNA

Potassium

Motility

Canonical discriminant function 1 (38.9%)

Can

onic

al d

iscr

imin

ant f

unct

ion

2 (3

1.0%

)C

anon

ical

dis

crim

inan

t fun

ctio

n 2

(31.

9%)

a

b

Figure 1 | Functional analysis of microbial and viral metagenomes. TheCDA of the microbial (a) and viral (b) metagenomes identified that themetabolic processes grouped these communities in the two-dimensionalspaced described by canonical discriminant functions 1 and 2. The symbolsrepresent the position of each metagenome and the vectors represent thestructural matrix for subsystems that were identified as influencing theseparation of the metagenomes using the stepwise procedure. The length ofthe vectors represents the strength of influence of the particular metabolicprocess. The cross-validation scores for the microbial and viralmetagenomes were 66.7 and 59.9%, respectively.

LETTERS NATURE

2Nature Publishing Group©2008

Together, these examples show that metagenomes predict important,emergent biological characters of the environments. By substitutingenvironmental groups in multiple CDAs, the predictive nature ofmetagenomes was confirmed (Supplementary Figs 5 and 6).

Shifting of a metagenome from its sister group in the CDA was alsopredictive of ecological differences. For example, one of the marinemetagenomes (number 27 Supplementary Table 1) was positionedmore negatively than the rest of the marine metagenomes (Fig. 1a).This sample was taken from waters that were unusually rich in nitro-gen, phosphate and dissolved organic carbon21. The ability to deter-mine subtle differences in metabolic potential will allow the detectionof environmental changes at early stages of perturbation and identifypreviously unknown pathways for therapeutics.

The viromes are dominated by phage, which are expected to havesimilar lifestyles in every environment (infection, replication, hostlysis and release of free virions). Phage have also been shown to movebetween environments22, which suggests that their metabolic profilesare similar in different ecosystems. In contrast, other studies haveshown that phage carry ‘specialization’ genes23, including phosphatemetabolism24 and cyanobacterial photosystems25, to manipulate hostmetabolisms associated with a particular ecosystem. Phage ‘sample’their host’s genetic material and incorporate extra pieces of DNAcalled MORONS26, suggesting that phage metagenomes may insteadshow distinctive profiles based on their environment. As shown inFigs 1b and 2, the viromes have highly predictive metabolic profilesthat suggest enrichment for specific genes in different environments,and thus support the latter hypothesis (69.9% of the variance).

Because phages and viruses are non-motile, the abundance ofmotility and chemotaxis proteins within the combined viral

metagenomes was the most unexpected example of specialized meta-bolisms being carried within the viromes (Fig. 3). A total of 130SEED-annotated motility and chemotaxis proteins (out of a possible157) were present in the viromes. There was a non-random acquisi-tion of these proteins by the viral community, shown by the variationin relative abundances of these proteins between the microbial andviral metagenomes (Supplementary Table 3). In the viromes, flagellarbiosynthesis protein FlhA, the chemotaxis response regulatorproteins CheA and CheB and deacylases were overrepresented(Supplementary Table 3), whereas the twitching motility proteinPilT, type II secretary pathways and GldJ were overrepresented inthe microbiomes. cheA and cheB genes within microbes worktogether to control flagella motor switching rates27, but their rolewithin the phage remains an outstanding question.

Essentially all of the functional diversity was represented in theviromes. Unlike their cellular hosts, most viruses must carry a specificamount of DNA to correctly pack their capsids (that is, viruses arenot evolutionarily penalized for carrying ‘extra’ DNA). If there is aselective advantage of the extra DNA (resulting in increased phageprogeny), these genes are fixed in the phage genome; otherwise theywill be lost. Because there are an estimated 1031 phages on the planetand they can move between environments, the potential reservoir ofgenes that can be transferred both locally and globally12 by phage isenormous28. As our research shows, there is little restriction to thetypes of genes carried by the viral community, suggesting that theyinfluence a wide range of processes, including biogeochemical cyc-ling, short-term adaptation and long-term evolution of microbes.

The low functional evenness measured for both microbial and viralmetagenomes is even lower than the functional diversity calculatedfor individual bacterial genomes (Table 2 and Supplementary Fig. 3).This finding is diametrically opposed to the high taxonomic evennessreported for both microbial and viral communities2,12, ranging from0.6 to 1 for human faecal and marine viruses9,12 and about 0.9 for soilmicrobes29. To resolve this apparent dilemma, we propose that thefrequency of a gene encoding a particular metabolic function reflectsits relative importance in an environment, and that genetic sweepsfavour particular gene frequencies regardless of their taxonomicalbackground. That is, rather than changing taxa, variation in genecontent, presumably by means of horizontal gene transfer30 between

Microbial metagenomes

a Subterranean e Coral i Mosquito m Coral

b Hypersaline f Microbialites j Hypersaline n Microbialites

c Marine g Fish k Marine o Fish

l Freshwater p Terrestrial animals

2520

1510

50

2520

1510

50

2520

1510

50

2520

1510

50

Viral metagenomes

h Terrestrial animalsd Freshwater

Viru

lenc

eP

rote

inR

esp

iratio

nC

ell w

all

Mem

bra

ne t

rans

por

tS

tres

sM

otili

tyC

ell s

igna

lling

Sul

phu

r

Viru

lenc

eP

rote

inR

esp

iratio

nC

ell w

all

Mem

bra

ne t

rans

por

tS

tres

sM

otili

tyC

ell s

igna

lling

Sul

phu

r

Car

boh

ydra

tes

Viru

lenc

eD

NA

met

abol

ism

Cel

l div

isio

nFa

tty

acid

sM

emb

rane

tra

nsp

ort

Mot

ility

Pho

spho

rus

Pot

assi

umS

econ

dar

y m

etab

olite

s

Car

boh

ydra

tes

Viru

lenc

eD

NA

met

abol

ism

Cel

l div

isio

nFa

tty

acid

sM

emb

rane

tra

nsp

ort

Mot

ility

Pho

spho

rus

Pot

assi

umS

econ

dar

y m

etab

olite

s

Metabolic processes

Per

cent

age

of s

eque

nces

(%)

Figure 2 | A one-dimensional representation of the environmentalmetabolic profiles for the microbial and viral metagenomes sampled fromthe nine environments. Microbial metagenomes are shown in a–h, and viralmetagenomes are shown in i–p. Each bar represents the mean for eachmetabolic category. For subterranean, n 5 2 (a); for hypersaline, n 5 9(b); for marine, n 5 8 (c); for freshwater, n 5 4 (d); for coral, n 5 7 (e);for microbialites, n 5 3 (f); for fish, n 5 4 (g); for terrestrial animals,n 5 8 (h); for mosquito, n 5 3 (i); for hypersaline, n 5 12 (j); for marine,n 5 10 (k); for freshwater n 5 4 (l); for coral n 5 6 (m); for microbialites,n 5 3 (n); for fish, n 5 2 (o); and for terrestrial animals, n 5 2 (p).

Subte

rrane

an

Hyper

salin

e

Mar

ine

Fres

hwat

erCor

al

Micr

obial

ites

Fish

Terre

strial

anim

als

Hyper

salin

e

Mar

ine

Fres

hwat

erCor

al

Micr

obial

itesFis

h

Terre

strial

anim

als

Mos

quito

2.52.0

1.51.00.5

0

2.52.0

1.51.00.5

0

2.52.0

1.51.00.5

0

2.52.0

1.51.00.5

0

2.52.0

1.51.00.5

0

2.52.0

1.51.00.5

0

Microbial metagenomes Viral metagenomes

Per

cent

age

of s

eque

nces

(%)

a

b

c

Biome

Figure 3 | A comparison of the distribution of sequences similar to motilityand chemotaxis genes identified within the microbiomes (n 5 43) andviromes (n 5 41). Microbial metagenomes are shown on the left, and viralmetagenomes are shown on the right. The abundance of sequences identifiedwithin each of three fine-scale subsystems including flagellum (a), bacterialchemotaxis (b) and gliding motility (c), as described by the SEED platform.

NATURE LETTERS

3Nature Publishing Group©2008

sympatric microbes, is controlling gene distribution within anenvironment. The large amount of variation (,70%) explained bythe functional analysis presented here supports this hypothesis.

METHODS SUMMARY

Samples for metagenomes were collected and fractioned using standard tech-

niques, sequenced using pyrosequencing and compared to the functional genes

in the SEED platform11,12 (Methods). All statistics were performed on the per-

centage of sequences showing similarities to known functions. For the CDA,

sequences were grouped according to the SEED classification scheme and the

analysis was conducted on the principal metabolic functions. The CDA builds a

model for group membership. A discriminative value is calculated for each

metagenomic sample, which is a linear combination of the response variables

(metabolic processes) represented in the new dimensional space. These values

are used to visualize group membership.

An advantage of the CDA is that it identifies which variables are driving the

separation between the groups; it uses these to build the model and discards

those that are not influential. Identification of influential variables was con-

ducted by a stepwise method, using Wilk’s lambda with P 5 0.05, and was con-

firmed with analysis of variance (ANOVA; Supplementary Table 4). The level of

influence of each variable is provided by the structural matrix and can be visua-

lized using an h-plot, in which the length of the line is representative of the level

of influence. The CDA also performs a cross-validation analysis that identifies

the likelihood of correctly classifying each sample. Cross validation removes the

predetermined grouping for each sample and uses the response variables to align

the individual sample to a group. Because the data were divided into nine

predetermined groups (biomes), the number of samples correctly identified by

chance alone is 11%. The percentage-correct classification has to be substantially

larger than this number for the metabolic processes to be useful for classifying the

metagenomes into environments.

Full Methods and any associated references are available in the online version ofthe paper at www.nature.com/nature.

Received 18 November 2007; accepted 6 February 2008.Published online 12 March 2008.

1. Newman, D. K. & Banfield, J. F. Geomicrobiology: how molecular-scaleinteractions underpin biogeochemical systems. Science 296, 1071–1076 (2002).

2. Prosser, J. I. et al. The role of ecological theory in microbial ecology. Nature Rev.Microbiol. 5, 384–392 (2007).

3. Ley, R. E., Turnbaugh, P. J., Klein, S. & Gordon, J. I. Microbial ecology: human gutmicrobes associated with obesity. Nature 444, 1022–1023 (2006).

4. Medini, D. et al. The microbial pan-genome. Curr. Opin. Genet. Dev. 15, 589–594(2005).

5. Coleman, M. L. et al. Genomic islands and the ecology and evolution ofProchlorococcus. Science 311, 1768–1770 (2006).

6. DeLong, E. F. et al. Community genomics among stratified microbial assemblagesin the ocean’s interior. Science 311, 496–503 (2006).

7. Tringe, S. G. & Rubin, E. M. Metagenomics: DNA sequencing of environmentalsamples. Nature Rev. Genet. 6, 805–814 (2005).

8. Edwards, R. A. et al. Using pyrosequencing to shed light on deep mine microbialecology. BMC Genomics 7, 57 (2006).

9. Breitbart, M. et al. Metagenomic analyses of an uncultured viral community fromhuman feces. J. Bacteriol. 185, 6220–6223 (2003).

10. Breitbart, M. et al. Genomic analysis of uncultured marine viral communities. Proc.Natl Acad. Sci. USA 99, 14250–14255 (2002).

11. Wegley, L., Breitbart, M., Edwards, R. A. & Rohwer, F. Metagenomic analysis of themicrobial community associated with the coral Porites astreoides. Environ.Microbiol. 9, 2707–2719 (2007).

12. Angly, F. et al. The marine viromes of four oceanic regions. PLoS Biol. 4, e368(2006).

13. Breitbart, M. & Rohwer, F. Method for discovering novel DNA viruses in bloodusing viral particle selection and shotgun sequencing. Biotechniques 39, 729–736(2005).

14. Fierer, N. et al. Metagenomic and small-subunit rRNA analyses reveal the geneticdiversity of Bacteria, Archaea, Fungi, and viruses in soil. Appl. Environ. Microbiol.73, 7059–7066 (2007).

15. Overbeek, R. et al. The subsystems approach to genome annotation and its use inthe project to annotate 1000 genomes. Nucleic Acids Res. 33, 5691–5702 (2005).

16. Mira, A., Ochman, H. & Moran, N. A. Deletional bias and the evolution of bacterialgenomes. Trends Microbiol. 17, 589–596 (2001).

17. Lozupone, C. A. & Knight, R. Global patterns in bacterial diversity. Proc. Natl Acad.Sci. USA 104, 11436–11440 (2007).

18. Shashar, N., Cohen, Y. & Loya, Y. Extreme diel fluctuations of oxygen in diffusiveboundary layers surrounding stony corals. Biol. Bull. 185, 455–461 (1993).

19. Iwanicka-Nowicka, R. et al. Regulation of sulfur assimilation pathways inBurkholderia cenocepacia: identification of transcription factors CysB and SsuR andtheir role in control of target genes. J. Bacteriol. 189, 1675–1688 (2007).

20. Aksnes, A., Hope, B., Hostmark, O. & Albrektsen, S. Inclusion of size fractionatedfish hydrolysate in high plant protein diets for Atlantic cod, Gadus morhua.Aquaculture 261, 1102–1110 (2006).

21. Dinsdale, E. A. et al. Microbial ecology of four coral atolls in the Northern LineIslands. Plos One 3, e1584 (2008).

22. Sano, E., Carlson, S., Wegley, L. & Rohwer, F. Movement of virus between biomes.Appl. Environ. Microbiol. 70, 5842–5846 (2004).

23. Davis, B. M. & Waldor, K. in Mobile DNA II (ed. Craig, N. L., Gragie, R., Gellert, M. &Lambowitz, A. M.) 1040–1055 (ASM, Washington DC, 2002).

24. Rohwer, F. et al. The complete genomic sequence of the marine phageRoseophage SIO1 shares homology with nonmarine phages. Limnol. Oceanogr. 45,408–418 (2000).

25. Mann, N. et al. Marine ecosystems: bacterial photosynthesis genes in a virus.Nature 424, 741 (2003).

26. Hendrix, R. W., Smith, M. C. M., Burns, R. N. & Ford, M. E. Evolutionaryrelationships among diverse bacteriophages and prophages: all the world’s aphage. Proc. Natl Acad. Sci. USA 96, 2192–2197 (1999).

27. Wadhams, G. H. & Armitage, J. P. Making sense of it all: bacterial chemotaxis.Nature Rev. Mol. Cell Biol. 5, 1024–1037 (2004).

28. Hendrix, R. W. Bacteriophage: evolution of the majority. Theor. Popul. Biol. 61,471–480 (2002).

29. Dunbar, J., Ticknor, L. O. & Kuske, C. R. Assessment of microbial diversity in foursouthwestern United States soils by 16S rRNA gene terminal restriction fragmentanalysis. Appl. Environ. Microbiol. 66, 2943–2950 (2000).

30. Frigaard, N.-U., Martinez, A., Mincer, T. J. & Delong, E. F. Proteorhodopsin lateralgene transfer between marine planktonic Bacteria and Archaea. Nature 439,847–850 (2006).

Supplementary Information is linked to the online version of the paper atwww.nature.com/nature.

Acknowledgements This project was supported by the Gordon and Betty MooreFoundation Marine Microbial Initiative, National Science Foundation grants (F.R.and D.L.V.), a Department of Commerce ATP grant (F.R.), a National ResearchInitiative Competitive Grant from the USDA Cooperative State Research,Education and Extension Service (B.W.), the National Institute of Allergy andInfectious Diseases, the National Institutes of Health and the Department of Healthand Human Services (R.S.).

Author Contributions E.A.D. conceptualized the project, conducted the CDA andwrote the manuscript. R.A.E., R.O. and R.S. performed the bioinformatics. D.H.conducted the non-parametric statistical analysis. F.R. oversaw most of themetagenomic projects. All other authors collected the metagenomes and providedcomments on the manuscript.

Author Information The metagenomes used in this paper are freely available fromthe SEED platform and are being made accessible from CAMERA and the NCBIShort Read Archive. The accession numbers are shown in Supplementary Table 1.The NCBI genome project IDs used in this study are: 28619, 28613, 28611, 28609,28607, 28605, 28603, 28601, 28599, 28597, 28469, 28467, 28465, 28463,28461, 28459, 28457, 28455, 28453, 28451, 28449, 28447, 28445, 28443,28441, 28439, 28437, 28435, 28433, 28431, 28429, 28427, 28425, 28423,28421, 28419, 28417, 28415, 28413, 28411, 28409, 28407, 28405, 28403, 28401,28395, 28393, 28391, 28389, 28387, 28385, 28383, 28381, 28379, 28377, 28375,28373, 28371, 28361, 28359, 28357, 28355, 28353 and 28351. Reprints andpermissions information is available at www.nature.com/reprints.Correspondence and requests for materials should be addressed to E.A.D.([email protected]).

LETTERS NATURE

4Nature Publishing Group©2008

METHODSCollection of the metagenomes. Metagenomic samples were collected and DNA

was prepared by the different groups involved; each laboratory used slight

modifications on the basic protocol. Sample locations were widely dispersed

or separate organisms (Supplementary Fig. 1 and Supplementary Table 1).

Metagenomes were collected to answer broad ecological questions such as viral

community dynamics in the lungs of healthy and cystic fibrosis patients and the

microbial communities on coral reefs (Supplementary Table 1). Typically, the

microbiome process starts by filtering samples onto 0.22mm Sterivex filters,

removing the filter membranes and extracting DNA using a bead-beating pro-tocol (MoBio). In some samples, the DNA was amplified with Genomiphi (GE

Healthcare Life Sciences) in six to eight 18-h reactions22,31. The reactions were

pooled and purified using silica columns (Qiagen). The DNA was precipitated

with ethanol and resuspended in water at a concentration of approximately

300 ngml21. Microbial metagenomes capture Bacteria, Archaea, some small

protists as well as a few trapped viral-like particles (Supplementary Table 2).

The viruses in the small metagenomic fractions (that is, 0.22-mm filtrate

treated with chloroform) were purified using caesium chloride (CsCl) step gra-

dients to remove free DNA and any cellular material10,12. Viral samples were

visually checked for microbial contamination using epifluorescent microscopy.

Viral DNA was isolated using CTAB (cyltrimethylammonium bromide) and

25:24:1 phenol:chloroform:isoamyl alcohol mix extractions and amplified using

Genomiphi reactions. These reactions were pooled and purified using silica

columns (Qiagen). The DNA was precipitated with ethanol and resuspended

in water at a concentration of approximately 300 ngml21. One viral metagenome

(number 40, Supplementary Table 1) was prepared by concentrating a natural

microbial sample and inducing it with mitomycin C. All metagenome libraries

consisted of approximately 5 mg of DNA. The viral metagenomes containedviruses, phage and prophage, and as expected the proportion of phage and

prophage are higher in these metagenomes than in the microbial fraction

(Supplementary Table 2).

Sequencing and bioinformatics. Sequencing was performed using pyrosequen-

cing on Roche Applied Sciences and 454 Life Sciences GS20 platforms32 with a

practical limit of 105 bp. DNA sequences were analysed in the metagenomics

RAST pipeline—an open-access metagenome curation and analysis platform

(http://metagenomics.theseed.org/)33. First, sequences were screened to remove

exactly duplicated sequences—a known artefact of the pyrosequencing

approach. The sequences were compared to the SEED platform, which comprises

all known protein sequences, using the NCBI BLASTX algorithm on the NMPDR

compute cluster (Argonne National Laboratory; http://www.nmpdr.org/). The

SEED platform includes all available genome data, DNA and protein sequences,

and is supplemented with data from genome sequencing centres as available.

Every metagenome was compared to exactly the same data set using the same

BLAST parameters at the same time to ensure congruity of the data. Connections

between the metagenomes and the SEED subsystems were calculated by iden-

tifying matches to the SEED platform where the matched protein was curated tobe in a subsystem, and the expect value from the BLAST search was less than

0.001. The SEED subsystems are manually curated collections of proteins with

related functions and are available at http://www.theseed.org/. Simultaneously,

all sequences were compared to the 16S databases using BLASTN. The databases

were extracted from GreenGenes34, the Ribosomal Database Project35 and the

European Ribosomal Database Project36.

Several metagenomes were constructed from environments that were likely to

contain DNA from other organisms such as humans, corals and mosquitoes. To

test and to remove contaminants, 20,000 sequences were chosen at random from

every metagenome and compared to the March 2006 build of the human genome

and the February 2003 build of the Anopheles gambiae genome (both down-

loaded from http://genome.ucsc.edu/). The comparisons were performed using

BLASTN with an expect (E) value cutoff of 1 3 1025. Every sample (including

the mosquito samples) had less than 1% of their sequences with significant

similarity to the A. gambiae genome, and only two samples had .5% of sequence

similarity to the human genome. These two samples, from the human virome

studies, were compared in full and human sequences excluded. To identify and

remove dinoflagellate sequences, such as Symbiodinium (a coral symbiont), a

custom database was created from the nucleotide and RNA (expressed sequence

tag) sequences in GenBank; all coral reef water and coral samples were analysed

as described above and dinoflagellates sequences were excluded.

Statistical analysis. Statistics were performed on the proportions of sequences

within each subsystem, thus normalizing data across metagenomes and

removing differences in reaction efficiencies. Total numbers of sequences and

numbers of sequences that showed similarities to the SEED are provided in

Supplementary Table 1, and ,11% of sequences were similar to functional

genes. The SEED platform housed 654 well-documented subsystems that were

used to calculate the Shannon index (H9). Maximum diversity occurs when every

functional category is present in equal numbers, thus Hmax 5 log S, where S is

number of categories. Evenness is H9 divided by the number of subsystems in

each sample (evenness ranges from 0 to 1, which is even). As a comparison to the

metagenomic analyses, the diversity and evenness was calculated for all 842

sequenced bacterial genomes. These calculations were conducted on the number

of genes within each subsystem, rather than on the number of sequences that was

used for the metagenomes (Supplementary Fig. 3).

To analyse the stability of the CDAs, an experiment was conducted in which

several of the metagenomic groups were removed and the analysis re-run. In the

first trial, the subterranean, fish and mosquito metagenomes were removed

(Supplementary Fig. 5). In the second trial, these metagenomes were re-added

and the hypersaline metagenomes removed (Supplementary Fig. 6). Multiple

trials were required because CDAs are sensitive to the number of samples (that is,

metagenomes) relative to the number of variables (that is, metabolic processes).

The data were further analysed using a non-parametric ANOVA, a Kruskal–

Wallis test and a median test, and the results compared to ensure that stable

results could be obtained (Supplementary Table 3). Environments driving the

variation were identified using Duncan comparisons (degrees of freedom were

set at 7).

All metagenomes were provided by authors of this manuscript. Further mater-

ial, including direct access to the data, is available at http://www.theseed.org/

DinsdaleSupplementalMaterial/. The NCBI genome project IDs used in this

study that were associated with previous publications are: 28369, 28367,

28365, 28363, 28349, 28347, 28345, 28343, 19145 17771, 17769, 17767, 17765

17635, 17633 and 17401.

31. Gunn, M. R. et al. A test of the efficacy of whole-genome amplification on DNAobtained from low-yield samples. Mol. Ecol. Notes 7, 393–399 (2007).

32. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitrereactors. Nature 437, 376–380 (2005).

33. Meyer, F. et al. The metagenomics RAST server — a public resource for theautomatic phylogenetic and functional analysis of metagenomes. BMC Bioinf.(submitted).

34. DeSantis, T. Z. et al. Greengenes, a chimera-checked 16S rRNA gene database andworkbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072(2006).

35. Cole, J. R. et al. The ribosomal database project (RDP-II): introducing myRDPspace and quality controlled public data. Nucleic Acids Res. 35, D169–D172(2007).

36. Wuyts, J., Perriere, G. & de Peer, Y. V. The European ribosomal RNA database.Nucleic Acids Res. 32, D101–D103 (2004).

doi:10.1038/nature06810

Nature Publishing Group©2008

Functional Metagenomic Profiling of Nine Biomes

Elizabeth A. Dinsdale1,2*, Robert A. Edwards1,3,4,5, Dana Hall1, Florent Angly1,6, Mya

Breitbart7, Jennifer M. Brulc 8,, Mike Furlan1, Christelle Desnues1,9, Matthew Haynes1,

Linlin Li1, Lauren McDaniel7, Mary Ann Moran10, Karen E. Nelson11, Christina

Nilsson12, Robert Olson5, John Paul7, Beltran Rodriguez Brito1,6, Yijun Ruan12, Brandon

K. Swan13, Rick Stevens5, David L. Valentine13, Rebecca Vega Thurber1, Linda

Wegley1, Bryan A. White8,14, and Forest Rohwer1,3

1Department of Biology, San Diego State University, San Diego, CA 92182 USA

2School of Biological Sciences, Flinders University, Adelaide, SA 5042, Australia

3Center for Microbial Sciences, San Diego State University, San Diego, CA 92182 USA

4Department of Computer Sciences, San Diego State University, San Diego, CA 92182

USA

5Mathematics and Computer Science Division, Argonne National Laboratory, Argonne,

IL 60439 USA

6Computational Science Research Centre, San Diego State University, San Diego, CA

92182-1245 USA

7University of South Florida, College of Marine Science 140 7th Avenue S., St.

Petersburg, FL 33701 USA 8 Department of Animal Sciences, University of Illinois, Urbana, IL 61801, USA

SUPPLEMENTARY INFORMATION

doi: 10.1038/nature06810

www.nature.com/nature 1

9Current address: Unité des Rickettsies, CNRS-UMR 6020, Faculté de médecine,

13385 Marseille, France

10 Department of Marine Sciences, University of Georgia, Athens, GA, USA

30602.

11 The J. Craig Venter Institute, 9712 Medical Center Drive, Rockville, MD, 20850, USA

12 Genome Institute of Singapore, 60 Biopolis Street, #02-01, Genome, Singapore

138672

13 Department of Earth Science, University of California Santa Barbara, Santa

Barbara, CA 93106, USA 14 The Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, USA

doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION

www.nature.com/nature 2

Supplementary information includes four tables presenting accession numbers and

descriptions of metagenomes, phage and prophage content of each metagenome, the

motility proteins present in the microbial and viral metagenomes and statistical

comparisons of the proportions of metabolic processes across the nine biomes. Six

figures provide information about the geographic separation of samples, diversity versus

sequences number, comparison of diversity between metagenomes and sequenced

whole bacterial genomes, the fine-scale details about the sulfur metabolic processes, and

two experiments that show the strength of the CDA across multiple groupings.

doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION

www.nature.com/nature 3

Tab

le S1. M

etagen

om

es used

in th

is man

uscrip

t listed u

sing

collecto

r’s descrip

tion

and

bio

me assig

nm

ent. A

ll

metag

eno

mes w

ere sequ

enced

usin

g 454 L

ife Scien

ce GS

20 pyro

sequ

encin

g. S

imp

le statistics of th

e ind

ividu

al

libraries, in

clud

ing

nu

mb

er of seq

uen

ces, blast h

its and

16Sr R

NA

gen

es are pro

vided

. M = m

icrob

ial library an

d V

=

Viral lib

rary. Th

e metag

eno

mes u

sed in

this p

aper are freely availab

le from

the S

EE

D p

latform

and

are bein

g m

ade

accessible fro

m C

AM

ER

A an

d th

e NC

BI S

ho

rt Read

Arch

ive wh

en availab

le. Th

e accession

nu

mb

ers are sho

wn

and

furth

er material an

d d

irect links to

the d

ata is available at h

ttp://w

ww

.theseed

.org

/Din

sdaleS

up

plem

entalM

aterial/.

ID

Nam

e S

EE

D

accession #

NC

BI

Genom

e

project #

Type

Biom

e # of

Sequences

# of Blast hits

# of 16S

1 S

oudan Red

4440281.3 17633

M

Subterranean

334,386 55,069

321

2 S

oudan Black

4440282.3 17635

M

Subterranean

388,627 43,079

24

3 S

olar Salterns low

Salinity S

an Diego

4440437.3 28359

M

Hyper-saline

268,206 52,745

243

4 S

olar Salterns m

edium

Salinity S

an Diego

4440435.3

28377

M

Hyper-saline

38,929 10,151

41

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature4

5 S

olar Salterns m

edium

Salinity S

an Diego

4440434.3

28379

M

Hyper-saline

23,261 5,630

26

6 S

olar Salterns P

lasmid

component

4440090.3

28443

M

Hyper-saline

111,431 19,365

129

7 S

olar Salterns m

edium

salinity west C

alifornia

4440416.3

28449

M

Hyper-saline

8,062 770

3

8 S

olar Salterns high

salinity west C

alifornia

4440419.3

28453

M

Hyper-saline

35,446 8,778

11

9 S

alton Sea

4440329.3 28613

M

Hyper-saline

178,407 17,531

43

10 S

olar Salterns m

edium

salinity west C

alifornia 4440425.3

28459

M

Hyper-saline

120,987 32,871

110

11 S

olar Salterns low

salinity

west C

alifornia 4440426.3

28461

M

Hyper-saline

34,296 3,754

26

12 S

olar Salterns m

edium

salinity west C

alifornia 4440427.3

28463

V

Hyper-saline

39,943 414

13 S

olar Salterns m

edium

salinity west C

alifornia 4440428.3

28465

V

Hyper-saline

58,735 1,822

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature5

14 S

olar Salterns high

salinity West C

alifornia 4440421.3

28457

V

Hyper-saline

154,167 3,028

15 S

olar Salterns low

salinity

San D

iego 4440436.3

28353

V

Hyper-saline

268,534 6,920

16 S

olar Salterns low

salinity

San D

iego 4440432.3

28373

V

Hyper-saline

110,511 3,068

17 S

olar Salterns m

edium

salinity west C

alifornia 4440431.3

28375

V

Hyper-saline

39,578 929

18 S

olar Salterns m

edium

salinity West C

alifornia 4440417.3

28445

V

Hyper-saline

55,903 904

19 S

olar Salterns high

salinity west C

alifornia 4440145.4

28447

V

Hyper-saline

47,587 2,601

20 S

olar Salterns high

salinity west C

alifornia 4440144.4

28451

V

Hyper-saline

4,645 947

21 S

olar Salterns low

salinity

west C

alifornia 4440420.3

28455

V

Hyper-saline

62,685 11,369

22 S

alton Sea

4440327.3 28613

V

Hyper-saline

55,787 926

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature6

23 S

alton Sea

4440328.3 28613

V

Hyper-saline

29,970 454

24 Line Is K

ingman

4440037.3 28343

M

Marine

188,445 11,309

6

25 Line Is C

hristmas

4440041.3 28347

M

Marine

227,542 11,574

18

26 Line Is P

almyra

4440039.3 28363

M

Marine

289,723 26,173

97

27 Line Is T

abuaeran 4440279.3

28367M

M

arine 290,844

12,631 100

28 D

MS

P T

reated 4440364.3

19145M

M

arine 54,848

11,725 24

29 D

MS

P T

reated 4440360.3

19145M

M

arine 50,313

7,198 52

30 V

anillate Treated

4440365.3 19145

M

Marine

12,446 1,720

48

31 V

anillate Treated

4440363.3 19145

M

Marine

33,773 6,610

7

32 M

arine GO

M

4440304.3 17765

V

Marine

263,908 28,878

33 M

arine BB

C

4440305.3 17767

V

Marine

416,456 20,770

34 M

arine Arctic

4440306.3 17769

V

Marine

688,590 197,018

35 M

arine SA

R

4440322.3 17771

V

Marine

399,343 17,813

36 Line Is K

ingman

4440036.3 28345

V

Marine

94,915 6,597

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature7

37 Line Is C

hristmas

4440038.3 28349

V

Marine

283,390 69,501

38 Line Is P

almyra

4440040.3 28365

V

Marine

320,397 9,608

39 Line Is T

abuaeran 4440280.3

28369V

M

arine 380,355

10,716

40 T

ampa B

ay Mitom

ycin C

induced 4440102.3

28619

V

Marine

280,019 8,767

41 S

kan Bay

4440330.3 28619

V

Marine

31,375 417

42 T

ilapia pond 4440440.3

28387M

F

reshwater

381,076 58,596

177

43 H

ealthy fish pond 4440413.3

28405M

F

reshwater

63,978 8,911

48

44 H

ealthy fish Prebead

4440411.3 28407

M

Freshw

ater 44,094

6,937 32

45 T

ilapia pond 3 4440422.3

28603M

F

reshwater

67,612 10,549

71

46 T

ilapia pond 3 4440424.3

28601V

F

reshwater

267,640 9,055

47 H

ealthy fish pond 4440412.3

28409V

F

reshwater

60,319 1,152

48 H

ealthy fish Prebead

4440414.3 28411

V

Freshw

ater 67,988

1,739

49 T

ilapia pond 4440439.3

28361V

F

reshwater

57,134 1,226

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature8

50 P

orites compressa tim

e

zero 4440380.3

28427

M

Coral

53,473 2,560

0

51 P

orites compressa

control 4440378.3

28429

M

Coral

65,191 2,030

2

52 P

orites compressa

temperature

4440373.3 28431

M

Coral

61,356 1,359

13

53 P

orites compressa D

OC

4440372.3

28433M

C

oral 62,959

1,566 7

54 P

orites compressa pH

4440379.3

28435M

C

oral 67,994

1,913 5

55 P

orites compressa

Nutrient

4440381.3 28437

M

Coral

65,008 3,258

11

56 P

orites asteriodes 4440319.3

28371M

C

oral 316,279

39,004 393

57 P

orites compressa tim

e

zero 4440376.3

28415

V

Coral

39,270 2,772

58 P

orites compressa

control 4440374.3

28417

V

Coral

39,340 5,276

59 P

orites compressa D

OC

4440370.3

28421V

C

oral 35,680

2,410

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature9

60 P

orites compressa pH

4440371.3

28423V

C

oral 50,364

2,710

61 P

orites compressa

nutrients 4440377.3

28425

V

Coral

34,433 2,338

62 P

orites compressa

Tem

perature 4440375.3

28419

V

Coral

39,036 2,141

63 R

io Mesquites

4440060.3 28351

M

Microbialites

124,694 21,374

10

64 H

ighborne Cay

4440061.3 28383

M

Microbialites

257,573 5,286

12

65 P

ozas Azule II

4440067.3 28385

M

Microbialites

326,146 36,468

61

66 P

ozas Azules II

4440320.3 28355

V

Microbialites

302,987 3,947

67 R

ios Mesquites

4440321.3 28357

V

Microbialites

328,656 14,561

68 H

ighborne Cay

4440323.3 28381

V

Microbialites

150,223 3,063

69 H

ealthy fish slime

4440059.3 28393

M

Fish

66,066 15,686

68

70 M

orbid fish slime

4440066.3 28395

M

Fish

82,442 20,635

147

71 H

ealthy fish gut 4440055.3

28389M

F

ish 51,498

16,377 63

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature10

72 M

orbid fish gut 4440056.3

28391M

F

ish 60,311

17,996 91

73 H

ealthy fish slime

4440065.3 28401

V

Fish

61,476 9,051

74 M

orbid fish slime

4440064.3 28403

V

Fish

60,111 13,826

75 C

ow rum

ens pool

plankton 4440357.3

28611

M

Terrestrial

Anim

als

236,830 38,626

313

76 C

ow rum

ens 80F6

4440356.3 28605

M

Terrestrial

Anim

als

178,713 29,989

240

77 C

ow rum

ens 640F6

4440355.3 28607

M

Terrestrial

Anim

als

264,849 39,775

386

78 C

ow rum

ens 710 F

4440387.3 28609

M

Terrestrial

Anim

als

345,317 130,089

757

79 Lean M

ice 4440324.3

17401

M

Terrestrial

Anim

als

49,074 8,688

42

80 O

bese Mice

4440325.3 17401

M

Terrestrial

Anim

als

35,053 9,161

37

81 C

hicken cecum N

CT

C

4440367.3 28599

M

Terrestrial

237,940 49,256

451

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature11

Anim

als

82 C

hicken cecum

Uninfected

4440368.3 28597

M

Terrestrial

Anim

als

294,682 83,912

533

83 Lung sputum

Cystic

fibrosis patient 4440441.3

28441

V

Terrestrial

Anim

als

92,223 7,946

84 Lung sputum

Healthy

4440442.4

28439

V

Terrestrial

Anim

als

39,807 3,292

85 M

osquito

Oceanside C

a

4440052.3

28413

V

Mosquito

340,098 97,269

86 M

osquito San D

iego 4440053.3

28467V

M

osquito 657,204

232,886

87 M

osquito Mission V

alley

Ca

4440054.3 28469

V

Mosquito

615,576 112,761

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature12

Tab

le S2. T

he p

ercent o

f ph

age an

d p

rop

hag

e sequ

ences in th

e micro

bial an

d viral m

etagen

om

es. Ns = n

o sam

ple.

Type

Microbial m

etagenomes

Viral m

etagenomes

S

ample

number

Percent

phage

Percent

prophage

Sam

ple

number

Percent

phage

Percent

prophage

Subterranean

1 1.879

3.802

ns ns

Subterranean

2 1.838

3.638

ns ns

Hyper-saline

3 0.983

3.802 12

3.922 5.456

Hyper-saline

4 0.000

3.595 13

8.861 3.927

Hyper-saline

5 0.375

3.638 14

25.517 3.744

Hyper-saline

6 0.557

3.802 15

14.463 3.554

Hyper-saline

7 0.000

1.238 16

29.762 3.578

Hyper-saline

8 1.695

2.779 17

34.884 4.940

Hyper-saline

9 4.918

3.802 18

17.647 3.263

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature13

Hyper-saline

10 1.286

3.802 19

4.545 4.341

Hyper-saline

11 1.961

3.638 20

1.056 4.777

Hyper-saline

ns

ns 21

3.198 3.667

Hyper-saline

ns

ns 22

25.000 2.626

Hyper-saline

ns

ns 23

60.000 4.001

Marine

24 0.589

3.638 32

1.051 3.474

Marine

25 3.797

3.580 33

2.171 3.523

Marine

26 1.073

3.762 34

0.351 3.802

Marine

27 0.763

3.146 35

15.764 3.803

Marine

28 0.727

3.720 36

3.243 2.655

Marine

29 1.342

3.299 37

0.531 3.802

Marine

30 0.478

3.746 38

11.189 3.864

Marine

31 1.370

3.415 39

7.563 3.921

Marine

ns

ns 40

30.469 3.855

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature14

Marine

ns

ns 41

8.824 4.352

Freshw

ater 42

6.759 3.802

46 41.176

3.185

Freshw

ater 43

3.204 3.809

47 68.182

5.143

Freshw

ater 44

3.472 4.032

48 50.000

4.628

Freshw

ater 45

0.321 3.802

49 58.301

3.723

Coral

50 5.797

3.575 57

2.602 3.503

Coral

51 0.000

2.839 58

9.385 4.047

Coral

52 30.864

3.786 59

2.871 3.903

Coral

53 2.222

3.385 60

11.765 4.357

Coral

54 2.941

4.504 61

4.348 3.602

Coral

55 0.000

3.807 62

2.985 3.205

Coral

56 0.472

3.712

ns ns

Microbialites

63 3.162

3.536 66

11.712 3.214

Microbialites

64 9.063

3.192 67

92.548 4.178

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature15

Microbialites

65 0.591

3.802 68

0.000 6.258

Fish

69 1.467

3.645 73

0.628 3.707

Fish

70 3.101

3.638 74

0.922 3.489

Fish

71 0.949

3.638

ns ns

Fish

72 0.833

3.675

ns ns

Terrestrial

animals

75 4.245

3.802 83

0.000 4.486

Terrestrial

animals

76 4.504

3.802 84

0.000 3.579

Terrestrial

animals

77 1.380

3.802

ns ns

Terrestrial

animals

78 3.229

3.802

ns ns

Terrestrial

animals

79 4.195

3.802

ns ns

Terrestrial

80 3.624

3.802

ns ns

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature16

animals

Terrestrial

animals

81 5.481

3.802

ns ns

Terrestrial

animals

82 5.472

3.802

ns ns

Mosquito

ns

ns 85

11.995 3.638

Mosquito

ns

ns 86

9.115 3.802

Mosquito

ns

ns 87

2.192 3.802

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature17

Tab

le S3. T

he th

irty mo

st abu

nd

ant m

otility an

d ch

emo

taxis pro

tein seq

uen

ces fou

nd

with

in th

e metag

eno

mes,

ord

ered w

ith resp

ect to th

e micro

bial m

etagen

om

es.

Motility proteins

Microbial

metagenom

es

Viral

metagenom

es

Tw

itchin

g m

otility p

rotein

PilT

0.0

33

0.0

23

Meth

yl-acceptin

g ch

emotaxis p

rotein

I 0.0

29

0.0

33

Flagellar b

iosyn

thesis p

rotein

flhA

0.0

25

0.0

89

Chem

otaxis p

rotein

CheA

0.0

18

0.0

59

Dip

eptid

e-bin

din

g A

BC tran

sporter

0.0

18

0.0

64

Typ

e II secretory p

athw

ay 0.0

17

0.0

08

Chem

otaxis p

rotein

meth

yltransferase C

heR

0.0

16

0.0

26

Gld

J 0.0

15

0.0

05

Acetylo

rnith

ine d

eacetylases 0.0

15

0.0

76

Flagellu

m-sp

ecific ATP syn

thase fliI

0.0

14

0.0

32

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature18

Flagellar m

oto

r rotatio

n p

rotein

motB

0.0

14

0.0

21

Flagellar h

ook-len

gth

contro

l pro

tein fliK

0.0

13

0.0

33

Flagellar h

ook p

rotein

flgE

0.0

10

0.0

14

Flagellar b

asal-b

ody ro

d p

rotein

flgG

0.0

10

0.0

27

Chem

orecep

tor sig

nals to

flagelllar m

oto

r CheY

0.0

10

0.0

12

type 4

fimbria

l bio

gen

esis pro

tein PilY

1

0.0

10

0.0

22

Flagellar reg

ulato

ry pro

tein fleQ

0.0

10

0.0

11

Gen

eral secretion p

athw

ay protein

E /A

TPase PilB

0.0

10

0.0

02

Flagellar m

oto

r rotatio

n p

rotein

motA

0.0

09

0.0

18

lagellin

pro

tein flaA

0.0

09

0.0

09

Chem

otaxis resp

onse reg

ulato

r CheB

0.0

09

0.0

51

Aero

taxis senso

r recepto

r pro

tein

0.0

08

0.0

16

Flagellar m

oto

r switch

pro

tein fliG

0.0

08

0.0

14

Flagellar b

iosyn

thesis p

rotein

flhB

0.0

08

0.0

30

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature19

Cell d

ivision p

rotein

ftsX

0.0

07

0.0

08

Chem

otaxis p

rotein

CheV

0.0

07

0.0

12

Flagellar m

oto

r switch

pro

tein fliM

0.0

07

0.0

15

Flagellar m

oto

r switch

pro

tein fliG

0.0

07

0.0

09

Flagellar b

iosyn

thesis p

rotein

fliP 0.0

06

0.0

15

Malto

se/malto

dextrin

ABC tran

sporter M

alE

0.0

06

0.0

42

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature20

Tab

le S4. T

he variatio

n fo

r each m

etabo

lism id

entified

for th

e micro

bial an

d viral co

mm

un

ities across th

e nin

e

bio

mes, u

sing

three statistical tests. T

he tab

le inclu

des th

e F valu

e and

P valu

e and

wh

ere po

ssible th

e bio

me th

at

was id

entified

as sho

win

g d

ifferences fo

r the p

articular m

etabo

lism.

M

icrobial metagenom

es V

iral metagenom

es

Metabolism

A

NO

VA

K

rus/wal

Median

Duncan

AN

OV

A

Krus/w

al M

edium

Duncan

Am

ino Acids

F=5.655

P<0.001

F=22.01

P=0.003

F=13.15

P=0.012

Coral

F=1.743

P=0.132

F=9.919

P=0.193

F=10.84

P=0.064

Carbohydrates

F=4.965

P<0.001

F=12.56

P=0.083

F=18.35

P=0.226

Coral

F=5.335

P<0.001

F=20.17

P=0.005

F=14.80

P=0.012

Multiple

Cell D

ivision &

Cell C

ycle

F=12.55

P<0.001

F=29.79

P<0.001

F=1.865

P=0.002

Coral,

Terrestrial

animals.

Microbialite

F=3.040

P=0.014

F=17.47

P=0.015

F=1.754

P=0.023

Multiple

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature21

Cell W

all and

Capsule

F=9.929

P<0.001

F=34.78

P<0.001

F=3.171

P<0.001

Coral,

Hyper-saline

Marine

F=0.875

P=0.536

F=6.260

P=0.510

F=3.562

P=0.339

Cofactors,

Vitam

ins, etc

F=8.950

P<0.001

F=26.66

P<0.001

F=5.593

P<0.001

Coral

F=1.266

P=0.296

F=9.063

P=0.248

F=6.147

P=0.692

DN

A M

etabolism

F=16.20

P<0.001

F=35.33

P<0.001

F=4.138

P<0.001

Multiple

F=6.236

P<0.001

F=26.70

P<0.001

F=5.453

P=0.002

Microbialite

Freshw

ater

Fatty A

cids and

Lipids

F=2.765

P=0.020

F=18.101

P=0.012

F=3.063

P=0.040

Multiple

F=1.514

P=0.196

F=10.75

P=0.150

F=3.006

P=0.151

Mem

brane

Transport

F=15.92

P<0.001

F=29.99

P<0.001

F=2.551

P<0.001

Multiple

F=4.494

P=0.001

F=14.95

P=0.037

F=2.435

P=0.204

Fish

mosquito

Arom

atic

Com

pounds

F=8.464

P<0.001

F=22.43

P=0.002

F=2.137

P=0.017

Fish

F=2.225

P=0.056

F=16.28

P=0.023

F=1.834

P=0.020

None obvious

Motility and

F=3.517

F=19.27

F=0.858

Fish

F=3.692

F=15.26

F=0.833

Multiple

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature22

Chem

otaxis P

=0.005 P

=0.007 P

=0.007 S

ubterranean P

=0.005 P

=0.033 P

=0.047

Nitrogen

Metabolism

F=8.887

P<0.001

F=26.28

P<0.001

F=1.613

P=0.003

Coral

F=2.252

P=0.054

F=12.79

P=0.077

F=1.137

P=0.057

Nucleosides,

Nucleotides

F=6.949

P<0.001

F=18.87

P=0.009

F=3.424

P=0.014

Coral

F=2.022

P=0.081

F=17.58

P=0.014

F=6.701

P=0.012

None obvious

Phosphorus

Metabolism

F=1.498

P=0.198

F=15.65

P=0.029

F=0.809

P=0.020

F=1.904

P=0.099

F=11.50

P=0.118

F=1.033

P=0.532

Photosynthesis

F=10.46

P<0.001

F=29.49

P<0.001

F=0.049

P=0.001

Coral

F=1.722

P=0.137

F=13.53

P=0.060

F=0.050

P=0.074

Potassium

metabolism

F=4.720

P=0.001

F=20.37

P=0.005

F=0.791

P=0.009

Multiple

F=4.634

P=0.001

F=17.35

P=0.015

F=0.680

P=0.103

Protein

Metabolism

F=6.814

P<0.001

F=23.93

P=0.001

F=9.316

P<0.001

Multiple

F=1.631

P=0.160

F=14.17

P=0.048

F=8.448

P=0.074

Cell signaling

F=4.701

F=21.06

F=0.717

Microbialite

F=2.346

F=12.89

F=0.734

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature23

P

=0.001 P

=0.004 P

=0.012 P

=0.046 P

=0.075 P

=0.115

Respiration

F=5.158

P<0.001

F=26.00

P=0.001

F=4.607

P=0.003

Coral

F=3.633

P=0.005

F=14.70

P=0.040

F=3.669

P=0.052

Multiple

RN

A M

etabolism

F=2.740

P=0.021

F=19.41

P=0.007

F=3.858

P=0.144

F=1.348

P=0.259

F=8.769

P=0.270

F=3.721

P=0.122

Secondary

Metabolism

F=1.366

P=0.249

F=13.47

P=0.061

F=0.131

P=0.116

F=1.200

P=0.329

F=10.65

P=0.154

F=0.093

P=0.230

Stress

Response

F=6.162

F<0.001

F=23.40

P=0.001

F=2.616

P=0.018

Coral

Fish

Freshw

ater

F=1.878

P=0.104

F=16.23

P=0.023

F=3.133

P=0.033

Sulfur

Metabolism

F=12.05

P<0.001

F=28.86

P<0.001

F=1.084

P=0.005

Fish

F=2.290

P=0.050

F=10.06

P=0.185

F=1.079

P=0.327

Virulence

F=5.150

P<0.001

F=30.79

P<0.001

F=9.698

F=0.002

Coral

Marine

F=3.953

P=0.003

F=13.67

P=0.057

F=10.65

P=0.208

Microbialite

do

i: 10.10

38

/n

atu

re0

68

10 S

UP

PL

EM

EN

TA

RY

INF

OR

MA

TIO

N

www.nature.com

/nature24

Equator

Tropic of Cancer

Arctic Circle

Pacific Ocean

Atlantic Ocean

2

0�

20�

40�

60�

80�

Subterranean MarineHyper-salineFreshwaterCoralMicrobialiteFishTerrestrial AnimalsMosquito

66

2

65

242

44

4912

3

Figure S1. The sampling location of the metagenomes, circles indicate

microbial and squares viral metagenomes. The number of metagenomes

collected at each site is given, except where only one metagenome per site was

taken.

doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION

www.nature.com/nature 25

Figure S2. Functional diversity of the a) microbial and b) viral metagenomes

quantified as a function of sequence number, suggesting high functional

diversity is gained at low sequence number. Note the different scale on the x-

axis.

doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION

www.nature.com/nature 26

Figure S3. Comparison of mean (± s.e.m.) functional diversity and evenness

between microbial and viral metagenomes and all sequenced bacterial

genomes. Note the different scale on the y-axis.

doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION

www.nature.com/nature 27

Figure S4. The percent of sequences found within the sulfur metabolism

pathways within the microbial metagenomes. The overrepresentation of the a)

alkanesulfonates assimilation, b) alkanesulfonates utilization and c) taurine

utilization subsystem suggests the addition of an organic source of the sulfur,

most likely taurine, whereas the subsystems involved with the utilization of

inorganic sulfur (d) were not overrepresented.

doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION

www.nature.com/nature 28

Figure S5. Canonical discriminant analysis of the a) microbial and b) viral

metagenomes on a reduced set of biomes (subterranean, fish and mosquito

metagenomes removed) to demonstrate the stability of the analysis and

variations in the influence of the potential metabolic processes between

environments.

doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION

www.nature.com/nature 29

Figure S6. Canonical discriminant analysis of the a) microbial and b) viral

metagenomes on a reduced set of biomes (hyper-saline biomes removed) to

demonstrate the stability of the analysis and variations in the influence of the

potential metabolic processes between environments.

doi: 10.1038/nature06810 SUPPLEMENTARY INFORMATION

www.nature.com/nature 30


Recommended