Date post: | 13-Jan-2015 |
Category: |
Technology |
Upload: | gigascience-bgi-hong-kong |
View: | 1,666 times |
Download: | 2 times |
Rick StevensArgonne National LaboratoryThe University of Chicago
Institute for Computing in Science (ICiS)2010 Summer Session, Snowbird, Utah
July 17-24|Computational Methods and Terabase Metagenomics | J. Gilbert, F. Meyer, R. StevensParticipants: 13 University, 9 Government and 3 Industry; 13 sessionsThese discussions became the first meeting of the Earth Microbiome Project and enabled the definition of a working committee, an implementation group, and a three-year plan.
July 24-31| Future of the Field | F. Streitz and A. WhiteParticipants: 18 University, 4 Government and 6 Industry; 15 sessionsSteering committee members and a select group of participants met to assess the state-of-the-art in scientific computing and identified areas for future programs.
July 31-Aug. 7 |Optimization in Energy Systems | M. Anitescu and J. MezaParticipants: 16 University, 10 Government and 3 Industry; 24 sessionsResearchers from different areas discussed the major challenges facing the energy sector, and in particular, problems arising in optimization.
Aug. 7-14 |Integrating, Representing, and Reasoning over Human Knowledge| J. Evans, I. Foster, A. RzhetskyParticipants: 18 University, 4 Government and 6 Industry; 16 sessionsParticipants were encouraged to think broadly about opportunities for transformative changes in knowledge that may become possible as data, computing, and collaboration are harnessed at exceptionally large scales.
A Core Group Emerged
• Jack Gilbert• Folker Meyer• Rob Knight• Jonathan Eisen• Jed Fuhrman• Janet Jansson• Bin Hu• Mark Bailey• Rick Stevens
We need a new Idea
• Sequencing is getting cheap.. VERY cheap– Terabase project becoming increasingly feasible
• Diversity studies are limited by sampling depth– Need combination of breadth and depth
• Computing is scaling up to handle large data– Supercomputing capabilities will keep scaling for a while
• Interest in range of metagenomics questions– Thousands of uncoordinated studies
• Crowdsourcing of samples increasingly feasible– But how to agree on protocols
EMP High-Level Concept
• Goal: A community approach to systematically approach the problem of characterizing microbial life on earth
• Strategy: combination of extremely deep metagenomics sequencing and very-large scale horizontal surveys to refine our understanding of:– Global microbial diversity, dispersion and biogeography– Microbial community structure and dynamics– Microbial contributions to the global nutrient cycles
Big Science?
• Earth Microbiome Project– Map % fraction of
microbiological habitats– Volume > 100x larger– > 1 PB of data– ~1M samples– > 100K new genomes– Millions of novel proteins– Largest reference collection
of metagenomics, field guide to the microbial universe used by scientists for decades to come
• Sloan Digital Sky– Mapped ¼ sky– Volume 100x larger– 15 TB data– Position/Brightness of >
100M objects– Distance to 100K quasars– New types of objects– The SDSS will be a new
reference point, a field guide to the universe that will be used by scientists for decades to come.
But its not a Complete Parallel
• EMP will have distributed sampling• EMP will have distributed sequencing• EMP will have distributed analysis• EMP will have common protocols• EMP will have common standards• EMP might have centralized archive of data• EMP might have repository of samples
What is the EMP model?
• A framework of standard practices that enables massively comparable meta-analyses of independent projects
• An network oriented organizational model to advance Large-scale Microbial Ecology research – establishing and coordinating projects proposed by the community which can be advanced using the EMP framework of standards and access to partner Centers
Infrastructure for Coordination
Common standards for:
• Sampling -> Methods tailored to environment– Georeferenced metadata
• DNA Extraction -> MoBio kit• Sequencing -> 515/806 for 16S, Illumina PE• Analysis -> QIIME (16S), MG-RAST/IMG, etc.
Concept: begin with defined, open (though imperfect) protocols, bless with “EMP seal of approval” new protocols that show equivalence
Why do we need the EMP?
• Microbial life is vast 1030 organisms on Earth106 – 109 or more species, massive gene/protein diversity
• Requires a systematic approach with a common framework– Reduce duplication, maximize coverage, improve comparability
between studies• Structures existing studies led by different PI’s into clusters
of Driving Projects• EMP standard protocols allow much better comparability
between projects• Leverage community structures and crowd sourcing
EMP Pilot Projects
• High-Impact science targets– Large-scale survey projects to identify diversity hotspots and
plan deeper studies– Small number of very deep demonstrations– Hypothesis driven programmatic problems
• Technical targets to debug the EMP approach– Community sourcing with standard protocols– High-levels of multiplexed sequencing– Environmental parameter characterization – Metadata and sample database– Analysis pipelines
Earth Microbiome Project: Attacking Basic Science Questions
• Coordination of community efforts to address long standing issues in environmental microbiology– How much diversity is there, what is driving it and where do we find it?– Are there diversity hotspots?– Does microbial biogeography exist, if so what patterns are present and
can we predict the patterns?– Are some taxa endemic and if so how unique are they?– Does global dispersal happen, how much and between where and is
there support for Baas Becking hypothesis?– Are the long tails of community distributions covergent in taxa?– Are rare taxa somewhere abundant?– How many places do we have to look to capture X diversity?– How do the patterns in microbial communities relate to macro
ecological patterns?
Curtis and Sloan on Microbial Diversity
• Perhaps patterns in global microbial diversity affect community composition, stability and functionality at a local level.
• If, as we argue, diversity matters, then patterns in global diversity could have a substantial effect on studies that seek to link community function and structure, strategies for seeking new drugs, for probiotics, bioaugmentation or studies to determine the persistence of chemicals.
Curtis and Sloan, Current Opinion in Microbiology 2004, 7:221-226
Curtis and Sloan Continued,
• To understand a microbial system at a local level we will have to understand something of the metacommunity from which it is drawn.
• Moreover, we will have to correctly understand the relationship between random factors and deterministic factors.
Curtis and Sloan, Current Opinion in Microbiology 2004, 7:221-226
What can we learn from extremely Deep Sequencing?
Latitude, Ph, Mineral Content, Rainfall, Mean Temperature, Insolation, etc.
Estimates of Global Diversity
NT/Nmax ~ 10 for soil
NT/Nmax ~ 4 for aquatic
Curtis, T.P. et al. (2002) Estimating prokaryotic diversity and itslimits. Proc. Natl. Acad. Sci. U. S. A. 99, 10494–10499
Pedros-Alio 2006
Are Most microbial taxa rare? Possibly Inactive?
From Martiny et al 2006 “Microbial biogeography Review”
Does a microbial biogeography exist?
If yes can we map it?
How Cosmopolitan are Mircrobes?
From Martiny et al 2006 “Microbial biogeography Review”
Earth Microbiome Project: Attacking Programmatic Questions
• Improve understanding of microbial processes underlying the global carbon and nitrogen cycles– Support process models development and uncertainty analysis for DOE
mission critical environments (e.g. permafrost, oceans, subsurface)– Discovery of novel microbial medicated global carbon pathways
• Improve our understanding of community structure/diversity/productivity/stability relationships– Support community engineering and community design for targeting
applications• Search for novel biological functions relevant to bioprocessing,
biofuels and bioremediation– Targeting searching for organisms and communities containing DOE
relevant to synthesis and degradation pathways– Novel pathway discovery
The abundance of prokaryotic carbon and other elements may be compared with the statement of Kluyver that about one-half of the ‘‘living protoplasm’’ on earth is microbial (2).
Because most of the plant biomass is made up of extracellular material such as cell walls and structural polymers, the protoplasmic biomass of prokaryotes probably far exceeds that of plants, and Kluyver’s well-accepted estimate is probably much too conservative.
Integrating Microbial Processes into Global Climate Models
Relative Metabolic Flux – Community Level Prediction
• Predicting the metabolome from metagenomics data!
• RMF returns a list of metabolites and whether those metabolites are more or less likely to be consumed or synthesized in one environment relative to another.
•When linked to Model-SEED – provides information relevant for ecologists
Integrating Microbial Metabolism into Soil Ecology Models
Metagenomic data collection
Collectingsamples
Sequencing
Sequence fragments
Assembly of most prevalent microbes into complete genomes
Associating fragments to taxonomical groups
BiomassForming flux balance models of individual microbe metabolism
Integrating these models into a flux balance community model:
Biomass
Biomass
Biomass
Biomass
Biomass
Biomass
Biomass
Soil nutrients
Combining physiochemical descriptions of soil content and structure with microbial models in agent-based simulations
Biomass
Biomass
Biomass
Biomass
Air and water
CO2 and organic matter
Unmapped World of Microbial Uses of Metals
EMP at the Right Time• Leverages the availability of continued advances in
sequencing capacity– Terabases to Petabases and beyond
• Evolution of sequencing center Models– Push towards aggregation of projects (i.e. scale up)
• Community driven but coordinated– Open, Real-time coordination, immediate data availability
• Novel approaches to address the scaling issues in sample collection and prep– Crowd sourced samples, distributed prep?
• Targets both wide survey and deep sampling– “Mapping” followed by targeted attacks
EMP Products and Deliverables
• Metagenomics datasets from many thousands of environments with standardized metadata
• Georeferenced inventory of global microbial 16s sequences
• Reference genomes recovered from the shotgun metagenomics datasets
• Community structure profiles for many thousands of communities
• Microbial protein catalog capturing global protein and gene diversity
• Explore fundamental principles governing the distribution of global diversity• Projects explore environmental gradients:• Temperature – Antarctica, Brazil, North America, Arctic
Tundra, Hydrothermal vents• Light availability – Water columns in the Pacific and Atlantic
from surface to the abyssal plain.• pH – UK, North America, China biogeographic soils• Nutrients and O2 - Temperate Bog Lakes
• Determining whether everything has the potential to be everywhere.• Projects request deep 16S rRNA sequencing of representative samples:• Globally distributed soil samples from China, Australia,
India, Argentina, Peru, USA and Antarctica• Globally distributed time series samples from English
Channel, Barrier Reef in Australia, Bermudan North Atlantic, Temperate Pacific and Tropical Pacific
• Zoo-animal microbiota from China, Chicago and San Diego
• Identify and model the role of microbial communities in carbon partitioning in different ecosystems.• Projects using deep shotgun metagenomics to explore modeled metabolomics:• Temporal and spatial distributed samples from the gulf oil
spill• Samples spanning the northern tundra belt from Canada,
USA, Russia, Sweden• Water column and time-series samples from coastal and
open ocean marine observatories
DP1 DP2 DP3 DP4
EMP Open Standards
• Multiple layers • At the bottom individual or consortium led hypothesis driven proposals• Individual projects cluster into proposed Driver Projects (DPs) • EMP standard protocols enable comparability across projects
What Does EMP Need?
SamplePreparation
Sequencing QualityAssurance
EMP CommunitySampling
DownstreamApplications
Three rate limiters
Sample collection and handlingPrep-Sequencing-QAAnalysis
Metagenome Datasets(1,000’s of Campaigns)
Environmental Parameters
16S/18S rRNAMetagenomicsMetatranscriptomics
Annotation &Statistical Analysis
Genome Assembly
modelSEED & RMF
Characterization of Novel Proteins
MetametabolomicsGC/MS & NMR
Model Metabolome
Provision of targets for novel enzymes
Gap-filling for model
Earth Microbiome Project Potential Dataflows
EMP needs new kinds of interfaces to Sequencing workflows
• Large-scale community projects will by necessity develop internal tracking systems– Sampling, LIMS etc.
• Transacting with Seq Centers could be enhanced by interfacing between the internal/external tracking and LIMS systems
• Large-scale EMP pilots could help develop this• Services partners will also need this type of
interfaces
What would change this strategy?
• Availability of “direct” interrogation of complex microbial environments– Geochemical environmental mapping (nm->um)– Environmental metabolomics and proteomics– Roving cellular scale reporters and probes
• Dramatic improvements in microbial microcosm experimental capabilities– Artificial community construction– Time dependent high-resolution measurements
Phases of EMP
Timeline• 2011
– Expert-Group consensus on EMP standards: sampling, extraction, sequencing, informatics
– Building the Global Environmental Sample Database (GESD)– Pilot Project:
• 10,000 samples acquired, extracted, sequenced and analyzed by five core centers (ANL, LBNL, UC-Boulder, JGI, and BGI).
• 2012 and beyond- Ongoing EMP: – Biological Driver Projects “collect” individual science driven sequencing
proposals (e.g. JGI-CSP, BGI, ANL, etc.)– EMP acts as a conceptual framework to allow comparative analysis within
and between Driver Projects.
Thanks to the EMP Leadership
• Jack Gilbert• Folker Meyer• Rob Knight• Jonathan Eisen• Jed Fuhrman• Janet Jansson• Bin Hu• Mark Bailey
Argonne National Laboratory Institute for Genomic and Systems Biology