Date post: | 03-Jul-2015 |
Category: |
Science |
Upload: | adina-chuang-howe |
View: | 141 times |
Download: | 3 times |
Is it time for a (community) effort towards a soil reference
database?
Erick Cardenas, James Cole, Maude David, Aaron Garoutte, Adina Howe, Janet Jansson, Dave Myrold, James Tiedje, and you?
Modified version of slides will be available after presentation: http://www.slideshare.net/adinachuanghowe
The most important hands in soil microbiology
Significance of a soil-specific reference
• Need standardized resource to connect sequencing data at different levels
• Integrate sequencing data towards soil health and productivity
• Broadly enable “connecting the dots”
Genes
Organisms
Communities
Ecosystems
Soil metagenomic challenges
• The amount we know…
• Incredible microbial diversity
• Spatial heterogeneity
• Complex dynamics
• Lack of reference genomes (bacteria, archaea, fungal)
HUMAN MICROBIOME PROJECT
Lessons from HMP
• 2009 Goals:
– Take advantage of high throughput technologies to characterize human microbiome of large number of samples
– Determine whether associations between changes in the microbiome and health disease
– Provide a standardized data resource and new technological approaches to enable such studies to be undertaken broadly in scientific community
HMP metagenomic challenges
Soil
• Incredible microbial diversity
• Spatial heterogeneity
• Complex dynamics
• Lack of reference genomes (bacteria, archaea, fungal)
HMP
• Microbial diversity
• Individual variation
• Complex host-associated dynamics
• Lack of reference genomes?
The HMP reference genome effort
• Add at least 900-3000 additional reference bacterial genome sequences to public database
• Thorough representation of domains and major body sites
Not only sequencing….but access to data
Currently, over 1000 bacterial genomes at various stages of sequencing
Tools: Opening doors broadly
Metaphlan, Nature Methods 9, 811-814 (2012)
Nature Reviews Genetics, 15, 577-584 (2014)
Vital et al., mBio, Vol 5., 2014
Another example: GEBA
Comparison of • rRNA tree of life• genome
sequence in the DSMZ culture collection
Are there any general benefits that come from this "phylogeny driven" approach?
Simpact of “targeted” sequencing of improved references
Higher rate of discovery and characterization of new gene families
New ways to link distantly related homologs that would otherwise go undetected
Significant phylogenetic expansions of known protein families
Enrichment of genetic diversity
Can a similar strategy benefit soil studies?
What could we use it for?
• Target isolation and sequencing efforts; creation of a “most wanted” list
• Soil specific framework for larger scale sequencing and proteomic efforts to identify taxonomic and functional information
• Genome-centric investigation of soil genomes (e.g., distribution of shared genes among soil phyla); development of improved biomarkers for high throughput assays
• Providing data to tool developers to make bioinformatics/visualization easier for soil-specific studies
What are the challenges?
• How do we defined a soil organism?
– Origin form soil?
– 16S rRNA gene sequence matched one from soil?
– What level of finishing is adecuate?
What are the challenges?
• What is the most critical/practical metadata?
– Soil location
– Soil taxonomy
– Links to RefSeq IDs
– Is the strain available and where?
What are the challenges?
• Who to include?
– Fungi! Archaea!
What are the challenges?
• Expert curators?
– You?
– Tiered hierarchy of curation level
Some initial efforts
RefSoil (2011)Erick Cardenas, Aaron Garoutte, Adina Howe, Jim Tiedje
Bacterial genomes retrieved from Gold database , and , and selected those associated with soil habitats
Manually curated to exclude obligated human pathogens and extremophiles
Databases can be biased and redundant
Proteobacteria, 267
Firmicutes, 92
Actinobacteria, 75
Bacteroidetes, 12
Cyanobacteria, 7
Tenericutes, 5
Acidobacteria, 5
Other, 29
492 organisms19 phyla
NCBI Reference Genomes described as originating from soil
Proteobacteria
Actinobacteria
Firmicutes
Bacteroidetes
Cyanobacteria
Acidobacteria
Protein Models for Functions: FOAM Database
Nucl. Acids Res. (2014)doi: 10.1093/nar/gku702
Some Motivation
60 terrestrial NEON sites distributed across 20 ecoclimatic domainsTerrestrial scale streaming of lots of data including sequencing data for each site
If you’d like to contribute
• Join the breakout session Thursday evening (6-7 pm)
• Know someone with genomes / database, let us know? Want to contribute? Have an opinion? Have funding?
Adina Howe, [email protected]