Date post: | 05-Dec-2014 |
Category: |
Technology |
Upload: | gigascience-bgi-hong-kong |
View: | 1,154 times |
Download: | 0 times |
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Pooling metagenomes in MEGAN based onenvironmental parameters
Hans-Joachim Ruscheweyh
Center for Bioinformatics, Tuebingen University
June 15, 2011
1 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
1 Introduction MetagenomicsUnculturable MicrobesTypical Metagenomic SamplesPipeline
2 MEGANMEGAN IntroductionTaxonomic & Functional AnalysisComparison AnalysisPostgreSQL
3 MetadataWhat is Metadata?Using Metadata to pool Datasets
4 Pooling DatasetsBasic IdeaCombined DatasetsMetaData Analyzer
5 Summary & Conclusion2 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
1 Introduction MetagenomicsUnculturable MicrobesTypical Metagenomic SamplesPipeline
2 MEGANMEGAN IntroductionTaxonomic & Functional AnalysisComparison AnalysisPostgreSQL
3 MetadataWhat is Metadata?Using Metadata to pool Datasets
4 Pooling DatasetsBasic IdeaCombined DatasetsMetaData Analyzer
5 Summary & Conclusion3 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Metagenomics
The study of DNA of uncultured organisms> 99% of all microbes cannot be culturedA genome is the entire genetic information of a singleorganismA metagenome is the entire genetic information of aassemblage of organisms
4 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Typical Metagenomic Samples
Human microbiomeSoil samplesSea water samplesSeabed samplesAir samplesMedical samplesAncient bones
5 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Metagenomic Pipeline
A primer on metagenomics; Wooley et al. (2010)
6 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
1 Introduction MetagenomicsUnculturable MicrobesTypical Metagenomic SamplesPipeline
2 MEGANMEGAN IntroductionTaxonomic & Functional AnalysisComparison AnalysisPostgreSQL
3 MetadataWhat is Metadata?Using Metadata to pool Datasets
4 Pooling DatasetsBasic IdeaCombined DatasetsMetaData Analyzer
5 Summary & Conclusion7 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
MEGAN Introduction
Interactive tool for metagenomic analysis - www-ab.informatik.uni-tuebingen.de/software/megan
8 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Taxonomic Analysis
Tree reflects theNCBI taxonomyReads arecompared againstreferencedatabase e.g. NRReads aremapped on thetree using thecomparisonresults based onthe LCA algorithm
9 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Functional Analysis - SEED
The tree containsthe nodes of theSEEDclassificationReads aremapped on to theSEEDclassification
www.theSEED.org
10 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Functional Analysis - KEGG
KEGG: Kanehisa et al., NucleicAcids Res. 38, D355-D360
(2010)http://www.genome.jp/kegg/
11 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Comparing Datasets
Based on(normalized)number of readsassigned to eachnodeEach colordetermines adataset
12 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
DB Extension - PostgreSQL
MEGAN communicates with aPostgreSQL databaseMany datasets are available inone database instanceMany users can operate onthe same database instanceThis avoids redundancy onoften large datasets
http://www.postgresql.org/
13 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
1 Introduction MetagenomicsUnculturable MicrobesTypical Metagenomic SamplesPipeline
2 MEGANMEGAN IntroductionTaxonomic & Functional AnalysisComparison AnalysisPostgreSQL
3 MetadataWhat is Metadata?Using Metadata to pool Datasets
4 Pooling DatasetsBasic IdeaCombined DatasetsMetaData Analyzer
5 Summary & Conclusion14 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
What is Metadata?
Metadata are for example environmental parameters recordedtogether with the actual metagenomic sample e.g. collectiondate, gender, health status, ...
Month Salinity AmmoniaJanuary_2PM January 33.3 0.0
January_10PM January 34.2 0.0August_4AM August 33.3 0.14
August_10AM August 32.1 0.06
Datasets taken from: The taxonomic and functional diversity of microbes at a temperate coastal site: a ’multi-omic’study of the seasonal and diel temporal variation; Gilbert et al. (2010)
15 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
January_2PM
January_10PM
August_4AM
August_10AM
Month ∈ {Dec, Jan, Feb}
Month ∈ {Jun,Jul, Aug}
Winter
Summer
16 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
1 Introduction MetagenomicsUnculturable MicrobesTypical Metagenomic SamplesPipeline
2 MEGANMEGAN IntroductionTaxonomic & Functional AnalysisComparison AnalysisPostgreSQL
3 MetadataWhat is Metadata?Using Metadata to pool Datasets
4 Pooling DatasetsBasic IdeaCombined DatasetsMetaData Analyzer
5 Summary & Conclusion17 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Basic Idea
Create two new datasets (winter, summer) from the fourBLAST filesProblems:
Doubles space consumptionIs time inefficient
Idea:Use database technology to avoid redundancy, save timeand space
18 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Primary & Combined Datasets in the Database
A primary dataset is a dataset created from the originalBLAST output and the reads fileA combined dataset is created from primary datasetsA combined dataset is created by using:
References to read and match data of the primary datasetsOptionally also the classification data of the primarydatasets
Hence, a combined dataset can be created time and spaceefficiently
19 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Creating Combined Datasets in MEGAN
20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Creating Combined Datasets in MEGAN
20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Creating Combined Datasets in MEGAN
20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Creating Combined Datasets in MEGAN
20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Creating Combined Datasets in MEGAN
20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Creating Combined Datasets in MEGAN
20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Analysis
Input: 8 primary datasets. Altogether ~100,000 reads, ~4mio matches, ~4.5 GB spaceIt takes ~50 minutes to load these datasets to the databaseThree combined datasets (winter, spring, summer) arecreatedTheir creation takes ~30 seconds and needs ~40MBadditional spaceAlternatively combined datasets can be created on-the-fly.This takes less than a second and needs no additionalspace
21 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Comparing all Datasets
22 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Comparing by Season
23 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
24 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
24 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
1 Introduction MetagenomicsUnculturable MicrobesTypical Metagenomic SamplesPipeline
2 MEGANMEGAN IntroductionTaxonomic & Functional AnalysisComparison AnalysisPostgreSQL
3 MetadataWhat is Metadata?Using Metadata to pool Datasets
4 Pooling DatasetsBasic IdeaCombined DatasetsMetaData Analyzer
5 Summary & Conclusion25 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
Summary & Conclusion
MEGAN communicates with a PostgreSQL databaseThis gives the user access to many datasetsMany user can work on the database simultaneouslyPrimary datasets can be pooled to create combineddatasetsThe MetaData Analyzer allows one to create combineddatasets based on the usage of boolean expressions onassigned metadataThis technique is highly space and time efficient
26 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion
MEGAN v4 is freely available from www-ab.informatik.uni-tuebingen.de/software/megan
Integrative analysis of environmental sequences usingMEGAN4, Daniel H. Huson, Suparna Mitra, Hans-JoachimRuscheweyh, Nico Weber, Stephan C. Schuster; submitted2011Thanks go to Daniel Huson, Suparna Mitra, Nico Weber,Stefan Schuster
Thank your for your attention!27 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes