+ All Categories
Home > Technology > Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Date post: 05-Dec-2014
Category:
Upload: gigascience-bgi-hong-kong
View: 1,154 times
Download: 0 times
Share this document with a friend
Description:
Hans-Joachim Ruscheweyh's talk from the 1st Earth Microbiome Project meeting in Shenzhen
33
Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Pooling metagenomes in MEGAN based on environmental parameters Hans-Joachim Ruscheweyh Center for Bioinformatics, Tuebingen University June 15, 2011 1 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
Transcript
Page 1: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Pooling metagenomes in MEGAN based onenvironmental parameters

Hans-Joachim Ruscheweyh

Center for Bioinformatics, Tuebingen University

June 15, 2011

1 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 2: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

1 Introduction MetagenomicsUnculturable MicrobesTypical Metagenomic SamplesPipeline

2 MEGANMEGAN IntroductionTaxonomic & Functional AnalysisComparison AnalysisPostgreSQL

3 MetadataWhat is Metadata?Using Metadata to pool Datasets

4 Pooling DatasetsBasic IdeaCombined DatasetsMetaData Analyzer

5 Summary & Conclusion2 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 3: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

1 Introduction MetagenomicsUnculturable MicrobesTypical Metagenomic SamplesPipeline

2 MEGANMEGAN IntroductionTaxonomic & Functional AnalysisComparison AnalysisPostgreSQL

3 MetadataWhat is Metadata?Using Metadata to pool Datasets

4 Pooling DatasetsBasic IdeaCombined DatasetsMetaData Analyzer

5 Summary & Conclusion3 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 4: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Metagenomics

The study of DNA of uncultured organisms> 99% of all microbes cannot be culturedA genome is the entire genetic information of a singleorganismA metagenome is the entire genetic information of aassemblage of organisms

4 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 5: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Typical Metagenomic Samples

Human microbiomeSoil samplesSea water samplesSeabed samplesAir samplesMedical samplesAncient bones

5 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 6: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Metagenomic Pipeline

A primer on metagenomics; Wooley et al. (2010)

6 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 7: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

1 Introduction MetagenomicsUnculturable MicrobesTypical Metagenomic SamplesPipeline

2 MEGANMEGAN IntroductionTaxonomic & Functional AnalysisComparison AnalysisPostgreSQL

3 MetadataWhat is Metadata?Using Metadata to pool Datasets

4 Pooling DatasetsBasic IdeaCombined DatasetsMetaData Analyzer

5 Summary & Conclusion7 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 8: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

MEGAN Introduction

Interactive tool for metagenomic analysis - www-ab.informatik.uni-tuebingen.de/software/megan

8 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 9: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Taxonomic Analysis

Tree reflects theNCBI taxonomyReads arecompared againstreferencedatabase e.g. NRReads aremapped on thetree using thecomparisonresults based onthe LCA algorithm

9 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 10: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Functional Analysis - SEED

The tree containsthe nodes of theSEEDclassificationReads aremapped on to theSEEDclassification

www.theSEED.org

10 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 11: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Functional Analysis - KEGG

KEGG: Kanehisa et al., NucleicAcids Res. 38, D355-D360

(2010)http://www.genome.jp/kegg/

11 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 12: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Comparing Datasets

Based on(normalized)number of readsassigned to eachnodeEach colordetermines adataset

12 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 13: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

DB Extension - PostgreSQL

MEGAN communicates with aPostgreSQL databaseMany datasets are available inone database instanceMany users can operate onthe same database instanceThis avoids redundancy onoften large datasets

http://www.postgresql.org/

13 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 14: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

1 Introduction MetagenomicsUnculturable MicrobesTypical Metagenomic SamplesPipeline

2 MEGANMEGAN IntroductionTaxonomic & Functional AnalysisComparison AnalysisPostgreSQL

3 MetadataWhat is Metadata?Using Metadata to pool Datasets

4 Pooling DatasetsBasic IdeaCombined DatasetsMetaData Analyzer

5 Summary & Conclusion14 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 15: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

What is Metadata?

Metadata are for example environmental parameters recordedtogether with the actual metagenomic sample e.g. collectiondate, gender, health status, ...

Month Salinity AmmoniaJanuary_2PM January 33.3 0.0

January_10PM January 34.2 0.0August_4AM August 33.3 0.14

August_10AM August 32.1 0.06

Datasets taken from: The taxonomic and functional diversity of microbes at a temperate coastal site: a ’multi-omic’study of the seasonal and diel temporal variation; Gilbert et al. (2010)

15 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 16: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

January_2PM

January_10PM

August_4AM

August_10AM

Month ∈ {Dec, Jan, Feb}

Month ∈ {Jun,Jul, Aug}

Winter

Summer

16 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 17: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

1 Introduction MetagenomicsUnculturable MicrobesTypical Metagenomic SamplesPipeline

2 MEGANMEGAN IntroductionTaxonomic & Functional AnalysisComparison AnalysisPostgreSQL

3 MetadataWhat is Metadata?Using Metadata to pool Datasets

4 Pooling DatasetsBasic IdeaCombined DatasetsMetaData Analyzer

5 Summary & Conclusion17 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 18: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Basic Idea

Create two new datasets (winter, summer) from the fourBLAST filesProblems:

Doubles space consumptionIs time inefficient

Idea:Use database technology to avoid redundancy, save timeand space

18 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 19: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Primary & Combined Datasets in the Database

A primary dataset is a dataset created from the originalBLAST output and the reads fileA combined dataset is created from primary datasetsA combined dataset is created by using:

References to read and match data of the primary datasetsOptionally also the classification data of the primarydatasets

Hence, a combined dataset can be created time and spaceefficiently

19 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 20: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Creating Combined Datasets in MEGAN

20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 21: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Creating Combined Datasets in MEGAN

20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 22: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Creating Combined Datasets in MEGAN

20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 23: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Creating Combined Datasets in MEGAN

20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 24: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Creating Combined Datasets in MEGAN

20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 25: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Creating Combined Datasets in MEGAN

20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 26: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Analysis

Input: 8 primary datasets. Altogether ~100,000 reads, ~4mio matches, ~4.5 GB spaceIt takes ~50 minutes to load these datasets to the databaseThree combined datasets (winter, spring, summer) arecreatedTheir creation takes ~30 seconds and needs ~40MBadditional spaceAlternatively combined datasets can be created on-the-fly.This takes less than a second and needs no additionalspace

21 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 27: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Comparing all Datasets

22 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 28: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Comparing by Season

23 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 29: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

24 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 30: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

24 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 31: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

1 Introduction MetagenomicsUnculturable MicrobesTypical Metagenomic SamplesPipeline

2 MEGANMEGAN IntroductionTaxonomic & Functional AnalysisComparison AnalysisPostgreSQL

3 MetadataWhat is Metadata?Using Metadata to pool Datasets

4 Pooling DatasetsBasic IdeaCombined DatasetsMetaData Analyzer

5 Summary & Conclusion25 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 32: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

Summary & Conclusion

MEGAN communicates with a PostgreSQL databaseThis gives the user access to many datasetsMany user can work on the database simultaneouslyPrimary datasets can be pooled to create combineddatasetsThe MetaData Analyzer allows one to create combineddatasets based on the usage of boolean expressions onassigned metadataThis technique is highly space and time efficient

26 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes

Page 33: Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion

MEGAN v4 is freely available from www-ab.informatik.uni-tuebingen.de/software/megan

Integrative analysis of environmental sequences usingMEGAN4, Daniel H. Huson, Suparna Mitra, Hans-JoachimRuscheweyh, Nico Weber, Stephan C. Schuster; submitted2011Thanks go to Daniel Huson, Suparna Mitra, Nico Weber,Stefan Schuster

Thank your for your attention!27 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes


Recommended