Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformatics for Biological...

Post on 05-Dec-2014

611 views 1 download

description

Course: Bioinformatics for Biologiacl Researchers (2014). Session: 3.1- Introduction to Metagenomics. Applications, Approaches and Tools. Statistics and Bioinformatisc Unit (UEB) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.

transcript

Hospital Universitari Vall d’HebronInstitut de Recerca - VHIR

Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII)

Bioinformatics for Biological Researchers

http://eib.stat.ub.edu/2014BBR

Ferran Briansóferran.brianso@vhir.org

28/05/2014

INTRODUCTION TO METAGENOMICSINTRODUCTION TO METAGENOMICS

1. Introduction

2. Applications

3. Basic Concepts

4. Approaches & Workflows

1. Whole Genome Shotgun

2. 16S/ITS Community Surveys

● Analysis Tools1. MEGAN

2. Mothur

3. Qiime

4. Axiome & CloVR

5. MG-RAST

1. More resources

5

1

2

3

4

5

PRESENTATION OUTLINE

6

1 INTRODUCTIONINTRODUCTION

Introduction | Metagenomics definition1

4

First use of the term metagenome, referencing the idea that a collection of genes sequenced from the environment could be analyzed in a way analogous to the study of a single genome.Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998). "Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products".Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9. PMID 9818143

1

First use of the term metagenome, referencing the idea that a collection of genes sequenced from the environment could be analyzed in a way analogous to the study of a single genome.Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998). "Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products".Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9. PMID 9818143

Chen, K.; Pachter, L. (2005). "Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities".PLoS Computational Biology 1 (2): e24. doi:10.1371/journal.pcbi.0010024

Current definition:“The application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments, bypassing the need for isolation and lab cultivation of individual species.”

5

Introduction | Metagenomics definition

1

6

Introduction | Historical context

1

Source:

7

Introduction | Historical context

1

Source:

http://howcoolismyresear.ch/#metagenomics8

Introduction | Historical context

1

9

Introduction | Basic purpose

2 APPLICATIONSAPPLICATIONS

2

11

Applications | What metagenomics can do

● Global Impacts. The role of microbes is critical in maintaining atmospheric

balances, as they are

● the main photosynthetic agents

● responsible for the generation and consumption of greenhouse

gases

● involved at all levels in ecosystems and trophic chains

2

12

Applications | What metagenomics can do

● Global Impacts. The role of microbes is critical in maintaining atmospheric

balances, as they are

● the main photosynthetic agents

● responsible for the generation and consumption of greenhouse

gases

● involved at all levels in ecosystems and trophic chains

● Bioremediation. Cleaning up environmental contamination, such as

● the waste from water treatment facilities

● gasoline leaks on lands or oil spills in the oceans

● toxic chemicals

2

13

Applications | What metagenomics can do

● Bioenergy. We are harnessing microbial power in order to produce

● ethanol (from cellulose), hydrogen, methane, butanol...

● Smart Farming. Microbes help our crops by● the “supressive soil” phenomenon

(buffer effect against disease-causing organisms)● soil enrichment and regeneration

2

14

Applications | What metagenomics can do

● Bioenergy. We are harnessing microbial power in order to produce

● ethanol (from cellulose), hydrogen, methane, butanol...

● Smart Farming. Microbes help our crops by● the “supressive soil” phenomenon

(buffer effect against disease-causing organisms)● soil enrichment and regeneration

● The World Within. Studying the human microbiome may lead

to valuable new tools and guidelines in● human and animal nutrition● better understanding of complex diseases

(obesity, cancer, asthma...)● drug discovery● preventative medicine

Grice E.A. & Segre J.A. (2012) The Human Microbiome: Our Second Genome, Annu. Rev. Genomics Human Genet. 13, 151-170

2

15

Applications | Mapping the Human Microbiome

3 BASIC CONCEPTSBASIC CONCEPTS

3

17

Concepts | Trimming

● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

18

● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.

Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

3 Concepts | Binning, OTUs

http://shuixia100.weebly.com/1/post/2011/12/mothur-tutorial-1.html / Wikipedia: Biological classification

19

● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.

Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

3 Concepts | Binning, OTUs

http://shuixia100.weebly.com/1/post/2011/12/mothur-tutorial-1.html / Wikipedia: Biological classification

20

● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.

Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.

3 Concepts | Chimeras

Hass B.J. et al (2011) Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons, Genome Res. 21: 494-504.

3

21

● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.

Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.

● Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.

● Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.

● Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448).

Concepts | Diversities

Zinger L. et al. (2012) Two decades of describing the unseen majority of aquatic microbial diversity, Molecular Ecology 21, 1878–1896.

3

22

● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.

Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.

● Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.

● Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.

● Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448).

Concepts | Diversity measurement issues

Zhou J. et al. (2010) Random Sampling Process Leads to Overestimation of β-Diversity of Microbial Communities, mBio 4(3):e00324-13. doi:10.1128/mBio.00324-13.

Diversity can virtually never be measured directly, rather it must be estimated or inferred from available data. Our estimates are anchored in the sample itself.Magurran (Ed.), Biological Diversity, Oxford U.P. 2010. Ch. 16 Microbial Diversity and Ecology

3

23

● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.

Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.

● Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.

● Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.

● Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448).

● Rarefaction allows the calculation of species richness for a given number of individual samples, based on the construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the number of samples.

Concepts | Rarefaction

most or all species have been sampled

species rich habitat, only a small fraction has been sampled

this habitat has not been exhaustively sampled

Wooley J.C. et al. (2010) A Primer on Metagenomics, PLoS Computational Biology 6 (2) e1000667

3

24

Concepts | Diversity indices (α diversity)

Mozzarella project, Michele Iacono http://www.science.gov/topicpages/w/water+buffalo+mozzarella.html

Other indices: berger_parker_d, brillouin_d, dominance, doubles, esty_ci, fisher_alpha, gini_index, goods_coverage, margalef, mcintosh_d, mcintosh_e, menhinick,osd, simpson_reciprocal, robbins, singles, strong...

3

25

Concepts | Compositional similarity (β diversity)

Mozzarella project, Michele Iacono http://www.science.gov/topicpages/w/water+buffalo+mozzarella.html

3

26

Concepts | Compositional similarity (β diversity)

Mozzarella project, Michele Iacono http://www.science.gov/topicpages/w/water+buffalo+mozzarella.html

3

27

Concepts | Compositional similarity (β diversity)

Mozzarella project, Michele Iacono http://www.science.gov/topicpages/w/water+buffalo+mozzarella.html

3

28

Concepts | Compositional similarity (β diversity)

Heat map

Mozzarella project, Michele Iacono http://www.science.gov/topicpages/w/water+buffalo+mozzarella.html

3

29

● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.

Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.

● Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.

● Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.

● Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448).

● Rarefaction allows the calculation of species richness for a given number of individual samples, based on the construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the number of samples.

● Metadata, reads, fasta/fastq files, counts, OTU tables/networks, .biom files, PCoA, p-values, diversity metrics, robustness, scores, jackniffed, clustering, UPGMA, trees, bootstrap, Bi-Plots, ...

Concepts | Summary

4 APPROACHES & WORKFLOWSAPPROACHES & WORKFLOWS

4

31

Workflows | Microbial ecology approaches

4

32Grice E.A. & Segre J.A. (2012) The Human Microbiome: Our Second Genome, Annu. Rev. Genomics Human Genet. 13, 151-170

Workflows | Overview

Sample collection

DNA extraction and preparation

Sequencing

Analysis

4

33Grice E.A. & Segre J.A. (2012) The Human Microbiome: Our Second Genome, Annu. Rev. Genomics Human Genet. 13, 151-170

Workflows | Overview

Sample collection

DNA extraction and preparation

Sequencing

Analysis

Experimental design

Sample Quality Controls

Sequence Quality Controls

Biological interpretation

4.1 WGS MetagenomicsWGS Metagenomics

4

35

Workflows | Whole Genome Shotgun (WGS)

Sven-Eric Schelhorn https://bioinf.mpi-inf.mpg.de/homepage/research.php?&account=sven

4

36

Workflows | Whole Genome Shotgun (WGS)

Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools

4

37

Workflows | Whole Genome Shotgun (WGS)

Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools

4

38

Workflows | Whole Genome Shotgun (WGS)

Sven-Eric Schelhorn https://bioinf.mpi-inf.mpg.de/homepage/research.php?&account=sven

4

39

Workflows | Whole Genome Shotgun (WGS)

Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools

4

40

Workflows | Whole Genome Shotgun (WGS)

Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools

4.2 16S/ITS Metagenomics16S/ITS Metagenomics

4

42

Workflows | 16S/ITS Community Surveys

Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools

4

43

Workflows | 16S/ITS Community Surveys

Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools

4

44

Workflows | 16S/ITS Community Surveys

Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools

4

45

Workflows | 16S/ITS Community Surveys

Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools

4

46

Workflows | 16S/ITS Community Surveys

Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools

4

47

Workflows | 16S/ITS Community Surveys

Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools

5 METAGENOMICS TOOLSMETAGENOMICS TOOLS

5

49

Tools | “The great quest”

5

50

Tools | “The great quest”

5

51

Tools | “The great quest”

5

52

Tools | “The great quest”

5

53

Tools | MEGAN

http://ab.inf.uni-tuebingen.de/software/megan5/

5

54

Tools | MEGAN

http://ab.inf.uni-tuebingen.de/software/megan5/

5

55

Tools | MEGAN

http://ab.inf.uni-tuebingen.de/software/megan5/

5

56

Tools | Mothur

http://www.mothur.org/wiki/Main_Page / Kevin R. Theis (Michigan State University)

5

57

Tools | Mothur

http://www.mothur.org/wiki/Main_Page / Kevin R. Theis (Michigan State University)

5

58

Tools | Mothur

http://www.mothur.org/wiki/Main_Page / Kevin R. Theis (Michigan State University)

5

59

Tools | Qiime

5

60

Tools | Qiime

5

61

Tools | Qiime

5

62

Tools | Axiome

http://neufeld.github.io/AXIOME

5

63

Tools | Axiome

http://neufeld.github.io/AXIOME

5

64

Tools | Axiome

http://neufeld.github.io/AXIOME

5

65

Tools | CloVR

http://clovr.org

5

66

Tools | CloVR

http://clovr.org

5

67

Tools | CloVR

http://clovr.org

5

68

Tools | MG-RAST

http://http://metagenomics.anl.gov/

5

69

Tools | MG-RAST

http://http://metagenomics.anl.gov/

5

70

Tools | MG-RAST

http://http://metagenomics.anl.gov/

6 MORE RESOURCESMORE RESOURCES

6

72

More resources, courses...

Resources & Projects:

MEGAN DB http://www.megan-db.org/megan-db/ (MEtaGenomics ANalysis)

CAMERA http://camera.calit2.net/ (community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis)

MG-RAST Search http://metagenomics.anl.gov/metagenomics.cgi?page=MetagenomeSearch

IMG http://img.jgi.doe.gov/ (Integrated Microbial Genomes and metagenomes)

MetaBioME http://metasystems.riken.jp/metabiome/ (Comprehensive Metagenomic BioMining Engine)

BOLD http://www.boldsystems.org/ (Barcoding Of Live Database)

GOS Expedition http://www.jcvi.org/cms/research/projects/gos/overview (Global Ocean Sampling)

...

6

73

More resources, courses...

Courses:

EBI http://www.ebi.ac.uk/training/course/metagenomics2014

EMBO http://cymeandcystidium.com/?tag=metagenomics

Coursera https://www.coursera.org/course/genomescience

... and a lot of seminars and workshops everywhere

Hospital Universitari Vall d’HebronInstitut de Recerca - VHIR

Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII)

Thanks for your attentionThanks for your attention

and also thanks toJosep Gregori (VHIR, ROCHE)

for providing some materials

INTRODUCTION TO METAGENOMICSINTRODUCTION TO METAGENOMICS

Bioinformatics for Biological Researchers

http://eib.stat.ub.edu/2014BBR

Ferran Briansóferran.brianso@vhir.org

28/05/2014