+ All Categories
Home > Documents > Lecture 01: Introduction - Goals for the day. Statistical ...Lecture 01: Introduction - !...

Lecture 01: Introduction - Goals for the day. Statistical ...Lecture 01: Introduction - !...

Date post: 18-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
4
Paul J. McMurdie Second Genome, South San Francisco, CA, USA Formerly: Statistics Department Stanford University Lecture 01: Introduction - Statistical analysis of Microbiome Count Data 1 Goals for the day. Unit 1 - morning - Interactive/Exploratory Analyses - Alpha/Beta Diversity - Distances - Transformations - Ordination (Dimensional Reduction Techniques) - Perm. Multivariate ANOVA testing (adonis) Lunch Unit II - afternoon - Multiple Testing, Mixture Models We can’t cover everything in one day! - An Introduction to Statistical Learning: - http://www-bcf.usc.edu/~gareth/ISL/ Link to NIOO Statistics-Day Materials 2 A population of a single species/strain is a culture, extremely rare outside of lab, some infections A microbiome is a mixed population of different microbial species (microbial ecosystem) What is a microbiome? The totality of microbes in a defined environment, especially their genomes and interactions with each other and surrounding environment. 3 Universal Gene census (amplicon) Shotgun Metagenome Sequencing Transcriptomics (shotgun mRNA) Proteomics (protein fragments) Metabolomics (excreted chemicals) Number of Species Counted Methods for characterizing microbiomes 4
Transcript
Page 1: Lecture 01: Introduction - Goals for the day. Statistical ...Lecture 01: Introduction - ! Statistical analysis of ! Microbiome Count Data 1 Goals for the day.

Paul J. McMurdie Second Genome, South San Francisco, CA, USA !Formerly: Statistics Department Stanford University

Lecture 01: Introduction - Statistical analysis of Microbiome Count Data

1

Goals for the day.

• Unit 1 - morning - Interactive/Exploratory Analyses - Alpha/Beta Diversity - Distances - Transformations - Ordination (Dimensional Reduction Techniques) - Perm. Multivariate ANOVA testing (adonis) • Lunch • Unit II - afternoon - Multiple Testing, Mixture Models • We can’t cover everything in one day! - An Introduction to Statistical Learning: - http://www-bcf.usc.edu/~gareth/ISL/

Link to NIOO Statistics-Day Materials

2

• A population of a single species/strain is a culture, extremely rare outside of lab, some infections

• A microbiome is a mixed population of different microbial species (microbial ecosystem)

What is a microbiome?The totality of microbes in a defined environment, especially their genomes and interactions with each other and surrounding environment.

3

• Universal Gene census (amplicon)

• Shotgun Metagenome Sequencing

• Transcriptomics (shotgun mRNA)

• Proteomics (protein fragments)

• Metabolomics (excreted chemicals)

Number of Species Counted

Methods for characterizing microbiomes

4

Page 2: Lecture 01: Introduction - Goals for the day. Statistical ...Lecture 01: Introduction - ! Statistical analysis of ! Microbiome Count Data 1 Goals for the day.

Data Analysis Tools of the Trade: R - RStudio - Rmarkdown

Data analysis should be: • Modular - Save time, protect mistakes, share • Repeatable - Confirmable by others • Flexible - Analysis not predefined • There is no “write paper” button

• Interactive - Answers create new questions !Recommendation: R - RStudio - Rmarkdown (and Bioconductor for this crowd) Lab exercises today will be Rmarkdown based, and I will encourage you to save your work this way

5

markdown (code + console) +

figures

phyloseq + ggplot2 + etc.

# Main title!This is an [R Markdown](my.link.com) document of my recent analysis.!## Subsection: some codeHere is some import code, etc.```{r}library("phyloseq")library("ggplot2")physeq = import_biom(“datafile.biom”)plot_richness(physeq)```

source.Rmd

Complete HTML5

knitr::knit2html()

microbiome data

Our Goal with Collaborators: Reproducible analysis workflow with R-markdown

Better Reproducibility

6

Paul J. McMurdie !Statistics Department & CEHG Stanford University !with contributions from Prof Susan Holmes

Microbiome data heterogeneity and processing

http://joey711.github.io/phyloseq/

phyloseq

7

phyloseq

http://joey711.github.io/phyloseq/

phyloseq Home Page

8

Page 3: Lecture 01: Introduction - Goals for the day. Statistical ...Lecture 01: Introduction - ! Statistical analysis of ! Microbiome Count Data 1 Goals for the day.

apepackage

OTU Abundanceotu_table

Sample Variablessample_data

Taxonomy TabletaxonomyTable

Phylogenetic Treephylo

otu_table sample_data tax_table phy_tree

otu_table sample_data tax_table

read.treeread.nexusread_tree

as as as

import

phyloseqconstructor:

Biostringspackage

Reference Seq.XStringSet

DNAStringSet RNAStringSet

AAStringSet

phyloseq

Experiment Data

otu_table,sam_data,tax_table,phy_treerefseq

Accessors:get_taxaget_samplesget_variablensamplesntaxarank_namessample_namessample_sumssample_variablestaxa_namestaxa_sums

Processors:filter_taxamerge_phyloseqmerge_samplesmerge_taxaprune_samplesprune_taxasubset_taxasubset_samplestip_glomtax_glom

matrix matrixdata.frame

optional

refseq

data

data structure & APIphyloseq

http://joey711.github.io/phyloseq/

9

phyloseq

Preprocessing

Import

Direct Plots

plot_network plot_heatmap plot_ordination

distance ordinate

Summary / ExploratoryGraphics

filter_taxafilterfun_samplegenefilter_sampleprune_taxaprune_samplessubset_taxasubset_samplestransform_sample_counts

import_biomimport_mothurimport_pyrotaggerimport_qiimeimport_RDP

plot_tree

plot_richness

plot_bar

bootstrappermutation testsregressiondiscriminant analysismultiple testinggap statisticclusteringprocrustes

Inference, Testing

sample data

OTU cluster output

Input

raw

phyloseqprocessed

work flowphyloseq

10

graphics

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●●●

−0.3

−0.2

−0.1

0.0

0.1

0.2

0.3

−0.4 −0.2 0.0 0.2 0.4NMDS1

NM

DS2

SampleType●●●●●●●●●●●●●●●●●●

FecesFreshwaterFreshwater (creek)MockOceanSediment (estuary)SkinSoilTongue

plot_ordination, NMDS, wUF

FreshwaterFreshwater (creek)FreshwaterFreshwater (creek)Freshwater (creek)SoilSoilSoilSkinSkinSkinM

ockM

ockM

ockFecesFecesFecesFecesSedim

ent (estuary)TongueTongueO

ceanO

ceanO

ceanSedim

ent (estuary)Sedim

ent (estuary)

SampleType

OTU

1

100

10000

Abundance

plot_heatmap; bray−curtis, NMDS

●●

●●

●●●●

●●●●●

●●

●●●●●●

●●●

●●

●●●●●●●

●●

●●

●●●

●● ●●

●●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●● ●

●●

SeqTech●

IlluminaPyro454Sanger

Enterotype● 1

23

plot_network; Enterotype data, bray−curtis, max.dist=0.25

●●

● ●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

●●

●●

●●

● ●● ●

●●

●●

CytophagaEmticicia

Sphingobacterium

Segetibacter

Haliscomenobacter

Pedobacter

Bacteroides

Alistipes

Bacteroides

Cytophaga

Porphyromonas

Prevotella

Parabacteroides

Algoriphagus

Odoribacter

CandidatusAquirestis

Capnocytophaga

Porphyromonas

Spirosoma

Prevotella

Balneola

Prevotella

Hymenobacter

Prevotella

●●

●●

●●

●●

●●

●●

●●

●●

●●●

76

73

75

75

79

67

81

84

84

82

75

Abundance●

12562515625

SampleType●

FecesFreshwaterFreshwater (creek)MockOceanSediment (estuary)SkinSoilTongue

Order● Bacteroidales

FlavobacterialesSphingobacteriales

plot_tree; Bacteroidetes−only. Merged samples, tip_glom=0.1

0e+00

2e+05

4e+05

6e+05

Feces

Freshwater

Freshwater (creek)

Mock

Ocean

Sediment (estuary)

Skin

Soil

Tongue

SampleType

Abun

danc

e

FamilyBacteroidaceaeBalneolaceaeCryomorphaceaeCyclobacteriaceaeFlavobacteriaceaeFlexibacteraceaePorphyromonadaceaePrevotellaceaeRikenellaceaeSaprospiraceaeSphingobacteriaceae

plot_bar; Bacteroidetes−only

S.obs S.chao1 S.ACE

2000

4000

6000

8000

FALSE TRUE FALSE TRUE FALSE TRUEHuman Associated Samples

Num

ber o

f OTU

s

SampleTypeFecesFreshwaterFreshwater (creek)MockOceanSediment (estuary)SkinSoilTongue

plot_ordination()

plot_network()

plot_bar()

plot_heatmap()

plot_tree()

plot_richness()

phyloseq

11

graphics

plot_ordination()

−1.0

−0.5

0.0

0.5

1.0

1.5

−1 0 1CA1

CA2

SampleTypeFecesFreshwaterFreshwater (creek)MockOceanSediment (estuary)SkinSoilTongue

Samples Only; type="samples"

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●

●●●

●●●●●●●

●●

●●●

● ●●●

−1.0

−0.5

0.0

0.5

1.0

1.5

−1 0 1CA1

CA2

Class●

FlavobacteriaSphingobacteriaBacteroidiasamples

SampleType●

FecesFreshwaterFreshwater (creek)MockOceanSediment (estuary)SkinSoilTonguetaxa

type

●●

samplestaxa

Biplot; type="biplot"

●●●●●

●●●●

●●●●●●●●●●●●●

●●●●●

●●●

●●●●

●●●●●●●●●

●●

Bacteroidia Flavobacteria Sphingobacteria

−1.0

−0.5

0.0

0.5

1.0

1.5

−1 0 1 −1 0 1 −1 0 1CA1

CA2

Class

BacteroidiaFlavobacteriaSphingobacteria

Taxa Only; type="taxa"

samples taxa

−1.0

−0.5

0.0

0.5

1.0

1.5

−1 0 1 −1 0 1CA1

CA2

ClassFlavobacteriaSphingobacteriaBacteroidiasamples

SampleTypeFecesFreshwaterFreshwater (creek)MockOceanSediment (estuary)SkinSoilTonguetaxa

Split Plot; type="split"

type="scree"

split

taxa-only

biplot

samples-only

phyloseq

12

Page 4: Lecture 01: Introduction - Goals for the day. Statistical ...Lecture 01: Introduction - ! Statistical analysis of ! Microbiome Count Data 1 Goals for the day.

http://joey711.github.io/phyloseq/tutorials-index

13


Recommended