Rapid taxonomic classification of complex consortia Rapid taxonomic classification of complex consortia of environmental rDNA using a microarray.of environmental rDNA using a microarray.
CEB - ESD - LBNLTodd DeSantis, Sonya Murray, Jordan Moberg, Gary Andersen
Carol Stone (DSTL, U.K.)
What bugs are in my What bugs are in my sample?sample?
The ponderings of a toddlerThe ponderings of a toddler
Why must Mom confiscate my “Hello Kitty”
blanket on laundry day?
Will the swings be wet at the park?How will this
sausage impact the diversity in my
lower G.I. bacterial
community?
Will I inhale any archaeal
microorganisms when I visit the
hot springs?Gianna DeSantis
• Every discarded water sample, geological core, or spent air filter is lost data.
• But who wants to do all the work?– Culture? Anaerobes? non-cultivable? Safety?– Analysis of nucleic acids isolated from environment
• Must classify or sort heterogeneous nucleic acids into bins.– Restriction Fragment Length Polymorphisms (RFLP)– Single Stranded Conformation Polymorphisms (SSCP)– Temp/Denat Gradient Gel Electrophoresis (T/DGGE)– Sequencing
» Provides taxonomic nomenclature » estimates the relative abundance » Need to create, clone, & process hundreds of samples
• Can we create a simple, comprehensive (quantitative?) microbial test?
Project OverviewProject Overview• Goal
– Create a single microarray capable of detecting and categorizing the bacteria in a complex sample.
• Approach– GeneChip targeted at
16S rDNA sequence variations to distinguish taxa.
The The RibosomeRibosome
rDNA
rRNA (functional molecule)
LSU
SSU16s or 18s
• Foundations:– Maintain the largest
16S gene library (~83,000).
– Cluster sequences into taxa (~8,000).
– Create algorithm for picking probes for each taxa (~500,000).
cctagcatgCattctgcatacctagcatgGattctgcata
MATCHMISMATCH
Build custom Affymetrix GeneChip
• Massive parallelism – up to 2 million probes upon a 1.28 cm2 glass surface.• Identification of multiple species in a mixed population•Pair each probe with a “mismatch” control probe.
General General ProtocolProtocol
AirSoil
FecesBloodWater
rRNA
gDNA
Universal 16S rDNA
PCR
Contains probes adhered to glass surface in grid
pattern.
Overview of Sample Preparation using a Modified Affymetrix Protocol
18 µ
18 µ
ACGGTCGA
ACGGTCGA
ACGGTCGA
ACGGTCGA
ACGGTCGA
Hybridize
PCR Amplify DNA
Fractionate DNA
End-label with biotin
Extract Genomic DNA
Progress in Miniaturization
GeneChip 3000 scanner 16S Microarray
500,000 Probe 16S array(DOE 16S Chip)
Parameter Frankia Clostridium Positive fraction 1.00 0.64Average difference 3720 625
Frankia sp. str. G48
PM MM
Clostridium butyricum
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
BASIS
EPA
(127 OTUs displayed)
High confidence bacterial species detectionwith 500K probe microarray: - Greater microbial diversity in EPA samples
...CGTAAAGCTCTGTCTTTGGGGAAGATAATGACGGTACCCAAGGAGGAAGCCACGGCTAACT... C. perf. str.CPN50
................................................................... C. perf. resistant
................................................................... Clostridium sp. AB&J
................................................................... clone p-4636-2Wa2
................................................................... C. perf. A
................................................................... C. perf rrnA
................................................................... C. perf rrnE
.................................T................................. C. perf rrnD
................................................................... C. perf rrnC
................................................................... C. perf rrnB
................................................................... C. perf rrnF
................................................................... C. perf rrnG
................................................................... C. perf str.13a
................................................................... C. perf str.13b
................................................................... C. perf rrnH
................................................................... C. perf rrnI
................................................................... C. perf rrnJ
................................................................... clone OI1612
................................................................... C. perf. B
................................................................... Swine manure 37-3
................................................................... Swine manure 37-4
TAAAGCTCTGTCTTTGGGGAAGATA tacccaaggaggaagccacggctaa AAAGCTCTGTCTTTGGGGAAGATAA AAGCTCTGTCTTTGGGGAAGATAAT AGCTCTGTCTTTGGGGAAGATAATG
Bacteria
CFB
Cyan
Proteo
Gram +
High G+C
Bacil-Strep
Clostridium
C.BOTULINUM_SUBGROUP
C.THERMOBUTYRICUM_SUBGROUP
C.BARATI_SUBGROUP
C.CADAVERIS
C.ALGIDICARNIS
C.PERFRINGENS
C.AURANTIBUTYRICUM
C. BUTYRICUM
16S rDNA
Probes 5 - 8
5 6 7 8
27 1492
420 469
Ave Diff =1891
C. perfringens probe set identified in EPA sample 22 (N.Y.C., Spring)
-- ARCHAEA (1)---- EURYARCHAEOTA (1.1)------ METHANOMICROBACTERIA_AND_RELATIVES (1.1.3)-------- HALOPHILIC_ARCHAEA (1.1.3.4)---------- NTC.AMYLOLYTICUS_SUBGROUP (1.1.3.4.6)
(Natronococcus) 1.1.3.4.6.32228 OTU 6 seqs prokMSA_id:649 AB012049 str. MSP1 prokMSA_id:650 AB012052 str. MSP11 prokMSA_id:653 AB012055 str. MSP16prokMSA_id:654 AB012056 str. MSP17prokMSA_id:655 AB012057 str. MSP22 prokMSA_id:656 AB012058 str. MSP23
Developing Microarray Experiment Browser
0
100
200
300
400
500
600
700
2280
2030
0000
0.001
_st
2280
2040
1000
0.006
_st
2280
2040
3000
0.066
_st
2280
2040
7000
0.007
_st
2280
2050
3000
0.009
_st
2280
2050
5000
0.013
_st
2280
2050
6000
0.007
_st
2280
2050
8000
0.015
_st
2280
2080
2000
0.001
_st
2280
2080
4000
0.031
_st
2280
2090
4010
0.017
_st
2280
2090
4030
0.008
_st
2280
2090
4080
0.030
_st
2280
2090
4120
0.011
_st
2280
2090
5010
0.004
_st
2280
2090
5010
0.026
_st
2280
2090
5040
0.013
_st
2280
2100
4000
0.013
_st
2280
2100
6000
0.009
_st
2280
2110
0000
0.063
_st
2280
2140
3000
0.005
_st
Beta protebacteria taxa
Sign
al (a
.u.)
434 -proteobacteria taxa
2250306000000.030_st Acidobacterium capsulatum+2280208040000.041_st Bordetella pertussis+2280209041300.025_st Acidovorax avenae+2280210040000.040_st Burkholderia graminis+2280210060000.007_st Burkholderia cepacia+2280211000000.071_st Oxalobacter formigenes+2280299000000.001_st Ralstonia solanacearum+2280308990000.001_st Stenotrophomonas maltophilia+
Clean Filter-extracted-amplified-fragmented-labeled-hybridized
Example of detection with the -proteobacteria Sub-division
Many of the detected taxa also contain "Environmental clone” members.
What’s in a “blank”
sample?
Testing the effects of variation in hybridization temperature
All 3 conditions allowed spikes to be detected.
48deg hybridization permitted the least additional taxa to be reported.
Create synthetic community of 16S amplicons.A. capsulatumY. pestisC. jejuniS. aureusB. anthracisS. typhiB. melitensisR. prowazekiiB. subtilisE. coli
Separately culture, gDNA-extract, and PCR-amplify.Combine PCR products.
Testing the effects of variation in hybridization temperature
All 3 conditions allowed spikes to be detected.
48deg hybridization permitted the least additional taxa to be reported.
Probe Set 45 hyb 48 hyb 50 hyb Members2130500000000.006 _st+ + + RB25 (Nitrospina)2131100000000.015 _st+ MB1228 (Nitrospina)2131200000000.011 _st+ RB40 (Nitrospina)2250306000000.030 _st+ + + Acidobacterium capsulatum2280106990000.001 _st+ + + Brucella melitensis2280108010199.001 _st+ + + Roseobacter litoralis 2280108050500.003 _st+ + + Rickettsia prowazekii2280110990000.001 _st+ Sphingomonas yanoikuyae2280205060000.005 _st+ Zoogloea DhA-352280209041400.009 _st+ + + Aquaspirillum metamorphum2280320040000.039 _st+ + + Shewanella woodyi2280323990000.001 _st+ + + Vibrio vulnificus2280324020000.008 _st+ + + Aeromonas hydrophila2280327040200.012 _st+ Erwinia amylovora2280327050000.007 _st+ + + Klebsiella oxytoca2280327070000.005 _st + Pantoea agglomerans2280327090000.001 _st+ + Erwinia carotovora 2280327100000.011 _st+ Serratia marcescens2280327120000.010 _st + Serratia proteamaculans2280327140500.003 _st+ + + Yersinia pestis2280327990000.002 _st+ + + E. coli, S. typhi2280503040000.010 _st+ + + Campylobacter jejuni2300107040000.027 _st+ + + R18 (Actinobacteria)2300710090000.020 _st+ + Bacillus firmus2300711050000.020 _st+ + + Bacillus subtilis2300712010000.013 _st+ + + Staphylococcus aureus2300712040000.004 _st+ + + Bacillus anthracis2300717030000.017 _st+ + + Col7 (Lactobacilli)2300721990000.001 _st+ + + Streptococcus
C. jejuni probe sets 45 deg C 48 deg C 50 deg C
2280503040000.010d_st 3015 2219 1927
2280503040000.010c_st 3011 2303 2126
2280503040000.010b_st 3166 2397 2548
2280503040000.010_st 3162 2320 2291
mean: 3088 2309 2223
standard deviation: 87 73 263
coefficient of variation: 0.03 0.03 0.12
Reproducibility
Probes for C. jejuni tiled in 4 areas
Columbus, OH
0
500
1000
1500
2000
2100
2000
0000
0.01
0_st
2130
5000
0000
0.00
6_st
2150
1031
0000
0.08
1_st
2210
2050
0000
0.00
1_st
2250
3060
0000
0.03
0_st
2280
1060
9000
0.02
0_st
2280
1061
0000
0.00
7_st
2280
1069
9000
0.00
1_st
2280
1070
5000
0.01
6_st
2280
1080
1019
9.00
1_st
2280
1080
1020
2.01
1_st
2280
1100
6000
0.00
6_st
2280
1101
2000
0.01
4_st
2280
1101
2000
0.02
0_st
2280
1101
3000
0.01
5_st
2280
1109
9000
0.00
1_st
2280
2010
9000
0.01
3_st
2280
2040
3000
0.07
6_st
2280
2050
6000
0.00
5_st
2280
2050
7000
0.01
4_st
2280
2090
4140
0.00
9_st
2280
2090
4990
0.00
2_st
2280
2090
5040
0.01
5_st
2280
3080
5000
0.00
2_st
2280
3110
1050
0.00
3_st
2280
3139
9000
0.00
2_st
2280
3240
2000
0.00
8_st
2280
3270
4020
0.01
2_st
2280
3270
5000
0.00
7_st
2280
3270
7000
0.00
5_st
2280
3271
0000
0.01
1_st
2280
3279
9000
0.00
2_st
2300
1070
1000
0.00
9_st
2300
1080
1990
0.00
1_st
2300
1080
3090
0.02
4_st
2300
1090
1010
0.01
2_st
2300
1090
1030
1.00
3_st
2300
1090
1030
5.02
8_st
2300
1090
1040
1.00
3_st
2300
1090
1050
1.05
1_st
2300
1090
1060
1.02
3_st
2300
1090
1060
2.02
3_st
2300
1090
1060
3.00
2_st
2300
1119
9000
0.00
1_st
2300
1119
9000
0.00
2_st
2300
1130
1010
0.07
6_st
2300
1130
2060
0.00
6_st
2300
1130
2990
0.00
2_st
2300
1130
6040
0.02
9_st
2300
4050
7000
0.00
4_st
2300
7040
1000
0.00
4_st
2300
7100
1000
0.00
5_st
2300
7100
1000
0.02
7_st
2300
7100
8000
0.00
4_st
2300
7100
9000
0.01
2_st
2300
7100
9000
0.02
0_st
2300
7101
1000
0.00
1_st
2300
7110
5000
0.02
0_st
2300
7120
4000
0.00
4_st
2300
7130
6000
0.00
1_st
2300
7170
3000
0.01
7_st
2300
7170
4000
0.01
4_st
2300
9020
8000
0.01
0_st
2300
9021
1040
0.02
1_st
2300
9021
1050
0.00
5_st
wk18 wk19
Quantitative Considerations-Spike environmental samples with defined amplicons.- Calibrate- Pearson’s corr coefficient was significant (df=18).- 95% confidence intervals plotted.
Figure 2 - Calibration Plot
y = 0.9207x + 10.504R = 0.974
9
11
13
15
17
19
0 1 2 3 4 5 6 7 8
log2 Concentration (pM)
log 2 H
ybSc
ore
Spike-in rDNA
Environmental rDNA
95% Confidence Limits
Spike-in Regression
Environmental community is measured with confidence intervals.
Figure 3 - Concentration of Environmental SSU Amplicons
0 20 40 60 80 100 120 140
Clostridium thermobutyricumStreptococcus anginosus
Bacillus racemilacticusPseudomonas sp.
symbiont of Solemya velumClostridium limosum+
Eurotiales (Aspergillus+)Bartonella+
Staphylococcus delphini+Vibrio parahaemolyticus+
Pasteurella sp.Heterotextus alpinus
StreptomycesStaphylococcus cohnii+
Propionibacterium lymphophilumLeucostoma persoonii
Taxa
rDNA Concentration (pM)
Conf Interval: Conc(t(RSE)/b)(1/m+1/n+((Y-y)2) / (b2(n-1)sx2))
b = slope from regressionY = mean of 6 replicate measurementsm = number of repeat measurements = 6y = mean of the HybScores for the 20 points used for calibrationt = critical value obtained from t-table for 18 d.f. for 95% = 1.734RSE = residual standard error of calibration points = 0.56sx = standard deviation of the conc. for the 20 points used for calibration
Current projectsCurrent projects
- Netherlands soil bioremediation tracking
- BioWatch – DHS metropolitan air microbe survey
- G.I. community comparison- Root-soil interface- Does polymerase brand affect perceived
community?
Takara PCR
Applied Bio PCR
PCR Enzyme Comparison
Biowatch gDNA
Biowatch gDNA
+
+
neg
neg
Takara (gDNA pool)Total ng DNA in PCR bands: 4437.58Total ng DNA in PCR reactions (+smear): 7266.95Ratio of bands:total DNA 0.61
Applied Bio. (gDNA pool)Total ng DNA in PCR bands: 546.73Total ng DNA in PCR reactions (+smear): 8149.28Ratio of bands:total DNA 0.07
ng in 5uL ng in 45uLLane 1 Band from biowatch pool (Tak) 76.3 686.9Lane 2 Band from biowatch pool (Tak) 59.3 533.8Lane 3 Band from biowatch pool (Tak) 80.5 724.9Lane 4 Band from biowatch pool (Tak) 99.2 892.8Lane 5 Band from biowatch pool (Tak) 104.3 938.5Lane 6 Band from pos control 1pg (Tak) 73.4 660.7Lane 1 Band+Smear from biowarch pool (Tak) 151.7 1365.3Lane 2 Band+Smear from biowarch pool (Tak) 146.8 1321.5Lane 3 Band+Smear from biowarch pool (Tak) 182.0 1638.4Lane 4 Band+Smear from biowarch pool (Tak) 161.3 1451.6Lane 5 Band+Smear from biowarch pool (Tak) 165.6 1490.1Lane 6 Band+Smear from pos control 1pg (Tak) 86.0 774.1Lane 9 Band from biowatch pool (AB) 14.1 127.0Lane 10 Band from biowatch pool (AB) 13.3 119.3Lane 11 Band from biowatch pool (AB) 7.8 70.4Lane 12 Band from biowatch pool (AB) 11.4 102.3Lane 13 Band from biowatch pool (AB) 14.2 127.7Lane 14 Band from pos control 1pg (AB) 8.3 74.4Lane 9 Band+Smear from biowarch pool (AB) 160.6 1445.1Lane 10 Band+Smear from biowarch pool (AB) 184.8 1663.6Lane 11 Band+Smear from biowarch pool (AB) 185.0 1665.2Lane 12 Band+Smear from biowarch pool (AB) 185.5 1669.9Lane 13 Band+Smear from biowarch pool (AB) 189.5 1705.5Lane 14 Band+Smear from pos control 1pg (AB) 132.3 1190.3
SummarySummaryThe rDNA microarray is able to rapidly quantify and
taxonomically classify complex consortia of rDNA amplicons.
You can collect data on over 8,000 taxa simultaneously.
AcknowledgementsAcknowledgements• Gary Andersen – Group Leader • Carol Stone – UK sample collection, hybridization • Sonya Murray – sample preparation, hybridizations,
manuscript composition• Jordan Moberg – sample preparation, sample tracking
database, parameter optimization.