Download - Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray.

Rapid quantification and taxonomic classification of a Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both complex consortium of rDNA amplicons from both

prokaryotic and eukaryotic origins using a prokaryotic and eukaryotic origins using a microarray.microarray.

CEB - ESD - LBNLTodd DeSantis, Sonya Murray, Jordan Moberg, Gary Andersen

Carol Stone (DSTL, U.K.)

What bugs are in my What bugs are in my sample?sample?

The ponderings of a toddlerThe ponderings of a toddler

Why must Mom confiscate my “Hello Kitty”

blanket on laundry day?

Will the swings be wet at the park?

How will this sausage impact the

diversity in my lower G.I. bacterial

community?

Will I inhale any archaeal

microorganisms when I visit the

hot springs?Gianna DeSantis

• Every discarded water sample, geological core, or spent air filter is lost data.

• But who wants to do all the work?– Culture? Anaerobes? non-cultivable? Safety?– Analysis of nucleic acids isolated from environment

• Must classify or sort heterogeneous nucleic acids into bins.– Restriction Fragment Length Polymorphisms (RFLP)– Single Stranded Conformation Polymorphisms (SSCP)– Temp/Denat Gradient Gel Electrophoresis (T/DGGE)– Sequencing

» Provides taxonomic nomenclature » estimates the relative abundance » Need to create, clone, & process hundreds of samples

• Can we create a simple, quantitative, comprehensive microbial test?

OutlineOutline• Goals

• Experimental Approach

• Organization of rDNA sequences into taxa (CASCADE-P)

• Assigning sets of probes for each taxa

• Using 16S GeneChip for quantitative aerosol analysis

Project OverviewProject Overview• Goal

– Create a single microarray capable of detecting and quantifying bacterial and/or archaeal organisms in a complex sample.

• Approach– Combinatorial power

of multiple probes for sequence-specific hybridization

16S rRNA gene (16S rDNA)16S rRNA gene (16S rDNA)

• Used to identify and classify organisms by gene sequence variations.

• Variations have been used in design of DNA probes for the detection of: – taxonomic domains, divisions, groups …– specific organisms

The The RibosomeRibosome

rDNA

rRNA (functional molecule)

LSU

SSU16s or 18s

The The RibosomeRibosome

• Folded secondary structure

• Essential functional component

• Conserved spans– structure must be retained for viability

– targeted for universal/group-specific PCR primers and probes

• Variable regions– spans not fundamental to the folded structure

– receive less pressure from natural selection

– probed for genus and species level discrimination

What could be What could be amplified?amplified?

• Universal 16S PCR primers complex population of amplicons.

• Must define the targets to consider as the Potential Amplicon Set.

Variable

5’ 3’

1390 1507

Region interrogated on chip

pA Ccomp 1492R

20 base DNA signature segments on chip = probe set

Sample reacts only with complementary signature sequences on chip

SSU rDNA

First generation rDNA Array uses 85-base

highly variable region of ribosomal DNA

http://greengenes.llnl.gov/http://greengenes.llnl.gov/16S16S

• Comprehensive Aligned Sequence Construction for Automated Design of Effective Probes

• Igor Dubosarskiy– Java

implementations

• Tim Harsch– RDBMS

consultations

• Lisa Corsetti– Apache module

management

• Kevin Melissare– Graphics

2.30.9.2.10

5th Level:C.ACETOBUTYLICUM_SUBGROUP

4th Level:C.BOTULINUM_GROUP

3rd Level:CLOSTRIDIUM_AND_RELATIVES

2nd Level:GRAM_POSITIVE_BACTERIA

1st Level:BACTERIA

Clostridium collagenovorans DSM 3089 (T) Clostridium sardiniensis ATCC 33455 (T) Clostridium acetobutylicum ATCC 824 (T) Clostridium acetobutylicum DSM 792 (T) Clostridium acetobutylicum ATCC 824 (T) Clostridium acetobutylicum NCDO 1712 Clostridium acetobutylicum DSM 1731

2.28.3.27.2

5th Level:ESCHERICHIA_SUBGROUP

4th Level:ENTERICS_AND_RELATIVES

(Group)

3rd Level:GAMMA_SUBDIVISION

2nd Level:PROTEOBACTERIA

1st Level:BACTERIA

U85138 clone ACK-SA7AE000452 Escherichia coli str. K-12Er.trachep Erwinia tracheiphila LMG 2906 (T)E.coliK12 Escherichia coli [gene=rrnG gene]Haf.alvei3 Hafnia alveiS.tymuriu3 Salmonella typhimurium str. Stm1Shi.boydii Shigella boydiiAF084835 str. KN4S.enterit4 Salmonella enteritidis str. SE22S.ptyphi6 Salmonella paratyphiS.typhi3 Salmonella typhi str. St111S.bovismrb Salmonella bovis morbificans Sbm1Alt.agrlyt Alterococcus agarolyticus str. ADT3Shi.flxne2 Shigella flexneri ATCC 29903 (T)

HierarchicalHierarchical Phylocodes Phylocodes

Chip TaxaChip Taxa

• Avoid groupings based on historical nomenclature.• Sequence-dependent classification by transitive

similarity clustering.

• Each sequence must end up in exactly 1 taxon.

if x R y & y R z x R z

Assigning Probes for GeneChip MicroarrayAssigning Probes for GeneChip Microarray

• Select probe sets for each taxon• Ideal Probe

• Present in all sequences of the taxon• Not present outside the taxon• Unable to X-hybe with seqs in other taxa

• Ideal Mis-match Control Probe• Unable to X-hybe to any sequence

Finding groupingsFinding groupingsseq

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

sequ

ence

s

probes

Consider A – O to be 16S sequences.

Consider 1 – 24 to be probes already embedded on the chip.

First, associate all available probes with all available sequences.

Let probe similarities drive sequence groupings.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O






1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O






1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O






1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O





Progressive Transitive Progressive Transitive ClusteringClustering

Count of Solved Clusters ith each Cycle's Parameters

1

10

100

1000

10000

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77

Cycle

Co

un

tTotal Clusters

Solved Clusters

uGBpplock

uPWppsep

DEFINE: upp (useful probe pair): a PM,MM pair where the 20-mer

PM complements all intra-cluster sequences AND the central 16-mer of PM does not complement any extra-cluster sequences AND the central 16-mer of the MM does not complement any sequence. Probe pairs are reassessed whenever the sequence clusters are altered.

nGBupp: number of upps for a cluster, these probe pairs globally differentiate a cluster from all other sequences.

L:the value of nGBupp which must be met for a cluster to be locked.

nPWuppA: number of useful probe pairs which pair-wise differentiate clustA from clustB

nPWuppB: number of useful probe pairs which pair-wise differentiate clustB from clustA

m: the value of nPWuppwhich must be met to inhibit two clusters from merging.

FOR L (11 .. 4) DO FOR m (1 .. 10) DO Determine nGBuppfor each cluster; Lock all clusters where nGBupp≥ L; Pair-wise compare non-locked clusters (clustA,

clustB); UNLESS (nPWuppA≥ mAND nPWuppB≥ m) Merge sequences of clustA and clustB into one

cluster; END UNLESS END FOR Uncluster non-locked clusters;END FOR

650 clusters found

cctagcatgCattctgcatacctagcatgGattctgcata

MATCHMISMATCH

Approach: Custom Affymetrix GeneChip

• Massive parallelism – Up to 500,000 probes in a 1.28 cm2 array• Identification of multiple species in a mixed population• Single nucleotide mismatch resolution

General General ProtocolProtocol

Air

Soil

Feces

Blood

Water

rRNA

gDNA

Universal 16S rDNA

PCR

Contains probes adhered to glass surface in grid

pattern.

50 µ

50 µ

AC

GG

TC

GA

AC

GG

TC

GA

AC

GG

TC

GA

AC

GG

TC

GA

AC

GG

TC

GA

Hybridize

PCR Amplify DNA

Fractionate DNA

Biotin End-label

Locating Hybridization Locating Hybridization EventsEvents

Parameter Frankia Clostridium Positive fraction 1.00 0.64Average difference 3720 625

Frankia sp. str. G48

PM MM

Clostridium butyricum

Can the chip detect Can the chip detect more than one more than one

analyte?analyte?

Combinatorial Combinatorial scoring of “Probe scoring of “Probe Sets” are able to Sets” are able to categorize mixed categorize mixed

samples.samples.

OTU % pos pairs2.30.7.12.1.013* 1002.30.7.12.1.014 46 – 572.30.7.12.1.015 54 - 612.30.7.12.1.016 39 – 542.30.7.12.1.017 182.30.7.12.2.002 112.30.7.12.2.003 142.30.7.12.2.005 14 – 322.30.7.12.2.006 18 – 322.30.7.12.2.007 21 – 252.30.7.12.2.008 14 – 292.30.7.12.3.001 7 – 252.30.7.12.3.002 82.30.7.12.3.003 42.30.7.12.3.004 7 – 112.30.7.12.3.005 4 – 142.30.7.12.3.006 112.30.7.12.3.007 14 – 292.30.7.12.3.008 72.30.7.12.3.009 4 – 112.30.7.12.3.010 0 - 42.30.7.12.4.001 21 – 362.30.7.12.4.004* 100

2.30.7.12.4.005 0 – 112.30.7.12.4.006 29 – 542.30.7.12.4.007 11 – 142.30.7.12.4.008 11

S.aureusspike

B.anthracisspike


analyte?analyte?

OTU % pos pairs2.30.7.12.1.013* 1002.30.7.12.1.014 46 – 572.30.7.12.1.015 54 - 612.30.7.12.1.016 39 – 542.30.7.12.1.017 182.30.7.12.2.002 112.30.7.12.2.003 142.30.7.12.2.005 14 – 322.30.7.12.2.006 18 – 322.30.7.12.2.007 21 – 252.30.7.12.2.008 14 – 292.30.7.12.3.001 7 – 252.30.7.12.3.002 82.30.7.12.3.003 42.30.7.12.3.004 7 – 112.30.7.12.3.005 4 – 142.30.7.12.3.006 112.30.7.12.3.007 14 – 292.30.7.12.3.008 72.30.7.12.3.009 4 – 112.30.7.12.3.010 0 - 42.30.7.12.4.001 21 – 362.30.7.12.4.004* 100

2.30.7.12.4.005 0 – 112.30.7.12.4.006 29 – 542.30.7.12.4.007 11 – 142.30.7.12.4.008 11 Percent of probe-pairs scored positive for each probe set in the Staphylococcus Group.

Hybridization results from spike-in experiment done in

triplicate.

Sonya Murray

Aubree Hubbel


analyte?analyte?

Combinatorial Combinatorial scoring of “Probe scoring of “Probe Sets” are able to Sets” are able to categorize mixed categorize mixed

samples.samples.

Application ExampleApplication Example

• Does air filter sample processing affect detection?– Method 1

• Wash particles from filter with SDS

• Digest particles with lysozyme

• Purify DNA using Qiagen kit

– Method 2• Pulverize filter and particles with bead mill, SDS,

P:C:ISA

• Purify DNA using MoBio kit and Sephacryl column

Bead beating allowed greater diversity to be

detected.

Quantitative AnalysisQuantitative Analysis

• Could the concentration of each amplicon in a sample be measured by fluorescence intensity?

• Experimental setup for 20 point Latin Square calibration:

Experiment Oc.oenos Fer.nod Sap.grand M.neuro H20 Environmental amplicons*

1 5 13 31 74 No Yes

2 13 31 74 143 No Yes

3 31 74 143 5 No Yes

4 74 143 5 13 No Yes

5 143 5 13 31 No Yes

6 0 0 0 0 Yes Yes

* 18uL of products from 30 cycle universal 16S PCR of gDNA extracted from U.K. air sample.

SPIKE CONCENTRATION (pM in Hybridization Solution)SPIKE CONCENTRATION (pM in Hybridization Solution)

Sonya Murray

Carol Stone

Oo Fn Sg Mn

1 5 (5474) 13 (16069) 31 (31805) 74 (124732)

2 13 (7885) 31 (61185) 74 (81107) 143 (115237)

3 31 (58912) 74 (70317) 143 (98235) 5 (8759)

4 74 (101803) 143 (69529) 5 (7789) 13 (11530)

5 143 (149869) 5 (4534) 13 (16228) 31 (56103)

6 n.a. n.a n.a. n.a.Final concentration of spike in hybridization in pM. Values in parentheses are the resulting hybridization signal in

arbitrary units (a.u.) obtained from the Latin Square experiments. All spikes were added to 18µL of products of 30 cycle universal SSU PCR of gDNA extracted from air samples using Method 2.

Log2 transformed

Linear Least Squares Regression

Pearson’s corr coeff was significant (df=18)

95% confidence intervals calculated according to: National Measurement System’s Valid Analytical Measurement Programme (VAM)

Figure 2 - Calibration Plot

y = 0.9207x + 10.504R = 0.974

9

11

13

15

17

19

0 1 2 3 4 5 6 7 8

log2 Concentration (pM)

log

2 H

ybS

core

Spike-in rDNA

Environmental rDNA

95% Confidence Limits

Spike-in Regression

Environmental community is measured with confidence intervals.

Figure 3 - Concentration of Environmental SSU Amplicons

0 20 40 60 80 100 120 140

Clostridium thermobutyricumStreptococcus anginosus

Bacillus racemilacticusPseudomonas sp.

symbiont of Solemya velumClostridium limosum+

Eurotiales (Aspergillus+)Bartonella+

Staphylococcus delphini+Vibrio parahaemolyticus+

Pasteurella sp.Heterotextus alpinus

StreptomycesStaphylococcus cohnii+

Propionibacterium lymphophilumLeucostoma persoonii

Tax

a

rDNA Concentration (pM)

Conf Interval: Conc(t(RSE)/b)(1/m+1/n+((Y-y)2) / (b2(n-1)sx2))

b = slope from regression

Y = mean of 6 replicate measurements

m = number of repeat measurements = 6

y = mean of the HybScores for the 20 points used for calibration

t = critical value obtained from t-table for 18 d.f. for 95% = 1.734

RSE = residual standard error of calibration points = 0.56

sx = standard deviation of the conc. for the 20 points used for calibration

SummarySummary

The SSU microarray was able to rapidly quantify and taxonomically classify of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic orgins.

AcknowledgementsAcknowledgements

• Gary Andersen – group Leader

• Carol Stone – sample collection, hybridization Sonya Murray - hybridizations