Post on 14-Jul-2015
transcript
Nate OlsonGenome Scale MeasurementsBiosystems and Biomaterials Division
Microbial Genomics @ NIST
● Example Microbial and Genomic Programs○ ERCC spike in controls○ Genome In a Bottle○ Biothreat Detection
● Three Microbial Genomics Projects○ Genomic Purity○ SNP Method Evaluation○ Genomic Reference Materials
Talk Overview
DisclaimerOpinions expressed in this paper are the authors’ and do not necessarily reflect the policies and views of DHS, NIST, or affiliated venues. Certain commercial equipment, instruments, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendations or endorsement by NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.
Microbial Genomics @ NIST
External RNA Control Consortium
Advancing Genomic and Biothreat Detection Metrology
Biothreat Detection
Genome In A Bottle Consortium
ExternalRNAControlConsortium
Is the apparent difference between samples biological or an artifact?
External RNA Control Consortium
Spike in controls to assess technical performance
External RNA Control Consortium
https://github.com/usnistgov/erccdashboard
ERCC Dashboard facilitates RM use
Is this genetic mutation real or an artifact?
Genome In A Bottle Consortium
http://genomeinabottle.org/
Genome In A Bottle
RM to challenge measurement process
http://www.bioplanet.com/gcat
Genome In A Bottle
High confidence variants used for algorithm benchmarking
Biothreat Detection
Is this suspicious powder a biothreat agent, or a false alarm?
Biothreat Detection
Surrogate material to support first responder training
Biothreat Detection
Engineered yeast as surrogate for biothreat agents
Address measurement challenges with reference materials and
documentary standards.
Summary
Microbial Genomics
Microbial Sample Characterization● Genomic Purity - DHS● Evaluating SNP calling methods - DHS● Microbial Genomic RMs - FDA
Microbial Genomics Purity
Challenge: Identify low levels of contaminants without knowing their identity.
Genomic Purity
Approach:Taxonomic read classification
paired with NGS
Experimental Design
Genomic Purity
● Seven Organisms○ Bacillus anthracis○ Escherichia coli O157:H7○ Francisella tularensis○ Pseudomonas aeruginosa○ Salmonella enterica○ Staphylococcus aureus○ Yersinia pestis
● Simulated Datasets○ Illumina error profile○ 250 paired end reads○ 20 X coverage
● 336 spiked datasets○ Pairwise combinations○ Contaminant concentrations
5% to 2.5 x 10-4 % of cells● Pathoscope used for read
classification (http://sourceforge.net/projects/pathoscope/)○ Database - Genbank bacterial
genomes
In-Silico Experiment
Contaminant
Genomic Purity
Only Contaminant and Sample Genus Identified
Genomic Purity
Only Contaminant and Sample Genus Identified
Detected contaminants down to 5.0 x 10-4 % of cells
● Sensitivity Dependent on○ Relative size of the sample and
contaminant genomes○ Genetic similarity of sample and
contaminant to other organisms in the database
Genomic Purity
Able to Detect Contaminants at less than 1% Cell Concentrations for Most Pairwise Comparisons
Genomic Purity
Conclusions● Next generation sequencing in conjunction with read classification
algorithms can be used to assess sample purity
● Achieved ○ Genus level classification specificity○ With sensitivity ranging from 5% to 2.5 x 10-4%
● Future work includes further validation of the method using real mixtures
Genomic Purity
SNP Method Evaluation
Challenge: Defining confidence in sample identification
SNP Method Evaluation
Approach: Whole genome (SNP) sample identification
SNP Method Evaluation Requirements
1. Reference with known truth○ Genomic DNA○ Data
■ real vs. simulated2. Performance metrics3. Replicates for assessing
uncertainty○ multiple sequencing runs○ bootstrap replicates ○ multiple reference genomes
SNP Method Evaluation
Truth Tables can be static or dynamic
SNP Method Evaluation
Truth Table Values Used to Calculate Performance Metrics
SNP Method Evaluation
Quality Score Algorithm
Conclusions● Three requirements for evaluation
○ Reference with known truth○ Performance metrics○ Replicates for uncertainty assessment
● Working to develop tools for implementing these requirements
● Application of these requirements will help to establish confidence in SNP based sample identification
SNP Method Evaluation
Genomic Reference Materials
Development of Microbial Reference Materials
Strains selected based on public health relevance and GC content
Genomic Reference Materials
Orthogonal Methods used to Characterize Genome Structure, Sequence, and Purity
Genomic Reference Materials
● Microbial genomic reference materials characterized○ Genome Structure○ Sequence○ Purity○ Stability
● Material and data will help validate pathogen detection assays as well as sequencing and bioinformatic workflows.
Genomic Reference Materials
Conclusions
Developing a measurement infrastructure to support genome-
based characterization of microbial samples.
Microbial Genomics @ NIST
Summary
● Biosystems and Biomaterials Division● Genome Scale Measurements
Genomic Purity● Justin Zook● Nancy Lin
MIcrobial Genomic Reference Materials● Marc Salit● Justin Zook● Steven Lund● Scott Jackson● Marc Allard and others at FDA● Heike Sichtig at the FDA
SNP Calling Method Evaluation● NIST
○ Jayne Morrow○ Justin Zook○ Steven Lund○ Nancy Lin○ Marc Salit
● Northern Arizona University○ Jim Schrup○ Becky Coleman○ Jason Sahl○ Paul Keim
● University of New Hampshire○ Jeff Foster
This work was supported by the Department of Homeland Security (DHS) Science and Technology Directorate under the Interagency Agreement HSHQPM-12-X-00078 with NIST and by two interagency agreements with the FDA.
Acknowledgements