Genomic Reference Materials @ The National Institute of Standards and Technology (NIST)
Scott A. JacksonGroup Leader
Complex Microbial Systems11-20-2015
NIST- Who we are today?
“Industry’s National Laboratory”partnering/serving industry to help maintain US leadership in science and technology products
Department of Commercedeveloping standards to support international trade and commerce
The National Metrology Instituteworking toward global harmonization and traceability to the SI
The FDA has also been active in addressing other regulatory issues surrounding personalized medicine. Along with authorizing the Illumina technology for marketing, the FDA recognized the need for reference materials and methods that would permit performance assessment. As a result, the FDA collaborated with the National Institute for Standards and
Technology (NIST) to develop reference materials consisting of whole human genome DNA, together with the best possible sequence interpretation of such genomes.
The FDA based its decision to grant marketing authorization for the Illumina instrument platform and reagents on their demonstrated accuracy across numerous genomic segments, spanning 19 human chromosomes. Precision and
reproducibility across instruments, users, days, and reagent lots were also demonstrated.
Justin ZookMarc Salit
Justin ZookMarc Salit
NIST Microbial Genomic Reference Materials
Microbial Genomic Reference Materials at NIST
Opinions expressed in this paper are the authors and do not necessarily reflect the policies and views of NIST or affiliated venues. Certain commercial equipment, instruments, or materials are identified in this paper only to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose. Official contribution of NIST; not subject to copyrights in USA.
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Traceability:
How do I gain confidence that my result it correct?
Why do I care if my answer is right?
High Stakes Decisions
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Material Selection and Acquisition
Strain Selection
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Produced by local vendorFor each strain
• pure culture• single batch of DNA • ~ 1500 vials• 3μg per vial
RM Production
NIST Microbial Genomic Reference Materials
RM Characterization
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Experimental Design
* OpGen Optical Genome Mapping
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
• Genome Assembly• Base Level Purity• Genomic Contaminants• DNA Stability
Characterized Properties
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
• Genome Assembly• Base Level Purity• Genomic Contaminants• DNA Stability
Characterized Properties
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Genome Assembly
Chin et al. 2013
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Assembly Validation OpGen Optical Mapping
"Optical mapping" by Fong Chun Chan and Kendric Wang http://commons.wikimedia.org/wiki/File:Optical_mapping.jpg#/media/File:Optical_mapping.jpg
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Genome Assembly ConfirmationOpGen Optical Mapping
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
High Confidence Assembly
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
High Confidence Assembly
*in progress
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Genome AssemblyBase Level Purity
Single Base HomogeneityVial-to-vial Homogeneity
Genomic ContaminantsDNA Stability
Characterized Properties
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Base Level Purity: MethodsSequencing Reads
Calculate Purity
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Base Level Purity: Results MG001● 19 out of 4.8 Mb have
purity values less than 0.98 for both platforms
● 5 positions with purity less than 0.95
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Base Level Purity: Results MG001
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Base Level Purity: Results MG001
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Base Level Purity: Conclusions
● Low genomic diversity within RM lot○ 19 out of 4.8 Mb low purity (95-98%) values for both
platforms● Purity variability due to:
○ Platform specific biases○ Run-to-run variability○ Bioinformatic errors○ NOT material vial-to-vial heterogeneity
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Genome AssemblyBase Level PurityGenomic Contaminants
Presence of contaminant DNADNA Stability
Characterized Properties
* Think metagenomic analysis of data from a pure isolate
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Genomic Contaminants: MethodsTaxonomic Read Assignment
Hong et al. 2014 Microbiome
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Genomic Contamination: Results MG001Contaminants most likely NOT from RM
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Likely contaminant sourcesSequencing reagentsBioinformatic errors
Fit for purpose99.995% minimum genomic purity
Genomic Contaminants: Conclusions
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
• Genome Assembly• Base Level Purity• Genomic Contaminants• DNA Stability
Characterized Properties
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
DNA Stability
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
DNA Stability: Methods
Control
Ladder
2 weeks
8 weeks
194 kb 48.5 kb 6.5 kb
Control
Ladder
2 weeks
8 weeks
194 kb 48.5 kb 6.5 kb
37℃ Treatment 4℃ Treatment
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Automated Gel Image Processing
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Automated Gel Image Processing
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Stable at -20OC and 4OC
Don’t store your DNA at 37OC
DNA Stabilty Conclusions
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Pipeline for Evaluating Prokaryotic References
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Computational ReproducibilityCode EnvironmentData
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Computational ReproducibilityBioinformatic Pipeline
https://github.com/usnistgov/peprhttps://hub.docker.com/r/natedolson/pepr/https://hub.docker.com/r/natedolson/docker-pathoscope/
Data Analysis and Reportinghttps://github.com/usnistgov/peprr
Code
Sequence data NIH SRA Bioproject PRJNA252728
http://www.ncbi.nlm.nih.gov/bioproject/PRJNA252728
NIST Microbial Genomic Reference Materials
Conclusions● Microbial genomic RMs characterized:
○ Genome Assembly○ Base Level Purity○ Genomic Contaminants
● RMs and associated data will help validate sequencing and bioinformatic processes.
● Reproducible and transparent characterization
NIST Microbial Genomic Reference Materials
AcknowledgementsNate Olson• Microbial Genomic Reference Materials
• Microbiologist
• Bioinformaticist - PEPR Pipeline
• UMD PhD Student
NIST Microbial Genomic Reference Materials
AcknowledgementsFDA:
○ Heike Sichtig○ Marc Allard○ Tim Muruvanda○ Shashi Sharma○ Nagarajan Thirunavukkarasu
NIST:Nate OlsonMarc SalitJustin ZookScott JacksonNancy LinJenny McDanielLindsay VangDavid CatoeSteven Lund
This work was supported by the Department of Homeland Security (DHS) Science and Technology Directorate under the Interagency Agreement HSHQPM-12-X-00078 with NIST and by two interagency agreements with the FDA.
NIST Microbial Genomic Reference Materials
A Mixed Pathogen DNA Reference Material for NGS-Based Pathogen Detection
NIST Microbial Genomic Reference Materials
An Unbiased Approach for Pathogen Detection:Shotgun Metagenomics via Next-Gen Sequencing
Clinical Sample Total
DNAShotgun Library Metagenomic
Sequence Data
Bioinformatically Search for Pathogen-Specific Signatures
NIST Microbial Genomic Reference Materials
There is a Need for Standards for NGS-Based Pathogen Detection Assays/Devices
• Industry, Academics, and Government Regulators (FDA) have expressed a need for standards for the purpose of validating biothreat/pathogen detection devices
• Primary users/adopters of these standards would be device developers and research laboratories who wish to assess analytical sensitivity, specificity and relative performance of their DNA sequence-based pathogen detection device/assay.
NIST Microbial Genomic Reference Materials
Pool
Pathogen #1
Pathogen #2
Pathogen #3
Pathogen #4
Pathogen #5
Pathogen #6
Human DNA
10-1
101
10-2
10-3
10-4
10-5
10-6
Abundance*Source
Abundance* is genome copy number relative to human reference DNA
NIST Microbial Genomic Reference Materials
1.0E-07
1.0E-06
1.0E-05
1.0E-04
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+01Relative Genome Abundance
Expected Copy Number
qPCR Copy Number
NGS Copy Number
Pool
Pathogen #1
Pathogen #2
Pathogen #3
Pathogen #4
Pathogen #5
Pathogen #6
Human DNA
10-1
101
10-2
10-3
10-4
10-5
10-6
Abundance*Source
Abundance* is genome copy number relative to human reference DNA
Targeted (PCR) Based Detection and NGS-Based Metagenomic
Detection
Quantitative and
Qualitative
NIST-FDA Pathogen Detection Workshop
NIST Microbial Genomic Reference Materials
Pool
Pathogen #1
Pathogen #2
Pathogen #3
Pathogen #4
Pathogen #5
Pathogen #6
Human DNA
10-1
101
10-2
10-3
10-4
10-5
10-6
Abundance*Source
Abundance* is genome copy number relative to human reference DNA
NIST Microbial Genomic Reference Materials
Questions?
External RNA Controls
• RNA Spike-Ins to provide confidence in gene expression experiments
• Serves both Microarray-based and RNA-Seq (NGS)-Based Technologies
NIST Microbial Genomic Reference Materials
NIST Standard Reference Material (SRM) 2374DNA Sequence Library for External RNA Controls
Marc SalitSarah Munro
The ERCC Standard Reference Material has Been Widely Adopted
Sarah MunroScott PineMarc Salit
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
GC Bias
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Genome Assembly
Chin et al. 2013
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Short Read Data: Assembly Validation
Walker et al. 2014http://www.broadinstitute.org/software/pilon/
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Results MG001
Optical Mapping: Assembly Validation
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Experimental Design
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Genome AssemblyBase Level PurityGenomic ContaminantsDNA Stability
Characterized Properties
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Genome AssemblyOverall chromosome structure
Base Level PurityGenomic ContaminantsDNA Stability
Characterized Properties
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Conclusions:●Closed genome assembly from long read
data●Assembly confirmation with orthogonal
methods●Evaluation of candidate errors may require
additional analysis
Assembly Validation
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Genome AssemblyBase Level Purity
Strain Diversity: within lotHomogeneity: vial-to-vial
Genomic ContaminantsDNA Stability
Characterized Properties
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Base Level Purity: Conclusions
● Low genomic diversity within RM lot○ 19 out of 4.8 Mb low purity values for both platforms
● Purity variability due to: ○ Platform specific biases○ Run-to-run variability○ Bioinformatic errors○ NOT material heterogeneity
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Genome AssemblyBase Level PurityGenomic Contaminants
Presence of contaminant DNADNA Stability
Characterized Properties
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Two Pronged approach1.NIST microbial genomic RMs2.Methods allowing USERS to evaluate in-
house materials
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Two Pronged approach1.NIST microbial genomic RMs2.Methods allowing USERS to evaluate in-
house materials
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Two Pronged approach1.NIST microbial genomic RMs2.Methods allowing USERS to evaluate in-
house materials
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
ShotgunGenome Sequencing
Modified From Loman et al. 2012 Nature Reviews Microbiology 10(9)
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Base Level Purity: Results MG001
Background Material Selection Characterization
NIST Microbial Genomic Reference Materials
Base Level Purity: MethodsSequencing Reads
Calculate Purity