Whole genome sequencing (WGS) - there’s a new tool in town
Henrik Hasman DTU - Food
Welcome to the NGS world TODAY Welcome
Introduction to Next Generation Sequencing DNA purification (Hands-on)
Lunch (Sandwishes – Hands-on)
DNA quantification for NGS (Hands-on)
Coffee….and….Library preparation (“at the movies”)
Running the MiSeq (Show’n tell) Computer exercises (Hands-on)
Goodbye
EU RL workshop on WGS Advanced NGS on antimicrobial resistant bacteria One day the training course will consist of hands-on and theoretical teaching focusing at NGS. This training aims at introducing relevant tools for the attendees to prepare for the challenges of genomic techniques. This theme will teach you how to prepare genomic DNA from bacterial culture for DNA sequencing, give a theoretical introduction to NGS sequencing including library preparation and running the MiSeq DNA sequencer. Finally, you will also perform an exercise regarding analysis of NGS data in relation to species identification, antimicrobial resistance gene detection and plasmid typing.
DNA sequencing
6
Applied Biosystems (ABI) Genetic analyser “First Generation” Sequencing machine (capillary Sanger sequencing)
Second generation sequencing
9
Illumina HiSeq/GAII systems High throughput systems
454 Life Sciences (Roche) First Next Generation Sequencing machine
Illumina MiSeq system Medium throughput system
Ion Torrent PGM system Low/medium throughput system
Second generation sequencing machines
Workflow today at the clinical laboratory
Workflow with WGS at the clinical laboratory
Didelot et al, 2012.
Rough assembly and compression
Raw DNA sequences
Gene finding Comparison
Identification
Fine assembly
What is already known? Pathogenicity islands Virulence genes Resistance genes MLST type
Google maps like view
• Reports Outbreaks
Summary of: What it is? Has it been seen before? How we can fight/treat? What is new/unusual?
Serv
er si
de
Clie
nt si
de
What is novel? Vaccine targets Virulence genes Resistance genes SNPs
Illumina MiSeq system Medium throughput system
MiSeq Workflow
Analysis tools
DNA purification
Gram
+
DNA barcoding
Library
Tutorial on MiSeq workflow
MiSeq Sequencing Chemistry: ca. 20 min http://support.illumina.com/training/courses/MiSeq_Sequencing_Chemistry/index.html?iframe
DNA purification EasyDNA from Invitrogen
Qubit DNA quantification
http://www.youtube.com/watch?v=6HtnVUHMX_8
Questions?
Then “to the dungeons”
MiSeq Workflow
Analysis tools
DNA purification
Gram
+
DNA barcoding
Library
Normal Illumina workflow
Video on Sample preparation
http://www.youtube.com/watch?v=fs1A_Ik7Smo
Simplified protocol
Nextera XT sample prep video
http://www.youtube.com/watch?v=ectVoRJ-6HU Manual:
http://supportres.illumina.com/documents/myillumina/900851dc-01cf-4b70-9e95-d590531c5bd4/nextera_xt_sample_preparation_guide_15031942_c.pdf
Nextera XT tutorial
Nextera DNA Sample Prep. Kit: ca. 20 min http://support.illumina.com/training/courses/Nextera_Sample_Prep_Kits/index.html?iframe
Nextera XT library workflow
Adapters added by PCR
Index (barcode)
Multiplexing DNA samples
Illumina MiSeq system Medium throughput system
Multiplexing 18 E. coli
24 S. aureus
Multiplexing with Nextera XT
Library building
Library preparation movies
http://support.illumina.com/training/sequencing_training.ilmn (Login might be required)
How many bacteria in a library?
• 16-18 genomes around 5-6 Mb - E. coli - Klebsiella - Salmonella
• 24 genomes around 3 Mb - Enterococcus - Staphylococcus - Campylobacter
The MiSeq principle
http://www.youtube.com/watch?v=l99aKKHcxC4
To the MiSeq
Illumina MiSeq system Medium throughput system
NGS output
Huge numbers of small fragments (35-500 bp)
Data analysis of WGS data
Assembled data
Raw reads Single end Paired end
De novo Assembly
Com. Software Web tool
http://cge.cbs.dtu.dk/services/Assembler/
Reference vs. de novo assembly
Known genome
Reference assembly
De novo assembly Smaller fragments (Unknown order)
Reference vs. de novo assembly
Data analysis of WGS data
Assembled data
Species confirmation
Resistance genes
MLST Virulence
genes Plasmids
Epidemio-logical
markers
MLST
NGS Illumina PacBio 454..
Resistance genes
Virulence genes
Allele 1 Allele 2 Allele 3 Allele 4
Allele 5 } ST Resistance
gene profile Assembly pipeline
List of genes (100% or >95%) Theoretical resistance phenotype
Species ID Outbreak strain SNP* based typing
*SNP – Single Nucleotide Polymorphism (extreme MLST)
1G bases 2-6 Mb
AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAATAAAAAAAAAAA AATAAATAATAATAAA
Data analysis of WGS data
Assembled data
http://cge.cbs.dtu.dk/services/all.php
Data analysis of WGS data
Assembled data
Species confirmation
KmerFinder SpeciesFinder
Breaks the genome into small 16-mers (k=16) and scans a DB of complete
genomes for best match.
Identifies 16S in the genome and compare to a
database of 16S sequences.
A sorted list of the number of best-matching 16-mers
in a given complete genome (hits in sequence)
The best hit. A “TRUE” value means perfect hit, a “FAIL” value means close
match.
Description
Output
http://cge.cbs.dtu.dk/services/SpeciesFinder/ http://cge.cbs.dtu.dk/services/KmerFinder/
Example: KmerFinder
Example - KmerFinder
Example – Resfinder
VTEC O104:H4 outbreak strain
For publication in: Journal of Antimicrobial Chemotherapy Genotyping using whole-genome sequencing is a realistic alternative to surveillance based on phenotypic antimicrobial susceptibility testing Ea Zankari1,2, Henrik Hasman1, Rolf Sommer Kaas1,2, Anne Mette Seyfarth1, Yvonne Agersø1, Ole Lund2, Mette Voldby Larsen2, Frank M. Aarestrup1,#
200 isolates – reduced to 197 (Salmonella, E. coli, E. faecium, E. faecalis) 3,051 individual susceptibility tests
Table 2. Overview of resistance genes detected in the isolates by ResFinder with an ID ≥ 98.0%
No. of isolates (%)*
Resistance gene S. Typhimurium (n = 49) E. coli (n = 48) E. faecalis (n = 50) E. faecium (n = 50)
Aminoglycoside str 0 (0) 0 (0) 5 (10.0) 1 (2.0)
ant(6)-Ia 0 (0) 0 (0) 18 (36.0) 18 (36.0)
ant(6')-Ii 0 (0) 0 (0) 0 (0) 49 (98.0)
aph(3')-Ia 2 (4.1) 2 (4.2) 0 (0) 0 (0)
aph(3')-Ic 2 (4.1) 0 (0) 0 (0) 0 (0)
aph(3')-III 0 (0) 0 (0) 17 (34.0) 10 (20.0)
aac(6')-aph(2'') 0 (0) 0 (0) 10 (20.0) 0 (0)
strA/strB 19 (38.8) 10 (20.8) 0 (0) 0 (0)
aadA1 5 (10.2) 19 (39.6) 0 (0) 0 (0)
aadA2 2 (4.1) 4 (8.3) 0 (0) 0 (0)
aadA4 0 (0) 1 (2.1) 0 (0) 0 (0)
aadA13 1 (2.0) 0 (0) 0 (0) 0 (0)
Beta-lactam pbp5 0 (0) 0 (0) 0 (0) 49 (98.0)
blaTEM-1 21 (42.9) 12 (25.0) 0 (0) 0 (0)
blaTEM-117 0 (0) 1 (2.1) 0 (0) 0 (0)
blaCTX-M-14 0 (0) 1 (2.1) 0 (0) 0 (0)
blaCARB-2 2 (4.1) 0 (0) 0 (0) 0 (0)
MLS erm(B) 0 (0) 1 (2.1) 25 (50.0) 15 (30.0)
Isa(A) 0 (0) 0 (0) 50 (100.0) 0 (0)
Inu(B) 0 (0) 0 (0) 11 (22.0) 15 (30.0)
msr(C) 0 (0) 0 (0) 0 (0) 44 (88.0)
mph(A) 0 (0) 1 (2.1) 0 (0) 0 (0)
Phenicol catA1 0 (0) 2 (4.2) 0 (0) 0 (0)
floR 2 (4.1) 0 (0) 0 (0) 0 (0)
cmlA1 0 (0) 3 (6.3) 0 (0) 0 (0)
cat(pC194) 0 (0) 0 (0) 0 (0) 1 (2.0)
Sulphonamide sul1 9 (18.4) 8 (16.7) 0 (0) 0 (0)
sul2 20 (40.8) 7 (14.6) 0 (0) 0 (0)
sul3 0 (0) 3 (6.3) 0 (0) 0 (0)
Tetracycline tet(A) 1 (2.0) 11 (22.9) 0 (0) 0 (0)
tet(B) 19 (38.8) 4 (8.3) 0 (0) 0 (0)
tet(G) 2 (4.1) 0 (0) 0 (0) 0 (0)
tet(M) 0 (0) 0 (0) 34 (68.0) 27 (54.0)
tet(L) 0 (0) 0 (0) 24 (48.0) 5 (10.0)
tet(S) 0 (0) 0 (0) 0 (0) 1 (2.0)
tet(O) 0 (0) 0 (0) 0 (0) 1 (2.0)
Trimethoprim dfrA1 1 (2.0) 9 (18.8) 0 (0) 0 (0)
dfrA12 0 (0) 2 (4.2) 0 (0) 0 (0)
dfrA14 1 (2.0) 0 (0) 0 (0) 0 (0)
dfrA21 0 (0) 1 (2.1) 0 (0) 0 (0)
dfrD 0 (0) 0 (0) 0 (0) 1 (2.0)
dfrG 0 (0) 0 (0) 17 (34.0) 0 (0)
Glycopeptide Van-A 0 (0) 0 (0) 0 (0) 1 (2.0) MLS, Macrolide-Lincosamide-StreptograminB, *, Per cent resistance genes was determined by dividing the number of isolates harbouring the gene by the total number of isolates (per species).
Gene blaTEM-1 blaCTX-M-14 blaCARB-2
Salmonella 21 (42.9%) 0 (0%) 1 (2.1%)
E. Coli 12 (25.0%) 1 (2.1%) 2 (4.1%)
Phenotypic
Resistant Susceptible
Resistant 475 7
Susceptible 0 2569
Phenotypic
Resistant Susceptible
Predicted resistant 475 7
Predicted susceptible 16 2553
99.2% concordance
retest
99.8% concordance Spectinomycin in E. coli
Example: VirulenceFinder
Plasmid markers
Gram negative plasmids
Gram positive plasmids 100%
98% 95% 90% 85%
Assembled genome/contigs 454 – single end reads 454 – paired end reads Illumina – single end reads Illumina – paired end reads Ion Torrent SOLiD – single end reads SOLiD – paired end reads SOLiD – mate pair reads
incF plasmid
PlasmidFinder
Workflow with WGS at the clinical laboratory
Modified from Didelot et al., 2012.
4-6 hours
E. coli in Urine samples A: ATCC 8739 reference
_d = Direct sequencing on urine _i = sequencing of isolate from urine
ST409
ST409
SNP tree ST597
ST597
ST227
ST227
Strain KmerFinder SpeciesFinder ResFinder VirulenceFinder PlasmidFinder
C751
24_26
2007-1-12488
E64
Skejby2
https://dl.dropboxusercontent.com/u/51020933/EURL.zip