NGS: current and future platforms eLibrary by author © ESCMID

transcript

11/13/18

NGS: current and future platforms

Natacha Couto

Department of Medical Microbiology

University Medical Center Groningen, RUG

Groningen, 21-24th October 2018

Disclosure slide for speaker at further training events

(Potential) conflict of interest None

Potentially relevant company relationships in connection with

Nanopore, Qiagen– no personal benefits

Sponsorship or research funding Nanopore, Qiagen, Roche

11/13/18

Lab design

Sample collectionDNA/RNA extraction

Library Preparation

Sequencing Data analysisResults

interpretation

Different types of samples (tissues,

fluids, etc).

Sequencing equipment.

Read length.

Taxonomy.ARG detection.

Phylogeny.

Sensitivity.PPV.

Limits of detection.

Human DNA/RNA

removal.Microbial post-

enrichment.

Fragmentation.Random pre-

amplification.Target

enrichment.

Where: @ the Microbiology lab not in our core

facility

• Need for Speed

• Infectious samples

• Customized protocols

• Building experience and capacity

• Gaining knowledge

Ion Torrent MiSeq 1 MiSeq 2 NextSeq

4x MinION 1x MinIT

NGS in the Medical Microbiology laboratory

11/13/18

How do I choose the best

platform for my metagenomics experiments?Million dollar question…

How do I choose the best platform for my metagenomics experiments?

• It depends on what you want to do!

• Where are you performing your sequence? In a capacitated diagnostic lab or in remote areas?

• Do you want to sequence DNA or RNA metagenomes? Or both?

• Do you want to perform taxonomic classification only or do you wish to obtain

whole genomes with a high accuracy?

• Do you want to detect low frequency variants (e.g. viruses)?

• What is the time available for an answer?

• Do you want to look at the host-response?

11/13/18

Three major sequencing categories

•Sequencing by synthesis

•Sequencing by ligation

•Nanopore sequencing

Sequencing by synthesis

11/13/18

Sequencing by

synthesis

The process of sequencing DNA using a DNA

polymerase and a primer to create a new complementary DNA strand (natural process that occurs

during cell division).

Sequencing platforms

Manufacturer Amplification Detection Chemistry

Sanger Single molecule Optical Sequencing by synthesis

Roche 454 (GS FLX) Clonal Optical Sequencing by synthesis

Helicos BioSciences

(Heliscope)

Single molecule Optical Sequencing by synthesis

Ion Torrent

(ThermoFisher)

Clonal Solid state Sequencing by synthesis

Illumina Clonal Optical Sequencing by synthesis

PacBio Single molecule Optical Sequencing by synthesis

GeneReader (Qiagen) Clonal Optical Sequencing by synthesis

GenapSys (GENIUS) Clonal Solid state Sequencing by synthesis

Levy and Myers. Annu. Rev. Genomics Hum. Genet. 2016, 17: 95-115.

11/13/18

1st Step – Attachment to the flow cell

Adapter

DNA fragment

Dense lawnof primers

Adapter

Adapters

Prepare genomic DNA sample

Randomly fragment genomic DNAand ligate adapters to both ends ofthe fragments.

Attach DNA to surface

Bind single-stranded fragmentsrandomly to the inside surfaceof the low cell channels.

Mardis. Annu. Rev. Genomics Hum. Genet. 2008, 9: 387-402.

11/13/18

2nd Step – Cluster generation

Attached

Bridge ampliication

Add unlabeled nucleotides

and enzyme to initiate solid-

phase bridge ampliication.

Denature the doublestranded molecules

Nucleotides

Optimal cluster densities

Illumina Technical Note – Optimizing Cluster Density on Illumina Sequencing Systems.

11/13/18

Clustering

Blue boxes - raw

cluster density

range.

Green boxes - %PF

cluster density

range.

Red line – median

cluster density

values.

Clustering

• Cluster density is critical because it influences run quality, reads

passing filter, Q30 scores, and total data output.

• Underclustering leads to lower data output, although maintaining the high data quality.

• Overclustering leads to poor run performance, lower Q30 scores, possible introduction of sequencing artifacts and, in the end, lower

quality data output.

11/13/18

How does overclustering affect sequencing data?

• Overclustering acts on sequencing data in the following ways:

• Lower Q30 Scores—Due to overloaded signal intensities, the ratio of base

intensity to background for each base is decreased. This decrease often results in ambiguity during base calling, and leads to a decrease in data quality.

• Lower Clusters Passing Filter—The percentage of clusters passing filter (%PF) is an indication of signal purity from each cluster. Overclustered flow cells typically have higher numbers of overlapping clusters. This leads to poor template generation, which then causes a decrease in the %PF metric.

How does overclustering affect sequencing data?

• Lower Data Output—Reduced yield (gigabases [Gb] per flow cell) is a by product of lower %PF.

• Inaccurate Demultiplexing—Index reads usually have low diversity by design, which can lead to poor base calling. Overclustering exacerbates the potential

for poor base calling, which in turn, can lead to demultiplexing failures.

• Complete Run Failure—In cases of extreme overclustering, focusing can fail and the run may terminate at any cycle.

lllumina Technical Note – Optimizing Cluster Density on Illumina Sequencing Systems.

11/13/18

MetaNet

• Metagenomics for clinical microbiology

• Organize ring test and proficiency testing trials (EQA, QC)

• Develop or improve databases for pathogens, host genes and know

pathogen-host relations

MetaNet

1st Ring Trial

• For this first test run, 1 positive control, 1 negative control, and 6

mock samples were extracted in Salt Lake City (RNA and DNA).

• Extracted nucleic acids were shipped to Groningen and then sent to Copenhagen, Münster, and Tübingen.

MetaNet

11/13/18

Run QC

A. Cluster density and total yield.

B. Proportion of clusters passing filter (CI PF), bases ³Q30 (Q30), and the error rates based on the phiX control Error Rate).

Metanet study - kindly provided by Robert Schlaberg.

11/13/18

Organism detection in mock samples

Metanet study - kindly provided by Robert Schlaberg.

First chemistry cycle:

determine irst base

To initiate the irst

sequencing cycle, add

all four labeled reversible

terminators, primers, and

DNA polymerase enzyme

to the low cell.

Image of irst chemistry cycle

After laser excitation, capture the image

of emitted luorescence from each

cluster on the low cell. Record the

identity of the irst base for each cluster.

Sequence read over multiple chemistry cycles

Repeat cycles of sequencing to determine the sequenceof bases in a given fragment a single base at a time.

Before initiating the

next chemistry cycle

The blocked 3' terminus

and the luorophore

from each incorporated

base are removed.

GCTGA...

• After clustering is completed, all of the

reverse strands are washed off the flow cell,

leaving only forward strands.

• Primers attach to the forward strands and add fluorescently tagged nucleotides to the DNA

strand. Only one base is added per round.

• A reversible terminator is on every nucleotide

to prevent multiple additions in one round.

• Using color chemistry, each of the four bases has a unique emission, and after each round,

the machine records which base was added.

3rd Step – Synthesis of DNA

11/13/18

Optical detection

• In four-channel SBS, bases are identified using four different

fluorescent dyes for each base and four images per sequencing cycle.

• With four-channel sequencing, every sequencing cycle requires four dyes and four images to determine the DNA sequence.

• The MiSeqTM, and HiSeqTM systems use four-channel chemistry.

Illumina Technical Note – Illumina CMOS Chip and One-Channel SBS Chemistry.

[9] Illumina Technical Note – Illumina CMOS Chip and One-Channel SBS Chemistry.

11/13/18

Optical detection

• Around 2014, Illumina changed from a 4-channel SBS method to a 2-

channel SBS method.

• Images are taken using red and green filter bands (2 images per cycle, instead of 4).

• This accelerated sequencing and data processing times, while delivering the same quality and accuracy.

• The MiniSeqTM, NextSeqTM, and NovaSeqTM systems use two-channel chemistry.

[9] Illumina Technical Note – Illumina CMOS Chip and One-Channel SBS Chemistry.

Thymines are labeled with a green fluorophore, cytosines are

labeled with a red fluorophore, and adenines are labeled with

both red and green fluorophores. Guanines are permanently

11/13/18

Optical detection

• In 2017, Illumina combined SBS chemistry with CMOS technology to

deliver one-channel sequencing chemistry.

• Using a CMOS sensor embedded in the consumable is a simple and fast detection method.

• The iSeqTM 100 system uses one-channel chemistry.

11/13/18

Experiment 1

• It has been shown that different Illumina sequencers may have

different sequencing errors that might influence the final sequences.

• Therefore, here we evaluated the use of shotgun metagenomics and bioinformatics analyses to type DENV directly from sera and plasma samples.

• To optimize the workflow, we evaluated the effect of: i) the DNase I treatment to decrease the human DNA background; ii) two different library preparation methods and iii) two sequencing platforms, on the sequence data quality.

Schirmer et al. BMC Bioinf. 2016, 17: 125; Lizarazo et al. Under review 2018.

MiSeq vs NextSeq

Table 2. Sequence quality of the 4 runs performed using two different library preparation kits and two sequencing platforms. 722

Platform Library Prep

Raw density

(K/mm2)

%PF %≥ Q30 Total reads Total reads (PF) Yield Gbp

MiSeq NXT 1,082 ± 38 86.12 82.96 40,332,330 34,734,340 5.45

MiSeq TS 869 ±16* 91.51 92.95 32,919,292 30,123,438 4.59

NextSeq NXT 22 ± 4* 98.82 96.50 38,684,052 37,613,244 2.29

NextSeq TS 179 ± 4 89.75 84.08 114,917,472 103,134,945 41.66

Abbreviations: Gbp, giga base pair; PF, passing filter; Q30, quality score with base call accuracy of 99.9% (1 incorrect base in 1000 based calls); 723

NXT, Nextera XT library pep; TS, TruSeq v2 RNA library prep. 724

*Raw density was under the optimal range. 725

Lizarazo et al. Under review 2018.

11/13/18

MiSeq vs NextSeq

Table 3. Comparison of the effect of different library preparation methods and sequencing platforms in the proportion of DENV reads. 726

Platform

Library

Preparation

Samples

Average

number

of reads

Average

mapped

Average

unmapped

Average

mapped

Average

proportion

of DENV

mapped*

Average

coverage

(fold)

Average

assembled

consensus

length

Average

mapped

consensus

length

NXT 5 2,792,515 2,042,971 749,544 469,950 63% 5,603 7,667 10,683

TS 5 1,396,358 1,182,726 213,632 136,998 64% 1,867 10,202 10,680

NextSeq

NXT 12 677,692 155,878 521,814 175,820 34% 2,498 4,960 9,543

TS 12 6,179,416 4,054,813 2,124,603 1,547,456 73% 21,327 10,347 10,483

*Using the number of unmapped reads as denominator. 727

Abbreviations: bp, base pair; NXT, Nextera XT library pep; TS, TruSeq v2 RNA library prep. 728

Conclusions

• No significant differences were found between equipment.

• The NextSeq allowed the multiplexing of more samples.

• The cost per sample was slightly higher (86€) with the MiSeq, compared to the NextSeq (70€).

11/13/18

Click to edit Master title style

• High accuracy

• High throughput (depending on the system)

• Expensive platforms (depending

on the system)

• Long run time

• Short read length

• Not enable to detect large structural variations

Advantages Disadvantages

11/13/18

Ion Torrent/Ion PGM Technology

• Each nucleotide floods the chip subsequently.

• When a nucleotide is added to the DNA template and is

then incorporated into a strand of DNA, a hydrogen ion (H+)

is released.

• The charge from that ion changes the pH of the solution,

which can be detected by the ion sensor.

• The sequencer calls the base, going directly from chemical

information to digital information.

www.thermofisher.com

• If the next nucleotide that floods the chip is not a match,

there will be no voltage change recorded and no base will

be called.© ESCMID eLibrary by a

11/13/18

• If there are two identical bases on the DNA strand, the

voltage will be double, and the chip will record two identical

bases.

Sequencing at sea: challenges andexperiences in Ion Torrent PGMsequencing during the 2013 SouthernLine Islands Research Expedition

• The authors successfully sequenced 26 marine microbial genomes, and two

marine microbial metagenomes using the Ion Torrent platform on the Merchant

Yacht Hanse Explorer.

• Onboard sequence assembly, annotation, and analysis enabled us to investigate

the role of the microbes in the coral reef ecology of these islands and atolls.

Lim et al. PeerJ 2014, 2:e520.

11/13/18

• Very fast

• Highly scalable

• Different chips available

• Low cost

• High sequencing error rate

(namely in homopolymericregions)

11/13/18

PacBio

1. Each nucleotide contains a specific fluorescent dye.

2. Once the correct nucleotide is added to the sequence by the DNA polymerase, it emits light at a

specific wavelength.

3. The detector inside the zero-mode waveguide (ZMW, nanophotonic visualization chamber), captures

the light emitted.

Study 1

• The objective was to investigate the utility of Pacific Biosciences

circular consensus sequencing (CCS) reads for metagenomic projects.

• They compared the application and performance of both PacBio CCS and Illumina HiSeq data with assembly and taxonomic binning algorithms using metagenomic samples representing a complex microbial community.

| : | DOI: . /srep

www.nature.com/scientificreports

Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence dataJ. A. Frank , Y. Pan , A. Tooming-Klunderud , V. G. H. Eijsink , A. C. McHardy ,

A. J. Nederbragt & P. B. Pope

function within microbial communities. (ere we investigate the utility of Paciic Biosciences long and

SMRT cells produced approximately Mb of CCS reads from a biogas reactor microbiome sample that averaged nt in length and . % accuracy. CCS data assembly generated a comparative number of large contigs greater than kb, to those assembled from a ~ x larger (iSeq dataset ~ Gb produced from the same sample i.e approximately % of total contigs . (ybrid assemblies

produced signiicant enhancements in taxonomic binning and genome reconstruction of two dominant

Department of emistr iotec no o an oo cience orwe ian ni ersit of ife ciences s omputationa io o of Infection esearc e m o t entre for Infection esearc In o enstra

raunsc wei erman . o utionar nt esis in ern orwa . orrespon ence an re uests for materia s s ou e a resse to

. . . emai : p i .pope nm u.no

ecei e : Octo er

ccepte : pri

u is e : a

Frank et al. Sci Rep 2016, 6: 25373.

11/13/18

Improved metagenome assemblies and

taxonomic binning using long-read circular

consensus sequence data

• The proportion of total DNA binned in

the major phyla (A) represented in

the Link_ADI microbiome was similar

for both PacBio CCS and HiSeqcontigs.

• Differences between the sequencing

methods were evident at a species

level where some abundant species

assembled and binned better with PacBio, whereas others produced

better results with HiSeq data (B).

• The incorporation of PacBio data

produced significant enhancements

in taxonomic binning and genome reconstruction of two dominant

phylotypes, which assembled and

binned poorly using HiSeq data alone.

Frank et al. Sci Rep 2016, 6: 25373.

• Long reads

• Enables detection of structural variants (e.g. gene duplications)

• Very fast (real time sequencing)

• High sequencing error rate

• Very expensive

• Lower throughput compared to other platforms, especially when increasing the number of passages (DNA polymerase saturation)

11/13/18

Sequencing by ligation

Sequencing by

ligation

The process of sequencing DNA using a DNA ligase to

create a new complementary DNA strand.

11/13/18

SOLiD (ThermoFisher) Clonal Optical Sequencing by ligation

Dover (Polonator) Clonal Optical Sequencing by ligation

Complete Genomics Clonal Optical Sequencing by ligation

Sequencing using a nanopore

11/13/18

Oxford Nanopore Single molecule Nanopore Nanopore

Roche Genia Single molecule Nanopore Nanopore

Quantum Biosystems Single molecule Nanogate Nanogate

Sequencing with a nanopore

• A nanopore is a pore of nanometer size.

• It can be divided into three categories:

• Biological– also called transmembrane protein channels, usually inserted into a substrate (membrane). Well-defined and highly-reproducible nanopore size and structure.

• Solid-state – synthetic nanopores. They have many superior advantages over their biological counterparts, such as chemical, thermal, and mechanical stability, size adjustability, and integration.

• Hybrid – a mixture of both, taking advantage of the features of biological and solid-state nanopores.

Feng et al. Gen. Prot. Bioinf. 2015, 13: 4-16.

11/13/18

Biological nanopores

2.6 nm

Top view

5.2 nm

Side viewCap

β-barrel

ASide viewB

1.2 nm

Top view

3.6 nm 6 nm

13.8 nm

7.5 nm

Top view

Side viewC

10 nm 9.6 nm

Alpha-hemolysin

S. aureus

MspA porin

M. smegmatis

Connector channel

Bacteriophage phi29

11/13/18

Oxford Nanopore

• The detection principle is

based on monitoring the

ionic current passing

through the nanopore as a voltage is applied across the

membrane.

Challenges/Hurdles

• The Oxford Nanopore data (i.e.

provided by the MinION) presents a higher error rate than other sequencing platforms.

• It is improving, but it is still not at the level of short-read sequencing.

Ashton et al. Nat. Biotechnol. 2015, 33: 296-300; George et al. Microb. Genom. 2017, 3, 1-8.

11/13/18

Real-time sequencing

• We obtained results after 10 minutes.

• However, you need the right equipment, for example the MinIT(Nanopore).

• You also need bioinformatic skills.

NGS for Tuberculosis

• Routine full characterization of Mycobacterium tuberculosis (TB) is

culture-based, taking many weeks.

• Whole-genome sequencing (WGS) can generate antibiotic susceptibility profiles to inform treatment, augmented with strain information for global surveillance.

• Such data could be transformative if provided at or near the point of care.

Votintseva et al. J. Clin. Microbiol. 2017, 55(5):1285–98.

11/13/18

NGS for Tuberculosis

• Applied directly to clinical samples.

• Initial evaluation with Illumina, followed by Nanopore sequencing.

• With Illumina MiSeq/MiniSeq, the workflow from patient sample to results could be completed in 44h/16 h at a reagent cost of £96/£198 per sample.

• For Nanopore, the estimated turnaround time to detection of resistance was 7.5h (full profile 5h later).

• Antibiotic susceptibility predictions were fully concordant.

Votintseva et al. J. Clin. Microbiol. 2017, 55(5):1285–98.

• They designed an adapter of a highly conserved termini of the

influenza A virus genome to target the (-) sense RNA into a protein nanopore on the Oxford Nanopore MinION sequencing platform.

• The researchers used total RNA extracted from the allantoic fluid of influenza rA/Puerto Rico/8/1934 (H1N1) virus infected chicken eggs (EID50 6.8 × 109).

• They demonstrated successful sequencing of the coding complete influenza A virus genome with 100% nucleotide coverage, 99% consensus identity, and 99% of reads mapped to influenza A virus.

| DO): . /s - - -

www.nature.com/scientificreports

Direct RNA Sequencing of the Coding Complete )nluenza A Virus GenomeMatthew W. Keller , Benjamin L. Rambo-Martin , Malania M. Wilson , Callie A. Ridenour ,

Samuel S. Shepard , Thomas J. Stark , Elizabeth B. Neuhaus , Vivien G. Dugan ,

David E. Wentworth & John R. Barnes

For the irst time, a coding complete genome of an RNA virus has been sequenced in its original form. Previously, RNA was sequenced by the chemical degradation of radiolabeled RNA, a diicult method that produced only short sequences. )nstead, RNA has usually been sequenced indirectly by copying it into cDNA, which is often ampliied to dsDNA by PCR and subsequently analyzed using a variety of DNA sequencing methods. We designed an adapter to short highly conserved termini of the inluenza A virus genome to target the - sense RNA into a protein nanopore on the Oxford Nanopore Min)ON sequencing platform. Utilizing this method with total RNA extracted from the allantoic luid of inluenza rA/Puerto Rico/ / ( N virus infected chicken eggs E)D . , we demonstrate successful sequencing of the coding complete inluenza A virus genome with % nucleotide coverage, % consensus identity, and % of reads mapped to inluenza A virus. By utilizing the same methodology one can redesign the adapter in order to expand the targets to include viral mRNA and

sense cRNA, which are essential to the viral life cycle, or other pathogens. This approach also has the potential to identify and quantify splice variants and base modiications, which are not practically measurable with current methods.

)nluenza Division, National Center for )mmunization and Respiratory Diseases NC)RD , Centers for Disease Control and Prevention CDC , Atlanta, Georgia, USA. Matthew W. Keller and Benjamin L. Rambo-Martin contributed equally. Correspondence and requests for materials should be addressed to J.R.B. email: fzq @cdc.gov

Received: 23 April 2018

Accepted: 5 September 2018

Published online: 26 September 2018

OPENCorrection: Author Correction

Keller et al. Sci. Rep. 2018, 8:14408 .

| DOI: . /s - - -

ResultsRNA calibration strand: enolase )) mRNA.

Sequencing RNA from crude versus puriied inluenza rA/Puerto Rico/ /1 (1N1 virus.

11/13/18

Experiment 3

• Comparison between Illumina sequencing and Nanopore sequencing.

• So far we have compared 3 metagenomes using both technologies.

11/13/18

Experiment 3

11/13/18

Click to edit Master title styleAdvantages

• Very portable

• Very fast (real time sequencing)

• Direct RNA sequencing is possible

• Very long reads

• Enables detection of structural variants (e.g. gene duplications)

• High sequencing error

• Chemistry is constantly changing

Disadvantages

Summary

• It is not an easy choice.

• It really depends on what you want to do and where you want to do

• Keep in mind that none of the technologies is perfect and they all

11/13/18

Any questions?

NGS: current and future platforms eLibrary by author © ESCMID

Documents