+ All Categories
Home > Documents > Next Generation Sequencing - · PDF filePreprocessing and SNP calling Natasja S. Ehlers, PhD...

Next Generation Sequencing - · PDF filePreprocessing and SNP calling Natasja S. Ehlers, PhD...

Date post: 19-Mar-2018
Category:
Upload: lydang
View: 226 times
Download: 3 times
Share this document with a friend
28
Next Generation Sequencing Simon Rasmussen Associate Professor Center for Biological Sequence analysis Technical University of Denmark 2016
Transcript

Preprocessing and SNP calling Natasja S. Ehlers, PhD student Center for Biological Sequence Analysis Functional Human Variation Group

Next Generation SequencingSimon Rasmussen

Associate ProfessorCenter for Biological Sequence analysis

Technical University of Denmark2016

Second generation sequencing

454 Illumina

SOLiD Ion Torrent (PGM)

106-109

Second generation sequencing

454 Illumina

SOLiD Ion Torrent (PGM)

106-109 90% market share

Library preparation1.Create library molecules

2.Amplification (PCR)

3.Massive parallel sequencing

Library preparation1.Create library molecules

2.Amplification (PCR)

3.Massive parallel sequencing

DNA from extract

Fragment & polish DNA

Adapters

Library molecule

Library preparation1.Create library molecules

2.Amplification (PCR)

3.Massive parallel sequencing

Library

Amplification and immobilizationEmulsion PCR (454, Solid, IonTorrent): Water, oil, beads, one DNA template/droplet

Bridge PCR (Illumina): One DNA template/cluster, primers on surface, grow by bridging primers

Metzker, NatGen Rev. 2010Bridge PCR

Fluorescence detection

Nature Reviews | Genetics

CC C

C

Each cycle, add a differentdye-labelled dNTP

GC

T

A

GG

G

GA

GC T

A

F

C

C

C

C

GCG

GCG

GCG

a Illumina/Solexa — Reversible terminators

Incorporate all four nucleotides, each label with a different dye

Repeat cycles Repeat cycles

TGC

TGC

TGCG CA

TGC

G CATGC

G CATGC

F

F

F F

FFF

F F F F F F

F

F FF

FF

F

F

FF F

FF

FF

F

F

FF

F F F

Cleave dyeand terminatinggroups, wash

Cleave dyeand inhibitinggroups, cap,wash

Wash, four-colour imaging

Wash, one-colour imaging

C

G

A

T

T A G

C T A G

CTAGTG

c Helicos BioSciences — Reversible terminators

b

Incorporate single, dye-labelled nucleotides

C

d

Bottom:Top:

CATCGTTop:Bottom: CCCCCC

CAGCTA

Figure 2 | Four-colour and one-colour cyclic reversible termination methods. a | The four-colour cyclic reversible termination (CRT) method uses Illumina/Solexa’s 3 -O-azidomethyl reversible terminator chemistry23,101 (BOX 1) using solid-phase-amplified template clusters (FIG. 1b, shown as single templates for illustrative purposes). Following imaging, a cleavage step removes the fluorescent dyes and regenerates the 3 -OH group using the reducing agent tris(2-carboxyethyl)phosphine (TCEP)23. b | The four-colour images highlight the sequencing data from two clonally amplified templates. c | Unlike Illumina/Solexa’s terminators, the Helicos Virtual Terminators33 are labelled with the same dye and dispensed individually in a predetermined order, analogous to a single-nucleotide addition method. Following total internal reflection fluorescence imaging, a cleavage step removes the fluorescent dye and inhibitory groups using TCEP to permit the addition of the next Cy5-2 -deoxyribonucleoside triphosphate (dNTP) analogue. The free sulphhydryl groups are then capped with iodoacetamide before the next nucleotide addition33 (step not shown). d | The one-colour images highlight the sequencing data from two single-molecule templates.One-base-encoded probe

An oligonucleotide sequence in which one interrogation base is associated with a particular dye (for example, A in the first position corresponds to a green dye). An example of a one-base degenerate probe set is ‘1-probes’, which indicates that the first nucleotide is the interrogation base. The remaining bases consist of either degenerate (four possible bases) or universal bases.

Caenorhabditis elegans genome. From a single HeliScope run using only 7 of the instrument’s 50 channels, approx-imately 2.8 Gb of high-quality data were generated in 8 days from >25-base consensus reads with 0, 1 or 2 errors. Greater than 99% coverage of the genome was reported, and for regions that showed >5-fold coverage, the consensus accuracy was 99.999% (J. W. Efcavitch, personal communication).

Sequencing by ligation. SBL is another cyclic method that differs from CRT in its use of DNA ligase35 and either one-base-encoded probes or two-base-encoded probes. In its simplest form, a fluorescently labelled probe hybridizes to its complementary sequence adja-cent to the primed template. DNA ligase is then added to join the dye-labelled probe to the primer. Non-ligated probes are washed away, followed by fluorescence

REVIEWS

36 | JANUARY 2010 | VOLUME 11 www.nature.com/reviews/genetics

Illumina - Cyclic reversible termination

Add all dNTPs labelled w. diff dye

Create four-color image

Cleave dye and repeat next cycle

454 - Pyrosequencing

Flow one dNTP across wells

Load template beads into wells

Polymerase incorporates nucleotide

Release of PPi leads to light

Imaging, next dNTP Metzker, NatGen Rev. 2010

2G: Imaging handout

Nature Reviews | Genetics

CC C

C

Each cycle, add a differentdye-labelled dNTP

GC

T

A

GG

G

GA

GC T

A

F

C

C

C

C

GCG

GCG

GCG

a Illumina/Solexa — Reversible terminators

Incorporate all four nucleotides, each label with a different dye

Repeat cycles Repeat cycles

TGC

TGC

TGCG CA

TGC

G CATGC

G CATGC

F

F

F F

FFF

F F F F F F

F

F FF

FF

F

F

FF F

FF

FF

F

F

FF

F F F

Cleave dyeand terminatinggroups, wash

Cleave dyeand inhibitinggroups, cap,wash

Wash, four-colour imaging

Wash, one-colour imaging

C

G

A

T

T A G

C T A G

CTAGTG

c Helicos BioSciences — Reversible terminators

b

Incorporate single, dye-labelled nucleotides

C

d

Bottom:Top:

CATCGTTop:Bottom: CCCCCC

CAGCTA

Figure 2 | Four-colour and one-colour cyclic reversible termination methods. a | The four-colour cyclic reversible termination (CRT) method uses Illumina/Solexa’s 3 -O-azidomethyl reversible terminator chemistry23,101 (BOX 1) using solid-phase-amplified template clusters (FIG. 1b, shown as single templates for illustrative purposes). Following imaging, a cleavage step removes the fluorescent dyes and regenerates the 3 -OH group using the reducing agent tris(2-carboxyethyl)phosphine (TCEP)23. b | The four-colour images highlight the sequencing data from two clonally amplified templates. c | Unlike Illumina/Solexa’s terminators, the Helicos Virtual Terminators33 are labelled with the same dye and dispensed individually in a predetermined order, analogous to a single-nucleotide addition method. Following total internal reflection fluorescence imaging, a cleavage step removes the fluorescent dye and inhibitory groups using TCEP to permit the addition of the next Cy5-2 -deoxyribonucleoside triphosphate (dNTP) analogue. The free sulphhydryl groups are then capped with iodoacetamide before the next nucleotide addition33 (step not shown). d | The one-colour images highlight the sequencing data from two single-molecule templates.One-base-encoded probe

An oligonucleotide sequence in which one interrogation base is associated with a particular dye (for example, A in the first position corresponds to a green dye). An example of a one-base degenerate probe set is ‘1-probes’, which indicates that the first nucleotide is the interrogation base. The remaining bases consist of either degenerate (four possible bases) or universal bases.

Caenorhabditis elegans genome. From a single HeliScope run using only 7 of the instrument’s 50 channels, approx-imately 2.8 Gb of high-quality data were generated in 8 days from >25-base consensus reads with 0, 1 or 2 errors. Greater than 99% coverage of the genome was reported, and for regions that showed >5-fold coverage, the consensus accuracy was 99.999% (J. W. Efcavitch, personal communication).

Sequencing by ligation. SBL is another cyclic method that differs from CRT in its use of DNA ligase35 and either one-base-encoded probes or two-base-encoded probes. In its simplest form, a fluorescently labelled probe hybridizes to its complementary sequence adja-cent to the primed template. DNA ligase is then added to join the dye-labelled probe to the primer. Non-ligated probes are washed away, followed by fluorescence

REVIEWS

36 | JANUARY 2010 | VOLUME 11 www.nature.com/reviews/genetics

Illumina 1:________

Illumina 2:________

454: _______________________________________________

Metzker, NatGen Rev. 2010

2G: Imaging handout - answers

Nature Reviews | Genetics

CC C

C

Each cycle, add a differentdye-labelled dNTP

GC

T

A

GG

G

GA

GC T

A

F

C

C

C

C

GCG

GCG

GCG

a Illumina/Solexa — Reversible terminators

Incorporate all four nucleotides, each label with a different dye

Repeat cycles Repeat cycles

TGC

TGC

TGCG CA

TGC

G CATGC

G CATGC

F

F

F F

FFF

F F F F F F

F

F FF

FF

F

F

FF F

FF

FF

F

F

FF

F F F

Cleave dyeand terminatinggroups, wash

Cleave dyeand inhibitinggroups, cap,wash

Wash, four-colour imaging

Wash, one-colour imaging

C

G

A

T

T A G

C T A G

CTAGTG

c Helicos BioSciences — Reversible terminators

b

Incorporate single, dye-labelled nucleotides

C

d

Bottom:Top:

CATCGTTop:Bottom: CCCCCC

CAGCTA

Figure 2 | Four-colour and one-colour cyclic reversible termination methods. a | The four-colour cyclic reversible termination (CRT) method uses Illumina/Solexa’s 3 -O-azidomethyl reversible terminator chemistry23,101 (BOX 1) using solid-phase-amplified template clusters (FIG. 1b, shown as single templates for illustrative purposes). Following imaging, a cleavage step removes the fluorescent dyes and regenerates the 3 -OH group using the reducing agent tris(2-carboxyethyl)phosphine (TCEP)23. b | The four-colour images highlight the sequencing data from two clonally amplified templates. c | Unlike Illumina/Solexa’s terminators, the Helicos Virtual Terminators33 are labelled with the same dye and dispensed individually in a predetermined order, analogous to a single-nucleotide addition method. Following total internal reflection fluorescence imaging, a cleavage step removes the fluorescent dye and inhibitory groups using TCEP to permit the addition of the next Cy5-2 -deoxyribonucleoside triphosphate (dNTP) analogue. The free sulphhydryl groups are then capped with iodoacetamide before the next nucleotide addition33 (step not shown). d | The one-colour images highlight the sequencing data from two single-molecule templates.One-base-encoded probe

An oligonucleotide sequence in which one interrogation base is associated with a particular dye (for example, A in the first position corresponds to a green dye). An example of a one-base degenerate probe set is ‘1-probes’, which indicates that the first nucleotide is the interrogation base. The remaining bases consist of either degenerate (four possible bases) or universal bases.

Caenorhabditis elegans genome. From a single HeliScope run using only 7 of the instrument’s 50 channels, approx-imately 2.8 Gb of high-quality data were generated in 8 days from >25-base consensus reads with 0, 1 or 2 errors. Greater than 99% coverage of the genome was reported, and for regions that showed >5-fold coverage, the consensus accuracy was 99.999% (J. W. Efcavitch, personal communication).

Sequencing by ligation. SBL is another cyclic method that differs from CRT in its use of DNA ligase35 and either one-base-encoded probes or two-base-encoded probes. In its simplest form, a fluorescently labelled probe hybridizes to its complementary sequence adja-cent to the primed template. DNA ligase is then added to join the dye-labelled probe to the primer. Non-ligated probes are washed away, followed by fluorescence

REVIEWS

36 | JANUARY 2010 | VOLUME 11 www.nature.com/reviews/genetics

Metzker, NatGen Rev. 2010

2G: Imaging handout - answers

Nature Reviews | Genetics

CC C

C

Each cycle, add a differentdye-labelled dNTP

GC

T

A

GG

G

GA

GC T

A

F

C

C

C

C

GCG

GCG

GCG

a Illumina/Solexa — Reversible terminators

Incorporate all four nucleotides, each label with a different dye

Repeat cycles Repeat cycles

TGC

TGC

TGCG CA

TGC

G CATGC

G CATGC

F

F

F F

FFF

F F F F F F

F

F FF

FF

F

F

FF F

FF

FF

F

F

FF

F F F

Cleave dyeand terminatinggroups, wash

Cleave dyeand inhibitinggroups, cap,wash

Wash, four-colour imaging

Wash, one-colour imaging

C

G

A

T

T A G

C T A G

CTAGTG

c Helicos BioSciences — Reversible terminators

b

Incorporate single, dye-labelled nucleotides

C

d

Bottom:Top:

CATCGTTop:Bottom: CCCCCC

CAGCTA

Figure 2 | Four-colour and one-colour cyclic reversible termination methods. a | The four-colour cyclic reversible termination (CRT) method uses Illumina/Solexa’s 3 -O-azidomethyl reversible terminator chemistry23,101 (BOX 1) using solid-phase-amplified template clusters (FIG. 1b, shown as single templates for illustrative purposes). Following imaging, a cleavage step removes the fluorescent dyes and regenerates the 3 -OH group using the reducing agent tris(2-carboxyethyl)phosphine (TCEP)23. b | The four-colour images highlight the sequencing data from two clonally amplified templates. c | Unlike Illumina/Solexa’s terminators, the Helicos Virtual Terminators33 are labelled with the same dye and dispensed individually in a predetermined order, analogous to a single-nucleotide addition method. Following total internal reflection fluorescence imaging, a cleavage step removes the fluorescent dye and inhibitory groups using TCEP to permit the addition of the next Cy5-2 -deoxyribonucleoside triphosphate (dNTP) analogue. The free sulphhydryl groups are then capped with iodoacetamide before the next nucleotide addition33 (step not shown). d | The one-colour images highlight the sequencing data from two single-molecule templates.One-base-encoded probe

An oligonucleotide sequence in which one interrogation base is associated with a particular dye (for example, A in the first position corresponds to a green dye). An example of a one-base degenerate probe set is ‘1-probes’, which indicates that the first nucleotide is the interrogation base. The remaining bases consist of either degenerate (four possible bases) or universal bases.

Caenorhabditis elegans genome. From a single HeliScope run using only 7 of the instrument’s 50 channels, approx-imately 2.8 Gb of high-quality data were generated in 8 days from >25-base consensus reads with 0, 1 or 2 errors. Greater than 99% coverage of the genome was reported, and for regions that showed >5-fold coverage, the consensus accuracy was 99.999% (J. W. Efcavitch, personal communication).

Sequencing by ligation. SBL is another cyclic method that differs from CRT in its use of DNA ligase35 and either one-base-encoded probes or two-base-encoded probes. In its simplest form, a fluorescently labelled probe hybridizes to its complementary sequence adja-cent to the primed template. DNA ligase is then added to join the dye-labelled probe to the primer. Non-ligated probes are washed away, followed by fluorescence

REVIEWS

36 | JANUARY 2010 | VOLUME 11 www.nature.com/reviews/genetics

Quality of base call deteriorates after many

cycles

Metzker, NatGen Rev. 2010

2G: Imaging handout - answers

Nature Reviews | Genetics

CC C

C

Each cycle, add a differentdye-labelled dNTP

GC

T

A

GG

G

GA

GC T

A

F

C

C

C

C

GCG

GCG

GCG

a Illumina/Solexa — Reversible terminators

Incorporate all four nucleotides, each label with a different dye

Repeat cycles Repeat cycles

TGC

TGC

TGCG CA

TGC

G CATGC

G CATGC

F

F

F F

FFF

F F F F F F

F

F FF

FF

F

F

FF F

FF

FF

F

F

FF

F F F

Cleave dyeand terminatinggroups, wash

Cleave dyeand inhibitinggroups, cap,wash

Wash, four-colour imaging

Wash, one-colour imaging

C

G

A

T

T A G

C T A G

CTAGTG

c Helicos BioSciences — Reversible terminators

b

Incorporate single, dye-labelled nucleotides

C

d

Bottom:Top:

CATCGTTop:Bottom: CCCCCC

CAGCTA

Figure 2 | Four-colour and one-colour cyclic reversible termination methods. a | The four-colour cyclic reversible termination (CRT) method uses Illumina/Solexa’s 3 -O-azidomethyl reversible terminator chemistry23,101 (BOX 1) using solid-phase-amplified template clusters (FIG. 1b, shown as single templates for illustrative purposes). Following imaging, a cleavage step removes the fluorescent dyes and regenerates the 3 -OH group using the reducing agent tris(2-carboxyethyl)phosphine (TCEP)23. b | The four-colour images highlight the sequencing data from two clonally amplified templates. c | Unlike Illumina/Solexa’s terminators, the Helicos Virtual Terminators33 are labelled with the same dye and dispensed individually in a predetermined order, analogous to a single-nucleotide addition method. Following total internal reflection fluorescence imaging, a cleavage step removes the fluorescent dye and inhibitory groups using TCEP to permit the addition of the next Cy5-2 -deoxyribonucleoside triphosphate (dNTP) analogue. The free sulphhydryl groups are then capped with iodoacetamide before the next nucleotide addition33 (step not shown). d | The one-colour images highlight the sequencing data from two single-molecule templates.One-base-encoded probe

An oligonucleotide sequence in which one interrogation base is associated with a particular dye (for example, A in the first position corresponds to a green dye). An example of a one-base degenerate probe set is ‘1-probes’, which indicates that the first nucleotide is the interrogation base. The remaining bases consist of either degenerate (four possible bases) or universal bases.

Caenorhabditis elegans genome. From a single HeliScope run using only 7 of the instrument’s 50 channels, approx-imately 2.8 Gb of high-quality data were generated in 8 days from >25-base consensus reads with 0, 1 or 2 errors. Greater than 99% coverage of the genome was reported, and for regions that showed >5-fold coverage, the consensus accuracy was 99.999% (J. W. Efcavitch, personal communication).

Sequencing by ligation. SBL is another cyclic method that differs from CRT in its use of DNA ligase35 and either one-base-encoded probes or two-base-encoded probes. In its simplest form, a fluorescently labelled probe hybridizes to its complementary sequence adja-cent to the primed template. DNA ligase is then added to join the dye-labelled probe to the primer. Non-ligated probes are washed away, followed by fluorescence

REVIEWS

36 | JANUARY 2010 | VOLUME 11 www.nature.com/reviews/genetics

Quality of base call deteriorates after many

cycles

Homopolymer runs are problematic, gives rise to

indels

Metzker, NatGen Rev. 2010

Illumina: Quality deterioration

Can you think of why?

Illumina: Quality deterioration

Can you think of why?

Efficiency of incorporation:Polymerase incorporation of base

Enzyme that cleaves the dye

NextSeq/HiSeq3000/4000

• Chemistry is not based 4 dyes (as before) but 2 dyes

• T (red), C (green), A (both) and G (none = “dark”)

• Faster processing rate and cheaper reagents

• Slightly increases error rate

• Problem with G stretches because G is not dyed

Ion TorrentSimilar principle to 454Library: Emulsion PCRBased on semiconductorsDetection is based on H ions (pH) changes

Ion Torrent

Solid

Solid example

@SRR349943.1 solid0420_20100825_FRAG/1T30212302300330212112223121002232332112002222302010+!=:369A?:.<9=.-5=%3-:6%3&<2%(169%,0.3%&'&(&.'%%%&&,

Double-base encodingColorspace

Low error rateErrors propagates

AAGT...

Complete GenomicsssDNA -> DNA nanoballs Use silicon chips with sticky spots

Place DNBs into each spot Sequence using ligase and flourescent labeled probes

Complete GenomicsssDNA -> DNA nanoballs Use silicon chips with sticky spots

Place DNBs into each spot Sequence using ligase and flourescent labeled probes

You cant buy the machine - Acquired by BGI - delayed - Only humans!

3rd generation

Helicos Pacific Biosciences Oxford Nanopore

No amplification (PCR introduces bias!)Simple sample preparation

3rd generation

Helicos Pacific Biosciences Oxford Nanopore

No amplification (PCR introduces bias!)Simple sample preparation

Pacific Biosciences

Slowed down DNA polymerase, measure light emission,Long reads > 10kb, high error rate (but random)

Pacific Biosciences

Oxford NanoporeNano-scale pores, with current acrossDrag DNA stretch through the poreMeasure change in current (pentamers)Long reads (up to 200k), currently ~5k, some systematic errors

Oxford Nanopore

Synthetic Long Reads (2nd gen)

• Illumina Synthetic long-read sequencing (Moleculo)

• 10X Genomics

• Based on Illumina sequencing

• Using barcode system to create linked reads / read clouds

ZMW wells

Sites where

sequencing

takes place

Labelled nucleotidesAll four dNTPs are

labelled and available

for incorporation

odified pol merase

As a nucleotide is

incorporated by the

polymerase, a camera

records the emitted light

lpha-hemol sinA large biological pore

capable of sensing DNA

rrentPasses through the pore

and is modulated as

DNA passes through

Leader airpin template

The leader sequence interacts

with the pore and a motor

protein to direct DNA,

a hairpin allows for

bidirectional sequencing

S RTbell templateTwo hairpin adapters

allow continuous

circular sequencing

Me

an

Sig

na

l

(pA

)

Aa Pacific Biosciences Ab O ord anopore Technologies

10 2 3 4

Time (seconds)

PacBio o tp tA camera records the changing

colours from all ZMWs; each

colour change corresponds to

one base

O T o tp t (s iggles)ach current shi as DNA

translocates through the

pore corresponds to a

particular k-mer

+

at re Re iews | Genetics

n matic clea age DNA is barcoded and

fragmented to ~350 bp

BarcodesDNA from the same well shares the same barcode

D ragmentDNA is fragmented and

selected to ~10 kb

Pooling DNA from

each well is

pooled and

undergoes

a standard

library

preparation

Se encingDNA is sequenced on

a standard short-read

sequencer

m lsion P R Arbitrarily long DNA

is mixed with beads

loaded with

barcoded primers,

enzyme and dNTPs

Lin ed reads• All reads from the same GEM derive from the long fragment, thus

they are linked

• Reads are dispersed across the long fragment and no GEM achieves

full coverage of a fragment

• Stacking of linked reads from the same loci achieves continuous

coverage

Bb enomics Ba ll mina

GEMsEach micelle

has 1 barcode

out of 750,000

mplification Long fragments are

amplified such that the

product is a barcoded

fragment ~350 bp

Pooling The emulsion is

broken and DNA is

pooled, then it

undergoes a standard

library preparation

~3,000

molecules

per well

A1 A2

otorprotein

A Real-time long-read se encing

B S nthetic long-read se encing

REV IEWS

344 | JUNE 2016 | VOLUME 17 www.nature.com/nrg

Company/technology Current machine, key characterstics

454 GS FLX+, 7-800bp length, 1M reads

Illumina HiSeq400/HiSeqX Ten, 100-150bp length, up to 2-4B reads

Solid (Life) 5500XL, 50/75bp length, 1.5B reads

PGM (Life) Ion Proton, 200/400bp length, 80M reads

Complete Genomics (BGISEQ) BGISEQ-500, 50-100 bp, 200Gb in total

PacBio Sequel, 8-12kb length, 350k reads

Oxford Nanopore GridION, up to 200kb, >100k reads

Illumina synthetic up to 100kb synthetic length, 1000$ pr. Gb

10X Genomics up to 100kb synthetic length, +500$ per sample

Machine overview - I

Excellent overview at Goodwin et al., Nature Reviews (2016)

Machine overview - II

Company/technology Current machine, key characterstics

454 GS Junior, 4-500bp length, 100k reads

Illumina MiSeq, 300bp length, 50M reads

PGM (Life) Ion Torrent, 400bp length, 5M reads

Oxford Nanopore MinION, up to 200kb, 100k reads

Benchtop machines

Excellent overview at Goodwin et al., Nature Reviews (2016)

More NGS material

• Elaine Mardis on NGS technology

• Youtube has many more …

• Excellent review on NGS technologies: Goodwin et al., Nature Reviews (2016)

• On Campusnet together with many other papers!


Recommended