Error Correction and Assembly of Oxford Nanopore...

Post on 20-Aug-2018

229 views 0 download

transcript

Error Correction and Assembly of Oxford Nanopore Sequencing

James Gurtowski

Assembly Complexity

A" R"

B"

C"

A" R" B" R" C" R"

Assembly Complexity

A" R" B" C"

A" R" B" R" C" R"

R" R"

A" R" B" R" C" R"

The advantages of SMRT sequencing Roberts, RJ, Carneiro, MO, Schatz, MC (2013) Genome Biology. 14:405

Oxford Nanopore MinION •  Thumb drive sized sequencer

powered over USB

•  Senses DNA by measuring changes to ion flow

•  Reads both DNA Strands (2D)

Nanopore Basecalling

Basecalling currently performed at Amazon with frequent updates to algorithm

Our Data - Yeast W303 Best: 446Mb

Mean Read Length: ~6kb (10kb shear)

New Flowcells 7k Avg Read Length

0

50

100

150

200

250

300

350

400

450

500

0

2000

4000

6000

8000

10000

12000

14000

16000

Yiel

d (M

egab

ases

)

Read

Len

gth

(bp)

Date

Oxford Flowcell Yields

Flowcell YieldRead Length

R6 R7 R7.3

Nanopore Alignments

Alignment Statistics (BLASTN) Mean alignment length at ~7kbp

Shearing targeted 10kbp 255k reads align (64%)

174x coverage

7kb mean alignment length

68% mean identity

0

2000

4000

6000

8000

10000

12000

14000

40 50 60 70 80 90 100

Freq

uenc

y

Percent Identity

templatecomplement

two-directions

Nanopore Accuracy Alignment Quality (BLASTN) Of reads that align, average ~65% identity “2D base-calling” improves to ~77% identity

65% AVG ID

77% AVG ID

ONT Read Alignments

0 20 40 60 80 100 120 140 160Read Length (kb)

10

20

30

40

50

60

70

80

90

100

Perc

enta

ge o

f Rea

d Al

igne

d

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Perc

enta

ge o

f Rea

ds in

Len

gth

Bin

Full Length Alignments

Partial Alignments

Unaligned

Nanopore Alignment Summary 64% of the data map using BLASTN

Long Read Correction Algorithms PacBioToCA & ECTools

Hybrid Error Correction

Koren, Schatz, et al (2012) Nature Biotechnology. 30:693–700

HGAP & Quiver

LR-only Correction & Polishing

Chin et al (2013) Nature Methods. 10:563–569

PBJelly

Gap Filling and Assembly Upgrade

English et al (2012) PLOS One. 7(11): e47768

< 5x > 50x Long Read Coverage

NanoCorr: Nanopore-Illumina Hybrid Error Correction

1.  BLAST Miseq reads to all raw Oxford Nanopore reads

2.  Select non-repetitive alignments

○  First pass scans to remove “contained” alignments

○  Second pass uses Dynamic Programming (LIS) to select set of high-identity alignments with minimal overlaps

3.  Compute consensus of each Oxford

Nanopore read ○  Currently using Pacbio’s pbdagcon

https://github.com/jgurtowski/nanocorr

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

40 50 60 70 80 90 100

Freq

uenc

y

Percent Identity

UncorrectedCorrected

Mean: 97%

Mean: 68%

Post Correction Identity

ONT vs Illumina Assembly

Oxford N50 : 585kb

Illumina N50 : 58kb

ONT Assembly Completeness

ONT Assembly Completeness

1

10

100

1000

10000

CDS (1282 bp)

gene (1344 bp)

rRNA (1393 bp)

gene cassette (2951 bp)transposable elem

ent (3201 bp)telom

ere (4396 bp)

LTR retrotransposon (5836 bp)

Freq

uenc

y

Genomic Features (Average Length in bp)

S288CNanopore

Miseq

Long Read Assembly S288C Reference sequence •  12.1Mbp; 16 chromo + mitochondria; N50: 924kbp

Oxford Nanopore 28x corrected reads > 7kb NanoCorr + Celera Assembler •  95 non-redundant contigs •  N50: 585kbp >99.78% id

Illumina MiSeq 30x, 300bp PE (Flashed) Celera Assembler •  6953 non-redundant contigs •  N50: 59kb >99.9% id

Pacific Biosciences 25x corrected reads > 10kb HGAP + Celera Assembler •  21 non-redundant contigs •  N50: 811kb >99.8% id

E. Coli K12 Single Contig Assembly with MinION

Sequencing Data From: A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer Joshua Quick, Aaron R Quinlan and Nicholas J Loman

Single Contig Assembly 99.99% Identity (Pilon polishing)

0

5000

10000

15000

20000

25000

30000

35000

40 50 60 70 80 90 100

Num

ber o

f Rea

ds (F

requ

ency

)

Percent Identity

E. coli Error Correction with Nanocorr

UncorrectedCorrected

Nanocor Correction Results 145x Oxford Nanopore X 35x MiSeq

Future of Oxford Nanopore

Acknowledgements

Michael Schatz

Dick McCombie

Sara Goodwin

Schatz Lab

Oxford Nanopore Sequencing and de novo Assembly of a Eukaryotic Genome Sara Goodwin , James Gurtowski , Scott Ethe-Sayers , Panchajanya Deshpande , Michael Schatz , W Richard McCombie doi: http://dx.doi.org/10.1101/013490