Error Correction and Assembly of Oxford Nanopore Sequencing
James Gurtowski
Assembly Complexity
A" R"
B"
C"
A" R" B" R" C" R"
Assembly Complexity
A" R" B" C"
A" R" B" R" C" R"
R" R"
A" R" B" R" C" R"
The advantages of SMRT sequencing Roberts, RJ, Carneiro, MO, Schatz, MC (2013) Genome Biology. 14:405
Oxford Nanopore MinION • Thumb drive sized sequencer
powered over USB
• Senses DNA by measuring changes to ion flow
• Reads both DNA Strands (2D)
Nanopore Basecalling
Basecalling currently performed at Amazon with frequent updates to algorithm
Our Data - Yeast W303 Best: 446Mb
Mean Read Length: ~6kb (10kb shear)
New Flowcells 7k Avg Read Length
0
50
100
150
200
250
300
350
400
450
500
0
2000
4000
6000
8000
10000
12000
14000
16000
Yiel
d (M
egab
ases
)
Read
Len
gth
(bp)
Date
Oxford Flowcell Yields
Flowcell YieldRead Length
R6 R7 R7.3
Nanopore Alignments
Alignment Statistics (BLASTN) Mean alignment length at ~7kbp
Shearing targeted 10kbp 255k reads align (64%)
174x coverage
7kb mean alignment length
68% mean identity
0
2000
4000
6000
8000
10000
12000
14000
40 50 60 70 80 90 100
Freq
uenc
y
Percent Identity
templatecomplement
two-directions
Nanopore Accuracy Alignment Quality (BLASTN) Of reads that align, average ~65% identity “2D base-calling” improves to ~77% identity
65% AVG ID
77% AVG ID
ONT Read Alignments
0 20 40 60 80 100 120 140 160Read Length (kb)
10
20
30
40
50
60
70
80
90
100
Perc
enta
ge o
f Rea
d Al
igne
d
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Perc
enta
ge o
f Rea
ds in
Len
gth
Bin
Full Length Alignments
Partial Alignments
Unaligned
Nanopore Alignment Summary 64% of the data map using BLASTN
Long Read Correction Algorithms PacBioToCA & ECTools
Hybrid Error Correction
Koren, Schatz, et al (2012) Nature Biotechnology. 30:693–700
HGAP & Quiver
LR-only Correction & Polishing
Chin et al (2013) Nature Methods. 10:563–569
PBJelly
Gap Filling and Assembly Upgrade
English et al (2012) PLOS One. 7(11): e47768
< 5x > 50x Long Read Coverage
NanoCorr: Nanopore-Illumina Hybrid Error Correction
1. BLAST Miseq reads to all raw Oxford Nanopore reads
2. Select non-repetitive alignments
○ First pass scans to remove “contained” alignments
○ Second pass uses Dynamic Programming (LIS) to select set of high-identity alignments with minimal overlaps
3. Compute consensus of each Oxford
Nanopore read ○ Currently using Pacbio’s pbdagcon
https://github.com/jgurtowski/nanocorr
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
40 50 60 70 80 90 100
Freq
uenc
y
Percent Identity
UncorrectedCorrected
Mean: 97%
Mean: 68%
Post Correction Identity
ONT vs Illumina Assembly
Oxford N50 : 585kb
Illumina N50 : 58kb
ONT Assembly Completeness
ONT Assembly Completeness
1
10
100
1000
10000
CDS (1282 bp)
gene (1344 bp)
rRNA (1393 bp)
gene cassette (2951 bp)transposable elem
ent (3201 bp)telom
ere (4396 bp)
LTR retrotransposon (5836 bp)
Freq
uenc
y
Genomic Features (Average Length in bp)
S288CNanopore
Miseq
Long Read Assembly S288C Reference sequence • 12.1Mbp; 16 chromo + mitochondria; N50: 924kbp
Oxford Nanopore 28x corrected reads > 7kb NanoCorr + Celera Assembler • 95 non-redundant contigs • N50: 585kbp >99.78% id
Illumina MiSeq 30x, 300bp PE (Flashed) Celera Assembler • 6953 non-redundant contigs • N50: 59kb >99.9% id
Pacific Biosciences 25x corrected reads > 10kb HGAP + Celera Assembler • 21 non-redundant contigs • N50: 811kb >99.8% id
E. Coli K12 Single Contig Assembly with MinION
Sequencing Data From: A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer Joshua Quick, Aaron R Quinlan and Nicholas J Loman
Single Contig Assembly 99.99% Identity (Pilon polishing)
0
5000
10000
15000
20000
25000
30000
35000
40 50 60 70 80 90 100
Num
ber o
f Rea
ds (F
requ
ency
)
Percent Identity
E. coli Error Correction with Nanocorr
UncorrectedCorrected
Nanocor Correction Results 145x Oxford Nanopore X 35x MiSeq
Future of Oxford Nanopore
Acknowledgements
Michael Schatz
Dick McCombie
Sara Goodwin
Schatz Lab
Oxford Nanopore Sequencing and de novo Assembly of a Eukaryotic Genome Sara Goodwin , James Gurtowski , Scott Ethe-Sayers , Panchajanya Deshpande , Michael Schatz , W Richard McCombie doi: http://dx.doi.org/10.1101/013490