Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | payton-starkes |
View: | 217 times |
Download: | 0 times |
MCB3895-004 Lecture #15Oct 23/14
De novo assemblies using PacBio
PacBio
• Long read sequencing technology
• High error rate (~13%) threw people at first• What would this be good for?
• Scaffolding an early focus
• Also correct reads using Illumina data• (now obsolete)
HGAP
• "Hierarchical Genome Assembly Process"
1. Preassembly - corrects longest reads by mapping shorter reads to them, quality trims
2. Assembly - OLC approach
3. Polishing - Quiver software derives consensus from mapped reads, uses to correct assembly
Results
• My test gave an impressive 1 contig!• High ~60X coverage, tame dataset
• Known problem: still some SNP errors • Can run Quiver again1. Import assembly as a reference sequence2. Perform reference mapping using same reads vs.
new reference3. Will output a new consensus fasta file
incorporating the variants it finds
PacBio chemistries
• PacBio has continually updated both its polymerases and detection chemistry
• Current test data uses P4-C2 chemistry
• P5-C3 gave slightly better length, maybe a bit more error
• Fastq available for this E.coli: SRR1284073
• Brand new: P6-C4
P6-C4
• As per last week
• 10-15kb read N50
• Slightly better accuracy?
• http://blog.pacificbiosciences.com/2014/10/new-chemistry-boosts-average-read.html
Other options: hybrid assemby
• It is possible to combine multiple data types
• Goal: cover the respective strengths of each• (of course, could confound too!)
• SPAdes is one of the most flexible assemblers in this regard
• Must have some Illumina• Will accept corrected, uncorrected PacBio (and
many more, including Oxford Nanopore)
Assignment #7
• Create 2 E.coli assemblies using PacBio data• Use P4-C2 alone and HGAP• Use Illumina + P5-C3 uncorrected• Use Illumina + P4-C2 uncorrected• Use Illumina + P4-C2 corrected• Multiple quiver steps to correct• Some other option!
• Hand in:• 2 genome assemblies• Lab notebook file detailing exact commands