Date post: | 29-Jun-2015 |
Category: |
Technology |
Upload: | sharmaanimesh |
View: | 1,167 times |
Download: | 0 times |
CQNCerCQNCer
DNA Sequencing and AssemblyDNA Sequencing and Assembly
Animesh SharmaAnimesh Sharma
About• Sequencing? Assembly?
– Break• Primer Walking• ShotGun
– Read– Assemble
• Why (Genome: The Autobiography of a Species in 23 Chapters - Matt Ridley )
• Book of life– Alphabet– Structure– Grammar
Sequencing (Lets Read)Past … present
• 2,3 - dideoxynucleotides triphosphates chain-termination method – Sanger et al.– 4lane ddNTP, PAGE, AR/UV, 1975– A T G C– -
– - => ? ? ? BTW, Protein by Edman, 1949!– -– -
• Dye-primer sequencing – Smith et al. – 5’ end fluorescent labeled primer, 1986
• Dye-terminator sequencing - Takumi et al.– Uniq labelled chain terminator -> Get off Wells!,
1997– 3700 capillary sequencing machine – ABI, 1998
Present• Still the common theme
– Sequencing by synthesis
• Reversible terminator methods– Solexa->Illumina
• Add, block, detect, unblock
– Helicos• Skips amplification
– 454 LS -> Roche• Pyrophosphates, Luciferin
Something different• Sequencing by ligation
– SOLiD, polony method• fixed length oligonucleotide hybridization
• Sequencing by hybridization– Sequencing by ligation in Microarray
• MS based– chain-termination in Mass spec setting
• Ideas for "$10,000 (US) per genome“– Labeled DNA polymerase in nanopores– EM on nucleotide labeled with heavier elements– You got one? Contact
http://en.wikipedia.org/wiki/Archon_X_Prize :)
Assembly● Find ‘Shortest common superstring’
– Alignment, seek overlap (largest), merge, repeat
● PHRAP, CAP3, TIGR assembler– Could work on Sanger long read
with meaningful qual files– For ‘shorter’ reads ...
● SSAKE , VCAKE and SHARCGS – Seed and grow, replace with largest
● VELVET, EULERSR graph base – reads as vertices, edges from overlap region of read, find eulerian path
● MIRA – can utilize 454 trace● NEWBLER – flowgram, instrument
specific error model
Comparing assemblers● Data
– Resequenced C.jejuni strain 81116 with 454 and closure with traditional methods
● (*arnoud@bbsrc , Institute of Food Research, Norwich, United Kingdom)
– One run (10X coverage)● 120389 sequences with 123 mean length
● Assembled using CAP3, MIRA and PHRAP and NEWBLER
● Mapped to reference genome* Journal of Bacteriology, November 2007, p. 8402-8403, Vol. 189, No, 220021-9193/07 doi:10.1128/JB.01404-07
Results
● Newbler
– lower number of contigs
– better meanlength
– higher match percent to reference genome
MIRA PHRAP CAP3 NEWBLERTOTAL_CONTIGS 5081 3440 10173 2061MISSASSEMBLY? 5 2 3 4MEANCONTIG:LENGTH 386.1 545.5 253.5 772.9MATCH_PERCENT 96.73 95.72 96.59 97.92
Contig length histogram
Contigs mapped to reference genome SOURCE BLUE (GENOME) PHRAP RED (264)CAP3 YELLOW (4)MIRA GREEN (47)NEWBLER BLUE-GREEN (275)
THANKS!