+ All Categories

CQNCER

Date post: 29-Jun-2015
Category:
Upload: sharmaanimesh
View: 1,167 times
Download: 0 times
Share this document with a friend
Description:
DNA sequencing and assembly
Popular Tags:
11
CQNCer CQNCer DNA Sequencing and Assembly DNA Sequencing and Assembly Animesh Sharma Animesh Sharma
Transcript
Page 1: CQNCER

CQNCerCQNCer

DNA Sequencing and AssemblyDNA Sequencing and Assembly

Animesh SharmaAnimesh Sharma

Page 2: CQNCER

About• Sequencing? Assembly?

– Break• Primer Walking• ShotGun

– Read– Assemble

• Why (Genome: The Autobiography of a Species in 23 Chapters - Matt Ridley )

• Book of life– Alphabet– Structure– Grammar

Page 3: CQNCER

Sequencing (Lets Read)Past … present

• 2,3 - dideoxynucleotides triphosphates chain-termination method – Sanger et al.– 4lane ddNTP, PAGE, AR/UV, 1975– A T G C– -

– - => ? ? ? BTW, Protein by Edman, 1949!– -– -

• Dye-primer sequencing – Smith et al. – 5’ end fluorescent labeled primer, 1986

• Dye-terminator sequencing - Takumi et al.– Uniq labelled chain terminator -> Get off Wells!,

1997– 3700 capillary sequencing machine – ABI, 1998

Page 4: CQNCER

Present• Still the common theme

– Sequencing by synthesis

• Reversible terminator methods– Solexa->Illumina

• Add, block, detect, unblock

– Helicos• Skips amplification

– 454 LS -> Roche• Pyrophosphates, Luciferin

Page 5: CQNCER

Something different• Sequencing by ligation

– SOLiD, polony method• fixed length oligonucleotide hybridization

• Sequencing by hybridization– Sequencing by ligation in Microarray

• MS based– chain-termination in Mass spec setting

• Ideas for "$10,000 (US) per genome“– Labeled DNA polymerase in nanopores– EM on nucleotide labeled with heavier elements– You got one? Contact

http://en.wikipedia.org/wiki/Archon_X_Prize :)

Page 6: CQNCER

Assembly●  Find ‘Shortest common superstring’

– Alignment, seek overlap (largest), merge, repeat

● PHRAP, CAP3, TIGR assembler– Could work on Sanger long read 

with meaningful qual files– For ‘shorter’ reads ...

● SSAKE , VCAKE and SHARCGS – Seed and grow, replace with largest

● VELVET, EULER­SR ­ graph base – reads as vertices, edges from overlap region of read, find eulerian path

● MIRA – can utilize 454 trace● NEWBLER – flowgram, instrument 

specific error model

Page 7: CQNCER

Comparing assemblers● Data

– Resequenced C.jejuni strain 81116 with 454 and closure with traditional methods 

● (*arnoud@bbsrc , Institute of Food Research, Norwich, United Kingdom) 

– One run (10X coverage)● 120389 sequences with 123 mean length

● Assembled using CAP3, MIRA and PHRAP and NEWBLER

● Mapped to reference genome* Journal of Bacteriology, November 2007, p. 8402-8403, Vol. 189, No, 220021-9193/07 doi:10.1128/JB.01404-07

Page 8: CQNCER

Results

● Newbler

– lower number of contigs

– better mean­length

– higher match percent to reference genome

MIRA PHRAP CAP3 NEWBLERTOTAL_CONTIGS 5081 3440 10173 2061MISSASSEMBLY? 5 2 3 4MEANCONTIG:LENGTH 386.1 545.5 253.5 772.9MATCH_PERCENT 96.73 95.72 96.59 97.92

Page 9: CQNCER

Contig length histogram

Page 10: CQNCER

Contigs mapped to reference genome SOURCE BLUE (GENOME) PHRAP RED (264)CAP3 YELLOW (4)MIRA GREEN (47)NEWBLER BLUE-GREEN (275)

Page 11: CQNCER

THANKS!