1/26/15
1
Genome mapping �
Genome sequencing �
Next Gen sequencing �
Cytogene(c Band 5-‐10 Mb
YACs ~1 Mb
BACs ~150 Kb
STS mapping
fingerprint mapping
Human Genome Gene9c Map
“Sequence-‐ready” BAC map
Genome mapping �
Genome sequencing �
Next Gen sequencing �1977 - 2003 �
1/26/15
3
SCIENCE VOL. 274 25 OCTOBER 1996
11 DECEMBER 1998 VOL 282 SCIENCE
SCIENCE VOL 287 24 MARCH 2000
1/26/15
4
15 February 2001
3 GB “dra_”
sequence
NATURE |VOL 421 | 6 FEBRUARY 2003
3,000 Mbp “finished”
F. Sanger, S. Nicklen, and A. R. Coulson, Proc Natl Acad Sci U S A. 1977; 74: 5463–5467
Detec(on of Fluorescently Tagged DNA
Output to Computer
Op(cal Detec(on System
Eric Green, NHGRI
G
G
A
A
A
T
T
T C
C C C
5’-‐-‐-‐-‐-‐ ACTAGTCCCATGdd3’
5’-‐-‐-‐-‐-‐ ACTAGTCCCATdd3’ 5’ -‐-‐-‐-‐-‐ ACTAGTCCCAdd3’
5’ -‐-‐-‐-‐-‐ ACTAGTCCdd3’ 5’ -‐-‐-‐-‐-‐ ACTAGTCdd3’ 5’ -‐-‐-‐-‐-‐ ACTAGTdd3’ 5’ -‐-‐-‐-‐-‐ ACTAGdd3’
5’ -‐-‐-‐-‐-‐ ACTAdd3’
5’ -‐-‐-‐-‐-‐ ACTdd3’ 5’ -‐-‐-‐-‐-‐ ACdd3’ 5’-‐-‐-‐-‐-‐ Add3’
5’ -‐-‐-‐-‐-‐ ACTAGTCCCdd3’
5’TTACGATGCGGAATGACGAATH!
5’TTACGATH!
5’TTACGATGCGGaaTH!
5’TTACGATGCH!
5’TTACGATGCGGAATGACH!
5’TTACGATGCGGAATGACGAATCH
5’TTACGATGCGH!
5’TTACGATGCGGH
5’TTACGATGH!
5’TTACGATGCGGAATGH!
5’TTACGATGCGGAATGACGH!
5’TTACGATGCGGAAH!
5’TTACGAH!
5’TTACGATGCGGAH
5’TTACGATGCGGAATGAH!
5’TTACGATGCGGAATGACGAH!
5’TTACGATGCGGAATGACGAAH!
1/26/15
5
Fluorescent DNA Sequencing Data
Eric Green, NHGRI hdp://www3.appliedbiosystems.com
hdp://www.cas.vanderbilt.edu/bsci111a/sequence-‐analysis/tab-‐a-‐complete-‐trace.gif
hdp://www.phrap.com/phred/
Ewing B et al. et Green P Genome Res. 1998 8:175-‐85 PMID: 9521921 and 8:186-‐194 PMID: 9521922
quan9fying sequence accuracy
1/26/15
6
hdp://www.cas.vanderbilt.edu/bsci111a/sequence-‐analysis/tab-‐a-‐complete-‐trace.gif
>gnl|ti|2 name:G10P69425RG9.T0!10 15 9 7 7 7 4 4 4 4 9 4 0 4 0 4 4 6 6 6 6 7 7 7 6 6 4 6 6 4 0 4 6 4 4 4 6 4 0 4 6 6 4 4 0 4 6 8 12 12 8 6 4 0 4 8 6 6 6 8 8 7 7 7 9 15 15 25 28 28 33 33 33 34 34 36 36 33 30 30 26 18 18 9 7 7 12 18 18 24 24 23 23 21 21 25 26 26 26 26 26 24 33 34 24 24 24 26 26 25 23 23 20 20 20 20 33 33 40 40 26 26 26 26 30 26 38 38 38 45 45 30 33 30 30 23 23 26 26 26 26 28 45 45 45 45 45 45 41 41 41 45 45 45 37 37 40 37 37 37 37 37 37 45 45 49 49 49 49 42 34 34 34 34 34 34 42 42 42 42 42 37 37 37 40 45 23 25 21 28 28 30 45 49 45 42 40 42 42 42 42 42 42 42 42 42 33 33 33 35 35 35 42 42 42 42 40 33 33 25 22 18 23 21 23 23 42 45 51 51 42 40 42 37 37 41 51 51 51 51 51 51 39 42 30 30 30 33 33 35 40 42 42 39 39 39 39 39 39 39 51 41 43 41 40 40 33 28 28 28 29 28 33 35 35 33 33 39 41 41 45 45 45 45 49 42 42 45 45 40 42 45 45 45 49 51 51 51 51 45 45 42 42 42 37 45 30 30 30 45 45 51 45 45 45 41 41 51 45 39 32 30 30 30 30 34 45 45 45 40 40 40 42 42 42 51 51 45 45 45 41 41 39 51 51 49 49 45 45 22 22 22 36 36 39 42 42 42 42 42 42 51 51 51 51 51 51 51 51 51 51 51 49 42 35 35 35 35 35 35 45 40 40 40 42 42 42 49 45 45 51 51 45 45 49 49 45 45 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 49 49 45 45 39 39 51 51 51 51 45 41 41 41 45 45 45 45 45 51 49 49 45 45 45 45 41 41 45 51 51 51 51 51 51 51 37 33 33 33 33 33 37 45 45 45 43 41 41 40 37 33 33 33 33 33 33 40 40 37 37 37 45 41 45 45 49 49 49 45 49 49 49 45 45 41 41 41 41 45 45 49 49 49 45 45 45 45 42 38 37 37 36 34 45 49 49 49 45 40 40 40 40 40 37 37 37 45 45 45 34 34 34 34 34! F. Sanger, S. Nicklen, and A. R. Coulson, Proc Natl Acad Sci U S A. 1977; 74: 5463–5467
Fred Sanger 2001 Nature Med. 7:267-‐8
“It is a great source of joy to me that the dideoxy method is s9ll the basic technique used. It was perhaps the climax of my career and
makes me feel that all our previous studies on sequences with their successes and failures were not only enjoyable but also a worthwhile contribu(on to the future of medicine.”
Cytogene(c Band 5-‐10 Mb
YACs ~1 Mb
BACs ~150 Kb
ATTCATCGGTGTCGATGCTGATTAACTAGCTAGTTTACCCAA!AGTTTACCCAATACCCAATTCGATCGACCGATTCGAC!
TTCAGCTGGAATCGAATTCATCGGT!
sequence
fragment
“con(gs”
assemble
“shotgun” clones
~2 Kb or ~10 Kb
“finishing”
finished sequence
1/26/15
7
ATTCATCGGTGTCGATGCTGATTAACTAGCTAGTTTACCCAA!AGTTTACCCAATACCCAATTCGATCGACCGATTCGAC!
TTCAGCTGGAATCGAATTCATCGGT!
sequence
fragment
“con(gs”
assemble
“shotgun” clones
~2 Kb or ~10 Kb
“finishing”
finished sequence
Problems with the shotgun approach
T.A. Brown GENOMES 2 BIOS Scien(fic Publishers Ltd, 2002
1/26/15
8
GATC GATC
“scaffold”
10, 50 kb inserts
T.A. Brown GENOMES 2 BIOS Scien(fic Publishers Ltd, 2002
Problems with the shotgun approach
Published by AAAS J. C. Venter et al., Science 291, 1304 -1351 (2001)
whole-human genome shotgun assembly
perfect 2X coverage
random 2X coverage
Waterston RH, Lander ES, Sulston JE (2002) On the sequencing of the human genome. PNAS 99: 3712-‐16; PNAS 100: 3022-‐3
50% of the assembed sequence lies in con9gs of length N50 or greater
Expecta9on for 7X WGS 30 kb HGP 7X WGS mouse assembly ~24 kb
1/26/15
9
Published by AAAS J. C. Venter et al., Science 291, 1304 -1351 (2001)
whole-human genome shotgun assembly
N50 = ~86 Kb (82 Kb HGP)
N50 = ~3.6 MKb (2.3 Mb HGP)
5.1X coverage
2X “shred” of BACs
hybrid WGS and hierarchical sequencing
Green ED (2001) Strategies for the Sequencing of Complex Genomes. Nature Reviews Gene2cs 2: 573 PMID: 11483982
chromosomes
BACs 2, 10, 50 kb fragments
Example • Average coverage (λ) = 5x • Probability a given base is sequenced exactly 10 (k) 9mes is
510e-‐5/10! = 0.018, or ~ 2% of bases will have exactly 10x coverage.
how many sequence reads do we need?
# of (mes a given base is sequenced average sequence coverage
P(k;λ) = (λk e-‐λ ) k!
k = # of events = λ = mean # of events =
• If you sequence at 10x coverage how much of the genome will be sequenced at least 5 9mes?
= 1 – probability base is sequenced < 5 9mes =
1 – [P(0,10) + P(1,10) + P(2,10) + P(3,10) + P(4,10)] = 0.97
Lander & Waterman GENOMICS 2, 231-‐239 (1988)
P(k;λ) = (λk e-‐λ ) k!
1/26/15
10
hdp://www.genome.ou.edu/LanderWatermanTables1_2_3.htm �Large-‐scale genome sequence processing� by M Kasahara & S Morishita
Rela9onship of sequence coverage and con9g length
(Θ = frac9on of clone overlap needed)
Published by AAAS J. C. Venter et al., Science 291, 1304 -1351 (2001)
whole-human genome shotgun assembly Genome mapping �
Genome sequencing �
Next Gen sequencing �