Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | lucinda-brittany-foster |
View: | 219 times |
Download: | 3 times |
1
PCR AND DNA SEQUENCING
MBG-487
Işık G. Yuluğ
2
Polymerase Chain Reaction (PCR)
DNA melting
Primer annealing
DNA elongation
Nobel Prize in Chemistry 1993, at age 48
Kary Mullis (invented PCR in 1983)
3
Exponential nature of PCR amplification
4
PCREvery cycle results in a doubling of the number of strands DNA present
After the first few cycles, most of the product DNA strands made are the same length as the distance between the primers
The result is a dramatic amplification of a the DNA that exists between the primers. The amount of amplification is 2 raised to the n power; n represents the number of cycles that are performed. After 20 cycles, this would give approximately 1 million fold amplification. After 40 cycles the amplification would be 1 x 1012
5
6
Try for equal Tm for both primers
7
Avoid primer dimer formation
Marginally problematic primer
8
Use Software to avoid of such problems
9
Typical PCR gel (Every PCR should by gel-verifyed)
10
Optimizing PCR protocols
While PCR is a very powerful technique, often enough it is not possible to achieve optimum results without optimizing the protocol
Critical PCR parameters:
- Concentration of DNA template, nucleotides, divalent cations (especially Mg2+) and polymerase
- Error rate of the polymerase (Taq, Vent exo, Pfu)
- Primer design
PCR can be very tricky
11
Primer design
General notes on primer design in PCR
Perhaps the most critical parameter for successful PCR is the design of primers
Primer selection
Critical variables are:
- primer length
- melting temperature (Tm)
- specificity
- complementary primer sequences
- G/C content
- 3’-end sequence
Primer length
- specificity and the temperature of annealing are at least partly dependent on primer length
- oligonucleotides between 20 and 30 (50) bases are highly sequence specific
- primer length is proportional to annealing efficiency: in general, the longer the primer, the more inefficient the annealing
- the primers should not be too short as specificity decreases
12
Primer design
Specificity
Primer specificity is at least partly dependent on primer length: there are many more unique 24 base oligos than there are 15 base pair oligos
Probability that a sequence of length n will occur randomly in a sequence of length m is:
Example: the mtDNA genome has about 20,000 bases, the probability of randomly finding sequences of length n is:
n Pn
5 19.52
10 1.91 x 10-2
15 1.86 x 10-5
P = (m – n +1) x (¼)n
13
Primer design
Complementary primer sequences
- primers need to be designed with absolutely no intra-primer homology beyond 3 base pairs. If a primer has such a region of self-homology, “snap back” can occur
- another related danger is inter-primer homology: partial homology in the middle regions of two primers can interfere with hybridization. If the homology should occur at the 3' end of either primer, primer dimer formation will occur
G/C content
- ideally a primer should have a near random mix of nucleotides, a 50% GC content
- there should be no PolyG or PolyC stretches that can promote non-specific annealing
3 ’-end sequence
- the 3' terminal position in PCR primers is essential for the control of mis-priming
- inclusion of a G or C residue at the 3' end of primers helps to ensure correct binding (stronger hydrogen bonding of G/C residues)
14
Primer design
Melting temperature (Tm)
- the relationship between annealing temperature and melting temperature is one of the “Black Boxes” of PCR- a general rule-of-thumb is to use an annealing temperature that is 5°C lower than the melting temperature
- the goal should be to design a primer with an annealing temperature of at least 50°C
- the melting temperatures of oligos are most accurately calculated using nearest
neighbor thermodynamic calculations with the formula:
Tm = H [S+ R ln (c/4)] –273.15 °C + 16.6 log 10 [K+]
(H is the enthalpy, S is the entropy for helix formation, R is the molar gas
constant and c is the concentration of primer) - a good working approximation of this value can be calculated using the Wallace formula:
Tm = 4x (#C+#G) + 2x (#A+#T) °C
- both of the primers should be designed such that they have similar melting temperatures.
If primers are mismatched in terms of Tm, amplification will be less efficient or may not
work: the primer with the higher Tm will mis-prime at lower temperatures; the primer with
the lower Tm may not work at higher temperatures.
15
Fidelity of PCR is often an issue
16
Proof-reading activity enzymes
17
18
If complete copies is amplified
19
20
21
LIST OF PCR REACTION CONDITIONS THAT MUST BE OPTIMIZED
PRIMER ANNEALING TEMPERATURE:
Increase in temperature: Increases specificity of primer annealing by destabilizing base pair mismatches.
Decrease in temperature: Increases the sensitivity (and yield) of the reaction by stabilizing base pairing.
22
LIST OF PCR REACTION CONDITIONS THAT MUST BE OPTIMIZED
DNA POLYMERASE:
Enzyme concentration: Enzyme concentrations affect the sensitivity and specificity; too little enzyme produces insufficient product and too much enzyme decreases specificity.
Type of DNA polymerase: Taq enzyme is the most efficient enzyme but it has also the highest error rate; in contrast, pfu has a decreased error rate but synthesizes the least amount of product.
23
LIST OF PCR REACTION CONDITIONS THAT MUST BE OPTIMIZED
MAGNESIUM CONCENTRATION:
Varying the (MgCl2): Low MgCl2 increases
specificity, high MgCl2 stabilizes primer
annealing and increases sensitivity, but can also
decrease primer specificity.
24
LIST OF PCR REACTION CONDITIONS THAT MUST BE OPTIMIZED
CYCLE PARAMETERS:
Denaturation temperature: Elevated denaturation temperature can increase sensitivity by allowing complete template denaturation, especially of G+C rich targets; however, Taq polymerase activity decreases rapidly above 93oC.
Duration time of primer extension: Longer primer extension times increase sensitivity in long distance PCR.
Cycle number: Assay sensitivity is determined by both the efficiency of the enzyme reaction and the initial number of DNA target molecules; it should be necessary to increase sycle number beyond 35 if the reaction contains <103 initial target molecules.
25
26
27
Non-specific PCR and how to improve it
Just PCR
5%DMSO
DMSO+ GLY
MARKER
Increase in Mg concentraton
28
PCR enzymes
Taq DNA polymerase, the first enzyme used for PCR, is still the most popular.
-- high processivity and is the least expensive choice
-- generates PCR products with single A overhangs on the 3´-ends
(Suitable for TOPO-cloning)
“Topo” cloning system (Invitrogen)
Halflife at 95C is 1.6 hours
29
The technology behind TOPO Cloning
• The key to TOPO Cloning is the enzyme, DNA topoisomerase I,
which functions both as a restriction enzyme and as a ligase.
• Its biological role is to cleave and rejoin DNA during replication.
• Vaccinia virus topoisomerase I specifically recognizes the
pentameric
sequence 5’-(C/T)CCTT-3’ and forms a covalent bond with the
phosphate group of the 3’ thymidine.
• It cleaves one DNA strand, enabling the DNA to unwind.
• The enzyme then religates the ends of the cleaved strand and
releases itself from the DNA.
• To harness the religating activity of topoisomerase, TOPO vectors
are
provided linearized with topoisomerase I covalently bound to
each 3’ phosphate.
• This enables the vectors to readily ligate DNA sequences with
compatible Ends.
• In only 5 minutes at room temperature, the ligation is complete
and ready for transformation into E. coli.
30From Invitrogen
31
Tth polymerase
Thermus thermophilus strain HB8.
RNA-dependent DNA-polymerase activity in the presence of Mn2+ ions.
DNA-dependent DNA-polymerase activity in the presence of Mg2+ ions.
The fragment should be ideally smaller 1 kb.
Mn 2+ Mg 2+
32
Pfu polymerase
Proofreading or high fidelity DNA polymerases
(from Pyrococcus furiosus). approx.1 / 2, 000,000 nucleotides before making an error.
In comparison Taq DNA polymerase makes an error in approx. every 1/ 10,000 nucleotides.
can tolerate temperatures exceeding 95°C, enabling it to PCR amplify GC-rich targets.
more expensive
33
Vent (From Thermococcus litoralis)
also known as Tli polymerase
Very termostable: Halflife at 95oC is approximately 7 hours
Vent error rate is intermediate between Taq and Pfu.
2-5 x 10-5 errors/bp
3'->5' exonuclease activity presents
Other polymerases:
Deep Vent (Pyrococcus species GB-D) (New England Biolabs)New England Biolabs claims fidelity is equal to or greater than that of Vent.
Replinase (Thermus flavis) 1.03 x 10-4 errors/base
34
Long-Range PCRUse of two polymerases:
a non-proofreading polymerase Taqis the main polymerase in the reaction,
a proofreading polymerase (3' to 5' exo) Pwo is present at a lower concentration.
22-24 kb PCR products achieved on Qiagen and Eppendorf PCR mixes
Taq+ Pwo (Pyrococcus woesei) ;
Pwo is very stable, 2 hrs at 100 C
35
DNA SEQUENCING
36
DNA sequencing: Importance
• Basic blueprint for life• Gene and protein
– Function– Structure– Evolution
• Genome-based diseases- “inborn errors of metabolism”– Genetic disorders – Genetic predispositions to infection– Diagnostics– Therapies
37
DNA sequencing methodologies: 1977!
• Maxam-Gilbert – base modification
by general and specific chemicals.
– depurination or depyrimidination.
– single-strand excision.
– not amenable to automation
• Sanger– DNA replication.– substitution of
substrate with chain-terminator chemical.
– more efficient
– Automation *
38
Maxam-Gilbert ‘chemical’ method
39
“bio” based methods
• Sanger
• dideoxynucleotides
40
DNA chemistry
41
DNA biochemistry: replication fork
42
SEQUENCING: (Sanger method)
Sanger method:
Frederick Sanger (Nobel prize 1980 with Paul Berg and Walter Gilbert)
43
DNA replication: biochemistry
OC N
purineor
pyrimidine
P O
O
OH
P O
O
OH
P O
O
OH
HO
P O
O
OH
O OC N
purineor
pyrimidine
OH
5’
3’
44
Dideoxynucleotide blocks chain elongation
45
DNA sequencing: Sanger-II
OC N
purineor
pyrimidine
P O
O
OH
P O
O
OH
P O
O
OH
HO
P O
O
OH
O OC N
purineor
pyrimidine
H
chainterminationmethod
46
Sanger method
47
Methods of sequence visualization:
1. Labeled primer
2. Labelled DNA chain (randomly)
3. Labeled terminators
48
Labelled nucleotide (radioactively)
50
Applied Biosystems Inc., have designed an automated method
that combines the PCR and actual sequencing
<http://www.utmb.edu/proch/servo3.htm>
51
DNA sequencing: chemistry
* * * * * * ***
* * * **
52
DNA sequencing: in practice
template + polymerase +
1dCTPdTTPdGTPdATP
ddATPprimer
2dCTPdTTPdGTPdATP
ddGTPprimer
3dCTPdTTPdGTPdATPddTTPprimer
4dCTPdTTPdGTPdATPddCTPprimer
extension
electrophoresis
A•TG•CA•TT•AC•GT•AG•CG•CA•TG•CT•AT•AC•GT•AG•CA•T
53
DNA sequencing: upgrade, second iteration, terminator-
label• Disadvantages of primer-labels:
– four reactions– tedious– limited to certain regions, custom
oligos or– limited to cloned inserts behind
‘universal’ priming sites.• Advantages:• Solution:
– fluorescent dye terminators
54
DNA sequencing: chemistry
template + polymerase +
dCTPdTTPdGTPdATP
ddATPddGTPddTTPddCTP
extension
electrophoresis
A•TG•CA•TT•AC•GT•AG•CG•CA•TG•CT•AT•AC•GT•AG•CA•T
55
DNA sequencing: photochemistry
56
DNA sequencing: Computation
57
DNA sequencing: Computation
58
Nucleotides for Sequencing
• Standard nucleotides (A,T,C, G)
• Modified versions of these nucleotides – Labeled so they fluoresce
– Structurally different so that they stop DNA synthesis when they are added to a strand
59
Reaction Mixture
• Copies of DNA to be sequenced
• Primer
• DNA polymerase
• Standard nucleotides
• Modified nucleotides
60
Reactions Proceed
• Nucleotides are assembled to create complementary strands
• When a modified nucleotide is included, synthesis stops
• Result is millions of tagged copies of varying length
61
Recording the Sequence
T C C A T G G A C CT C C A T G G A C
T C C A T G G A
T C C A T G G
T C C A T G
T C C A T
T C C A
T C C
T C
T
electrophoresisgel
one of the many fragments of DNA migratingthrough the gel
one of the DNA fragmentspassing through a laser beam after moving through the gel
T C C A T G G A C C A
• DNA is placed on gel
• Fragments move off gel in
size order; pass through
laser beam
• Color each fragment
fluoresces is recorded on
printout
62
DNA Sequencing
Goal:
Find the complete sequence of A, C, G, T’s in DNA
Challenge:
There is no machine that takes long DNA as an input, and gives the complete sequence as output
Can only sequence ~500 letters at a time
63
DNA sequencing – vectors
+ =
DNA
Shake
DNA fragments
VectorCircular genome(bacterium, plasmid)
Knownlocation
(restrictionsite)
64
Different types of vectors
VECTORSize of insert
Plasmid2,000-10,000Can control
the size
Cosmid 40,000
BAC (Bacterial Artificial
Chromosome)
70,000-300,000
YAC (Yeast Artificial Chromosome)
> 300,000Not used
much recently
65
DNA sequencing – gel electrophoresis
1. Start at primer(restriction site)
2. Grow DNA chain
3. Include dideoxynucleoside (modified a, c, g, t)
4. Stops reaction at all possible points
5. Separate products with length, using gel electrophoresis
66
Electrophoresis diagrams
67
Challenging to read answer
68
Challenging to read answer
69
Challenging to read answer
70
Reading an electropherogram
1. Filtering2. Smoothening3. Correction for length compressions4. A method for calling the letters – PHRED
PHRED – PHil’s Read EDitor (by Phil Green)Based on dynamic programming
Several better methods exist, but labs are reluctant to change
71
Output of PHRAP: a readA read: 500-700
nucleotides
A C G A A T C A G …A16 18 21 23 25 15 28 30 32 …21
Quality scores: -10log10Prob(Error)
Reads can be obtained from leftmost, rightmost ends of the insert
Double-barreled sequencing:Both leftmost & rightmost ends are sequenced
72
Method to sequence longer regions
cut many times at random (Shotgun)
genomic segment
Get one or two reads from each segment
~500 bp ~500 bp
73
Reconstructing the Sequence
(Fragment Assembly)
Cover region with ~7-fold redundancy (7X)
Overlap reads and extend to reconstruct the original genomic region
reads
74
Definition of Coverage
Length of genomic segment: LNumber of reads: nLength of each read: l
Definition: Coverage C = n l / L
How much coverage is enough?
Lander-Waterman model:Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides
C
75
Challenges with Fragment Assembly
• Sequencing errors~1-2% of bases are wrong
• Repeats
false overlap due to repeat
76
RepeatsBacterial genomes: 5%Mammals: 50%
Repeat types:
• Low-Complexity DNA (e.g. ATATATATACATA…)
• Microsatellite repeats (a1…ak)N where k ~ 3-6(e.g. CAGCAGTAGCAGCACCAG)
• Transposons – SINE (Short Interspersed Nuclear Elements)
e.g., ALU: ~300-long, 106 copies– LINE (Long Interspersed Nuclear Elements)
~500-5,000-long, 200,000 copies– LTR retroposons (Long Terminal Repeats (~700 bp) at each
end)cousins of HIV
• Gene Families genes duplicate & then diverge (paralogs)
• Recent duplications ~100,000-long, very similar copies
77
Strategies for whole-genome sequencing
1. Hierarchical – Clone-by-clone yeast, worm, humani. Break genome into many long fragmentsii. Map each long fragment onto the genomeiii. Sequence each fragment with shotgun
2. Online version of (1) – Walking rice genomei. Break genome into many long fragmentsii. Start sequencing each fragment with shotguniii. Construct map as you go
3. Whole Genome Shotgun fly, human, mouse, rat, fugu
One large shotgun pass on the whole genome
Hierarchical Sequencing
79
Hierarchical Sequencing Strategy
1. Obtain a large collection of BAC clones2. Map them onto the genome (Physical Mapping)3. Select a minimum tiling path4. Sequence each clone in the path with shotgun5. Assemble6. Put everything together
a BAC clone
mapgenome
80
Methods of physical mapping
Goal:
• Map the clones relative to one another • Use the map to select a minimal tiling set of clones to sequence
Methods:
• Hybridization• Digestion
81
1. Hybridization
Short words, the probes, attach to complementary words
1. Construct many probes p1, p2, …, pn
2. Treat each clone Ci with all probes
3. Record all attachments (Ci, pj)4. Same words attaching to clones X, Y overlap
p1 pn
82
2. Digestion
Restriction enzymes cut DNA where specific words appear
1. Cut each clone separately with an enzyme2. Run fragments on a gel and measure length3. Clones Ca, Cb have fragments of length { li, lj, lk }
overlap
Double digestion:Cut with enzyme A, enzyme B, then enzymes A + B
Online Clone-by-cloneThe Walking Method
84
The Walking Method
1. Build a very redundant library of BACs with sequenced clone-ends (cheap to build)
2. Sequence some “seed” clones
3. “Walk” from seeds using clone-ends to pick library clones that extend left & right
85
Walking: An Example
86
Advantages & Disadvantages of
Hierarchical SequencingHierarchical Sequencing
– ADV. Easy assembly– DIS. Build library & physical map; redundant sequencing
Whole Genome Shotgun (WGS)– ADV. No mapping, no redundant sequencing– DIS. Difficult to assemble and resolve repeats
The Walking method – motivation
Sequence the genome clone-by-clone without a physical map
The only costs involved are:– Library of end-sequenced clones (cheap)– Sequencing
87
Walking off a Single Seed
• Low redundant sequencing
• Many sequential steps
88
Walking off a single clone is impractical
Cycle time to process one clone: 1-2 months
1. Grow clone2. Prepare & Shear DNA3. Prepare shotgun library & perform shotgun4. Assemble in a computer5. Close remaining gaps
A mammalian genome would need 15,000 walking steps !
89
Walking off several seeds in parallel
• Few sequential steps
• Additional redundant sequencing
In general, can sequence a genome in ~5 walking steps, with <20% redundant sequencing
Efficient Inefficient
90
Using Two Libraries
Solution: Use a second library of small clones
Most inefficiency comes from closing a small ocean with a much larger clone
Whole-Genome Shotgun Sequencing
92
Whole Genome Shotgun Sequencing
cut many times at random
genome
forward-reverse paired reads
plasmids (2 – 10 Kbp)
cosmids (40 Kbp) known dist
~500 bp~500 bp