1 PCR AND DNA SEQUENCING MBG-487 Işık G. Yuluğ. 2 Polymerase Chain Reaction (PCR) DNA melting...

Post on 19-Jan-2016

219 views 3 download

Tags:

transcript

1

PCR AND DNA SEQUENCING

MBG-487

Işık G. Yuluğ

2

Polymerase Chain Reaction (PCR)

DNA melting

Primer annealing

DNA elongation

Nobel Prize in Chemistry 1993, at age 48

Kary Mullis (invented PCR in 1983)

3

Exponential nature of PCR amplification

4

PCREvery cycle results in a doubling of the number of strands DNA present

After the first few cycles, most of the product DNA strands made are the same length as the distance between the primers

The result is a dramatic amplification of a the DNA that exists between the primers. The amount of amplification is 2 raised to the n power; n represents the number of cycles that are performed. After 20 cycles, this would give approximately 1 million fold amplification. After 40 cycles the amplification would be 1 x 1012

5

6

Try for equal Tm for both primers

7

Avoid primer dimer formation

Marginally problematic primer

8

Use Software to avoid of such problems

9

Typical PCR gel (Every PCR should by gel-verifyed)

10

Optimizing PCR protocols

While PCR is a very powerful technique, often enough it is not possible to achieve optimum results without optimizing the protocol

Critical PCR parameters:

- Concentration of DNA template, nucleotides, divalent cations (especially Mg2+) and polymerase

- Error rate of the polymerase (Taq, Vent exo, Pfu)

- Primer design

PCR can be very tricky

11

Primer design

General notes on primer design in PCR

Perhaps the most critical parameter for successful PCR is the design of primers

Primer selection

Critical variables are:

- primer length

- melting temperature (Tm)

- specificity

- complementary primer sequences

- G/C content

- 3’-end sequence

Primer length

- specificity and the temperature of annealing are at least partly dependent on primer length

- oligonucleotides between 20 and 30 (50) bases are highly sequence specific

- primer length is proportional to annealing efficiency: in general, the longer the primer, the more inefficient the annealing

- the primers should not be too short as specificity decreases

12

Primer design

Specificity

Primer specificity is at least partly dependent on primer length: there are many more unique 24 base oligos than there are 15 base pair oligos

Probability that a sequence of length n will occur randomly in a sequence of length m is:

Example: the mtDNA genome has about 20,000 bases, the probability of randomly finding sequences of length n is:

n Pn

5 19.52

10 1.91 x 10-2

15 1.86 x 10-5

P = (m – n +1) x (¼)n

13

Primer design

Complementary primer sequences

- primers need to be designed with absolutely no intra-primer homology beyond 3 base pairs. If a primer has such a region of self-homology, “snap back” can occur

- another related danger is inter-primer homology: partial homology in the middle regions of two primers can interfere with hybridization. If the homology should occur at the 3' end of either primer, primer dimer formation will occur

G/C content

- ideally a primer should have a near random mix of nucleotides, a 50% GC content

- there should be no PolyG or PolyC stretches that can promote non-specific annealing

3 ’-end sequence

- the 3' terminal position in PCR primers is essential for the control of mis-priming

- inclusion of a G or C residue at the 3' end of primers helps to ensure correct binding (stronger hydrogen bonding of G/C residues)

14

Primer design

Melting temperature (Tm)

- the relationship between annealing temperature and melting temperature is one of the “Black Boxes” of PCR- a general rule-of-thumb is to use an annealing temperature that is 5°C lower than the melting temperature

- the goal should be to design a primer with an annealing temperature of at least 50°C

- the melting temperatures of oligos are most accurately calculated using nearest

neighbor thermodynamic calculations with the formula:

Tm = H [S+ R ln (c/4)] –273.15 °C + 16.6 log 10 [K+]

(H is the enthalpy, S is the entropy for helix formation, R is the molar gas

constant and c is the concentration of primer) - a good working approximation of this value can be calculated using the Wallace formula:

Tm = 4x (#C+#G) + 2x (#A+#T) °C

- both of the primers should be designed such that they have similar melting temperatures.

If primers are mismatched in terms of Tm, amplification will be less efficient or may not

work: the primer with the higher Tm will mis-prime at lower temperatures; the primer with

the lower Tm may not work at higher temperatures.

15

Fidelity of PCR is often an issue

16

Proof-reading activity enzymes

17

18

If complete copies is amplified

19

20

21

LIST OF PCR REACTION CONDITIONS THAT MUST BE OPTIMIZED

PRIMER ANNEALING TEMPERATURE:

Increase in temperature: Increases specificity of primer annealing by destabilizing base pair mismatches.

Decrease in temperature: Increases the sensitivity (and yield) of the reaction by stabilizing base pairing.

22

LIST OF PCR REACTION CONDITIONS THAT MUST BE OPTIMIZED

DNA POLYMERASE:

Enzyme concentration: Enzyme concentrations affect the sensitivity and specificity; too little enzyme produces insufficient product and too much enzyme decreases specificity.

Type of DNA polymerase: Taq enzyme is the most efficient enzyme but it has also the highest error rate; in contrast, pfu has a decreased error rate but synthesizes the least amount of product.

23

LIST OF PCR REACTION CONDITIONS THAT MUST BE OPTIMIZED

MAGNESIUM CONCENTRATION:

Varying the (MgCl2): Low MgCl2 increases

specificity, high MgCl2 stabilizes primer

annealing and increases sensitivity, but can also

decrease primer specificity.

24

LIST OF PCR REACTION CONDITIONS THAT MUST BE OPTIMIZED

CYCLE PARAMETERS:

Denaturation temperature: Elevated denaturation temperature can increase sensitivity by allowing complete template denaturation, especially of G+C rich targets; however, Taq polymerase activity decreases rapidly above 93oC.

Duration time of primer extension: Longer primer extension times increase sensitivity in long distance PCR.

Cycle number: Assay sensitivity is determined by both the efficiency of the enzyme reaction and the initial number of DNA target molecules; it should be necessary to increase sycle number beyond 35 if the reaction contains <103 initial target molecules.

25

26

27

Non-specific PCR and how to improve it

Just PCR

5%DMSO

DMSO+ GLY

MARKER

Increase in Mg concentraton

28

PCR enzymes

Taq DNA polymerase, the first enzyme used for PCR, is still the most popular.

-- high processivity and is the least expensive choice

-- generates PCR products with single A overhangs on the 3´-ends

(Suitable for TOPO-cloning)

“Topo” cloning system (Invitrogen)

Halflife at 95C is 1.6 hours

29

The technology behind TOPO Cloning

• The key to TOPO Cloning is the enzyme, DNA topoisomerase I,

which functions both as a restriction enzyme and as a ligase.

• Its biological role is to cleave and rejoin DNA during replication.

• Vaccinia virus topoisomerase I specifically recognizes the

pentameric

sequence 5’-(C/T)CCTT-3’ and forms a covalent bond with the

phosphate group of the 3’ thymidine.

• It cleaves one DNA strand, enabling the DNA to unwind.

• The enzyme then religates the ends of the cleaved strand and

releases itself from the DNA.

• To harness the religating activity of topoisomerase, TOPO vectors

are

provided linearized with topoisomerase I covalently bound to

each 3’ phosphate.

• This enables the vectors to readily ligate DNA sequences with

compatible Ends.

• In only 5 minutes at room temperature, the ligation is complete

and ready for transformation into E. coli.

30From Invitrogen

31

Tth polymerase

Thermus thermophilus strain HB8.

RNA-dependent DNA-polymerase activity in the presence of Mn2+ ions.

DNA-dependent DNA-polymerase activity in the presence of Mg2+ ions.

The fragment should be ideally smaller 1 kb.

Mn 2+ Mg 2+

32

Pfu polymerase

Proofreading or high fidelity DNA polymerases

(from Pyrococcus furiosus). approx.1 / 2, 000,000 nucleotides before making an error.

In comparison Taq DNA polymerase makes an error in approx. every 1/ 10,000 nucleotides.

can tolerate temperatures exceeding 95°C, enabling it to PCR amplify GC-rich targets.

more expensive

33

Vent (From Thermococcus litoralis)

also known as Tli polymerase

Very termostable: Halflife at 95oC is approximately 7 hours

Vent error rate is intermediate between Taq and Pfu.

2-5 x 10-5 errors/bp

3'->5' exonuclease activity presents

Other polymerases:

Deep Vent (Pyrococcus species GB-D) (New England Biolabs)New England Biolabs claims fidelity is equal to or greater than that of Vent.

Replinase (Thermus flavis) 1.03 x 10-4 errors/base

34

Long-Range PCRUse of two polymerases:

a non-proofreading polymerase Taqis the main polymerase in the reaction,

a proofreading polymerase (3' to 5' exo) Pwo is present at a lower concentration.

22-24 kb PCR products achieved on Qiagen and Eppendorf PCR mixes

Taq+ Pwo (Pyrococcus woesei) ;

Pwo is very stable, 2 hrs at 100 C

35

DNA SEQUENCING

36

DNA sequencing: Importance

• Basic blueprint for life• Gene and protein

– Function– Structure– Evolution

• Genome-based diseases- “inborn errors of metabolism”– Genetic disorders – Genetic predispositions to infection– Diagnostics– Therapies

37

DNA sequencing methodologies: 1977!

• Maxam-Gilbert – base modification

by general and specific chemicals.

– depurination or depyrimidination.

– single-strand excision.

– not amenable to automation

• Sanger– DNA replication.– substitution of

substrate with chain-terminator chemical.

– more efficient

– Automation *

38

Maxam-Gilbert ‘chemical’ method

39

“bio” based methods

• Sanger

• dideoxynucleotides

40

DNA chemistry

41

DNA biochemistry: replication fork

42

SEQUENCING: (Sanger method)

Sanger method:

Frederick Sanger (Nobel prize 1980 with Paul Berg and Walter Gilbert)

43

DNA replication: biochemistry

OC N

purineor

pyrimidine

P O

O

OH

P O

O

OH

P O

O

OH

HO

P O

O

OH

O OC N

purineor

pyrimidine

OH

5’

3’

44

Dideoxynucleotide blocks chain elongation

45

DNA sequencing: Sanger-II

OC N

purineor

pyrimidine

P O

O

OH

P O

O

OH

P O

O

OH

HO

P O

O

OH

O OC N

purineor

pyrimidine

H

chainterminationmethod

46

Sanger method

47

Methods of sequence visualization:

1. Labeled primer

2. Labelled DNA chain (randomly)

3. Labeled terminators

48

Labelled nucleotide (radioactively)

49

Fluorescent DNA labeling with BigDye

50

Applied Biosystems Inc., have designed an automated method

that combines the PCR and actual sequencing

<http://www.utmb.edu/proch/servo3.htm>

51

DNA sequencing: chemistry

* * * * * * ***

* * * **

52

DNA sequencing: in practice

template + polymerase +

1dCTPdTTPdGTPdATP

ddATPprimer

2dCTPdTTPdGTPdATP

ddGTPprimer

3dCTPdTTPdGTPdATPddTTPprimer

4dCTPdTTPdGTPdATPddCTPprimer

extension

electrophoresis

A•TG•CA•TT•AC•GT•AG•CG•CA•TG•CT•AT•AC•GT•AG•CA•T

53

DNA sequencing: upgrade, second iteration, terminator-

label• Disadvantages of primer-labels:

– four reactions– tedious– limited to certain regions, custom

oligos or– limited to cloned inserts behind

‘universal’ priming sites.• Advantages:• Solution:

– fluorescent dye terminators

54

DNA sequencing: chemistry

template + polymerase +

dCTPdTTPdGTPdATP

ddATPddGTPddTTPddCTP

extension

electrophoresis

A•TG•CA•TT•AC•GT•AG•CG•CA•TG•CT•AT•AC•GT•AG•CA•T

55

DNA sequencing: photochemistry

56

DNA sequencing: Computation

57

DNA sequencing: Computation

58

Nucleotides for Sequencing

• Standard nucleotides (A,T,C, G)

• Modified versions of these nucleotides – Labeled so they fluoresce

– Structurally different so that they stop DNA synthesis when they are added to a strand

59

Reaction Mixture

• Copies of DNA to be sequenced

• Primer

• DNA polymerase

• Standard nucleotides

• Modified nucleotides

60

Reactions Proceed

• Nucleotides are assembled to create complementary strands

• When a modified nucleotide is included, synthesis stops

• Result is millions of tagged copies of varying length

61

Recording the Sequence

T C C A T G G A C CT C C A T G G A C

T C C A T G G A

T C C A T G G

T C C A T G

T C C A T

T C C A

T C C

T C

T

electrophoresisgel

one of the many fragments of DNA migratingthrough the gel

one of the DNA fragmentspassing through a laser beam after moving through the gel

T C C A T G G A C C A

• DNA is placed on gel

• Fragments move off gel in

size order; pass through

laser beam

• Color each fragment

fluoresces is recorded on

printout

62

DNA Sequencing

Goal:

Find the complete sequence of A, C, G, T’s in DNA

Challenge:

There is no machine that takes long DNA as an input, and gives the complete sequence as output

Can only sequence ~500 letters at a time

63

DNA sequencing – vectors

+ =

DNA

Shake

DNA fragments

VectorCircular genome(bacterium, plasmid)

Knownlocation

(restrictionsite)

64

Different types of vectors

VECTORSize of insert

Plasmid2,000-10,000Can control

the size

Cosmid 40,000

BAC (Bacterial Artificial

Chromosome)

70,000-300,000

YAC (Yeast Artificial Chromosome)

> 300,000Not used

much recently

65

DNA sequencing – gel electrophoresis

1. Start at primer(restriction site)

2. Grow DNA chain

3. Include dideoxynucleoside (modified a, c, g, t)

4. Stops reaction at all possible points

5. Separate products with length, using gel electrophoresis

66

Electrophoresis diagrams

67

Challenging to read answer

68

Challenging to read answer

69

Challenging to read answer

70

Reading an electropherogram

1. Filtering2. Smoothening3. Correction for length compressions4. A method for calling the letters – PHRED

PHRED – PHil’s Read EDitor (by Phil Green)Based on dynamic programming

Several better methods exist, but labs are reluctant to change

71

Output of PHRAP: a readA read: 500-700

nucleotides

A C G A A T C A G …A16 18 21 23 25 15 28 30 32 …21

Quality scores: -10log10Prob(Error)

Reads can be obtained from leftmost, rightmost ends of the insert

Double-barreled sequencing:Both leftmost & rightmost ends are sequenced

72

Method to sequence longer regions

cut many times at random (Shotgun)

genomic segment

Get one or two reads from each segment

~500 bp ~500 bp

73

Reconstructing the Sequence

(Fragment Assembly)

Cover region with ~7-fold redundancy (7X)

Overlap reads and extend to reconstruct the original genomic region

reads

74

Definition of Coverage

Length of genomic segment: LNumber of reads: nLength of each read: l

Definition: Coverage C = n l / L

How much coverage is enough?

Lander-Waterman model:Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides

C

75

Challenges with Fragment Assembly

• Sequencing errors~1-2% of bases are wrong

• Repeats

false overlap due to repeat

76

RepeatsBacterial genomes: 5%Mammals: 50%

Repeat types:

• Low-Complexity DNA (e.g. ATATATATACATA…)

• Microsatellite repeats (a1…ak)N where k ~ 3-6(e.g. CAGCAGTAGCAGCACCAG)

• Transposons – SINE (Short Interspersed Nuclear Elements)

e.g., ALU: ~300-long, 106 copies– LINE (Long Interspersed Nuclear Elements)

~500-5,000-long, 200,000 copies– LTR retroposons (Long Terminal Repeats (~700 bp) at each

end)cousins of HIV

• Gene Families genes duplicate & then diverge (paralogs)

• Recent duplications ~100,000-long, very similar copies

77

Strategies for whole-genome sequencing

1. Hierarchical – Clone-by-clone yeast, worm, humani. Break genome into many long fragmentsii. Map each long fragment onto the genomeiii. Sequence each fragment with shotgun

2. Online version of (1) – Walking rice genomei. Break genome into many long fragmentsii. Start sequencing each fragment with shotguniii. Construct map as you go

3. Whole Genome Shotgun fly, human, mouse, rat, fugu

One large shotgun pass on the whole genome

Hierarchical Sequencing

79

Hierarchical Sequencing Strategy

1. Obtain a large collection of BAC clones2. Map them onto the genome (Physical Mapping)3. Select a minimum tiling path4. Sequence each clone in the path with shotgun5. Assemble6. Put everything together

a BAC clone

mapgenome

80

Methods of physical mapping

Goal:

• Map the clones relative to one another • Use the map to select a minimal tiling set of clones to sequence

Methods:

• Hybridization• Digestion

81

1. Hybridization

Short words, the probes, attach to complementary words

1. Construct many probes p1, p2, …, pn

2. Treat each clone Ci with all probes

3. Record all attachments (Ci, pj)4. Same words attaching to clones X, Y overlap

p1 pn

82

2. Digestion

Restriction enzymes cut DNA where specific words appear

1. Cut each clone separately with an enzyme2. Run fragments on a gel and measure length3. Clones Ca, Cb have fragments of length { li, lj, lk }

overlap

Double digestion:Cut with enzyme A, enzyme B, then enzymes A + B

Online Clone-by-cloneThe Walking Method

84

The Walking Method

1. Build a very redundant library of BACs with sequenced clone-ends (cheap to build)

2. Sequence some “seed” clones

3. “Walk” from seeds using clone-ends to pick library clones that extend left & right

85

Walking: An Example

86

Advantages & Disadvantages of

Hierarchical SequencingHierarchical Sequencing

– ADV. Easy assembly– DIS. Build library & physical map; redundant sequencing

Whole Genome Shotgun (WGS)– ADV. No mapping, no redundant sequencing– DIS. Difficult to assemble and resolve repeats

The Walking method – motivation

Sequence the genome clone-by-clone without a physical map

The only costs involved are:– Library of end-sequenced clones (cheap)– Sequencing

87

Walking off a Single Seed

• Low redundant sequencing

• Many sequential steps

88

Walking off a single clone is impractical

Cycle time to process one clone: 1-2 months

1. Grow clone2. Prepare & Shear DNA3. Prepare shotgun library & perform shotgun4. Assemble in a computer5. Close remaining gaps

A mammalian genome would need 15,000 walking steps !

89

Walking off several seeds in parallel

• Few sequential steps

• Additional redundant sequencing

In general, can sequence a genome in ~5 walking steps, with <20% redundant sequencing

Efficient Inefficient

90

Using Two Libraries

Solution: Use a second library of small clones

Most inefficiency comes from closing a small ocean with a much larger clone

Whole-Genome Shotgun Sequencing

92

Whole Genome Shotgun Sequencing

cut many times at random

genome

forward-reverse paired reads

plasmids (2 – 10 Kbp)

cosmids (40 Kbp) known dist

~500 bp~500 bp