+ All Categories
Home > Documents > Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Date post: 21-Jan-2016
Category:
Upload: arella
View: 38 times
Download: 0 times
Share this document with a friend
Description:
Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring. 1-Feb-2005 9:15-10 MITRE. Thanks to: DARPA & DOE-GtL Agencourt , Ambergen, Atactic , BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen , Xeotron/Invitrogen For more info see: arep.med.harvard.edu. - PowerPoint PPT Presentation
Popular Tags:
32
Thanks to: DARPA & DOE-GtL Agencourt , Ambergen, Atactic , BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen , Xeotron/Invitrogen For more info see: arep.med.harvard.edu 1-Feb-2005 9:15-10 MITRE Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring
Transcript
Page 1: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Thanks to: DARPA & DOE-GtL

Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, Xeotron/Invitrogen

For more info see: arep.med.harvard.edu

1-Feb-2005 9:15-10 MITRE

Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Page 2: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Synthetic - homologous recombination

testing of DNA motifs

1.3 2.4 (1.3 in argR)

1.1 1.3

0.7 2.5

0.2 1.4

1.4 3.5

RNA Ratio (motif- to wild type) for each flanking gene

Bulyk, McGuire,Masuda,Church Genome Res. 14:201–208

Page 3: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Synthetic Genomes & Proteomes. Why?

• Test or engineer cis-DNA/RNA-elements •Access to any protein (complex) including post-transcriptional modifications• Affinity agents for the above.• Protein design, vaccines, solubility screens • Utility of molecular biology DNA -- RNA -- Protein

in vitro "kits" (e.g. PCR -- T7 -- Roche)

Toward these goals design a chassis:• 115 kbp genome. 150 genes.• Nearly all 3D structures known.• Comprehensive functional data.

Page 4: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

(PURE) translation utility

Removing tRNA-synthetases, translational release-factors,RNases & proteases

Selection of scFvs[antibodies] specific for HBV DNA polymerase using ribosome display. Lee et al. 2004 J Immunol Methods. 284:147

Programming peptidomimetic syntheses by translating genetic codes designed de novo. Forster et al. 2003 PNAS 100:6353

High level cell-free expression & specific labeling of integral membrane proteins. Klammt et al. 2004 Eur J Biochem 271:568

Cell-free translation reconstituted with purified components. Shimizu et al. 2001 Nat Biotechnol. 19:751-5.

Page 5: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

in vitro genetic codes

5'

mS yU eU

UGGUUG CAG

AAC... GUU A 3'GAAACCAUG

fM TN V E

| | | | | || | |

5' Second base 3'

U

A

C

C U

mSyU

eU

A C U

G

A

0

500

1000

1500

2000

2500

3000

3500

30 40 50 60 70 80

3H-E dpm

time (min.)

fM yU mS eU E |

Forster, et al. (2003) PNAS 100:6353-7

80% average yieldper unnatural coupling. eU = 2-amino-4-pentenoic acid

yU = 2-amino-4-pentynoic acid mS = O-methylserine gS = O-GlcNAc–serine bK = biotinyl-lysine

Page 6: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Escherichia coli Mycoplasma 3D structureColiphage 29 DNA polymerase + +Coliphage P1 Cre recombinase - + >Coliphage Lox/Cre recombinase site - +Coliphage T7 RNA polymerase + + >Coliphage T7 RNA polymerase initiation site + + >Coliphage T7 RNA polymerase termination site + +RNase P RNA + -RNase P protein + + >RNase P site/RNA primer for DNA polymerase + +Small subunit 16S ribosomal RNA + +All 21 small subunit ribosomal proteins (1-21) + except 1,21 +Large subunit 5S ribosomal RNA + +Large subunit 23S ribosomal RNA + +Large subunit 23S rRNA G2445>m2G methylase: unknown ? -Large subunit 23S rRNA U2449>dihydroU synthetase: unknown ? -Large subunit 23S rRNA U2457>pseudoU synthetase ? -Large subunit 23S rRNA C2498>Cm methylase: unknown ? -Large subunit 23S rRNA A2503>m2A methylase: unknown ? -Large subunit 23S rRNA U2504>pseudoU synthetase ? -All 33 large subunit ribosomal proteins (1-7,9-11,13-25,27-36) + except 25, 30 +Translational initiation factor 1 + +Translational initiation factor 2 + +Translational initiation factor 3 + +Translational elongation factor Tu + +Translational elongation factor Ts + +Translational elongation factor G + +Translational release factor 1 + +Translational release factor 2 - +Translational release factor Gln methylase + +Translational release factor 3 - +Ribosome recycling factor + +33/45 Transfer RNAs (see Fig. 2) 29/33 +tRNA(I) C34>lysidine synthetase ? +tRNA(R) A34>I deaminase ? +tRNA(ASV) U34>cmo5U (=V) synthetase: unknown - -tRNA(R) U34>2sU Cys desulfurase - +tRNA(R) nm5U34 methylase ? +tRNA(R) U34>cmnm5U GTPase ? +tRNA(R) U34>cmnm5U synthetase ? +tRNA(R) cmnm5U34>nm5U,mnm5U synthetase ? -tRNA(R) G37 N1-methylase + +tRNA(RNIKM) A37>t6A N6-threonylcarbamoyl-A synthetase: unknown + -tRNA(CLFSWY) A37>i6A synthetase - +tRNA(CLFSWY) i6A37>s2i6A(ms2i6A) synthetase - +All 22 aminoacyl-tRNA synthetase subunits (20 enzymes) + except G subunit, Q + except G subunitMet-tRNA formyltransferase + +Chaperonin DnaK + +Chaperonin GroEL + +Chaperonin GroES + +

Total genes = 150Forster & Church

Oligos for 150 & 776

synthetic genes(for E.coli minigenome & M.mobile whole genome

respectively)

Page 7: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Up to 760K Oligos/Chip18 Mbp for $700 raw (6-18K genes)

<1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid Sheng , Zhou, Gulari, Gao (U.Houston) 24K Agilent Ink-jet standard reagents 48K Febit 100K Metrigen 380K Nimblegen Photolabile 5'protection Nuwaysir, Smith, Albert

Tian, Gong, Church

Page 8: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Improve DNA Synthesis CostSynthesis on chips in pools is 5000X less expensive per

oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!)

Solution: Amplify the oligos then release them.

10 50 10 => ss-70-mer (chip)

20-mer PCR primers with restriction sites at the 50mer junctions

Tian, Gong, Sheng , Zhou, Gulari, Gao, Church

=> ds-90-mer

=> ds-50-mer

Page 9: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Improve DNA Synthesis Accuracyvia mismatch selection

Tian & Church Other mismatch methods: MutS (&H,L)

Page 10: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Genome assembly

Moving forward: 1. Tandem, inverted and dispersed repeats (hierarchical assembly, size-selection and/or scaffolding)2. Reduce mutations (goal <1e-6 errors) to reduce # of intermediates 3. 15kb to 5Mb by homologous recombination (Nick Reppas)4. Phage integrase site-specific recombination, also for counters.

Stemmer et al. 1995. Gene 164:49-53;Mullis 1986 CSHSQB.

50

75

125 225 425 825 … 100*2^(n-1)

Page 11: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

All 30S-Ribosomal-protein DNAs(codon re-optimized)

Tian, Gong, Sheng , Zhou, Gulari, Gao, Church

1.7 kb

0.3 kb

s190.3kb

Nimblegen 95K chip

Atactic <4K chip

Page 12: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Improving synthesis accuracy

Method Bp/error

Chip assembly only 160 Hybridization-selection 1,400MutS-gel-shift 10,000MutHLS cleavage 30,000 (10X better than PCR)

Tian & Church 2004Carr & Jacobson 2004Smith & Modrich 1997

Page 13: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Extreme mRNA makeover for protein expression in vitro

RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially.

RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable.

Solution: Iteratively resynthesize all mRNAs with less mRNA structure.

Tian & Church

20w 20m 17w 17m 16w 16m

10kd

W: wild-typeM: modified

Western blot based on His-tags

Page 14: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Safety Proposals

Church, G.M. A synthetic biohazard non-proliferation proposal. http://arep.med.harvard.edu/SBP/Church_Biohazard04c.doc (2004)

1. Monitor oligo synthesis via expansion of Controlled substances, Select Agents, &/or Recombinant DNA

2. Computational tools for the above

3. System modeling checks for synthetic biology projects

4. Multi-auxotroph, novel genetic code for the host genome, prevents functional transfer of DNA to other cells.

Page 15: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Why sequence?

• Synthetic biology & laboratory selections• Pathogen "weather map", biowarfare sensors• Cancer: mutation sets for individual clones, loss-of-heterozygosity• RNA splicing & chromatin modification patterns.• Antibodies or "aptamers" for any protein• B & T-cell receptor diversity: Temporal profiling, clinical • Preventative medicine & genotype–phenotype associations • Cell-lineage during development• Phylogenetic footprinting, biodiversity

Shendure et al. 2004 Nature Rev Gen 5, 335.

Page 16: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Personal genomics & cancer therapy

Mutations G719S, L858R, Del746ELREA in red.

EGFR Mutations in lung cancer: correlation with clinical response to gefitinib [Iressa] therapy. Paez, … Meyerson (Apr 2004) Science 304: 1497

Lynch … Haber, N Engl J Med. (Apr 2004) 350:2129.

Pao .. Mardis,Wilson,Varmus H, PNAS (Aug 2004) 101:13306-11.

Dulbecco R. (1986) A turning point in cancer research: sequencing the human genome. Science 231:1055-6.

Page 17: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Why 'single molecule' sequencing?

(1) Single-cells: Preimplantation (PGD), uncultivatable

(2) Co-occurrence on a molecule, complex, cell RNA splice-forms & DNA haplotypes

(3) Cost: $1K-100K "personal genomes"http://grants.nih.gov/grants/guide/rfa-files/RFA-HG-04-003.html

(4) Precision: Counting 109 RNA tags (to reduce variance)

(~5e5 RNAs per human cell)Fixed 5e3 5e4 5e6 5e9 (goal) costs EST SAGE MPSS Polony-FISSeq (polymerase colony)

Page 18: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

CD44 Exon Combinatorics (Zhu & Shendure)

• Alternatively Spliced Cell Adhesion Molecule• Specific variable exons are up-or-down-regulated in

various cancers (>2000 papers)• v6 & v7 enable direct binding to chondroitin sulfate,

heparin…

Zhu,J, et al. Science. 301:836-8.

Page 19: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Zhu J, Shendure J, Mitra RD, Church GM. Science 301:836-8. Single molecule profiling of alternative pre-mRNA splicing.

EXON PATTERN Eph4 Eph4bDD TOTALEph4 FRATIO LSTP-PV------------7-8-9-10 609 764 1373 1.17 1E-4--------------8-9-10 320 390 710 1.13 3E-2----------6-7-8-9-10 431 251 682 -1.85 4E-18------4-5-6-7-8-9-10 218 216 434 -1.08 2E-1----------------9-10 68 143 211 1.96 7E-7--------5-6-7-8-9-10 86 39 125 -2.37 2E-6----3-4-5-6-7-8-9-10 40 56 96 1.30 9E-2------4-5---7-8-9-10 16 74 90 4.30 2E-9--2-3-4-5-6-7-8-9-10 44 28 72 -1.69 1E-21-2-3-4-5-6-7-8-9-10 22 5 27 -4.73 3E-4--------5---7-8-9-10 5 19 24 3.53 3E-3----3-4-5---7-8-9-10 1 15 16 13.95 4E-4--2-3-4-5---7-8-9-10 1 10 11 9.30 5E-3

Eph4 = murine mammary epithelial cell line

Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)

CD44 RNA isoforms

Page 20: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Chromosome-wide haplotyping

IL6-3572 : A

60-Mb

CD36-4366 : A/T

Human Chr. 7

A..A

A..T

73

3

1

150 Mb

Page 21: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Convergence on non-electrophorectic tag-sequencing methods?

Tag >400 14-26 20 100 26 bp (2-ends) EST SAGE MPSS 454 Polony-Seq Ronaghi• Single-molecule vs. amplified single molecule. • Array vs. bead packing vs. random• Rapid scans vs. long scans (chemically limited, 454)• Number of immobilized primers: 0: Chetverin'97 "Molecular Colonies" 1: Mitra'99 > Agencourt "Bead Polonies" 2: Kawashima'88, Adams'97 > Lynx/Solexa: "Clusters"

http://arep.med.harvard.edu/Polonator/Plone.htm

Page 22: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Bead Polony Sequencing Pipeline

In vitro libraries via paired tag

manipulation

Bead polonies via emulsion PCR

[Dre03]

Monolayered immobilization in acrylamide

Enrichment of amplified beads

SOFTWARE

Images → Tag Sequences

Tag Sequences → Genome

FISSEQ or “wobble”sequencing

Epifluorescence Scope with Integrated Flow

Cell

Page 23: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Polony Fluorescent In Situ Sequencing Libraries

Greg PorrecaAbraham Rosenbaum

1 to 100kb Genomic1 to 100kb Genomic

M

L R

M

PCRbead

Sequencingprimers

Selectorbead

2x20bp after MmeI (BceAI, AcuI)

Dressman et al PNAS 2003 emulsion

Page 24: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Cleavable dNTP-Fluorophore (& terminators)

Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and Church,GM (2003) Fluorescent in situ Sequencing on Polymerase Colonies. Analyt. Biochem. 320:55-65

Reduce

or

photo-cleave

Page 25: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Polony-FISSeq: up to 2 billion beads/slideCy5 primer (570nm) ; Cy3 dNTP (666nm)

Jay ShendureSelf Organizing Monolayer

Page 26: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

• # of bases sequenced (total) 23,703,953

• # bases sequenced (unique) 73

• Avg fold coverage 324,711 X

• Pixels used per bead (analysis) ~3.6

• Read Length per primer 14-15 bp

• Insertions 0.5%

• Deletions 0.7%

• Substitutions (raw) 4e-5 • Throughput: 360,000 bp/min

Polony FISSeq Stats

Current capillary sequencing 1400 bp/min (600X speed/cost ratio, ~$5K/1X)

(This may omit: PCR , homopolymer, context errors)Shendure

Page 27: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

High accuracy special case: homopolymers (e.g. AAA, CC, etc.)

• Use "compressed" tags , ACG = ACCG=ACCCG• Quantitate incorporation • Reversible terminators• FRET between adjacent 3' bases • Wobble sequencing

All five of these work.

• Maintenance of amplification fidelity using linear amplification from initial genomic fragment

Page 28: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Degenerate (aka “wobble) sequencing

“single tipped” vs “double tipped”

length of anchoring sequence

natural vs. universal nucleotides (i.e. deoxyinosine)

single fluor vs. four-color fluor mixtures of dNTPs for extensions

Sequenase vs Klenow vs BST

Exonuclease stripping vs heat stripping

CTAGCGAGCTAGNNNNNNNNACTAGCGAGCTAGNNNNNNNNGCTAGCGAGCTAGNNNNNNNNCCTAGCGAGCTAGNNNNNNNNT

anchor degenerate

“tip”

Page 29: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Wobble vs Simple base-extension

1/4 vs 2.5/4 base/cycle

>8 vs 14-200 base reads

3e-3 vs 4e-5 non-homopolymer errors

3e-3 vs 1e-1 homopolymer errors

40' per cycle, 60 hr per 20 cycles

Page 30: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Sequencing single molecules

Ecosystem studies need single-cell amplification because of multiple chromosomes (& RNAs) per cell. Many cells are hard to grow. Microbes exchange genome subsets.

(Even an 80% genome coverage is better than 100 kb BACs)

Many input molecules required to sequence one molecule. vs. one molecule sufficient to sequence via many copies of it.

Page 31: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

Single cell sequencing

29 real-time amplification

No template control

Affymetrix quantitation of independent amplifications

Page 32: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring

.


Recommended