Date post: | 02-Apr-2015 |
Category: |
Documents |
Upload: | kaley-settle |
View: | 213 times |
Download: | 0 times |
1
Use a circular template to get redundant reads and so more accuracy.
Pacific Biosciences
2
DNA methylation detection by bisulfite conversion
3
Detection of methylated adenine in Pacific Biosciences (SMRT) sequencing
4
IPD = average interpulse duration ratio (meth/non-meth)
Template position
5
Pacific Biosciences
• 50,000 ZMWs (Aug., 2011), and density may climb
• Long reads (e.g., full molecules to determine full length splicing isoforms)
• Direct RNA sequencing possible.
• DNA methylation detectable
6
Agilent SureSelect RNA Target Enrichment
Capture a subgenomic region of interest for economy and speed of sequencing:
E.g.,
the entire exome (all exons w/o introns or intergeneic regions)
hundreds of cancer genes
a particular genomic locus
Alternative: hybridize to a custom microarray.
Agilent
7
Nimblegen (Roche) sub=-genomic DNA capture options: Beads or microarrays
8
Targeted Capture and Next-Generation SequencingIdentifies C9orf75, encoding Taperin, as the Mutated Gene in Nonsyndromic Deafness DFNB79
Rehman et al.American Journal of Human Genetics 86, 378–388,2010
Some results using DNA capture for subgenomic sequencing
9
----CpG-- > ----CmpG--- > ----CmpG--- >< ---G p Cm---
Na bisulfiteHeat
cytosine
uracil
----UpG-- > ----CmpG--- >
Na bisulfiteHeat
deamination
PCR
----TpG-- ><--ApC---
----CpG-- ><--GpC---
All NON-methylated Cs changed to T. Sequence and compare to deduce the methylated C’s
Detection of methylated C (~all in CpG dinucleotides)
DS DNA
10
DEEP SEQUENCING (Next generation sequencing, High throughput sequencing, Massively parallel sequencing) applications:
Human genome re-sequencing (mutations, SNPs, haplotypes, disease associations, personalized medicine)
Tumor genome sequencing
Microbial flora sequencing (microbiome, viruses)Metagenomic sequencing (without cell culturing)
RNA sequencing (RNAseq; gene expression levels, miRNAs, lncRNAs, splicing isoforms)
Chromatin structure (ChIP-seq; histone modifications, nucleosome positioning)Epigenetic modifications (DNA CpG methylation and hydroxymethylation)
Transcription kinetics (GROseq; nascent RNA, BrdU pulse labeled RNA)
High throughput genetics (QUEPASA; cis-acting regulatory motif discovery)Drug discovery (bar-coded organic molecule libraries) [Manocci PNAS paper]
11
Ke et al, and Chasin, Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res. 2011. 21: 1360-1374 ).
Order an equal mixture of all 4 bases at these 6 positions
12
Quantifying extensive phenotypic arraysfrom sequence arrays (= QUEPASA)
13Rank 6-mer ESRseq score (~ -1 to +1)1 AGAAGA 1.0339 2 GAAGAT 0.9918 3 GACGTC 0.9836 4 GAAGAC 0.9642 5 TCGTCG 0.9517 6 TGAAGA 0.9434 7 CAAGAA 0.9219 8 CGTCGA 0.8853 : :4086 TAGATA -0.86094087 AGGTAG -0.87134088 CGTCGC 0.8850 4089 CTTAAA -0.87864090 CCTTTA -0.88124091 GCAAGA 0.89114092 TAGTTA -0.89334093 TCGCCG 0.91134094 CCAGCA -0.89424093 CTAGTA -0.92514094 TAGTAG -0.9383 4095 TAGGTA -0.9965 4096 CTTTTA -1.0610
Best exonic splicing enhancers
Worst exonic splicing enhancers,= best exonic splicing silencers
-
-
-
14
Composite exon (from ~100,000)
Constitutive exons
Alternativexons
Pseudo exons
1515
Experiment: 1 1 1 2 2 1+2 2 2 1 2
Sequence of 36 Quality codeCGCACTGTGCTGGAGCTCCCGGGGTTAACTCTAGAA abU^Vaa`a\aaa]aWaTNZ`aa`Q][TE[UaP_U]TACACTGTGCTGGAGCTCCCAACGGCAACTCTAGAA a`P^Wa`[`Wa^`X_X_XWVa^NSP]_]S^X_T\X^CGCACTGTGCTGGAGCTCCCATGGAGAACTCTAGAA aTa`^b``baaaa^aab^YaTQLOHIa`^a``TX]]TACACTGTGCTGGAGCTCCCCTCCCAAACTCTAGAA I_`aaaa`aaaaaaa_a_^[KZIGIGZ`U`\^P^^`CGCACTGTGCTGGAGCTCCCAATAGTAACTTTAGAA aY_\abb[T\abaaa`a`bZ[HXXIZa_`_LGMS[`TATACTGTGCTGGAGCTCCCGACGTAAACTCTAGAA aba]^aa_a]`aa]_]`XWSMFGGIPX[P]X`V_Y^TACACTGTGCTGGAGCTCCCTGGTAAAACTCTAGAA a_^a^aa`aYaaa_aY`Y_^[I]VY\`]V]R\W]VVTACACTGTGCTGGAGCTCCCAATAAAAACTCTAGAA XZababa`aZaaaaaYaYXX`baa``\\TaUa\aW`
2 nt barcode (TA or CG)
Constant regions(peculiar to our expt.)
Variable regionBarcoding allows multiplexing of several or many experiments at once(in one channel of a sequencer) economy. Here, two biological replicates
What the data looks like:
Error
16
Next generation methods for high throughput genetic analysis:
Use custom oligo libraries to construct minigene libraries (40,000, up to 60 nt long):
E.g., for saturation mutagenesis to identify all exonic bases contributing to splicing (or transcription or polyadenylation, …..)
Use bar codes to detect sequences missing from the selected molecules
E.g., Nat Biotechnol. 2009 27:1173-5. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Patwardhan RP, Lee C, Litvin O, Young DL, Pe'er D, Shendure J.
Long (200-mer) synthetic oligo library
1717OUTLINE OF LECTURE TOPICS COMING UP
Expression and manipulation of transgenes in the laboratory
• In vitro mutagenesis to isolate variants of your protein/gene with desirable properties– Single base mutations– Deletions– Overlap extension PCR– Cassette mutagenesis
• To study the protein: Express your transgene – Usually in E. coli, for speed, economy– Expression in eukaryotic hosts– Drive it with a promoter/enhancer– Purify it via a protein tag– Cleave it to get the pure protein
• Explore protein-protein interaction• Co-immunoprecipitation (co-IP) from extracts• 2-hybrid formation• surface plasmon resonance• FRET (Fluorescence resonance energy transfer)• Complementation readout
1818
PCR
fragment subsequent cloning in a plasmid
(or not, the PCR product itself can be used in many ways, e.g., transfection)
Cut with RE 1 and 2
Ligate into similarly cut vector
RS1 RS2
RS1 RS2
Site-directed mutagenesis by overlap extension PCR
1 2
Strachan and Read Human Mol. Genet.3, p.148
1919
Original sequence coding for, e.g., a transcription enhancer region
Cassette mutagenesis = random mutagenesis but in a limited region:
1) by error-prone PCR
------*--------*--*-**---------------*-----------*--*-------*------------------------*-*-*------------*------------*--
----------------------------------------------------------------------------------------------------------------------
Cut in primer sites and clone upstream of a reporter protein sequence.
Pick coloniesAnalyze phenotypes Sequence
PCR fragment with high Taqpolymerase and Mn+2 instead of Mg+2 errors
2020
Original enhancer sequence
-*------------------------*-*-*------------*------------*--------*--------*--*-**---------------*-----------*--*------
----------------------------------------------------------------------------------------------------------------------
Buy 2 doped oligos; annealOK for up to ~80 nt.
Clone upstream of a reporter. Doping = e.g., 90% G, 3.3% A, 3.3% C, 3.3% Tat each position
Pick coloniesAnalyze phenotypes Sequence
Cassette mutagenesis = random mutagenesis but in a limited region:
2) by “doped” synthesis Target = e.g., an enhancer element
2121
E. coli as a host
• PROs:Easy, flexible, high tech, fast, cheap; but problems
• CONs
• Folding (can misfold)
• Sorting within the cell -> can form inclusion bodies
• Purification -- endotoxins• Modifications -- not done (glycosylation, phosphorylation, etc. )
• Modifications:• Glycoproteins • Acylation: acetylation, myristoylation• Methylation (arg, lys)• Phosphorylation (ser, thr, tyr)• Sulfation (tyr)• Prenylation (farnesyl, geranylgeranyl on cys)• Vitamin C-Dependent Modifications (hydroxylation of proline and lysine)• Vitamin K-Dependent Modifications (gamma carboxylation of glu)• Selenoproteins (seleno-cys tRNA at UGA stop)
E. coli expression vectorsPromoter examples:
1) Lac promoter (with operator)-YFG, + lac repressor (I gene): Induce expression by inactivationof thelac repressor with IPTG or lactose
2) As above but with a hybrid Tac promoter (tryptophan operon + lac operon):Stronger. Use iq mutant of lac I gene, which prodices high levels of the lac repressor.
Expression regulatatable over several orders of magnitude.
3) BAD promoter-YFG. Arabinose utilization operon. Inducible by arabinose via theendogenous araC gene for a transciptional activator. Background levels driven down by including glucose.
4) Phage T7 promoter-YFG. Vector carries gene for T7 polymerase, under control of the lac promoter. Add IPTG or lactose to induce T7 polymerase and thence YFG.
IPTG = isoproplthiogalactoside (non-metabolizable indicer)YFG = your favorite gene
23
Myristoylation – myristoic acid to N-terminal glycine alpha amino group
Anchors protein to memebrane.
24
Lysine epsilon amino group modifications
mono methyl, dimethyl also
Well-studied in histones, microtubules
25
Via seleno-cys tRNA at a UGA nonsense codonSequence context dictates efficiency.
26Gamma carboxylation of glutamic acid
Binds calcium, used in coagulation proteins
2727
Some alternative hosts
• Yeasts (Saccharomyces , Pichia)• Insect cells with baculovirus vectors• Mammalian cells in culture (later)• Whole organisms (mice, goats, corn)
(not discussed) • In vitro (cell-free), for analysis only, not preparatively
(good for radiolabeled proteins, discussed later)
Some popular yeast promoters
ARS = autonomously replicating sequence element
Selectable marker
orihttp://biochemie.web.med.uni-muenchen.de/Yeast_Biol/04 Yeast Molecular Techniques.pdf
2929Yeast Expression Vector (example)
2μ = 2 micron plasmid
2 mu seq features:yeast orioriE = bacterial oriAmpr = bacterial selectionLEU2, e.g. = Leu biosynthesisfor yeast selection
Saccharomyces cerevisiae(baker’s yeast)
oriE
Your favorite
gene(Yfg)
LEU2
Ampr
GAPD term’n
GAPD prom
Complementation of an auxotrophy can be used instead of drug-resistance
Auxotrophy = state of a mutant in a biosynthetic pathway resulting in a requirement for a nutrient
GAPD = the enzyme glyceraldehyde-3 phosphate dehydrogenase
For growth in E. coli
Got this far
31
Genomic DNA
HIS4 mutation-
Yeast - genomic integration via homologous recombination
HIS4
gfY
pt Vector DNA
FunctionalHIS4 gene
DefectiveHIS4 gene
Yfg
tp
Genomic DNA
32
Double recombination Yeast (integration in Pichia pastoris)
AOX1 gene (~ 30% of total protein)
Genomic DNA
AOX1p
Yfg
AOX1t HIS4 3’AOX1
Genomic DNA
HIS4
Yfg
AOX1p
AOX1t
3’AOX1
Vector DNA
P. pastoris-tight control-methanol induced (AOX1)-large scale production (gram quantities)
Alcohol oxidase gene
Expression in mammalian cellsLab examples of immortal cell lines:HEK293 Human embyonic kidney (high transfection efficiency)HeLa Human cervical carcinoma (historical, low RNase)CHO Chinese hamster ovary (hardy, diploid DNA content, mutants)Cos Monkey cells with SV40 replication proteins (-> high transgene copies)3T3 Mouse or human exhibiting ~regulated (normal-like) growth+ various others, many differentiated to different degrees, e.g.:BHK Baby hamster kidney HepG2 Human hepatomaGH3 Rat pituitary cellsPC12 Mouse neuronal-like tumor cellsMCF7 Human breast cancerHT1080 Human fibroblastic cells with near diploid karyotypeIPS induced pluripotent stem cells and:Primary cells cultured with a limited lifetime. E.g., MEF = mouse embryonic fibroblasts, HDF = Human diploid fibroblasts
Common in industry:NS1 mAbs Mouse plasma cell tumor cellsVero vaccines African greem monkey cellsCHO mAbs, other therapeutic proteins Chinese hamster ovary cellsPER6 mAbs, other therapeutic proteins Human retinal cells
Mammalian cell expression
Generalized gene structure for mammalian expression:
cDNA geneMam.prom.
polyA site
intron
5’UTR3’UTR
Intron is optional but a good idea
Popular mammalian cell promoters
• SV40 LargeT Ag (Simian Virus 40)• RSV LTR (Rous sarcoma virus)• MMTV (steroid inducible) (Mouse mammary tumor virus)• HSV TK (low expression) (Herpes simplex virus)• Metallothionein (metal inducible, Cd++)• CMV early (Cytomegalovirus)• Actin• EIF2alpha• Engineered inducible / repressible:
tet, ecdysone, glucocorticoid (tet = tetracycline)
Engineered regulated expression:Tetracycline-reponsive promotersTet-OFF (add tet shut off)
tTA cDNA
tTA = tet activator fusion protein:tetR = tet repressor (original role)
tetRdomain
VP16 transcriptionactivation domain
No tet.Binds tet operator (multiple copies)(if tet not also bound)
tetRdomain
Tetracycline (tet), or,better, doxicyclin (dox)
active
not active
CMV prom.
polyA sitetTA gene must be in cell (permanent transfection, integrated):
Tet-OFF
Tet-OFF
(Bujold et al.)
Allosteric change in conformation
VP16 transcriptionactivation domain
MIN. CMV prom. your favorite gene
polyA site
Mutliple tet operator elements
MIN. CMV prom. your favorite gene
polyA site
tetRdomain
VP16 tc’nact’n domain
not activelittle transcripton (2%?, bkgd)
Doxicyclin present:
MIN. CMV prom. your favorite gene
polyA siteactivePlenty of transcripton
No doxicyclin:
tetRdomain
VP16 tc’nact’n domain
RNA po l
Tet-OFF, cont.
Tetracycline-reponsive promotersTet-ON (add tet turn on gene
tTA cDNA
tetRdomain
VP16 tc’nact’n domain
tetRdomain
VP16 tc’nact’n domain
Tetracycline (tet), or,better, doxicyclin (dox)
active
not active
Full CMV prom.
polyA site
Different fusion protein: Does NOT bind tet operator(if tet not bound)
Tet-ON
Must be in cell (permanent transfection, integrated): commercially available (293, CHO) or do-it-yourself
MIN. CMV prom. your favorite gene
polyA site
Mutliple tet operator elements
MIN. CMV prom. your favorite gene
polyA site
active
Doxicyclin absent:
MIN. CMV prom. your favorite gene
polyA siteactivePlenty of transcripton (> 50X)
Add dox:
tetRdomain
VP16 tc’nact’n domain
RNA pol II
Tet-ON
tetRdomain
VP16 tc’nact’n domain
not active little transcription (bkgd.)
doxicyclin