Chapter 13
The Genetic Code and Transcription
Flow of Genetic Information in Cells
• Central dogma of molecular biology– DNA replicates
• replication
– RNA is transcribed from a DNA template• transcription
– mRNA templates are translated into proteins by ribosomes
• translation
Information Flow in Cells
• Transcription• Translation• Genetic code
Genetic Code
• Linear order of ribonucleotide bases derived from complementary DNA
• Codons call for amino acids and are triplets of nucleotides
• Codons are unambiguous and nonoverlapping
• Degenerate
• Code includes initiation and termination codons
• No internal punctuation
• Code is universal (with minor exceptions…)
Early Studies…
• 1950s– became clear mRNA serves as intermediate in
transferring genetic information from nucleus to cytoplasm
• No direct DNA participation in translation
– Code thought to be overlapping to allow 4 nucleotides call for 20 amino acids
• 1961– Jacob and Monod postulated existence of mRNAs
Triplet Code
• Sidney Brenner (early 1960s)– Postulated triplet code based upon theoretical grounds
(41 = 4, 42 = 16, 43 = 64)
• Crick, Barnett, Brenner and Watts-Tobin– Deletion/insertion mutants in T4 rII locus
• Caused all subsequent amino acids to be wrong
• Reversions by subsequent mutations involved one insertion and one deletion, 3 insertions or 3 deletions
– 3 (not 2) seemed to be the key multiple
– Triplet code…
Triplet Code
• Frameshifts by single deletions or insertions
• Single nucleotide insertions compensate for single nucleotide deletions
• Two single nucleotide insertions still give frameshift
• A total of 3 added or deleted nucleotides leave code in frame
Nonoverlapping Code• Brenner (early 1960s)
– Theoretical considerations make overlapping code highly unlikely
• Only certain amino acids could follow other certain amino acids since the first two nucleotides of the second codon would be already determined
– But looking at known amino acid sequences of proteins available at the time this was clearly not true
• Effects of single nucleotide insertions/deletions argue against overlapping code
– Would not affect all subsequent amino acids in protein
• Crick– Overlapping code unlikely due to physical constraints
during translation (and predicts adapter by hydrogen bonding…tRNAs)
More from Francis Crick
• Adapter hypothesis
• Predicted no internal punctuation on basis of genetic data available
• Only 20 of 64 possible codons specify amino acids– Can’t always be correct…– Insertion/deletion data suggested that all/most
codons could be translated so he changed his opinion
Code is Degenerate
• Degenerate– Amino acids may be encoded by more than one
codon• Amino acids have up to 6 different codons
Marshall Nirenberg and Matthaei• 1961• Cell-free translation system • In 1961 mRNAs not yet isolated• polynucleotide phosphorylase
– Can make synthetic ribonucleotide chains• Normally degrades mRNAs but in high rNDP concentrations
reaction runs in reverse
– Can make homopolymers• AAAAAn, UUUUUn, CCCCCn, GGGGGn
– Controlled mixtures• High A, low C give predictable nucleotides (AAA, AAC,
ACA and CAA)
Nucleotide Phosphorylase
• Can synthesize ribonucleotide polymers from rNDPs• Normally degrades mRNAs by phosphorolysis in the
cell
Translation of Ribonucleotide Homopolymers In Vitro
Working Out the Codons
• Translate homopolymers
• Translate mixed (2 nucleotide) polymers – calculate theoretical codon frequency– Determine amino acid frequency in peptides
Copolymer Experiment
Triplet Binding Assays
• Nirenberg and Philip Leder– 1964
• Synthesize trinucleotides of known sequence
• Ribosomes can bind mRNA as short at 3 nucleotides under proper conditions– Form complex with tRNA
• Codon of mRNA binds to anticodon of tRNA
• This approach by several laboratories eventually assigns 50 of 64codons
Triplet Binding Assay
Translation of Repeating Copolymers
• Gobind Khorana, 1960s– Synthesized long RNAs with dinucleotide,
trinucleotide or tetranucleotide repeats
– For (UG)n there are only two possible triplets• UGU and GUG cysteine and valine in peptide
Results of Synthetic
Copolymer Experiments
• (GAUA)n and (GUAA)n experiments suggested some codons do not translate to amino acids
Genetic Code Summary
• 64 codons– 61 call for amino acids (1 to 6 each)
• degenerate
• 3 triplets are stop signals– UAA, UAG, UGA
• AUG is for methionine and also is the initiation codon
The Code
• In the format suggested by Crick…
More Crick Hypotheses
• Organized codons into now accepted chart format
• Noticed nearly all degeneracy was in the 3rd position of the codon
• Proposed wobble hypothesis in 1966– First two nucleotides more critical– Base pairing by 3rd nucleotide of codon (to anticodon
of tRNA) could be less constrained• Wobble pairing…
Wobble Pairing
• 1st position of anticodon with 3rd position of codon
• 1st position of anticodon allowed to pair with up to 3 different bases in 3rd position of codon
• Inosine base…
An “Ordered” Code
• Codons for a particular amino acid generally grouped
• Amino acids with similar properties (hydrophobic, positive charge, etc.) often have at least 2 position nucleotide of codon in common
• Buffers the effect of mutations…
Initiation
• Nearly all protein coding sequences begin with AUG– Initiator codon– Rarely GUG– Recognized by special initiator tRNA carrying
methionine
• All polypeptides begin with methionine– N-formylmethionine in prokaryotes
Termination
• 3 termination codons– UAA, UAG, UGA– Often called nonsense codons– Amber, ochre, umber
• Not recognized by “normal” tRNAs– Recognized by special proteins called releasing
factors– Mutations in tRNA gene in anticodon region can
produce “suppressor” tRNAs that suppress stop signals (and therefore nonsense mutations
Confirming the Code
• 1972, Walter Fiers• MS2 bacteriophage
– RNA chromosome, 3 genes, 3500 nucleotides
• Compared amino acid sequence of coat protein and the gene (RNA) that encoded it– Agreed with predicted translation– AUG start, UAAUAG double stop codons– Note RNA sequence compared, DNA could not
yet be sequenced…
Nearly Universal
• Up to 1978 considered universal– Humans and E. coli use same basic code, as do all
other species
• 1979 noted that coding properties of human and yeast mtDNA genes not quite the same as predicted
• In general exceptions “simplify the code– Reduce number of tRNAs required for translation
in mitochondria (only 22 encoded there)
Overlapping Genes
• Genetic code is nonoverlapping
• But genes sometimes overlap…– Generally for only a short distance because of
constraints placed on both peptide sequences– Mostly in viruses, some bacterial genes– Optimizes the use of DNA coding space
• (b) shows relative positions of 7 X174 genes
Transcription
• Synthesis of RNA from a DNA template• Suggestive observations that RNA is an
informational intermediate between DNA and the site of protein synthesis– DNA in chromosomes in nucleus, proteins made by
ribosomes in cytoplasm– RNA synthesized in nucleus and is chemically
similar to DNA– After synthesis, much RNA migrates to cytoplasm– Amount of RNA in cell is generally reflective of the
level/amount of protein synthesis
Evidence for mRNAs
• Volkin, 1956 and 1958– 32P, T2 and T7 phage, E. coli
• Added 32P to culture medium as cells were infected with bacteriophage– Newly made (radioactive) RNA matched
composition of phage DNA, not original E. coli RNA
Elliot Volkin’s Results
Brenner, Jacob and Meselson
• Are individual ribosomes specific for a single protein?
• Heavy isotope-labeled E. coli ribosomes, 1961– Determined that preinfection ribosomes synthesized
proteins from phage genes not present in the cell when the ribosomes were themselves synthesized
– Consistent with a DNA-derived mRNA being translated by a generic ribosome
• Jacob and Monod model proposed in 1961
RNA Polymerase
• Discovered in 1959• n(rNTP) + DNA + RNAP (rNMP)n + n(PPi)• Nucleotides linked by 5’ to 3’ phosphodiester
linkages and made 5’3’– Pyrophosphate subsequently cleaved by
pyrophosphatase
• E. coli holoenzyme composed of 2’– Core enzyme (2’) synthesizes RNA while
recognizes promoter sequence– E. coli has only one type of RNA polymerase
E. coli Promoters
• Initial step of RNA synthesis is template binding by sigma factor at the promoter– Promoters are transcription start regions– Sigma binds to 60 bp of DNA about 40 nucleotides
upstream and 20 downstream from the actual transcription start point
– Promoters can be strong or weak • Start frequency of every 1-2 seconds, up to once per 20-
30 minutes
Consensus Sequences
• Conserved sequences found 10 and 35 nucleotides upstream of the transcription start point (+1)– Called –10 (Pribnow box, TATA box) and –35
sequences (TTGACA)– Cis-acting elements
• Bound by trans-acting factors– Common E. coli factor is 70
– Others are 32 54 S E and have different –10 and -35 sequences
Phases of Transcription• Initiation
– No primer required, sigma is– Short ~8-mer oligonucleotide synthesized– Can be abortive
• Elongation– Sigma lost, holoenzyme, 5’ to 3’– 50 nucleotides/sec at 37 degrees Celsius
• Termination– Hairpin structure in RNA– Rho-dependent or independent
Prokaryotic Transcription
mRNA Molecules
• Can be polycistronic in prokaryotes– Operons lead to mRNAs with multiple genes– mRNAs can associate with ribosomes and
begin translation before transcription is completed
• Transcription and translation are said to be coupled
• mRNAs are monocistronic in eukaryotes
Transcription in Eukaryotes
• Transcription in the nucleus, translation in cytoplasm
• Chromatin must be uncoiled and DNA made accessible to RNA polymerase– Chromatin remodeling
• Initiation involves a more complex set of interactions between cis-acting elements and trans-acting factors– Initiation factors, enhancers– Elements can be within or downstream from gene
Eukaryotic Pre-mRNA Processing
• Addition of CAP to 5’end
• polyA tail to most 3’ ends
• Initial RNA molecules called primary transcripts or hnRNAs (that form hnRNPs)– Perhaps as few as 25% of hnRNAs converted to
mRNAs– Involves splicing out of intron-derived
sequences (vs. exons) from transcript
Eukaryotic RNA Polymerases
• Eukaryotes have 3 different RNA polymerases– Each specialized for production of particular types of RNA
Transcriptional Initiation in Eukaryotes
• RNAP II transcribes pre-mRNAs (hnRNAs)– In yeast has 12 subunits/polypeptides– Regulated by transacting factors and core-promoter,
promoter (includes elements in addition to the core promoter element) and enhancer elements
– Core promoter element is called the Goldberg-Hogness or TATA box
• Common consensus is TATAAAA
• Similar to E. coli –10 but located –25 to -30
Other Promoter Elements• CAAT box
– Consensus GGCCAATCT– Generally upstream of TATA and commonly within
100 bp– Distance and orientation may vary– Basal element
• GC box– Properties generally similar to CAAT
• Enhancer element– Act at great distances upstream and downstream– Associated with very strong promoters
Transcription Factors
• Factors are proteins
• Generalized transcription factors– Required for all RNAP II mediated transcription – Bind to basal elements– Required for polymerase binding, do not “turn gene
on/off”
• Specific transcription factors– Involved in regulating on/off, specific for gene or
subset of genes
RNAP II Transcription Factors
• Designated TFIIA, TFIIB, TFFIIetc– Complex, TFIID has 10 polypeptides
• One is TBP or TATA-binding protein
• Once binds at least 7 other general transcription factors bind to form pre-initiation complex, which then binds RNAP II
Eukaryotic Transcription• Yeast model—Roger Kornberg (Arthur’s son)
– Two large subunits, 10 others, 500kDa– Has positive-charged cleft to bind DNA, which clamps
around DNA– Initial interaction/synthesis is unstable and process
often aborts by 11-mer– If proceeds beyond this point will continue until
termination• Terminator causes clamp to “open” and complex dissociates
• Structure conserved to human enzyme and 9 of 10 subunits found conserved in RNAP I and RNAP III
Eukaryotic RNA Processing
• hnRNA is converted to mRNA• Posttranscriptional modifications
– 7-methyl guanosine nucleotide cap attached 5’ to 5’ to the 5’ end of transcript, 2’ of terminal sugar(s) also methylated
• May be essential for transport out of the nucleus and protect from 5’ exonuclease attack
– PolyA added to 3’ end• About 200-250 nucleotides by polyA polymerase• Signal is AAUAAA (actually for nuclease cleavage to
produce mature 3’ end of hnRNA• Without polyA transcript is degraded
Intervening Sequences in Eukaryotic Genes
• Intervening sequences, split genes– Introns– Exons
• Discovered when genomic DNA hybridized to mRNAs or cDNAs– Heteroduplex had loop outs
• Common to most genes– More (can be >50) and larger in higher eukaryotes
(genes can be 5X or more than mRNA, dystrophin mRNA is 1% of gene)
– Locations conserved, sizes/sequences not
Heteroduplex With “Loop Outs”
Genes With Introns
Genes and mRNA Sizes
RNA Splicing
• Commonly involves ribozymes– snRNAs, snRNPs
• Some RNAs are self-splicing– Tetrahymena rRNA transcripts, group I introns– Thomas Cseh, 1982
Group I Introns
• Self-splicing• Catalytic activity in the
intron itself• Requires a guanosine
nucleoside or nucleotide for hydroxyl group
• Involves two nucleophylic attacks, transesterifications
• Found in ciliate rRNA transcripts and some organelle mRNA and tRNA primary transcripts
Splicesomes and Nuclear Splicing
• Nuclear introns can be up to 20 K nucleotides
• Nuclear introns commonly begin (5’, donor sequence) with GU and end with AG (3’, acceptor sequence)
• Splicing carried out by large spliceosome complex (40S in yeast, 60s in mammals)– Small nuclear RNAs (snRNAs)/small nuclear
ribonucleoproteins (snRNPs or snurps)• U1, U2, U3, U4, U5, U6 (rich in Uridine bases)
Spliceosome Mechanism
• U1 complementary to 5’ site• Two transesterifications• Hydroxyl comes from
adenylate residue at branch site bound by U2
• Branch site “attacks” 5’ end of intron
• Free end of exon attacks 3’ end of intron, releasing intron as a lariat structure
RNA Editing• Change in the nucleotide sequence of a pre-mRNA prior to
translation– Actual final sequence not found in DNA– Can be substitution editing (change base) or insertion/deletion
editing
• Classic example: Trypanosoma genus– Some RNAs (e.g. cox) have 60% of their total transcript added
after transcription completed• Uridines• gRNAs (guide RNAs) provide complementary template
• Mammalian example– Long and short forms of apolipoprotein B
• CAA to UAA
Transcription and Translation
• Coupled in prokaryotes• Not coupled in eukaryotes• Visualized by electron microscopy
Gene Amplification
• Temporary increase in the copy number of certain genes– rDNA in amphibian oocytes
– Drug resistance in eukaryotic cell cultures
– Also occurs in some/many cancers
• Prostate cancer DMs and HSRs