On the origin of the genetic code Eörs Szathmáry
Eötvös University, Dept
of Plant Systematics,
Ecology and Theoretical
Biology
Centre for the
Conceptual Foundations
of Science, Munich
To the memory of Sergei Rodin
• Evolutionary scientist named to Susumu
Ohno Chair in Theoretical Biology
A major transition
• Novel way of using genetic information
• Division of labour between nucleic acids and proteins (replication, storage AND catalyis): molecular „germ” and „soma”
• Replicability and enzymatic function disturb each other
• Origin likely to comprise some idiosyncratic steps
Unambiguous and degenerate
The structure of the genetic code
• Amino acids in
the same
column of the
genetic code
are more
related to each
other physico-
chemically
Central nucleotide and amino acid
properties
Carl Woese
Constraints on codon reshuffling
for statistical investigations
Significance of some patterns
„The genetic
code is one in
a million” for
polarity
(Freeland and
Hurst)
Amino acid biosynthesis in E. coli
Biosynthetic relationships
Tzei-Fei Wong
Biosynthesis and amino acid chemistry
BOTH have shaped the code
• The code within the codons (Taylor &
Coates, 1989): first letter correlates with
biosynthesis, second letter with chemisty
• Szathmáry, E. & Zintzaras, E. (1992) A
statistical test of hypotheses on the
organization and origin of the genetic
code. J. Mol. Evol. 35, 185-189.
The RNA world may have preceded
the RNA-protein world
• Easy optimisation (with limits)
• Many artifical ribozymes (BUT no replicase)
• Coenzymes
• Ribozyme doing peptidyl transfer during protein
synthesis in ribosomes
• Amino acyl-tRNA synthetases are NOT the most
ancient proteins
• 20 residues are better than 4 in catalysis
Possibility for an experimental test
of the role of RNA stereochemistry
• Szathmáry, E. (1990) Towards the evolution of ribozymes. Nature 344, 115.
• Generate aptamers against different amino acids: see whether there is specific binding at all
• Search for codonic or anticodonic sequence accumulation in the binding sites
• Draw conclusions
Important ribozyme activities for the
emergence of translation
The fascinating work of Michael
Yarus
• Consistently carrying out
the research programme
for amino acid binding
aptamers
• Looking for increasingly
statistically significant
results
• Trying to put it into the
context of evolution
The minimal GCCU/GUGGC
ribozyme system
• The smallest ribozyme that
carries out a complex group
transfer is the sequence
GUGGC-3’,
• Acting to aminoacylate GCCU-
3’ (and host a manifold of
further reactions) in the
presence of substrate PheAMP.
Tryptophan-binding aptamers
The smallest typtophan binder
• The anticodon is CCA
• 13 fully conserved
nucleotides (26 bits of
information)
• Selective among
hydrophobic changes
The force of aptamer selection
(Yarus)
• Using recent sequences for 337 independent
binding sites directed to 8 amino acids and
containing 18,551 nucleotides in all, we show a
highly robust connection between amino acids and
cognate coding triplets within their RNA binding
sites.
• The apparent probability (P) that cognate triplets
around these sites are unrelated to binding sites is
congruent with 5.3 x 10(-45) for codons overall,
and P congruent with 2.1 x 10(-46) for cognate
anticodons.
Yarus aptamer codon/anticodon
table (2009)
Forces may have changed in strength
But what was the initial advantage?
• Evolution has no foresight
• Should confer some immediate advantage
• Concept of exaptation (preadaptation)
• Coded protein enzymes as culmination of a protracted phase of evolution
• Origin of the genetic code and protein synthesis are not necessariy the same thing
• Evolution is opportunistic
My reservation against uncoded
translation and statistical proteins
• Very wasteful process, initial selective advantage is unclear relative to the high cost of the machinery
• It is like proposing a scenario for the origin of language with long sentences but no meaning, where even word-meaning pairs are statistical!
• They usually completely ignore Yarus’s results!
Replicability and enzyme action are
in conflict
An independent catalytic alphabet is a cool
idea—provided you can get to it
Coding coenzyme handle (CCH)
hypothesis for the origin of the genetic
code (1990, 1993,…)
• This mechanism
works only if binding
between the kissing
hairpins follows the
umambiguous, but
degenerate principle of
the current genetic
code
Piecemeal vocabulary extension
• Amino acids are added and utilised one by
one
• No vicious error feedback as far as amino
acids are not involved (at the beginning) in
the functioning of synthetase ribozymes
• Coding precedes translation
Why indirect binding through base-
pairing? • N number of amino acids
• M number of metabolic enzymes
• If metabolic ribozymes specifically and directly bind amino acid cofactors, then 2 * M functionalities
• THE SITE FOR AMINO ACID BINDING WOULD BLOCK THE AA’S SPECIFIC GROUPS NO GOOD FOR CATALYSIS
• In contrast, only M specific synthetases are needed IF AMINO ACIDS ARE CHARGED TO SPECIFIC HANDLES!
• If M >> N, then choose synthetases
• Bind cofactors by their handles through base pairing (cheap)
Cofactor use by aptamers
CCH generates a prediction
• Missed previously
• Footprints of the evolution for catalytic
potential should be found in codon
clustering
• Kun et al. (2008) In: M. Barbieri (eds)
Codes of Life. Springer, Berlin.
Amino acid catalytic propensities
• Joint work with Kun, Pongor and Jordán (2008)
Significance of some patterns
Catalytic propensity and properties
Highest catalytic and β-turn
propensities
Substitution connectivity based on
the BLOSUM matrix
A minimalist enzyme
• Chorismate
mutase built
of 9 amino
acids only
Chiral histidine selection by D-
ribose RNA (Yarus) • The invariant choice of L-amino acids and D-ribose RNA for biological
translation requires explanation.
• Chiral choice using mixed, equimolar D-ribose RNAs having 15, 18,
21, 27, 35, and 45 contiguous randomized nucleotides was analyzed.
• These are used for simultaneous affinity selection of the smallest bound
and eluted RNAs using equal amounts of L- and D-His immobilized on
an achiral glass support, with racemic histidine elution.
• The most prevalent/smallest RNA sites are reproducibly and repeatedly
selected and there is a four- to sixfold greater abundance of L-histidine
sites. RNA’s chiral D-ribose therefore yields a more frequent fit to L-
histidine.
• Thus, if D-ribose RNA were first chosen biologically, translational L-
His usage could have followed.
Transfer RNAs with complementary anticodons:
Could they reflect early evolution of
discriminative genetic code adaptors? (1993)
• With regard to the anticodon loop and stem of pairs of
consensus tRNAs, complementary distances were
considerably less than direct distances-i.e., antiparallel
pairing invariably yielded fewer mismatches than direct
pairing.
• Each pair of pre-tRNAs with complementary anticodons
should have been almost identical with each other except for
their three central bases.
• The above situation appears to have dictated the early
establishment of direct links between anticodons and the
type of amino acids with which tRNAs are to be charged.
The presence of codon-anticodon
pairs in the acceptor stem of
tRNAs (1996)
• In pairs of consensus tRNAs with complementary
anticodons, their bases at the 2nd position of the
acceptor stem were also complementary.
• Accordingly, inverse complementarity was also
evident at the 71st position of the acceptor stem.
• The parallelism is especially impressive for the
pairs of tRNAs recognized by aminoacyl-tRNA
synthetases (aaRS) from the opposite classes.
Two types of aminoacyl-tRNA synthetases
could be originally encoded by
complementary strands of the same nucleic
acid (1995)
The growth of the
adaptor molecule
The first possible tetrad
ALL members of the first tetrad are
metabolically important
GLY
GCC
ALA
GGC
GUC
ASP
GAC
VAL
CU GA
• RNA synthesis (Gly,
Asp)
• Coenzyma A synthesis
(Val, Asp, Ala)
• Asp is also
catalytically important
Complementary anticodons and
parallel expansion into the catalytic
and structural worlds
• As a rule, pairs of complementary triplets encode
the functionally very different amino acids, most
often those with a high catalytic propensity (His,
Asp, Glu, Lys, Arg) contrasted with those with a
low catalytic but high structural (beta sheet
building) propensity (Val, Ile, Leu, Phe, Ala)
Second tetrad, catalytic expansion,
and the formation of the anticodon
loop?
ARG
GCG
ALA
CGC
GUG
HIS
CAC
VAL
CU GA
Experimentation with catalytic mini
RNAs (Yarus, 2010)
• The small ribozyme initially trans-phenylalanylates a partially complementary 4-nt RNA selectively at its terminal 2’-ribose hydroxyl using PheAMP.
• The initial 2’ Phe-RNA product can be elaborated into multiple peptidyl-RNAs.
Protein buildup on RNA scaffolds
• Shrinking RNA cores
• Selection for peptidyl-transferase activity
• Initially, proteins were strongly associated
with RNAs
• Could not fold by themselves
Proteins from pieces (Lupas,
2003)
Lupas’ conclusion
• The peptides forming these building blocks would
not in themselves have had the ability to fold, but
would have emerged as cofactors supporting RNA-
based replication and catalysis (the 'RNA world').
• Their association into larger structures and
eventual fusion into polypeptide chains would have
allowed them to become independent of their RNA
scaffold, leading to the evolution of a novel type of
macromolecule: the folded protein.
Ribosomal proteins cannot fold by
themselves (Lupas)
Thanks for your attention!
An ancient genetic code at the
anticodon?
• In eubacteria, a paralog of glutamyl-tRNA
synthetase, which lacks the tRNA-binding
domain, was found to aminoacylate tRNAAsp not
on the 30-hydroxyl group of the acceptor stem but
on a cyclopentene diol of the modified nucleoside
queuosine present at the wobble position of
anticodon loop.
• This modified nucleoside might be a relic of an
ancient code.
Amino acids in tRNA modifications
• At positions 34 and 37 of tRNA
Modified queuosine