+ All Categories
Home > Documents > Relationships Among Isoacceptor tRNAs Seems to Support the Coevolution Theory of the Origin of the...

Relationships Among Isoacceptor tRNAs Seems to Support the Coevolution Theory of the Origin of the...

Date post: 25-Aug-2016
Category:
Upload: da
View: 214 times
Download: 1 times
Share this document with a friend
10
Relationships Among Isoacceptor tRNAs Seems to Support the Coevolution Theory of the Origin of the Genetic Code M.B. Chaley, 1 E.V. Korotkov, 1 D.A. Phoenix 2 1 Center ‘‘Bioengineering,’’ Russian Academy of Sciences, 60-letiya Oktyabrya Prospect, 7/1, 117312 Russia, Moscow 2 University of Central Lancashire, Department of Applied Biology, Preston PR1 2HE, UK Received: 4 May 1998 / Accepted: 11 July 1998 Abstract. A new method for looking at relationships between nucleotide sequences has been used to analyze divergence both within and between the families of isoaccepting tRNA sets. A dendrogram of the relation- ships between 21 tRNA sets with different amino acid specificities is presented as the result of the analysis. Methionine initiator tRNAs are included as a separate set. The dendrogram has been interpreted with respect to the final stage of the evolutionary pathway with the de- velopment of highly specific tRNAs from ambiguous molecular adaptors. The location of the sets on the den- drogram was therefore analyzed in relation to hypotheses on the origin of the genetic code: the coevolution theory, the physicochemical hypothesis, and the hypothesis of ambiguity reduction of the genetic code. Pairs of 16 sets of isoacceptor tRNAs, whose amino acids are in biosyn- thetic relationships, occupied contiguous positions on the dendrogram, thus supporting the coevolution theory of the genetic code. Key words: Comparative analysis — tRNA — Mo- lecular evolution — Origin of the genetic code Introduction Transfer RNAs have been central to the development of the genetic code into its current state and have therefore become the focus of discussions on the main reason for its establishment (Szathmary and Zintzaras 1992; Di Gi- ulio 1994, 1995). The discussions have revolved around the biosynthetic relationship between amino acids (Wong 1975; Taylor and Coates 1989) and their physical and chemical properties (Sonneborn 1965; Woese et al. 1966a, b). Because of the importance of tRNAs as adap- tors for the genetic code, they have been chosen to in- vestigate its origin (Fitch and Upper 1987). In spite of structural conservatism, transfer RNAs (tRNAs) contain enough sites of information (52 of the total length of 76 bases) for comparative analysis (Eigen et al. 1989), and indeed tRNA sequences have been popular objects for such analysis. The first phylogenetic trees of isoaccept- ing tRNAs have already shown the divergence of tRNA sequences among different species (LaRue et al. 1979). The construction of a phylogenetic tree for the VAL and GLY superfamilies of tRNAs has shown that tRNA di- vergence followed the divergence of anticodons and later the divergence of species. The evolution of transfer RNAs has been considered in detail by Cedergren et al. (1981). By comparing known tRNAs and their genes, the conservative cloverleaf structural elements in each isoac- ceptor tRNA family and archetypical features in all fami- lies have been revealed (Nicoghosian et al. 1987). A new method (Eigen et al. 1989) has suggested that, at the moment of archaebacteria and eubacteria division, the level of divergence between the tRNA sequences was already equal to one-third of the current level of tRNA divergence (Eigen et al. 1989). This method of statistical geometry in sequence space has been offered as a more Correspondence to: M.B. Chaley; e.mail: [email protected] J Mol Evol (1999) 48:168–177 © Springer-Verlag New York Inc. 1999
Transcript
Page 1: Relationships Among Isoacceptor tRNAs Seems to Support the Coevolution Theory of the Origin of the Genetic Code

Relationships Among Isoacceptor tRNAs Seems to Support the CoevolutionTheory of the Origin of the Genetic Code

M.B. Chaley,1 E.V. Korotkov, 1 D.A. Phoenix2

1Center ‘‘Bioengineering,’’ Russian Academy of Sciences, 60-letiya Oktyabrya Prospect, 7/1, 117312 Russia, Moscow2University of Central Lancashire, Department of Applied Biology, Preston PR1 2HE, UK

Received: 4 May 1998 / Accepted: 11 July 1998

Abstract. A new method for looking at relationshipsbetween nucleotide sequences has been used to analyzedivergence both within and between the families ofisoaccepting tRNA sets. A dendrogram of the relation-ships between 21 tRNA sets with different amino acidspecificities is presented as the result of the analysis.Methionine initiator tRNAs are included as a separateset. The dendrogram has been interpreted with respect tothe final stage of the evolutionary pathway with the de-velopment of highly specific tRNAs from ambiguousmolecular adaptors. The location of the sets on the den-drogram was therefore analyzed in relation to hypotheseson the origin of the genetic code: the coevolution theory,the physicochemical hypothesis, and the hypothesis ofambiguity reduction of the genetic code. Pairs of 16 setsof isoacceptor tRNAs, whose amino acids are in biosyn-thetic relationships, occupied contiguous positions on thedendrogram, thus supporting the coevolution theory ofthe genetic code.

Key words: Comparative analysis — tRNA — Mo-lecular evolution — Origin of the genetic code

Introduction

Transfer RNAs have been central to the development ofthe genetic code into its current state and have therefore

become the focus of discussions on the main reason forits establishment (Szathmary and Zintzaras 1992; Di Gi-ulio 1994, 1995). The discussions have revolved aroundthe biosynthetic relationship between amino acids(Wong 1975; Taylor and Coates 1989) and their physicaland chemical properties (Sonneborn 1965; Woese et al.1966a, b). Because of the importance of tRNAs as adap-tors for the genetic code, they have been chosen to in-vestigate its origin (Fitch and Upper 1987). In spite ofstructural conservatism, transfer RNAs (tRNAs) containenough sites of information (52 of the total length of 76bases) for comparative analysis (Eigen et al. 1989), andindeed tRNA sequences have been popular objects forsuch analysis. The first phylogenetic trees of isoaccept-ing tRNAs have already shown the divergence of tRNAsequences among different species (LaRue et al. 1979).The construction of a phylogenetic tree for the VAL andGLY superfamilies of tRNAs has shown that tRNA di-vergence followed the divergence of anticodons and laterthe divergence of species. The evolution of transferRNAs has been considered in detail by Cedergren et al.(1981). By comparing known tRNAs and their genes, theconservative cloverleaf structural elements in each isoac-ceptor tRNA family and archetypical features in all fami-lies have been revealed (Nicoghosian et al. 1987). A newmethod (Eigen et al. 1989) has suggested that, at themoment of archaebacteria and eubacteria division, thelevel of divergence between the tRNA sequences wasalready equal to one-third of the current level of tRNAdivergence (Eigen et al. 1989). This method of statisticalgeometry in sequence space has been offered as a moreCorrespondence to:M.B. Chaley;e.mail: [email protected]

J Mol Evol (1999) 48:168–177

© Springer-Verlag New York Inc. 1999

Page 2: Relationships Among Isoacceptor tRNAs Seems to Support the Coevolution Theory of the Origin of the Genetic Code

reliable means of revealing hidden kinship betweennucleotide sequences (Eigen et al. 1988, 1989), but anynew method which provides insight into earlier relation-ships between tRNA species and the evolutionary diver-gence of these sequences is of importance. This paperdescribes a method for studying relationships betweennucleotide sequences and gives insight into the evolu-tionary divergence of tRNA families and their relation-ship with the genetic code.

Materials and Methods

The sequences were taken from a compilation of tRNA sequences andtRNA genes from the EMBL database (Steinberg et al. 1993). Thesequences were aligned according to the standard cloverleaf structure.The modified RNA bases were exchanged for the original unmodifiedform. The tRNA sequences whose anticodons imply a binding withknown exceptions to the universal genetic code (Osawa et al. 1992),were excluded from the comparative analysis. According to the mecha-nisms of ‘‘codon reassignment’’ (Osawa and Jukes 1989) and ‘‘codonswapping’’ (Szathmary 1991), these exceptions are due to later eventsin genome evolution rather than the original correspondence which wasestablished between codons (anticodons) and amino acids. tRNAs ofcellular organelles were also excluded from the analysis based on thesymbiotic theory of the organelle origin (Alberts et al. 1995). It isreasonable to assume that the evolution of cellular organelle tRNAsimplies an individual character and should be studied separately. Thisis supported by the fact that mitochondrial tRNAs often have nonstan-dard secondary structures (Cedergren 1982) and show deviations fromthe universal coding (Osawa et al. 1992). The accession codes for thesequences used in the present analysis are given in the Appendix.

Isoacceptor tRNAs and their genes were put into distinct sets ac-cording to their different amino acid specificities. Elongator and ini-tiator methionine tRNAs were also divided into separate sets. Table 1gives the number of the sequences in each set and distribution ofanticodons within the sets. It is noticeable that anticodons span thewhole table of amino acid codons. Table 2 shows the distribution of thesequences among kingdoms of protista (viruses, archaebacteria, andeubacteria are represented separately), plants, and animals. As can beseen, in most cases each set contains representatives from all kingdoms.The method developed here for observing the relationship betweennucleotide sequences can be described as follows.

Generalization of Relationships Between theNucleotide Sequences

(1) The nucleotide sequences of tRNAs and tRNA genes have beencompared in pairs within each set and between the sets of different

Table 1. The number of investigated isoacceptor tRNAs and tRNAgenesa

Aminoacid

Number ofisoacceptingtRNAs Anticodon

Number of tRNAswith identicalanticodons

Aminoacid codon

Ala 46 UGC 33 GCAIGC 8 GCUCGC 3 GCGGGC 2 GCG

Arg 40 ICG 16 CGUCCG 5 CGGUCG 2 CGAGCG 2 CGCUCU 11 AGACCU 4 AGG

Asn 28 GUU 28 AAC— — AAU

Asp 26 GUC 26 GAC— — GAU

Cys 10 GCA 10 UGC— — UGU

Gln 25 UUG 15 CAACUG 10 CAG

Glu 34 UUC 21 GAACUC 13 GAG

Gly 46 GCC 21 GGCUCC 16 GGACCC 9 GGG— — GGU

His 23 GUG 23 CAC— — CAU

Ile 30 GAU 12 AUCCAU 10 —IAU 6 AUUUAU 2 AUA

Leu 61 CAG 18 CUGUAG 12 CUAIAG 5 CUUGAG 2 CUCUAA 12 UUACAA 12 UUG

Lys 42 UUU 22 AAACUU 20 AAG

Met 25 CAU 25 AUGMetf 46 CAU 46 AUGPhe 39 GAA 39 UUC

— — UUUPro 42 UGG 24 CCA

CGG 7 CCGIGG 7 CCUGGG 4 CCC

Ser 62 UGA 18 UCAIGA 15 UCUCGA 14 UCGGGA 5 UCCGCU 10 AGC— — AGU

Thr 34 UGU 14 ACAGGU 9 ACCCGU 6 ACGIGU 5 ACU

Trp 20 CCA 18 UGGUCA 2 —

Tyr 35 GUA 34 UACCUA 1 —— — UAU

Table 1. Continued

Aminoacid

Number ofisoacceptingtRNAs Anticodon

Number of tRNAswith identicalanticodons

Aminoacid codon

Val 42 IAC 14 GUUUAC 12 GUAGAC 8 GUCCAC 8 GUG

a Distribution of anticodons corresponding to the codons of amino ac-ids. Anticodons and codons are represented in a 58 to 38 direction. If theI base is shown at the first position of the anticodon, it means that theA base was present in the original tRNA gene sequence.

169

Page 3: Relationships Among Isoacceptor tRNAs Seems to Support the Coevolution Theory of the Origin of the Genetic Code

amino acid specificities. The similarity measure obtained by pairwisecomparison was twice the value of the mutual information measure,(2I), as defined in Eq. (1).

2I = 2S(i=1

4

(j=1

4

mij lnmij − (i=1

4

milnmi − (j=1

4

mjlnmj +LlnLD (1)

L = (i=1

4

mi = (j=1

4

mj (2)

mi = (j=1

4

mij , mj = (i=1

4

mij (3)

Heremij are the elements of the base coincidence matrix denotedM.Matrix M is a 4 × 4 matrix which contains numerical values for allcoincidences among A, G, U, and C bases, when two tRNAs werecompared. It has been shown that 2I has ax2 distribution with ninedegrees of freedom (Kullback 1958). This permits the statistical sig-nificance of any similarity to be ascertained.

Then, a matrixRXZ was formed containing measures of the simi-larity between tRNA pairs for each pair of sets (X and Z). This wasfilled according to the results of the comparative analysis, with the 2Ivalues forming the elements of theRXZ matrix.

(2) As shown in Table 1, the sets with different amino acid speci-ficities contain unequal numbers of sequences. Consequently, theRXZ

matrices for pairwise comparison of the sets had different dimensions.The similarities between tRNA sets were analyzed via integration toallow a comparison independent of the dimensions of theRXZ matrixes.The distribution of the 2I value was considered over the following 12intervals:

[0,20), [20,25),. . . , [60,65), [65,70), [70,+̀ ) (4)

The distribution of the experimentally determined frequencies of the 2Ivalue over the intervals are denotedfn, n 4 1, . . . , 12. Theoreticalprobabilitiesen that the 2I value belonged to each interval were calcu-lated by integration of the probability density of thex2 distribution with9 df. The measure of an interrelationship between two sets of tRNAs,i.e., the deviation of the comparison matriceRXZ from a matrix showingaccidental alignments, was given by

FXZ = KXZS(n=1

r

fnlnfn − (n=1

r

fnlnenD (5)

Here KXZ is the product of column and row numbers of matrixRXZ

corresponding to the pair of tRNA sets:X and Z. r is the number ofdiscrete intervals of 2I value, withr 4 12 in this case.

(3) It has been shown that the 2FXZ value is distributed asx2 with(r − 1) degrees of freedom (Kullback 1958). This allows the introduc-tion of GXZ as a measure of the interrelationship between tRNA sets.This measure is independent of the dimension of the comparison matrixRXZ:

GXZ =2FXZ

KXZK0 (6)

HereK0 is a mean volume for the comparison matrix among all pairsof tRNA sets. We considerK0 4 1296 in further calculations.

(4) To represent relationships between the sets of isoacceptortRNAs more adequately, thex2 distribution with 11 df was transformedto the standard normal distribution by the formula

NXZ 4 (2GXZ)1/2 − (2r − 1)1/2 (7)

(5) Finally, the strength of the relationship between a pair of tRNA setshas been represented as an argument of the standard normal distribu-tion. All such relationships have been placed in matrixN(x,z)—a simi-larity matrix for the 21 sets of isoacceptor tRNAs. TheN(x,z) matrix isgiven in Table 3. Using the well-known algorithm for cluster analysisby ‘‘nearest neighbor’’ (Duda and Hart 1973), a dendrogram based onthe N(x,z) similarity matrix has been constructed. In constructing thedendrogram, the arguments of the normal distribution were expressedto three significant figures. Each set was drawn as a separate branch.Each branch of the dendrogram was continued horizontally to the mark,which was equal to the measure of the similarity between the sequenceswithin one set (see Fig. 1).

Results and Discussion

Using mutual information [2I value, introduced by for-mula (1)] as a basic measure of similarity between se-quence pairs allows phylogenetic reconstruction withoutthe need to assume a constant rate of mutations in allbranches of a phylogenetic tree. This approach does notfocus on the number of different nucleotides between thesequences since relationships between them are mea-sured by summarizing all possible nucleotide coinci-dences. This provides greater insight to the similarity ofnucleotide sequences in comparison to the commonlyused approach of representing similarity in terms of per-centage homology. Further, this method allows a statis-tical measure to be assigned to each pairwise comparisonof groups of sequences for different isoacceptor tRNAsindependent of the groups’ size [formulas (5)–(7)].

A dendrogram, showing the interrelationships amongthe tRNA sets with different amino acid specificities, isshown in Fig. 1. The interrelationship values are given asarguments of a normal distribution ranging from 195 to330 units. The arguments for a normal distribution be-longing to such an interval have corresponding probabili-

Table 2. Numbers of representatives of all kingdoms of living organisms for each set of isoaccepting tRNAs and their genes: The order of thesets from left to right follows the dendrogram in Fig. 1

Gln His Pro Ser Leu Gly Glu Asp Cys Trp Arg Val Ala Metf Tyr Thr Met Ile Phe Asn Lys

Viruses 2 1 4 3 2 2 — 1 — 1 1 1 2 1 — 2 2 2 — 1 1Archaebacteria 3 4 6 7 11 8 5 4 2 3 5 7 15 7 3 7 4 4 3 5 5Eubacteria 8 8 16 17 28 21 10 8 5 7 15 12 17 13 9 16 10 17 11 10 11Protista 6 5 3 14 6 1 6 4 1 3 10 7 5 8 6 5 2 4 6 5 8Plants — 1 3 9 1 4 3 1 — 3 1 2 1 5 10 — 3 1 6 2 1Animals 6 4 10 12 13 10 10 8 2 3 8 13 6 12 7 4 4 2 13 5 16

170

Page 4: Relationships Among Isoacceptor tRNAs Seems to Support the Coevolution Theory of the Origin of the Genetic Code

ties of accidentally establishing relationships betweenthe set pairs which are less than 10−8. A difference be-tween the interrelationship values of even one unit leadsto a difference in the probabilities of more than one orderof magnitude. It can be seen that the sets of lysine andelongator methionine tRNAs contain the most closelyrelated sequences.

Two processes have accompanied the development ofcurrent isoacceptor tRNA sets. The first is the evolution-ary differentiation of molecular adaptors into sets ofhigh-specificity molecules which concentrate amino ac-ids. The second process is that of species evolution,which would have influenced the levels of interrelation-ships both within and across the sets. Usually, by con-structing a common ancestral sequence, the influence ofspecies evolution is reduced to a minimum in the se-quence phylogeny (Fitch and Upper 1987; Di Giulio1995). Then trees of ancestral sequences can be con-structed in accordance with the hypothesis under inves-tigation, but statistical testing of phylogenetic hypothesesis complicated due to the vast number of all possibletrees, which consist of the same number of taxons. Thenumber of unrooted trees ofn taxons has a value ofP(2i − 1), i 4 1, . . . ,(n − 2) (Fitch and Margoliash1968). The approach used in the present work avoids theneed for the reconstruction of an ancestral sequence foreach set of isoacceptor tRNAs. Nevertheless, the analysisgiven below provides evidence that the generalization ofinterrelationships across the nucleotide sequencessmoothes the influence of species evolution. First, con-sider Table 2. The 21 tRNA sets are shown, along withthe number of tRNA representatives from each kingdom:

viruses, archaebacteria, eubacteria, protista (excludingthe last three groups), plants, and animals. The tRNA setsin Table 2 go from left to right as they are arranged inFig. 1. There is no obvious relationship between thenumber of representatives from each kingdom and thearrangement of the tRNA sets in the dendrogram. Thismay imply that the order of the tRNA sets in the den-drogram reflects mainly the evolutionary process, whichformed tRNAs of precise amino acid specificities. Sec-ond, in Fig. 1, it can be observed that the lower thestrength of the internal relationship between tRNAs in aset, the less the isoacceptor tRNA family is related toother tRNA sets. Naturally, the more ancient the se-quences of the isoacceptor tRNAs, the more divergencethey show between themselves. Furthermore, their rela-tionships with other isoacceptor tRNAs, which wereformed later, are weaker. It may therefore be supposedthat the dendrogram in Fig. 1 correctly reflects the orderof tRNA evolution.

Being the molecular adaptors between the geneticcode and the amino acid sequences of proteins, transferRNAs, surely, must bear some traces of evolutionarystructuring of the genetic code. So the dendrogram ofinterrelations between the sets of isoacceptor tRNAs issupplemented with a scheme-table of amino acid codonsin Fig. 2. Moving from top to bottom, specific codons inFig. 2 repeat the arrangement of tRNA sets in Fig. 1 fromleft to right. The codons of amino acids, in which the firsttwo bases differ only by transitions, are distinguished byseparate boxes. The first column in the scheme-table inFig. 2 includes the first two bases of the codons; thesecond column contains the third codon base. Amino

Table 3. A matrix of generalized interrelationships among 21 isoacceptor tRNA sets (methionine tRNAs are divided into initiator and elongatorsets)a

Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu

Ala 309.88 205.09 178.26 182.26 128.68 158.99 145.60 212.71 144.35 255.42 132.52Arg 275.27 233.60 180.99 165.03 181.95 163.50 200.78 164.86 229.58 156.90Asn 309.07 198.29 158.84 171.54 168.12 166.39 181.72 259.85 139.03Asp 281.85 123.82 166.44 250.90 218.58 195.09 167.84 102.65Cys 242.13 162.57 95.75 162.59 165.12 154.21 170.80Gln 238.28 184.47 152.38 187.21 173.56 164.99Glu 267.34 197.96 188.74 146.46 127.12Gly 279.17 195.96 180.66 168.83His 254.56 146.92 179.36Ile 315.68 152.99Leu 267.15LysMetMetfPheProSerThrTrpTyrVal

171

Page 5: Relationships Among Isoacceptor tRNAs Seems to Support the Coevolution Theory of the Origin of the Genetic Code

acid names are in the third column. In the case where acodon might have previously been assigned to anotheramino acid, the name of that acid is shown in parenthe-ses. Capital letters C, P, and R in the fourth column markthe clustering of amino acid codons (which is the same asfor the sets of isoaccepting tRNAs in Fig. 1) in accor-dance with the three main hypotheses on the origin of thegenetic code. ‘‘C’’ corresponds to the coevolution hy-pothesis (Wong 1975); ‘‘P,’’ the physicochemical hy-pothesis (Woese et al. 1966a); and ‘‘R,’’ the hypothesis

of ambiguity reduction of the genetic code (Fitch andUpper 1987). Leucine codons are listed to the right of thescheme-table in Fig. 2 because transfer RNAs of leucineand serine, according to the dendrogram in Fig. 1 (seealso Table 3), have the same value for relationships be-tween themselves and with other isoacceptor tRNA sets.Though clustering of the tRNALEU and tRNASER sets isnot in agreement with any of the three hypotheses men-tioned above, it is quite natural from the point of view ofa united box of PyPyN codons. The stop- codon UGA is

Fig. 1. A dendrogram constructed on theN(x,z) matrix (Table 3) ofgeneralized interrelationships between the isoacceptor tRNA sets. Themethod of clustering is nearest neighbor (single-link). Values of inter-nal and external relationships are the arguments of standard normal

distribution. The horizontal growth of each branch is terminated on thelevel of the generalized interrelationship value within the isoacceptortRNA set. The values of internal relationships are present on the sameline together with the set’s name. See text for further explanation.

Table 3. Extended

Lys Met Metf Phe Pro Ser Thr Trp Tyr Val

Ala 207.40 217.93 167.45 208.24 204.51 157.82 237.14 193.47 172.40 246.08Arg 243.60 206.67 173.25 209.12 177.09 183.37 218.61 234.08 184.36 191.45Asn 279.54 229.02 208.98 235.77 148.77 176.52 245.75 189.20 230.61 186.77Asp 192.77 161.91 132.97 143.33 174.62 129.64 173.74 171.56 177.46 207.00Cys 163.59 153.65 152.81 222.13 134.82 206.65 178.91 185.83 207.70 134.55Gln 178.85 168.76 171.85 136.95 180.93 162.66 156.86 195.20 175.28 159.29Glu 188.76 159.24 140.00 103.02 162.35 121.84 152.75 139.67 135.66 180.87Gly 207.07 177.25 132.75 213.38 170.27 154.08 204.67 200.37 162.27 209.18His 169.65 150.21 115.10 195.59 162.47 154.65 168.24 147.48 185.61 142.99Ile 267.13 267.98 220.36 237.94 187.23 172.91 250.04 209.18 216.17 236.10Leu 185.41 198.10 190.66 206.78 163.25 207.22 170.82 183.85 178.09 129.66Lys 330.13 265.93 193.56 270.76 173.70 188.91 266.15 224.22 226.85 186.20Met 274.11 261.75 240.18 174.73 182.88 236.72 215.17 198.20 210.98Metf 330.48 194.17 163.57 162.78 176.86 172.86 170.92 172.92Phe 322.67 152.78 188.16 240.11 205.93 262.64 212.26Pr 254.34 140.71 175.85 169.61 144.90 192.30Ser 258.54 189.90 203.89 190.90 145.90Thr 278.57 203.10 200.59 204.13Trp 263.61 198.08 177.20Tyr 282.64 159.62Val 274.44

a The interrelationship values are given as the arguments for a standard normal distribution. The matrix is symmetrical relative to its two maindiagonals.

172

Page 6: Relationships Among Isoacceptor tRNAs Seems to Support the Coevolution Theory of the Origin of the Genetic Code

Fig. 2. A scheme-table, where amino acid codons range from top tobottom in the same order as the sets of isoacceptor tRNAs in Fig. 1 gofrom left to right. Codons and names of amino acids occupy the firstthree columns in the table. If a codon might have previously beenassigned to another amino acid, then the name of that amino acid isshown inparentheses. Boxesdivided by arrows include amino acidcodons, which differ by transitions. Separate columns ofcapital

lettersdenote the codons clustering in accordance with the coevolutionhypothesis (C), the physicochemical hypothesis (P), and the hypothesisof ambiguity reduction of the genetic code (R). Each column representsone cluster. Leucine and UGA stop codons are shown to theright tosimplify the clustering pattern of the other codons. However, placingthe UAN codon box at the right emphasizes the interruption of theAPyN codon box assimilation. See text for discussion.

173

Page 7: Relationships Among Isoacceptor tRNAs Seems to Support the Coevolution Theory of the Origin of the Genetic Code

also listed at the right of the scheme-table to allow moreconvenient analysis of amino acid codon clustering. Thebox containing codons for tyrosine and two stop- codonsis similarly displaced to emphasize its role in interruptionof the AYN codon cluster.

The codon clustering in Fig. 2 is now considered withrespect to the hypotheses on the origin of the geneticcode. The hypothesis of coevolution (Wong 1975, 1981;Taylor and Coates 1989) implies that, through a media-tion of ancestral tRNA molecules, accepted amino acidswhich are biosynthetically linked via a precursor–product relationship could allow the precursor to passsome of its codons to the product. In this case transferRNAs, whose amino acids are related via biosyntheticconversions, would show a higher correlation betweenthemselves than with other tRNAs. Groups of amino acidcodons, which are clustered in Fig. 2 in accordance withthe coevolution hypothesis, are indicated by C and in-clude (GLN,HIS,PRO), (SER,GLY), (ASP,GLU), (TRP,CYS), (ALA,VAL), (THR,ILE,MET), (ASN,LYS). Dataon biosynthetic relationships were taken from Wong(1975) and Taylor and Coates (1989). The codons of 16amino acids of 20 possible are clustered in agreementwith the coevolution theory.

The physicochemical hypothesis (Sonneborn 1965;Woese et al. 1966a, Jungck 1978; Weber and Lacey1978; Lacey et al. 1992) put forward physicochemicalproperties of amino acids as the basis for the structuringof the genetic code. In this case, it is expected that tRNAsof similar amino acids would show a higher correlationthan tRNAs of amino acids with different physicochem-ical properties. There are many directions to the polardistances between amino acids if basic factors were in-volved in the arrangement of amino acids over the ge-netic code (Woese et al. 1966a, b; Di Giulio 1989; Haigand Hurst 1991; Goldman 1993). So the absolute mea-sure of the difference between amino acid polarity valueswas chosen as the distance measure between two aminoacids. The values of amino acid polarities were takenfrom Woese et al. (1966a). Pairs of tRNA sets seem tocluster according to the minimum differences in polarityvalues of amino acids, denoted P, and include (GLN,HIS), (SER,GLY), (ASP,GLU), (CYS,GLU), and (ASN,LYS). In Fig. 2 codon clusters of 9 amino acids of 20possible fit the physicochemical hypothesis on the originof the genetic code.

The hypothesis of ambiguity reduction of the geneticcode (Fitch and Upper 1987) supposes that ancestral mo-lecular adaptors were initially unable to distinguish be-tween amino acid codons and only later developed theability to differentiate between purines and pyrimidinesin either the first or the second codon base position.Finally, tRNAs developed a sensitivity to both kinds ofpurine and pyrimidine. Following Di Giulio (1995) wedistinguished the second-position codon as a marker be-cause of the correlation between amino acid physico-

chemical properties and base position (Nelsestuen 1978;Wolfenden et al. 1979; Sjostrom and Wold 1985). Theclustering pattern of isoacceptor tRNA sets in Fig. 1,which supports the ambiguity reduction of the geneticcode, is best seen in Fig. 2, where the codons of differentamino acids with the same second base are nearest neigh-bors. In this case only UCN were considered as serinecodons, so codons AGU and AGC could be captured byserine much later. Codons of phenylalanine UUU andUUC were not considered in this case because UUNcodons might be the subject of controversy between phe-nylalanine and leucine amino acids. Groups of codons ofamino acids supporting this hypothesis are noted R andinclude (GLN,HIS), (PRO,SER), (ASP,GLU), (CYS,TRP,ARG), (ILE,MET), and (ASN,LYS). Thirteen oftwenty possible amino acids show clustering which is inagreement with the hypothesis.

Figure 2 shows that of 20 possible amino acids, 16 areclustered in accordance with the coevolution hypothesis,13 in accordance with the hypothesis of ambiguity re-duction of the genetic code, and 9 according to the phys-icochemical hypothesis of the origin of the genetic code,i.e., 80, 65, and 45%, respectively. The results thereforeimply that the greatest contribution to the structuring ofthe genetic code was from relationships between precur-sor and product amino acids and support earlier conclu-sions in favor of the coevolution theory on the origin ofthe genetic code (Di Giulio 1994, 1995). It is worthnoting, though, that the clustering pattern according tothe coevolution hypothesis overlaps with the patterns ofthe other two hypotheses by about 50%.

The separate boxes in Fig. 2 link amino acid codons inwhich the first two bases are either identical or connectedby transitions, so, moving from the top to bottom of thetable, PyPuN→PyPyN→PuPuN→PyPuN→PuPyN→PyPuN→PuPyN→PyPyN→PuPuN. Exchange betweenPu and Py in the second codon position led us to supposean ordered assimilation of codons by ancestral tRNAsduring the evolution of tRNAs. If highly specific adap-tors have arisen for certain codons, the triplets of nucleo-tides, which were complementary to these codons, havebeen given a priority to form their own specific transferRNAs. Though mechanisms by which this could occurare not obvious, some speculation is possible. The hy-percycle theory (Eigen and Schuster 1979) postulatesthat the ancestors of modern transfer RNAs haveemerged as short symmetrical double-helixes about 73bases or more long, whose complementary strandsserved as pre-tRNAs with complementary anticodons(Rodin et al. 1993). It is true that further improvement ofthe adaptors to complementary codons should have beenproceeding in parallel and under close competition. Un-fortunately, such a model of ancestral adaptors is not inagreement with the pre-tRNA model which follows fromthe genomic tag model (Maizels and Weiner 1993,1994). This model defines the upper half of the modern

174

Page 8: Relationships Among Isoacceptor tRNAs Seems to Support the Coevolution Theory of the Origin of the Genetic Code

tRNA (the acceptor stem and TCC arm) as the moreancient structural domain, with the lower half (the dihi-drouracil arm and anticodon arm) arising later. This viewof tRNA evolution is currently dominant (Schimmel andde Populana 1995). Nevertheless, we believe that it ispossible to combine the pre-tRNA model of Rodin et al.(1993) with the contemporary idea of tRNA consisting oftwo structural domains.

For many codons Fig. 2 seems to allow for sequentialassimilation of codons with specific adaptors. For ex-ample, leucine codons (UUG,CUG) are complementaryto glutamine codons (CAA,CAG) and the high-speci-ficity tRNAsLEU were established soon after tRNAsGLN.This could also be true of complementary codon pairsGAG–Gln and CUC–Leu, GGG-Gly and CCC–Pro, andmany others. If such a development occurred, then thebreak in the assimilation of PuPyN codons and transferto tyrosine codons must be explained (Fig. 2). The for-mation of specific adaptors for codons PuPyN startedfrom the methionine codon AUG and development couldhave been prolonged due to the additional demands ofbinding with the initiating factor and small ribosomalsubunit from initiator tRNAMET. At this point the tyro-sine codon UAC was already distinguished as comple-mentary to the valine codon GUA and consequently hadpriority to form its own specific adaptors. Owing to theprolonged pause in assimilation of PuPyN codons, theformation of high-specificity adaptors for tyrosinecodons was complete before methionine initiator tRNAs.

In conclusion, the results obtained here for transferRNAs support the theory of coevolution, with biosyn-thetic links between amino acids as the dominant factorin shaping the structure of the genetic code. For 80% ofamino acids their isoacceptor tRNAs were foundgrouped according to the biosynthetic pathways of theaccepted amino acids. Our results also demonstrate thesignificance of physicochemical properties (namely, po-larity) in establishing biosynthetic relationships sinceclusters of isoacceptor tRNAs which support both thephysicochemical and the coevolution hypotheses coin-cided in 50% of cases. Some support was also obtainedfor the ambiguity reduction hypothesis.

Acknowledgment. We thank Dr. M. Di Giulio for his critical andhelpful remarks.

References

Alberts B, Bray D, Lewis J, Raff M, Roberts K, Watson JD (1995)Molecular biology of the cell, 3rd ed. Garland, New York, pp17–21

Cedergren RJ (1982) An evaluation of mitochondrial tRNA gene evo-lution and its relation to the genetic code. Can J Biochem 60:475–479

Cedergren RJ, LaRue B, Sankoff D, Lapalme G, Grosjean H (1980)Convergence and minimal mutation criteria for evaluating earlyevents in tRNA evolution. Proc Natl Acad Sci USA 77:2791–2793

Cedergren RJ, Sankoff D, LaRue B, Grosjean H (1981) The evolvingtRNA molecule. CRC Crit Rev Biochem 11:35–104

Crick FHC (1966) Codon-anticodon pairing: The wobble hypothesis. JMol Biol 19:548–555

Crick FHC (1968) Origin of the genetic code. J Mol Biol 38:376–379Di Giulio M (1989) The extension reached by the minimization of the

polarity distances during the evolution of the genetic code. J MolEvol 29:288–293

Di Giulio M (1994) The phylogeny of tRNA molecules and the originof the genetic code. Origins Life 24:425–434

Di Giulio M (1995) The phylogeny of tRNAs seems to confirm thepredictions of the coevolution theory of the origin of the geneticcode. Origins Life 4:1–16

Duda RO, Hart PE (1973) Pattern classification and scene analysis.John Wiley & Sons, New York London Sydney Toronto, p 253

Eigen M, Schuster P (1979) Hypercycle: A principle of natural self-organization. Springer, New York

Eigen M, Winkler-Oswatitsch R, Dress A (1988) Statistical geometryin sequence space: A method of quantitative comparative sequenceanalysis. Proc Natl Acad Sci USA 85:5913–5917

Eigen M, Lindemann BF, Tietze M, Winkler-Oswatitsch R, Dress A,Von Haeseler A (1989) How old is the genetic code? Statisticalgeometry of tRNA provides an answer. Science 244:673–679

Fitch WM, Margoliash E (1968) The construction of phylogenetictrees. Brookhaven Symp Biol 21:217

Fitch WM, Upper K (1987) The phylogeny of tRNA sequences pro-vides evidence for ambiguity reduction in the origin of the geneticcode. Cold Spring Harbor Symp Quant Biol 52:759–767

Goldman N (1993) Further results on error minimization in the geneticcode. J Mol Evol 37:662–664

Haig D, Hurst LD (1991) A quantitative measure of error minimizationin the genetic code. J Mol Evol 33:412–417

Hendry LB, Bransome ED Jr, Petersheim M (1981) Are there structuralanalogies between amino acids and nucleic acids? Origins Life11:203–221

Jungck JR (1978) The genetic code as a periodic table. J Mol Evol11:211–224

Koroljuk VS, Portenko NI, Skorochod AV, Turbin AF (1985) Thehandbook of the theory of probability and mathematical statistics.Nauka, Moscow, p 125

Kullback S (1958) Information theory and statistics. John Wiley andSons, New York, p 125

Lacey JC Jr, Mullins DW Jr (1983) Experimental studies related to theorigin of the genetic code and the process of protein synthesis—Areview. Origins Life 13:3–42

Lacey JC Jr, Wickramasingle NSMD, Cook GW (1992) Experimentalstudies on the origin of the genetic code and the process of proteinsynthesis—A review. Origins Life 22:243–275

LaRue B, Cedergren RJ, Sankoff D, Grosjean H (1979) Evolution ofmethionine initiator and phenylalanine transfer RNAs. J Mol Evol14:287–300

Maizels N, Weiner AM (1993) The genomic tag hipothesis: Modernviruces as molecular fossils of antient strategies for genomic rep-lication. In: Gesteland RE, Atkins JF (eds) The RNA World. ColdSpring Harbor Laboratory Press; Plainview, NY, pp 577–602

Maizels N, Weiner AM (1994) Phylogeny from function: evidens fromthe molecular fossil record that tRNA originated in replication, nottranslation. Proc Natl Acad Sci USA 91:6729–6734

Nelsestuen GL (1978) Amino acid directed nucleic acid synthesis—Possible mechanism in origin of life. J Mol Evol 11:109–120

Nicoghosian K, Bigras M, Sankoff D, Cedergren R (1987) Archetypi-cal features in tRNA families. J Mol Evol 26:341–346

Osawa S, Jukes TH (1989) Codon reassignment (codon capture) inevolution. J Mol Evol 28:271–278

Osawa S, Jukes TH, Watanabe K, Muto A (1992) Recent evidence forevolution of the genetic code. Microbiol Rev 56:229–264

Rodin S, Ohno S, Rodin A (1993) Transfer RNAs with complementary

175

Page 9: Relationships Among Isoacceptor tRNAs Seems to Support the Coevolution Theory of the Origin of the Genetic Code

anticodons: Could they reflect early evolution of discriminativegenetic code adaptors? Proc Natl Acad Sci USA 90:4723–4727

Schimmel P, de Populana LR (1995) Transfer RNA: From minihelix togenetic code. Cell 81:983–986

Sjostrom M, Wold S (1985) A multivariate study of the relationshipbetween the genetic code and the physical-chemical properties ofamino acids. J Mol Evol 22:272–277

Sonneborn TM (1965) In: Bryson V, Vogel HJ (eds) Evolving genesand proteins. Academic Press, New York, pp 377–397

Steinberg S, Misch A, Sprinzl M (1993) Compilation of tRNA se-quences and sequences of tRNA genes. Nucleic Acids Res 21:3011–3015

Szathmary E (1991) Codon swapping as a possible evolutionarymechanism. J Mol Evol 32:178–182

Szathmary E, Zintzaras E (1992) A statistical test of hypotheses on theorganization and origin of the genetic code. J Mol Evol 35:185–189

Taylor FJR, Coates D (1989) The code within the codons. BioSystems22:177–187

Weber AL, Lacey JC Jr (1978) Genetic code correlations—Aminoacids and their anticodon nucleotides. J Mol Evol 11:199–210

Woese CR, Dugre DH, Saxinger WC, Dugre SA (1966a) The molecu-lar basis for the genetic code. Proc Natl Acad Sci USA 55:966–974

Woese CR, Dugre SA, Kando M, Saxinger WS (1966b) On the fun-damental nature and evolution of the genetic code. Cold SpringHarbor Symp Quant Biol 31:720–736

Wolfenden RV, Cullis PM, Southgate CCB (1979) Water, protein fold-ing, and the genetic code. Science 206:575–577

Wong JT-F (1975) A co-evolution theory of the genetic code. Proc NatlAcad Sci USA 72:1909–1912

Wong JT-F (1981) Coevolution of genetic code and amino acids bio-synthesis. Trends Biochem Sci 6:33–36

176

Page 10: Relationships Among Isoacceptor tRNAs Seems to Support the Coevolution Theory of the Origin of the Genetic Code

Appendix

Fig. A1. The accession codes of the tRNA sequences and tRNA genes from Steinberg et al. (1993).

177


Recommended