+ All Categories
Home > Documents > relatedness in Evolutionand aminoacyl-tRNA two synthetase ... · pec 35 171 c sec 158 303 430 syc...

relatedness in Evolutionand aminoacyl-tRNA two synthetase ... · pec 35 171 c sec 158 303 430 syc...

Date post: 03-Nov-2018
Category:
Upload: vokhuong
View: 214 times
Download: 0 times
Share this document with a friend
5
Proc. Natl. Acad. Sci. USA Vol. 88, pp. 8121-8125, September 1991 Evolution Evolution and relatedness in two aminoacyl-tRNA synthetase families GLENN M. NAGEL* AND RUSSELL F. DOOLITTLE Center for Molecular Genetics, M-034, University of California, San Diego, La Jolla, CA 92093 Contributed by Russell F. Doolittle, June 24, 1991 ABSTRACT Sequence segments of about 140 amino acids in length, each containing a selected consensus region, were used in alignments of the amiyl-tRNA synthetases with the aim of discerning their evolutionary relationships. In all cams tested, enzymes specific for the same amino acid from a variety of organisms grouped together, reinforcing the supoton that the aminoacyl-tRNA synthetases are very ancient enzymes that evolved to indude the full complement of 20 amino acids long before the divergence leading to prokaryotes and eukaryotes. The enzymes are divided into two mutually exclusive groups that appear to have evolved from independent roots. Group I, for which two sequence segments were analyzed, contains the enzymes specific for amic acid, glutine, tryptophan, tyrosine, valine, leucine, methionin, and argmine. Group II enzymes include those acti- vating threonine, proline, serine, lysine, aspartic acid, hisidine, alanine, glycine, and phenylaanine. Both groups contain a spectum of amino acid types, s ing the pobity that each could have once supported an independent system for protein synthesis. Within each group, enzymes specific for c ly similar amino aids tend to duster together, indicating that a major theme of synthetase evolution involved the a ion of binding sites to accommodate related amino acids with subsequent special- ization to a Asle amino acid. In a few cases, however, synthetases activating diimiar amino acids are grouped together. Aminoacyl-tRNA synthetases catalyze the esterification or "charging" of a single amino acid to its cognate tRNA; thus, 20 such enzymes, one enzyme specific for each amino acid found in proteins, constitute a minimum set for protein biosynthesis. The evolution and structural relatedness of these enzymes has been a subject of intense interest for many years (for review, see refs. 1-3). Because they appear to participate universally in protein synthesis, the origins of these "activating" enzymes must be very ancient (4) and studies of their divergence may shed sigif- icant light onto the development of the genetic code and its expression. In addition, the structural basis for nucleic acid- protein recognition and the manner in which enzymes have come to activate only a single amino acid and cognate tRNA while excluding a large number of structurally similar molecules is a subject of great interest. As the sequences of more and more of the aminoacyl-tRNA synthetases became available, we under- took a study of the interrelationships within the more than 50 reported sequences with the aims of tracing their evolution and of identifying more conserved sequence segments that are pre- sumably essential to their biological function. Because these enzymes catalyze the same overall reaction and utilize a common strategy for chemical activation of their amino acids, aminoacyl-AMP being formed at the expense of ATP, it has long been supposed that all these enzymes had a common ancestral root, even though large differences in polypeptide chain length (303-1104 amino acids) and quater- nary structure (a, a2, a4, and a2/32) appeared to argue for more diversity. Initially, studies of primary sequence simi- larities yielded only limited regions of clear relatedness. The "HIGH" (His-Ile-Gly-His) consensus or signature sequence was identified early in a small group of enzymes (5), but other enzymes appeared to lack this motif. In addition, the "KM- SKS" (Lys-Met-Ser-Lys-Ser) sequence was revealed (6, 7) in another group that overlapped significantly in membership with the HIGH enzymes. The HIGH region has been impli- cated in amino acid activation and the KMSKS region has been implicated in "docking" the acceptor stem of tRNA and/or transferring amino acid to the 3' end of tRNA in these enzymes (7-11). More recently a third consensus sequence denoted "GLER" (Gly-Leu-Glu-Arg) was observed (12, 13). Because the enzymes containing this sequence formed a group apparently exclusive of the earlier group, it was hypothesized (13) that two separate evolutionary families of the synthetases exist. The first x-ray crystallographic structure of an enzyme from the GLER group (14), seryl-tRNA synthetase, SerRS,t supported this view, there being virtually no structural similarity between SerRS and en- zymes of the HIGH group MetRS (15), TyrRS (16), and GlnRS (17). Ourown studies, which were in progress when these reports appeared, are in agreement with the existence of two separate groups, designated here simply as group I and group II. The work described in this paper identifies sequence segments in members of each group and applies standard computer methods to align the sequences and to discern their evolutionary relationships. MATERIALS AND METHODS Amino Acid Sequences. The amino acid sequences used in this study either were taken from the National Biomedical Research Foundation sequence collection (Version 21), translated from DNA sequences in GenBank (Release 61) or EMBL (Release 18 on compact disc, April 1989) or were entered by us directly from the original literature. The latter included GluRS from Rhizobium meliloti (18), LysRS from yeast (19), and yeast GlnRS (20). Programs. Sequence alignments were made by the progressive method (21). Phylogenetic trees were constructed from multiple sequence alignments by a matrix procedure (22) and by a nearest- neighbor character analysis (23). The program for the latter method is called PAPA (parsimony after progressive alignment). Best trees were defined as those with the lowest percent standard deviations and no negative branch lengths. The most similar *Permanent address: Department of Chemistry and Biochemistry, California State University, Fullerton, CA 92634. tAbbreviations: Aminoacyl-tRNA synthetases specific for a given amino acid have been abbreviated by the convention employing the three-letter designation for the amino acid followed by RS; e.g., SerRS denotes seryl-tRNA synthetase, etc. Where it was necessary to include a designation of the biological source of the enzyme, the abbreviation was modified to include the single-letter symbol for the amino acid along with ec for Escherichia coli, bs for Bacillus stearothermophilus, rm for Rhizobium meliloti, and yc (yeast) for Saccharomyces cerevisiae; e.g., E. coli seryl-tRNA synthetase is Sec. etc. 8121 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.
Transcript

Proc. Natl. Acad. Sci. USAVol. 88, pp. 8121-8125, September 1991Evolution

Evolution and relatedness in two aminoacyl-tRNAsynthetase familiesGLENN M. NAGEL* AND RUSSELL F. DOOLITTLECenter for Molecular Genetics, M-034, University of California, San Diego, La Jolla, CA 92093

Contributed by Russell F. Doolittle, June 24, 1991

ABSTRACT Sequence segments of about 140 amino acids inlength, each containing a selected consensus region, were used inalignments of the amiyl-tRNA synthetases with the aim ofdiscerning their evolutionary relationships. In all cams tested,enzymes specific for the same amino acid from a variety oforganisms grouped together, reinforcing the supoton that theaminoacyl-tRNA synthetases are very ancient enzymes that evolvedto indude the full complement of 20 amino acids long before thedivergence leading to prokaryotes and eukaryotes. The enzymesare divided into two mutually exclusive groups that appear to haveevolved from independent roots. Group I, for which two sequencesegments were analyzed, contains the enzymes specific for amicacid, glutine, tryptophan, tyrosine, valine, leucine,methionin, and argmine. Group II enzymes include those acti-vating threonine, proline, serine, lysine, aspartic acid,hisidine, alanine, glycine, and phenylaanine. Both groups containa spectum ofamino acid types, s ing thepobity that eachcould have once supported an independent system for proteinsynthesis. Within each group, enzymes specific for c lysimilar amino aids tend to duster together, indicating that a majortheme of synthetase evolution involved the a ion of bindingsites to accommodate related amino acids with subsequent special-ization to a Asle amino acid. In a few cases, however, synthetasesactivating diimiar amino acids are grouped together.

Aminoacyl-tRNA synthetases catalyze the esterification or"charging" of a single amino acid to its cognate tRNA; thus, 20such enzymes, one enzyme specific for each amino acid found inproteins, constitute a minimum set for protein biosynthesis. Theevolution and structural relatedness of these enzymes has beena subject of intense interest for many years (for review, see refs.1-3). Because they appear to participate universally in proteinsynthesis, the origins of these "activating" enzymes must bevery ancient (4) and studies of their divergence may shed sigif-icant light onto the development of the genetic code and itsexpression. In addition, the structural basis for nucleic acid-protein recognition and the manner in which enzymes have cometo activate only a single amino acid and cognate tRNA whileexcluding a large number of structurally similar molecules is asubject of great interest. As the sequences of more and more ofthe aminoacyl-tRNA synthetases became available, we under-took a study of the interrelationships within the more than 50reported sequences with the aims of tracing their evolution andof identifying more conserved sequence segments that are pre-sumably essential to their biological function.Because these enzymes catalyze the same overall reaction

and utilize a common strategy for chemical activation of theiramino acids, aminoacyl-AMP being formed at the expense ofATP, it has long been supposed that all these enzymes had acommon ancestral root, even though large differences inpolypeptide chain length (303-1104 amino acids) and quater-nary structure (a, a2, a4, and a2/32) appeared to argue formore diversity. Initially, studies of primary sequence simi-

larities yielded only limited regions of clear relatedness. The"HIGH" (His-Ile-Gly-His) consensus or signature sequencewas identified early in a small group ofenzymes (5), but otherenzymes appeared to lack this motif. In addition, the "KM-SKS" (Lys-Met-Ser-Lys-Ser) sequence was revealed (6, 7) inanother group that overlapped significantly in membershipwith the HIGH enzymes. The HIGH region has been impli-cated in amino acid activation and the KMSKS region hasbeen implicated in "docking" the acceptor stem of tRNAand/or transferring amino acid to the 3' end oftRNA in theseenzymes (7-11).More recently a third consensus sequence denoted "GLER"

(Gly-Leu-Glu-Arg) was observed (12, 13). Because the enzymescontaining this sequence formed a group apparently exclusive ofthe earlier group, it was hypothesized (13) that two separateevolutionary families of the synthetases exist. The first x-raycrystallographic structure of an enzyme from the GLER group(14), seryl-tRNA synthetase, SerRS,t supported this view, therebeing virtually no structural similarity between SerRS and en-zymes of the HIGH group MetRS (15), TyrRS (16), and GlnRS(17). Ourown studies, which were in progress when these reportsappeared, are in agreement with the existence of two separategroups, designated here simply as group I and group II. The workdescribed in this paper identifies sequence segments in membersofeach group and applies standard computermethods to align thesequences and to discern their evolutionary relationships.

MATERIALS AND METHODSAmino Acid Sequences. The amino acid sequences used in

this study either were taken from the National BiomedicalResearch Foundation sequence collection (Version 21),translated from DNA sequences in GenBank (Release 61) orEMBL (Release 18 on compact disc, April 1989) or wereentered by us directly from the original literature. The latterincluded GluRS from Rhizobium meliloti (18), LysRS fromyeast (19), and yeast GlnRS (20).Programs. Sequence alignments were made by the progressive

method (21). Phylogenetic trees were constructed from multiplesequence alignments by a matrix procedure (22) and by a nearest-neighbor character analysis (23). The program for the lattermethod is called PAPA (parsimony after progressive alignment).Best trees were defined as those with the lowest percent standarddeviations and no negative branch lengths. The most similar

*Permanent address: Department of Chemistry and Biochemistry,California State University, Fullerton, CA 92634.

tAbbreviations: Aminoacyl-tRNA synthetases specific for a givenamino acid have been abbreviated by the convention employing thethree-letter designation for the amino acid followed by RS; e.g.,SerRS denotes seryl-tRNA synthetase, etc. Where it was necessaryto include a designation of the biological source of the enzyme, theabbreviation was modified to include the single-letter symbol for theamino acid along with ec for Escherichia coli, bs for Bacillusstearothermophilus, rm for Rhizobium meliloti, and yc (yeast) forSaccharomyces cerevisiae; e.g., E. coli seryl-tRNA synthetase isSec. etc.

8121

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

8122 Evolution: Nagel and Doolittle

portions of various pairs of sequences were identified with aprogram called INSPECT (24).

RESULTSThe primary sequence segments we have used in our analysis aredepicted diagramatically in Fig. 1. In some instances, it wasnecessary to delete strings of 5-43 amino acids to obtain correctalignments. Group I enzymes contain two conserved sequencesthat served as identifiers. The N-terminal segment includes theHIGH sequence and the C-terminal region is characterized bythe KMSKS sequence. The large enzymes specific for valine,isoleucine, and leucine also include a central region. MetRS, thenext most closely related enzyme, contains a shortened versionof this region, whereas in the smallest enzymes (e.g., TrpRS andTyrRS), it is absent altogether. The size ofthe smaller enzymes,therefore, limits the sequence length that could be utilized inalignments of group I.

In this analysis, our attention has focused primarily on theenzymes of bacterial origin from yeast cytoplasm, excludingfor the moment sequences from organelles and from highereukaryotes. Restricting the data set in this way simplified theanalysis not only by reducing the number of sequences butalso by eliminating some sequences, particularly those ofmitochondrial origin, which were more variant and some-times contained strings with little or no sequence similaritiesto related enzymes. In every case, however, enzymes spe-cific for the same amino acid could be readily aligned with oneanother after being suitably trimmed, regardless of source.As a general rule, these enzymes clustered together and weremore similar to each other than they were to enzymes specificfor any other amino acid.A complete sequence alignment for the N-terminal seg-

ments of 16 group I enzymes, specific for nine amino acids,is shown in Fig. 2. In applying the progressive alignmentprocedure (22), care was taken to optimize the alignment

A Group

scores by exploring alternate orders of sequence input intothe DFALIGN program and byjudicious trimming ofregions tobe aligned. The output from the scoRE and DFALIGN pro-grams was particularly useful in this regard and, in the finalresult after several iterations, the input and output order ofsequences to and from DFALIGN was the same.For enzymes specific for the same amino acid, the percent

identity varied from 64% (Yect vs. Ybs) to 28% (lec vs. Iyc).For enzymes specific for different amino acids, values rangedfrom 29%o (Vyc vs. Lec) to 7% (Wbs vs. Mec) identical.Because these sequences are quite divergent, we applied anumber of internal checks for consistency. First, we calcu-lated distance scores by two independent approaches, matrixand PAPA, with the aim of finding agreement with regard toboth the branching order and relative branch lengths. As anadditional means of tracing the evolution of these proteins,we applied an identical and independent analysis to theC-terminal sequence segments (Fig. 3). The statistical prop-erties (percent identities and distance scores) of the alignedsequences were remarkably like those of the N-terminalregion. (See the two trees obtained by the PAPA analysis inFig. 5 A and B for the N-terminal and C-terminal segments,respectively.) The main difference between the two trees wasthe relative branching of leucine and isoleucine.The same approach was used for group II (14 enzymes

specific for 10 amino acids) except that only a single sequencesegment was analyzed. The range of sequence identities inthe aligned regions (Fig. 4) is very similar to that found for thegroup I enzymes. For enzymes specific for the same aminoacid, the observed range was 50%o (Tec vs. Tyc) to 31% (Secvs. Syc). For enzymes charging different amino acids, therange was 33% (Nec vs. Dyc) to 6% (Aec vs. Syc and Nec vs.Sec). The matrix and PAPA methods for calculating evolu-tionary distances were in agreement (Fig. SC). Both methodsyielded the same branching order with no negative branches.

B Group II

Yec 34 1 77 217 374Ye 423

Ybs 32 172 213 370Ybs ~~~~419Wec 5 144 180 322-33~34Wbs 3 141 177 316Wbsinin~327Eec2 138 220 363Eec ~~~~~471Erm 5 141 236 374

484

Qec 26 162 251 392 550

Qyc 251 391 479 621

Vec 35 184

Vyc 183

Lec 35 167

lec 51 196

lyc 40 189

Mec 7 137 316 455

332

Tec 259 395

345 481TycPec 35 171 c

Sec 158 303 430

Syc 173 314 462

Kec 356 500Kec 505

Kyc 432 575

-809

537 717951

686 867

559 742860

Dyc

Nec

1104 Hec

Hyc

585 728

585 735

410 550-5 557448 586

208 330424

272 426

Aec 126 261

G~c65 188GSec 66 | 303

FSec 188 320

327

Myc198 329 509 653 75 1

Rec 115 270 362 577

FIG. 1. Aminoacyl-tRNA synthetase sequence segments used in alignments of group I (A) and group 11 (B) enzymes. Segments are shownas bold lines with the N- and C-terminal limits ofeach denoted above. Enzymes are abbreviated using the single-letter designation for the specificamino acid activated followed by two lowercase letters for the organism. For the phenylalanine- and glycine-specific enzymes, "S" indicatesthat the sequences of the small subunits were used. Only bacterial and yeast cytoplasmic sequences were used in this analysis. For Erm, aminoacids 74-81 were removed prior to alignment of the N-terminal segments of group I. Similarly, the following residues were removed from theC-terminal segments of group I enzymes: from Yec, residues 251-262; from Ybs, residues 247-258; from Vec, residues 573-609; from Vyc,residues 722-758; from lyc, residues 666-671; and from Lec, residues 573-615.

734

572

591

Proc. Natl. Acad. Sci. USA 88 (1991)

Evolution: Nagel and Doolittle Proc. Natl. Acad. Sci. USA 88 (1991) 8123

Yee . . ALYCGFDPTADS LHLGHL VP LLCLKRFQQAGHKPVALVGGATGLIGDPSFKAA ERKLNTEETVQE WV DK IRKYbs . . TLYCGFDPTADS LHIGHL AT ILTMRRFQQAGHRPIALVGGATGLIGDPSGKKS ERTLNAKETVEA WS AR IKEWee . . IVFSGAQPSGE LTIGNY MG ALRQWVKMQDDYHCIYCIVDQHAITVRQDAQKL RKAILDTLALYL AC GI DPEWbs .. TIFSGIQPSGV ITIGNY IG ALRQFVELQHEYNCYFCIVDQHAITVWQDPHEL RQNIRRLAALYL AV GI DPTEec .. KIKTRFAPSPTGYLHVGGA RT ALYSWLFARNHGGEFVLRIEDTDLE RSTPEAI EAIMDGMNWLSL EW DE GPYErm .. AVRVRIAPSPTGEPHVGTA YI ALFNYLFAKKHGGKFILRIEDTDAT RSTPEFE KKVLDALKWCGL EW SE$GPYQec ..TVHTRFPPEPNGYLHIGHA KS ICLNFGIAQDYKGQCNLRFDDTNPV KEDIEYV ESIKNDVEWLGF HW SG NVRQyc . . KVRTRFPPEPNGYLHIGHS KA IMVNFGYAKYHNGTCYLRFDDTNPE KEAPEYF ESIKRMVSWLGFKPW KITVec ..FCIMIPPPNVTGSLHMGHAFQQT IMDTMIRYQRMQGKNTLWQVGTDHAGIATQMVV ERKIAAEEGKTRHDY GA EAFVyc . . FCIPAPPPNVTGALHIGHALTIA IQDSLIRYNRMKGKTVLFLPGFDHAGIATQSVV EKQIWAKDRKTRHDY GR EAFIec ..FILHDGPPYANGSIHIGHSVNKI LKDIIVKSKGLSGYDSPYVPGWDCHGLPIELKV EQEYGKPGEKF TA AEFIyc ..FSFFDGPPFATGTPHYGHILAST IKDIVPRYATMTGHHVERRFGWDTHGVPIEHII DKKLGITGKDDVFKY GL ENYLee . . YYCLSMLPYPSGRLHMGHVRNYT IGDVIARYQHMLGKNVLQPIGWDAFGLPAEGAA VKNNTAPAPWTMec . . ILVTCALPYANGSIHLGHMLEHI QADVWVRYQRMRGHEVNFICADDAHGTPIMLKA QQLGITPEQMIMyc . . ILITSALPYVNNVPHLGNIIGSVLSADIFARYCKGRNYNALFICGTDEYGTATETKA LEEGVTPRQLCRee . . IVVDYSAPNVAKEMHVGHLRSTI IGDAAVRTLEFLGHKVIRANHVGDWGTQFGMLIAWLEKQQQENAGEMELADLEGF YRD

Yee QVAPF LDFDCGENSAI AANNYDWFGNMNVLT FLRDIGKHFSVNQMINKEAVKQRLNREDQGISFTEFSYNLL..Ybs QLGRF LDFEADGNPAK IKNNYDWIGPLDVIT FLRDVGKHFSVNYMMAKESVQSRI ETGISFTEFSYMML..Wee KSTIF VQSHVPEHAQLGWALNCYTYFGELSRMT QFKDKSARYAENINAGLFDYPVLM AADILL YGTNLV..lbs QATLF IQSEVPAHAQAAWMLQCIVYIGELERMT QFKEKSAG KEAVSAGLLTYPPLM AADILL YNTDIV..Eee YQTKR FDRYNAVIDQMLEEGTAYKCYCSKERLE ALREEQMAKGEKPRYDGRCRHSH EHHA DDEPC..Erm RQSDR KDIYKPYVEKIVANGHGFRCFCTPERLE QMREAQRAAGKPPKYDGLCLSLS AEEV TSRVD..Qec YSSDY FDQLHAYAIELINKGLAYVDELTPEQIR EYRGTLTQ PGKN SPYRDR SVEENLALFEKMRA..Qyc YSSDY FDELYRLAEVLIKNGKAYVCHCTAEEIK RGRGIKEDGTPGGERYACKHRDQ SIEQNLQEFRDMRD..Vec IDKIWEWKAESGGTITRQMRRLGNSVDWDGERFTMDEGLSNAVKEVFVRLYKEDLIYRGK RLVNWDPKLRTAIS..Vyc VGKVWEWKEEYHSRIKNQIQKLGASYDWSREAFTLSPELTKSVEEAFVRLHDEGVIYRAS RLVINWSVKLNTAIS..Iec RAKCREYAATQVDGQRKDFIRLGVLGDISHPYLTMDFKTEANIIRALGKIIGNGHLHKGA KPVHWCVDCRSALA..Iyc NNECRSIVMTYASDWRKTIGRLGRWIDFDNDYKTMYPSFMESTWWAFKQLHEKGQVYRGF KVMPYSTGLTTPLS..Lee YDNIAYMKNQLKMLGFGYDWSRELATCTPEYYRWEQKFFTELYKKGLVYKKT SAVNWCPNDQTVLA..Mee GEMSQEHQTDFAGFNISYDNYHSTHSE ENRQLSELIYSRLKENGFIKNRT ISQLYDPEKGMFLP..Myc DKYHKIHSDVYKWFQIGFDYFGRTTTD KQTEIAQHIFTKLNCNGYLEEQS ?MKQLYCPVHNSYLA..Rec AKKHYDEDEEFAERARNYVVKLQSGDEYFREMWRKLVDITMTQNQITYDRLNVTLTRDDVM GESLYNPMLPGIVA..

FIG. 2. Multiple sequence alignment of the N-terminal sequence segments of group I enzymes. Precise locations of each sequence segmentare shown in Fig. 1. The $ symbol corresponds to the deleted string described in the legend to Fig. 1.

DISCUSSION In working with divergent sequences such as those exam-ined here, one is concerned about finding the "right" align-In view of the observation that the use of 20 aminoacyl-tRNA ments. Although we cannot be certain in having succeeded in

synthetases appears to be universal, it is logical to conclude this quest, it can be stated that many things are right about thethat the basic evolution of this part of the protein synthetic alignments and the resulting trees presented herein. (i) Allapparatus was completed and the full complement of amino alignments were based on recognizable consensus se-acids utilized prior to the time when the major groups of quences. (ih) Care was taken to employ sequence segmentspresent-day organisms diverged. This conclusion is sup- sufficiently large to yield statistically significant data butported by our observation that synthetases specific for the short enough to make visual inspection and analysis possible.same amino acid, but from diverse organisms, are more This allowed us, in some cases, to locate homologous regionssimilar to each other than they are to enzymes specific for any that initially escaped detection by computer analysis as aother amino acid. Thus, the data place the branching of the result of radical insertion or deletions. (iiM) Because theseprokaryotic and eukaryotic lineages later in time than the alignments are sensitive to the relative order in which se-branching and specialization of enzymes with regard to quences are submitted to the progressive alignment method,particular amino acids. many alternative combinations were tried and the process

Ybs . . ARAFGLTIPLVTKA DGTKFGKTESGTIWLDKEKT$DDRD VIRYLKYFTFL SKEEIEALEQELREAPEKRAAQKTLAEEVYec . . NQVFGLTVPLITKA DGTKFGKTEGGAVWLDPKKT$ADAD VYRFLKFFTFM SIEEINALEEEDKNSGKAPRAQYVLAEQVWec . . PKSGARVMSLLEPT KKMSKSDDNRNNVIGLLE DPKS VVKKIK RAVT DSDEPPVVRYDVQNKAGVSNLLDILSAVTWbs . . PKVGARIMSLVDPT KKMSKSDPNPKAYITLLD DAKT IEKKIK SAVT DSEG TIRYDKEAKPGISNLLNIYSTLSEec . . PVPVYAHVSMINGD DGKKLSKRH GAVSVMQYRD DGYL PEALLNYL VRLGWSHGDQEIFTREEMIKYFTLNAVSKSASAFNErm . . EPPVFMHLSLMRNA DKSKLSKRK NPTSISYYTA LGYL PEALMNFLGLFFIQIAEGEELLTMEELAEKFDPENLSKAGAIFDQec . . VHPRQYEFSRL NL EYTVMSKRK LNLLVTDKHV EGWD DPRMPTISGLR RRGYTAASIREFCKRIGVTKQDNTIEQyc .. FRPAQREYGRL NI TGTVLSKRK IAQLVDEKFV RGWD DPRLFTLEAIR RRGVPPGAILSFINTLGVTTSTTNIQVec .. PFHTVYMTGLIRDD EGQKMSKSK G NVIDPLDM VDGI$GTDALRFTLAALA STGRDINWDMKRLEGYRNFCNKLWNASRVyc .. PFKEVFCHSLVRDA QGRKMSKSL G NVIDPLDV ITGI$GTDAMRFALCAYT TGGRDINLDILRVEGYRKFCNKIYQATKIec .. PYCQVLTHGFTVDG QGRKMSKSI G NTVSPQDV MNKL GADILRLWVASTDY TGQMAVSDEILKR AADSY RRIRNTARIyc .. PYKNVIVSGIVLAA DGRKMSKSL K NYPDPSIV LNKY GADALRLYLINSPVLKAESLKFKEEGVKEVVSKVL LPWWNSFKLec .. PAKQLLCQGMVLAD$GMSKMSKSK N NGIDPQVM VERY GADTVRLFMMFAS PADMTLEWQESGVEGANRFLKRVWK LVMec .. KPSNLFVHGYV TV NGAKMSKSR G TFIKASTW LNHF DADSLRYYYTAKLSSRIDDIDLNLEDFVQRVNADIVNKV VNMyc ..MLHHLNTTEYL QY ENGKFSKSR GVGVFGNNAQ DSGI SPSVWRYYLASVRP ESSDSHFSWDDFVARNNSELLANL GNRec . . VPLEHHMFGMMLGK DGKPF KTR AGGTVKLADL LDEA LERARRLVAEKNPDMPADELE KLANAVGIGAVKY AD

Ybs TK LVHGEEALRQAIRISEALFSGDIANLTAAEIEQGFKDVPSFVHEGG DVPLVELLVSAGISPSKRQA..Yec TR LVHGEEGLQAAKRITECLFSGSLSALSEADFEQLAQDGVPMVEMEK GADLMQALVDSELQPSRGQA..Wec GQ SIPELEKQFEGKMYG HLKGEVADAVSGMLTELQERYHRFRNDEAFLQQVMKDGAEKASAHASRTLK..Wbs GQ SIEELERQYEGKGYG VFKADLAQVVIETLRPIQERYHHWMESEE LDRVLDEGAEKANRVASEMVR..Eec TD KLLWLNHHYI NALP PEYVATHLQWHIEQ ENIDTRNGPQLADLVKLLGERCKTLKEM AQSCRY..Erm IQ KLDWLNARWIREKLSEEEFAARVLAWAMDN E RLKEGLKLSQTRISKLGEL PDLAAF..Qec MA SLESCIREDLNENAPRAMAVIDPVKLVIENYQGE GEMVTMPNHPNKPEMGSRQVPFSGEIWIDRADF..Qyc VV RFESAVRKYLEDTTPRLMFVLDPVEWVDNLSDDYEELATIPYRPGTPEFGERTVPFTNKFYIERSDF..Vec FV LMNTEG QDCGFNGGEMTLSLADRWILAEFNQTIKAYREALDSFRFDIAAGIL YEFTWNQFCDWYL..Vyc FA LMRLGDDYQPPATEGLSGNESLVEKWILHKLTETSKIVNEALDKRDFLTSTSSI YEF WYLICDVYI..Iec FL LANLNGFD PAKDMVKPEEMVVLDRWAVGCAKAAQEDILKAYEAYDFHEVVQRL MRFCSVEMVSFYL..Iyc FLSLKKMSNIDFQYDDSVKSDN VMDRWILASMQSLVQFIHEEMGQYKLYTVVPKL LNFID ELTNWYI..Lee YE HTAKGDVAALNVDALTENQKAL RRDVH KTIAKVTDDIGRRQTFNTAIAAI MELMNKLAKAPTD..Mec LA SRNAGFINKRFDGVLAS ELADPQLYK TF TDAAEVIGEAWESREFGKAVREI MALADLANRYV..Myc FV NRLIKFVNAKYNGVVPKFDPKKVSNYD GLVKDINEILSNYVKEMELGHERRGLEIAMSLSARGNQFL..Rec LS KNRTTDYIFDWDNMLAFEG NTAPYMQYAYTRVLSVFRKAEIDEEQLAAAPVI IREDREAQLAARLL..

FIG. 3. Multiple sequence alignment of the C-terminal sequence segments ofgroup I enzymes. Precise-locations of each segment are shownin Fig. 1. The $ symbols correspond to the deleted strings described in the legend to Fig. 1.

8124 Evolution: Nagel and Doolittle

TecTycPecSecSycKecKycDycNecHecHycFSecGSecAec

TecTycPecSecSycKecKycDycNecHecHycFSecGSecAec

. . EAPGMVFWHN DG WTIFRELEVF VRSKLKEYQYQEVKGPFMMDRVLW EKTGHWDNYKDAMF TT SSENR EYC IKPMNC

. MSPGSCFWLP HG TRIYNTLVDL LRTEYRKRGYEEVITPNMYNSKLW ETSGHWANYKENMF TF EVEKE TFG LKPMNC

. LASGLYTWLP TG VRVLKKVENI VREEMNNAGAIEVSMPVVQPADLW QESGRWEQYGPELL RFVDRGER PFV LGPTHE

..TGSRFVVMKG QI ARMHRALSQFMLDLHTEQHGYSENYVPYLVNQDTL YGTGQLPKFAGDLFHTRPLEEEADTSNYA LIPTAE

. CGHRGYFFRN YG VFLNQALINYGLQFLAAK GYIPLQAPVMMNKELM SKTAQPSEFDEELY KVIDGEDE KY LIATSE

..ESIGIHVEKS WGLGRIVTEIFEEVAEAHLIQPTFI TEYPAEVSPLAR RNDVNPEITDRFEF FIGGREIGNGFSELNDAED

..VDNKLECPPP LTNARMLDKLVGELEDTC INPTFI FGHPQMMSPLAK YSRDQPGLCERFEV FVATKEICNAYTELNDPFD

. . RAAGKEIGD F EDLSTENEKFLGKLVRDKYDTDFYILDKFPLEIRPFYT MPDPANPKYSNSY DFFMRG E EI LSGAQR

. . ENCGRKFENPVYWGVDLSSEHERYLAE EHFKAPV VVKNYPKDIKAFY MRLNEDGKTVAAM DVLAPGIG EI IGGSQR

..LDSKN PEVQALLNDAPAL GDYLDEESREHFAGLCKLLES AGI AY TVNQRL VRGL DYYNRTVFEW

. LNGSL KEIHAVLSADANITSNEKAKQGLDDIATLMKYTEA FDIDSFISFDLSL ARGL DYYTGLIYEV

. . IAPGRVYRN DYDQTHTPMFHQMEGLIVDTNISFTNL KGTLHDFLRNFFEEDLQIRFRPSYF PF TEPSAE

. . ATDGR YGE NPNRLQHYYQFQVVIKPSPDNIQELYLGSLKELGMD PTIHDIRFVED NW ENPTLG

. . RLWVTVYESD DEAYEIWEKEVGIPRERIIRINDNKGAPYASGNFWRMGGTGPCDPCTEIF YDHG DHIWGGPPGS

* f

PGHVQIFNQGLKS YRDLPLRMAEFGSCHRNEPSG SLHGLMRVRGFTQDDAHIFCTEEQIRD..PGHCLMFKSRERS YRELPWRVADFGVIHRNEFSG ALSGLTRVRRFQQDDAHIFCTHDQIES..EVITDLIRNELSS YKQLPLNFYQIQTKFRDEVRP RF GVMRSREFLMKDAYSFHTSQESLQ..VPLTNLVRGEIID EDDLPIKKMTAHTPCFRSEAGSYGRDTRGLIRMHQFDKVEMVQIVRPEDSMA..QPISAYHSGEWFEKPQEQLPIHYVGYSSCFRREAGSHGKDAWGVFRVHAFEKIEQFVITEPEKSWE..QAQRFLDQVAAKDAGDDEAMFYDEDYVTALEHGLPPTAGLGIGIDRMVMLF TNSHTIRDVILFPA..QRARFEEQARQKDQGDDEAQLVDETFCNALEYGLPPTGGWGCGIDRLAMFL TDSNTIRGVLLFPT..IHDHALLQERMKAHGLSPEDPGLKDYCDGFSYGCPPHAGGGIGLERVVMFY LDLKNIRRASLFPR..EERLDVLDERMLEMGLNKED YWWYRDLRRYGTVPHSGFGLGFERLIAYV TGVQNVRDVIPFPR..VTNSLGSQGTVCAGGRYDGLVEQL GGRAT PAVGFAMGLERLVLLV QAVNPEFKADPVVD..VTSAFVGVGSIAAGGRYDNLVNMFSEASGKKSTQI PCVGISFGVERIFSLIKQRINSSTTIKPTAT..VDVMGKNGKWLEVLGCGMVHPNVLRNV GIDPEVYSGFAFGMGMERLTMLR YGVTDLRSFFENDL..AWGLGWEV WLNGM EVTQFTYFQQVGGLECKPVTG EITYGLERLAMYI QGVDSVYDLVWSDG..PEEDGDRYIEIWNIVFMQFNRQA DGTMEPLPKPSVDTGMGLERIAAVL QHVNSNYDIDLFRT..

FIG. 4. Multiple sequence alignment of group II enzymes. Precise locations of each sequence segment are shown in Fig. 1.

was refined to cluster closely related sequences. (iv) Each setof aligned sequences was analyzed by two independentmethods to yield trees, and we sought alignments thatyielded, ideally, the same evolutionary relationships whenanalyzed by both methods. For both groups I and II, thealignments reported here satisfied, with the few exceptionsnoted, these criteria of consistency in evolutionary order.Since there were two sequence segments used for the groupI enzymes, we imposed the additional criteria that the fourtrees that could be drawn for each pair of alignments (com-pleted with the same order of sequence input) be the same.The enzymes are divided into two groups that appear to be

exclusive. Although there is clear homology within eachgroup, all attempts to uncover significant sequence relation-ships between members of the two groups have provenunsuccessful. Similar conclusions have been reported byothers (13). Thus, the data argue that the two groups have

undergone a kind of convergent evolution with regard tofunction in forming the present day set of enzymes.

It is interesting to note that each group contains a rather fullcomplement of chemically diverse amino acids, includingacidic, basic, hydrophobic, and hydrophilic representatives.This finding is consistent with each group having arisenindependently in two archaic protein synthetic apparatuses,each using a more restricted set of amino acids. Such asituation could have existed in a single organism or in twodistinct biological environments that later merged during thecourse of evolution. If the two did not coevolve, one groupmust have come first with the other being recruited tosupplement the pool of available amino acids. The dataprovide no reason to believe that one group is the moreancient. Distance scores and percent identity cover a remark-ably similar range within each group. Although proteinscertainly may evolve at different rates, these data suggest acoordinate development.

Group (HIGH) B Group (KMSKS) C Group 11 (GLER)

- Kec

Kyc

Dyc

I Hyc

* Aec

ec

FSec

FIG. 5. Trees showing evolutionary relationships among group I (A and B) and group 11 (C) enzymes. Branching order and distance scoresare based on the alignment shown in Fig. 2 for A, Fig. 3 for B, and Fig. 4 for C. Distance scores shown were derived from the parsimony-basedPAPA program (23).

Proc. Nadl. Acad. Sci. USA 88 (1991)

A

Evolution: Nagel and Doolittle

Synthetase pairs specific for the acidic amino acids andtheir amides, glutamic acid/glutamine and aspartic acid/asparagine (25), cluster closely despite the fact that each isfound in a separate group. The data indicate that the additionof amidated amino acids from their corresponding acids (orvice versa) is a relatively recent addition to protein synthesis.Similarly, the data point to the more recent radiation of thealiphatic amino acids (valine, leucine, isoleucine, and me-thionine), a cluster first identified by Heck and Hatfield (26).The synthetases specific for the aliphatic amino acids aremore closely related in terms of percent identity to oneanother than members ofany other cluster, save aspartic acidand asparagine.

In most cases, closely related enzymes recognize aminoacids that are chemically similar to one another. Examplesalready noted include the two acid amide clusters and thealiphatic cluster. In -addition, the tryptophan and tyrosineenzymes, although they apparently were among the first todiverge, are closely related as are the enzymes for thehydroxylated amino acids serine and threonine. The closelink between the charged and/or polar amino acids asparticacid, asparagine, and lysine (27, 28) is an additional exampleas is the relationship between the enzymes recognizing thesmall amino acids glycine and alanine. These data argue thatradiation occurred by adapting binding sites to chemicallysimilar amino acids. It is possible that; as particular enzymesevolved, an early form may have charged initially a tRNA ora group of tRNAs with two or more amino acids. As long asamino acids were sufficiently similar in chemical properties,this ambiguity may have been tolerable during early stages oflife. As protein synthesis became more complex and therequirements for particular amino acids became more strin-gent, however, one can envision evolutionary pressure forthe aminoacyl-tRNA synthetases to select particular tRNA-amino acid pairs. Thus, a primordial AspRS gene may haveundergone duplication and mutation such that there existedboth an AspRS and an AsnRS charging the same set oftRNAmolecules with either amino acid. Refinements in specificitymay have led to a parallel evolution in cognate tRNA and/orsynthetase structure such that separate aspartic acid- andasparagine-accepting tRNA families were established. Thisscenario predicts that relationships similar to those seen herefor the synthetase enzymes may be seen in tRNA structures(29) as well.Not all related enzymes recognize amino acids that are

chemically similar. The cluster of histidine and alanine andthe inclusion of a proline-specific enzyme in the cluster ofthreonine and serine are cases in point although in neithercase are these amino acids vastly dissimilar. The placementof ArgRS in the aliphatic cluster, however, is surprising. Byanalogy to group- II, one might have expected a closerrelationship to the glutamic acid- and glutamine-specificenzymes. ArgRS is an outlier in this cluster, however,suggesting either that the large size of the amino acid sidechain may be the common theme in this group or that theArgRS cannot be assigned reliably to any subgroup. The dataalso suggest an early divergence leading to enzymes activat-ing either charged or neutral side chains. Enzymes specificfor the four aliphatic amino acids appear to have proliferatedsubsequent to this separation. The cluster containing thesmall subunits of the glycine- and phenylalanine-specificenzymes also stands out as a major example of distinctlydifferent amino acids being charged by similar enzymes (21%identical in this region). It is known that these are the onlyaminoacyl-tRNA syntheses possessing an a2f subunit struc-ture and that the enzymes have a number of other physicaland immunological properties in common as well (30). Thesequence similarities shown here underscore the close evo-

Proc. Natl. Acad. Sci. USA 88 (1991) 8125

lutionary relationship between these enzymes and also showclearly that the a2132 enzymes did not evolve independently ofthe other aminoacyl-tRNA synthetases.The sequence for Cec has appeared (31, 32). It clearly

belongs to group I, bringing to 10 the number in each group.The cysteine-specific enzyme most closely resembles themethionine-specific enzyme.

We thank Da-Fei Feng for many helpful discussions. G.M.N.gratefully acknowledges a sabbatical supplement award for molec-ular studies of evolution from the Alfred P. Sloan Foundation. Thiswork was supported by National Institutes of Health Grant GM34434.

1. Schimmel, P. (1987) Annu. Rev. Biochem. 56, 125-158.2. Schimmel, P. & Soll, D. (1979) Annu. Rev. Biochem. 48,

601-648.3. Burbaum, J. J., Starzyk, R. M. & Schimmel, P. (1990) Pro-

teins: Struct. Funct. Genet. 7, 99-111.4. Doolittle, R. F. (1979) in The Proteins, eds. Neurath, H. & Hill,

R. L. (Academic, New York), pp. 1-118.5. Webster, T. A., Tsai, H., Kula, M., Mackie, G. & Schimmel,

P. (1984) Science 226, 1315-1317.6. Houtondji, C., Desson, P. & Blanquet, S. (1986) Biochimie 68,

1071-1078.7. Houtondji, C., Lederer, F., Dessen, P. & Blanquet, S. (1986)

Biochemistry 25, 16-21.8. Jasin, M., Regan, L. & Schimmel, P. (1983) Nature (London)

306, 441-447.9. Leatherbarrow, A. J., Fersht, A. R. & Winter, G. (1985) Proc.

Nati. Acad. Sci. USA 82, 7840-7844.10. Blow, D., Bhat, T. N., Metcalfe, A., Risler, J. L., Brunie, S.

& Zelwer, C. (1983) J. Mol. Biol. 171, 571-576.11. Schimmel, P. (1991) Trends Biochem. Sci. 16, 1-3.12. Jacobo-Molina, A., Peterson, R. & Yang, D. C. H. (1989) J.

Biol. Chem. 264, 16608-16612.13. Eriani, G., Delarue, M., Poch, O., Gangloff, J. & Moras, D.

(1990) Nature (London) 347, 203-206.14. Cusack, S., Berthet-Colominas, C., Hartlein, M., Nassar, N. &

Leberman, R. (1990) Nature (London) 347, 249-255.15. Bhat, T. N., Blow, D. M., Brick, P. & Syborg, J. (1982) J. Mol.

Biol. 158, 699-709.16. Zelwer, C., Risler, J. L. & Brunie, S. (1982) J. Mol. Biol. 155,

63-81.17. Rould, M. A., Perona, J. J., Soll, D. & Steitz, T. A. (1990)

Science 246, 1135-1142.18. Laberge, S., Gagnon, Y., Bordeleau, L. M. & LaPointe, J.

(1989) J. Bacteriol. 171, 3926-3932.19. Mirande, N. & Walker, J. P. (1988) J. Biol. Chem. 263, 18443-

18451.20. Ludmerer, S. W. & Schimmel, P. (1987) J. Biol. Chem. 262,

10801-10806.21. Feng, D. F. & Doolittle, R. F. (1987) J. Mol. Evol. 25, 351-360.22. Feng, D. F. & Doolittle, R. F. (1990) Methods Enzymol. 183,

375-387.23. Doolittle, R. F. & Feng, D. F. (1990) Methods Enzymol. 183,

659-669.24. Doolittle, R. F. (1987) in Of Urfs and Orfs (University Science

Books, Mill Valley, CA), pp. 26-28.25. Anselme, J. & Hartlein, M. (1989) Gene 84, 481-485.26. Heck, J. D. & Hatfield, G. W. (1988) J. Biol. Chem. 263,

868-877.27. Gample, A. & Tzagoloff, A. (1989) Proc. Natl. Acad. Sci. USA

86, 6023-6027.28. Leveque, F., Plateau, P., Dessen, P. & Blanquet, S. (1990)

Nucleic Acids Res. 18, 305-312.29. Fitch, W. M. & Upper, K. (1987) Cold Spring Harbor Symp.

Quant. Biol. 52, 759-767.30. Nagel, G. M., Johnson, M. S., Rynd, J., Petrella, E. & Weber,

B. H. (1988) Arch. Biochem. Biophys. 262, 409-415.31. Hou, Y.-M., Shiba, K., Mottes, C. & Schimmel, P. (1991) Proc.

Natl. Acad. Sci. USA 88, 976-980.32. Eriani, G., Dirheimer, G. & Gangloff, J. (1991) Nucleic Acids

Res. 19, 265-269.


Recommended