+ All Categories
Home > Documents > Ribonucleotide Reductases: Divergent Evolution of an Ancient Enzyme

Ribonucleotide Reductases: Divergent Evolution of an Ancient Enzyme

Date post: 22-Nov-2023
Category:
Upload: uab
View: 0 times
Download: 0 times
Share this document with a friend
15
Ribonucleotide Reductases: Divergent Evolution of an Ancient Enzyme Eduard Torrents, 1 Patrick Aloy, 2 Isidre Gibert, 1 Francisco Rodrı ´guez-Trelles 3 1 Institut de Biotecnologia i de Biomedicina and Departament de Gene ´tica i de Microbiologia, Bacterial Molecular Genetics group, Universitat Auto `noma de Barcelona, 08193-Bellaterra, Barcelona, Spain 2 EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany 3 Instituto de Investigaciones Agrobiolo ´gicas de Galicia, Consejo Superior de Investigaciones Cientı ´ficas, Avenida de Vigo s/n, Apartado 122, 15780-Santiago de Compostela, Spain Received: 5 October 2001 / Accepted: 25 January 2002 Abstract. Ribonucleotide reductases (RNRs) are uniquely responsible for converting nucleotides to de- oxynucleotides in all dividing cells. The three known classes of RNRs operate through a free radical mecha- nism but differ in the way in which the protein radical is generated. Class I enzymes depend on oxygen for radical generation, class II uses adenosylcobalamin, and the an- aerobic class III requires S-adenosylmethionine and an iron–sulfur cluster. Despite their metabolic prominence, the evolutionary origin and relationships between these enzymes remain elusive. This gap in RNR knowledge can, to a major extent, be attributed to the fact that dif- ferent RNR classes exhibit greatly diverged polypeptide chains, rendering homology assessments inconclusive. Evolutionary studies of RNRs conducted until now have focused on comparison of the amino acid sequence of the proteins, without considering how they fold into space. The present study is an attempt to understand the evolu- tionary history of RNRs taking into account their three- dimensional structure. We first infer the structural align- ment by superposing the equivalent stretches of the three-dimensional structures of representatives of each family. We then use the structural alignment to guide the alignment of all publicly available RNR sequences. Our results support the hypothesis that the three RNR classes diverged from a common ancestor currently represented by the anaerobic class III. Also, lateral transfer appears to have played a significant role in the evolution of this protein family. Key words: nrd — Ribonucleotide reductase (RNR) — RNR evolution — Pyruvate formate lyase — Struc- tural alignment — Lateral transfer — Phylogeny Introduction Ribonucleotide reductases (RNRs) are a family of struc- turally complex enzymes that play an essential role in all living organisms: they catalyze the conversion of the four common nucleotides to deoxynucleotides essential for DNA replication and repair. The three known classes of RNRs use free radical chemistry for catalysis but rely on different metallocofactors for the initiation of the radical reduction process, each exhibiting a different be- havior toward oxygen (Sjo ¨berg 1997). Class I RNRs re- quire oxygen to produce a tyrosyl radical by a diferric iron center, thereby they can function only under aerobic conditions. They consist of two homodimeric proteins, in Escherichia coli called NrdA ( 2 ) and NrdB ( 2 ), ar- ranged as a heterotetramer ( 2 2 ). The tyrosyl radical is located in the 2 polypeptide. Based on sequence identity and allosteric properties, class I RNRs are subdivided into classes Ia and Ib, encoded, respectively, by the nrdABs and the nrdEFs genes. Class II RNRs use ad- enosylcobalamin (AdoCbl) in a radical generation pro- cess not affected by oxygen, thereby they can work in aerobic or anaerobic environments. Class II RNRs are Correspondence to: F. Rodrı ´guez-Trelles; email: [email protected] J Mol Evol (2002) 55:138–152 DOI: 10.1007/s00239-002-2311-7 © Springer-Verlag New York Inc. 2002
Transcript

Ribonucleotide Reductases: Divergent Evolution of an Ancient Enzyme

Eduard Torrents,1 Patrick Aloy,2 Isidre Gibert,1 Francisco Rodrıguez-Trelles3

1 Institut de Biotecnologia i de Biomedicina and Departament de Genetica i de Microbiologia, Bacterial Molecular Genetics group, UniversitatAutonoma de Barcelona, 08193-Bellaterra, Barcelona, Spain2 EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany3 Instituto de Investigaciones Agrobiologicas de Galicia, Consejo Superior de Investigaciones Cientıficas, Avenida de Vigo s/n, Apartado 122,15780-Santiago de Compostela, Spain

Received: 5 October 2001 / Accepted: 25 January 2002

Abstract. Ribonucleotide reductases (RNRs) areuniquely responsible for converting nucleotides to de-oxynucleotides in all dividing cells. The three knownclasses of RNRs operate through a free radical mecha-nism but differ in the way in which the protein radical isgenerated. Class I enzymes depend on oxygen for radicalgeneration, class II uses adenosylcobalamin, and the an-aerobic class III requires S-adenosylmethionine and aniron–sulfur cluster. Despite their metabolic prominence,the evolutionary origin and relationships between theseenzymes remain elusive. This gap in RNR knowledgecan, to a major extent, be attributed to the fact that dif-ferent RNR classes exhibit greatly diverged polypeptidechains, rendering homology assessments inconclusive.Evolutionary studies of RNRs conducted until now havefocused on comparison of the amino acid sequence of theproteins, without considering how they fold into space.The present study is an attempt to understand the evolu-tionary history of RNRs taking into account their three-dimensional structure. We first infer the structural align-ment by superposing the equivalent stretches of thethree-dimensional structures of representatives of eachfamily. We then use the structural alignment to guide thealignment of all publicly available RNR sequences. Ourresults support the hypothesis that the three RNR classesdiverged from a common ancestor currently representedby the anaerobic class III. Also, lateral transfer appears to

have played a significant role in the evolution of thisprotein family.

Key words: nrd — Ribonucleotide reductase (RNR)— RNR evolution — Pyruvate formate lyase — Struc-tural alignment — Lateral transfer — Phylogeny

Introduction

Ribonucleotide reductases (RNRs) are a family of struc-turally complex enzymes that play an essential role in allliving organisms: they catalyze the conversion of thefour common nucleotides to deoxynucleotides essentialfor DNA replication and repair. The three known classesof RNRs use free radical chemistry for catalysis but relyon different metallocofactors for the initiation of theradical reduction process, each exhibiting a different be-havior toward oxygen (Sjoberg 1997). Class I RNRs re-quire oxygen to produce a tyrosyl radical by a diferriciron center, thereby they can function only under aerobicconditions. They consist of two homodimeric proteins, inEscherichia coli called NrdA (�2) and NrdB (�2), ar-ranged as a heterotetramer (�2�2). The tyrosyl radical islocated in the �2 polypeptide. Based on sequence identityand allosteric properties, class I RNRs are subdividedinto classes Ia and Ib, encoded, respectively, by thenrdABs and the nrdEFs genes. Class II RNRs use ad-enosylcobalamin (AdoCbl) in a radical generation pro-cess not affected by oxygen, thereby they can work inaerobic or anaerobic environments. Class II RNRs areCorrespondence to: F. Rodrıguez-Trelles; email: [email protected]

J Mol Evol (2002) 55:138–152DOI: 10.1007/s00239-002-2311-7

© Springer-Verlag New York Inc. 2002

mostly �2 homodimers encoded by the nrdJ genes. ClassIII RNRs use S-adenosylmethionine (SAM) and a smallactivating protein, NrdG, to generate a stable glycyl radi-cal. Structurally they are (�2–�2), with subunits encodedby the nrdDG genes. Recent studies have established aconnection between class III RNR and the pyruvate for-mate-lyase (PFL) system: as in the PFL system, theNrdG protein of class III RNR acts as an activase in thegeneration of the glycyl radical of NrdD (Tamarit et al.1999; Torrents 2001). The NrdG activase harbors one(4Fe–4S) cluster per polypeptide chain. The glycyl radi-cal, located at the C-terminal end of NrdD, is sensitive tooxygen, hence class III RNRs can function only understrictly anaerobic conditions.

RNRs share a complex pattern of allosteric regulation(Jordan and Reichard 1998). The binding site for sub-strates and allosteric effectors is located on the large�-polypeptide. The substrate specificity of the catalyticsite for a given ribonucleotide is determined by the bind-ing of specific deoxyribonucleoside triphosphates(dATP, dTTP, dGTP) or ATP to an allosteric site termedthe specificity site. Class Ia and III and some class IIIRNRs contain an extra allosteric site (referred to as theactivity site) which activates or inhibits the overall ac-tivity of the enzyme, with ATP and dATP acting as en-hancer and inhibitor, respectively (reviewed by Jordanand Reichard 1998; Stubbe and van der Donk 1998).

The diversity of RNR metallocofactors may seem toindicate that the three RNR classes arose independently.However, the facts that the ribonucleotide reductionpathway in which they are involved occurs in all modernorganisms studied so far and that the different RNRclasses exhibit very similar catalytic mechanisms (Rei-chard 1997; Stubbe et al. 2001) suggest that RNRs orig-inated from a single ancestral form, prior to the diver-gence of archaebacteria, eubacteria, and eukaryotes.Indeed, as they are essential for the production of thebuilding blocks for DNA synthesis, RNRs are hypoth-esized to be involved in the transition from the RNA tothe “DNA world” (Reichard 1993, 1997; Freeland et al.1999). If the RNRs had originated from each other byduplication in the last universal common ancestor(LUCA), it would follow that classes II and III RNRs(and also Ib) were lost in eukaryotes, because they arenot present in this life domain (but see below). Yet de-spite the metabolic prominence of RNRs, this and otherfundamental questions remain to be answered. Suchquestions are which modern class of RNR represents theancestral enzyme (see Stubbe et al. 2001) and the shapeof RNR tree topologies in connection with accepted or-ganismal phylogenies. Moreover, RNRs have beenlargely neglected in efforts to determine the functionalcontent of the LUCA (see Kyrpides et al. 1999). To agreat extent, this gap in our understanding of the evolu-tion of RNRs can be attributed to the fact that these

enzymes have very different polypeptide chains, suchthat homology assessments based on conventional align-ment strategies have been inconclusive.

Generally, protein structure is better conserved thanprimary sequence structure (Chothia and Lesk 1986;Murzin et al. 1995; Patthy 1999). Proteins are broadlyclassified as pertaining to a given structural superfamilyif evidence for homology becomes apparent after struc-tural alignment. In turn, structural information becomesinvaluable for identifying key active residues (Russell1998; Aloy et al. 2001a), binding sites, and surfaces(Russell et al. 1998), which can guide primary sequencealignment, ultimately allowing more detailed evolution-ary analyses. The present study is an attempt to cast lighton the evolutionary history of RNRs from the three-dimensional (3D) structure of the protein. We focus onthe large �-polypeptide, which comprises the active andallosteric regulation sites, common to the three RNRs.First, we identified a consensus RNR structure by super-position of individual RNR structures. Then we used thisconsensus for alignment of the primary sequences.

Materials and Methods

Sequence Retrieval and Alignment

Amino acid sequences of RNRs were retrieved from the GenBank,EMBL, and PIR databases (Release 65, December 2000) with theBLAST (version 2.1) sequence similarity search tool. Blast probing ofDNA and protein databases was performed with the Blastp and tBlastnprograms (Altschul et al. 1997). Accession numbers for the sequencesare listed in Table 1.

For alignment of the amino acid primary sequences we adopted thefollowing strategy: first, we aligned the sequences using the ClustalXversion 1.81 (Thompson et al. 1997) program with the default gapopening and extension penalties. Then we adjusted the alignments byeye using GeneDoc version 2.6.001 (Nicholas and Nicholas 1997).Visual fine-tuning of the alignment was conducted taking into account(i) structure-based alignments (Logan et al. 1999; Uhlin and Ekund1994), (ii) identified conserved putative allosteric binding regions(Eriksson et al. 1997), and (iii) the alignment of the RNR family hostedin the Pfam database (Bateman et al. 2000). First, sequences belongingto each RNR class were aligned separately. Then the resulting threemultiple alignments were aligned with each other. After removal of allgaps and ambiguities, the length of the alignment was 236 residues.

The crystallographic structures of the PFL (2pfl) NrdA subunit ofRNR class Ia (1r1r) and the NrdD subunit of anaerobic RNR class III(1b8b) were obtained from the Brookhaven Protein Data Bank (Bermanet al. 2000; http://www.rcsb.org/pdb/) and converted to protein se-quences using the STAMP package (Russell and Barton 1992; http://barton.ebi.ac.uk/manuals/stamp.html). The domain definition and evo-lutionary families adopted for the analysis of the protein sequenceswere those of the Structural Classification of Proteins (SCOP release1.53) database (Murzin et al. 1995; http://scop.mrclmb.cam.ac.uk/) andthe Protein Families Database of Alignments and HMMs (Pfam release6) database (Bateman et al. 2000; http://www.sanger.ac.uk/Software/Pfam/).

139

Table 1. Occurrence of ribonucleotide reductases in species in this studya

Species MetabolismSequencingstatus RNR class Accession no.

ArchaebacteriaEuryacheota

1. Aeropyrum pernix AE G II AP0000632. Archaeglobus fulgidus AN G II AE0007823. Halobacterium sp. AE G Ia, II AE005120, AE0050734. Metanobacterium thermoautotrophicum AN G II, III AE0006665. Metanococcus jannaschii AN G III U675276. Pyrococcus abyssi AN G II, III NC_0008687. Pyrococcus furiosus AN g II, III U78098, TIGR UG8. Pyrococcus horiskishii AN G II, III NC_0009619. Thermoplasma acidophilus AE G II AL445067

EubacteriaAquificales

10. Aquifex aeolicus AN G Ia AE000657Cyanobacteria

11. Synechocystis sp. AE G Ia P74240Cytophaga/Flexibacter/Bacterioides group

12. Porphyromonas gingivalis g II, III TIGR UGGreen nonsulfur bacteria

13. Chloroflexus AN II Personal communicationGreen sulfur bacteria

14. Chlorobium tepidum AE G II TIGR UGHigh-G+C Gram-positive

15. Corynebacterium ammoniagenes AE S Ib Y0957216. Corynebacterium diphteriae AE g Ib (2E) TIGR UG17. Corynebacterium nephridii AE S Personal communication18. Mycobacterium avium AE g Ib, II TIGR UG19. Mycobacterium bovis AE g Ib, II TIGR UG20. Mycobacterium leprae AE G Ib, II AL583923, AL58392421. Mycobacterium tuberculosis AE G Ib (2F), II P50640, A7093322. Streptomyces clavuligerus AE S II AJ22487023. Streptomyces coelicor AE g II T35125

Low-G+C Gram-positive24. Bacillus anthracis g Ib, II TIGR UG25. Bacillus halodurans AE G Ia, II BAB04220.1, BAB06529.126. Bacillus megaterium AE II Personal communication27. Bacillus stearothermophilus g Ia, II TIGR UG28. Bacillus subtilis AE G Ib (2×) P5062029. Clostridium acetobutylicum AN g Ia, II, III NC 00303030. Clostridium difficile AN g Ia, II, III TIGR UG31. Enterococcus faecalis FA g Ib, III TIGR UG32. Lactobacillus leichmannii FA S II L2004733. Lactococcus lactis FA S Ib, III AE006332, U7333634. Mycoplasma gallisepticum FAN g Ib AF15211435. Mycoplasma genitalium FAN G Ib P4747336. Mycoplasma pneumoniae FAN G Ib U0008937. Staphylococcus aureus FA G Ib, III AP003138, AP00313138. Streptococcus mutans FA G Ib, III TIGR UG39. Streptococcus pneumoniae FA G Ib, III TGR UG40. Streptococcus pyogenes FA g Ib (2×), III AE004092

Plantomyces/Chlamydia group41. Chlamydia pneumoniae AE G Ia NC_00092242. Chlamydia trachomatis AE G Ia NC_000117

Proteobacteria (� subdivision)43. Caulobacter crescentus g Ia AE00586244. Mesorhizobium loti FA G II NC_00267845. Rhodobacter capsulatus FA g II, III R. capsulatus genome project46. Rhodobacter sphaeroides FA g II TIGR UG

Proteobacteria (� subdivision)47. Neisseria gonorrhoeae AE g Ia B8110148. Neisseria meningitides AE G Ia AL16275649. Ralstonia eutropha AE S III AJ012479

140

Table 1. Continued

Species MetabolismSequencingstatus RNR class Accession no.

Proteobacteria (� and � subdivisions)50. Campylobacter jejuni AE G Ia AL13907451. Geobacter sulfurreducens g II TIGR UG52. Helicobacter pylori AE G Ia P5598253. Rickettsia prowasekii AE G Ia C71655

Proteobacteria (� subdivision)54. Actinobacillus actinomycetemcom FA g Ia, III TIGR UG55. Buchnera sp. G Ia AP0011856. Escherichia coli FA G Ia, Ib, III X06999, P39452, P2890357. Haemophilus influenzae FA G Ia, III P43754, A6404758. Pasteurella multicocida FA G Ia, III Nc_00266359. Pseudomonas aeruginosa FA G Ia, II, III AE004545, AE004962, AE00461860. Pseudomonas stutzeri FA Ia, II, III Personal communication61. Salmonella typhimurium FA g Ia, II, III X7948, X73226, AF24239062. Shewanella putresgasciens FA g III TIGR UG63. Thiobacillus ferrooxidans AE g Ia, III TIGR UG64. Vibrio cholearae FA G Ia, III AE82223, D8245265. Xylella fastidiosa G Ia C8271066. Yersinia pestis FA G Ia, Ib, III TIGR UG

Spirochaetes67. Borrelia burgorferi AE G Ia AE0078368. Treponema pallidum AE G Ia AE000520

Thermotogales69. Thermotoga maritima AN G II Y12877

Thermus/Deinococcus group70. Deinococcus radiodurans AE G Ib, II AE001826, D75281

Eubacteria viruses71. Mycobacteriophage L5 G II S3099572. Phage T4 G Ia, III AF15810173. Roseophage S101 G II NC002519

EukaryotesApicomplexa

74. Plasmodium falciparum AE G Ia AF20558075. Cryptosporidium parvum AE g Ia AF043243

Euglenozoa76. Trypanosoma brucei AE g Ia 015909

Animals77. Caenorhabditis elegans AE G Ia Q0360478. Danio rerio AE g Ia U5796479. Homo sapiens AE G Ia P2392180. Mus musculus AE g Ia P07742

Fungi81. Schyzosaccharomyces pombe AE g Ia P3660282. Neurospora crassa AE g Ia AF17169783. Saccharomyces cerevisiae AE G Ia (2×) P21524, P2167284. Candida albicans AE g Ia AJ390500

Plants85. Arabidopsis thaliana AE G Ia T5181386. Nicotiana tabacum AE g Ia Y10862

Eukaryote viruses87. African swine fever virus AE S Ia NP_04273988. Bovine herpes virus AE S Ia P5064689. Epstein Barr virus A AE S Ia P0319090. Herpes virus AE S Ia P0854391. Orgya pseudotsugata virus AE S Ia U7593092. Pseudorabies virus AE S Ia P5064393. Spodoptera exigua nucleopolyhedrovirus AE S Ia NC_002169.194. Vaccinia virus AE S Ia P2050395. Varicella virus AE S Ia P3298496. Varicella zoster AE S Ia P09248

a AE, aerobe; AN, anaerobe; FA, facultative aerobe; FAN, facultative anaerobe; G/g, gene sequence available (complete/partial); S, gene sequenceknown; TIGR UG, TIGR unfinished genomes.

141

Structural Alignments and P3d-Value Calculation

Alignment of the 3D structures was performed with the STAMP pack-age for protein structure alignment and superimposition. All the align-ments were double-checked and eventually manually edited to preventerroneous results. Superposition and calculation of the structurallyequivalent regions across the three proteins were also performed bymeans of the STAMP package.

Following structural alignment, one has to measure the probabilityof sequence identity occurring by chance. For this purpose Murzin(1993) derived a P3d value based on the tendency of buried residues tobe hydrophobic and exposed residues to be hydrophilic. It was origi-nally applied to the cystatin–monelin similarity, where an evolutionaryrelationship was inferred from the P3d value of ∼10−3. More recently,Aloy et al. (2001b) applied it to the search for links between sequenceand structure spaces.

Phylogenetic Analyses

We adopted a model-based maximum likelihood approach of phyloge-netic inference (e.g., Yang et al. 1995; Huelsenbeck and Crandall 1997;Rodrıguez-Trelles et al. 1999, 2000). We first built a protein distancematrix using a simple Poisson model and used this matrix to recover atree with the neighbor-joining (NJ) algorithm. The tree topology ob-tained in this manner was then used as a working hypothesis for modelfitting using the likelihood ratio test. Amino acid substitution modelsused in this study are all special cases of the model of Yang et al.(1998); this model is based on the matrix of Jones et al. (1992) withamino acid frequencies set as free parameters (referred to as JTT-F).Variation of substitution rates across sites is accommodated in thesubstitution models using the discrete-gamma approximation of Yang(1996a) with shape parameter � (setting eight equally probable catego-ries of rates to approximate the continuous gamma distribution, referredto as dG models). The value of � is inversely related to the extent ofrate variation among sites (Yang 1996a). The transition probabilitymatrixes of models and details on parameter estimation are given byYang (2000).

Likelihood ratio tests are applied for contrasting several hypothesesof interest. For a given tree topology (i.e., Fig. 3), a model (H1) con-taining p free parameters and with log-likelihood Ll fits the data sig-nificantly better than a nested submodel (H0) with q � p − n restric-tions and likelihood L0 if the deviance D � −2log� � −2(logL1 −logL0) falls in the rejection region of a �2 distribution with n degrees offreedom (Yang 1996b). We use several starting values in the iterationsto guard against the possible existence of multiple local optima. Theseanalyses are conducted with the program DAMBE (Xia 2000) and theCODEML program from the PAML version 3.0b package (Yang2000).

The model found to describe satisfactorily the amino acid replace-ment process in the RNR gene region is used as a hypothesis forphylogenetic reconstruction by distance methods. The estimate of �

that we use in distance computation is that obtained simultaneously bythe joint likelihood comparison of all sequences in the first stage, whichcan be considered the most reliable (Yang 1996a). Statistical supportfor nodes of the NJ trees is assessed using 50% majority-rule consensustrees compiled from 1000 bootstrap replications (Felsenstein 1985).

Results

Distribution of the nrd Classes Across the Tree of Life

Table 1 compiles available information on the distribu-tion of the RNR classes across the life domains archae-

bacteria, eubacteria, and eukaryotes, as defined byWoese et al. (1990). Data on the presence/absence ofRNR classes is continuously growing with genome se-quencing projects. The three main conclusions are asfollows. (i) All three RNR classes are represented inarchaebacteria and eubacteria. All archaebacteria haveRNR classes II and III, except Archaeoglobus and Ther-moplasma, which have only class II, and Methanococ-cus, which has only class III. In contrast, class I RNR isrepresented in only a single archaebacterial species,Halobacterium sp., which also has class II (see Table 1).Eubacteria exhibit the greatest diversity of RNR classeswithin one single genome. In this life domain the threeRNR classes occur simultaneously in the genomes ofPseudomonas aeruginosa, Clostridium acetobutylicum,and Clostridium difficile. In the remaining species it ispossible to observe every conceivable pairwise combi-nation of RNR classes (I/II, I/III, and II/III). In Gram-positive eubacteria, the majority of low-G+C contentspecies has a class Ib–III combination; high-G+C contentspecies, however, have almost exclusively class II, some-times in conjunction with class Ib. In Gram-negative eu-bacteria the Proteobacteria group exhibits the most di-verse spectrum of RNR class combinations: for example,E. coli has the two types of class I RNR (i.e., Ia and Ib)and class III RNR (see Table 1). (ii) Known nrd se-quences from eukaryotes all belong to the oxygen-dependent class Ia. Class II (i.e., dependent on AdoCbl)activity has, however, been detected in crude extracts ofthe algae Euglena gracilis and the primitive fungusPithomyces chartarum (Gleason 1970; Stutzenberger1974). The corresponding sequences for the nrd genes ofthese organisms have not yet been obtained, but the re-ported activities represent the first evidence suggestingthat eukaryotes can have RNR classes other than Ia. (iii)Viruses have RNR classes similar to those of their hosts.However, viral sequences show distinctive features inregions of the polypeptide chain involved in the activitysite of the enzyme (Berglund 1972; Nikas et al. 1986).

The RNR classes present on a given genome reflectthe environment inhabited by the organism regarding tothe presence/absence of oxygen: aerobes have RNR classI (which requires oxygen for functioning, e.g., the eu-bacterium Treponema pallidum, and all eukaryotes haveRNR class Ia), class II (which is oxygen independent,e.g., the archaebacterium Aeropyrum pernix), or both(e.g., the eubacterium Mycobacterium tuberculosis andthe archaebacterium Halobacterium sp.). On the otherhand, anaerobes have RNR class III (which is inactivatedby oxygen, e.g., the archaebacterium Methanococcusjannaschii), class II (e.g., the archaebacterium Thermo-toga maritima), or both (e.g., the eubacterium Porphy-romonas gingivalis and the archaebacterium Methano-bacterium thermoautotrophicum). However, the specificRNR class combination present in a given aerobe (i.e.,

142

RNR I, RNR II, or both) or anaerobe (i.e., RNR II, RNRIII, or both) does not follow a discernible pattern. Thisabsence of a pattern might be best illustrated by thegenera Corynebacterium and Bacillus. Both genera com-prise a group of closely related species, which, however,differ in their RNR class(es). Additional examples arethe �-proteobacterium P. aeruginosa, seemingly an aer-obe that has RNR III (in addition to classes I and II), andthe Gram-positive low-G+C% C. acetobutylicum, whichhas RNR I (in addition to classes II and III) despite beinga strict anaerobe.

Primary Sequence Alignment and Comparison ofThree-Dimensional Structures Indicate that RNRClasses are Distantly Homologous

In principle, homology of structure is better conservedthan homology of sequence (Chothia and Lesk 1986;Patthy 1999). Therefore remote homology of the RNRclasses might be better reflected by comparison of their3D structures. The 3D structures have been determinedfor two of the three RNRs and are available from theProtein Data Bank: the E. coli NrdA subunit of RNRclass Ia [denoted 1r1r (Uhlin and Eklund 1994)] and theT4-phage NrdD subunit of RNR class III [denoted 1b8b(Logan et al. 1999)]. In addition, the crystal structure ofthe E. coli pyruvate formate-lyase (Becker et al. 1999)has been resolved. This glycyl radical enzyme exhibitsthe same glycyl radical chemistry as the class III RNR, aswell as most of their structural features, and is thereforeconsidered here. A class II structure is not publicly avail-able, but Stubbe et al. (2001) have reported that it ex-hibits the same 3D configuration.

Figure 1 represents the structural superposition of theRNR classes I (i.e., 1r1r) and III (1b8b) and PFL (2pfl).

The three model structures have the same fold: a scaffoldconsisting of a 10-stranded �/�-barrel which accommo-dates a hairpin loop inside. The degree of structuralequivalence (stretches in helix’n’strand representation) ishighest in the hydrophobic core of the proteins (corre-sponding to the active center of the enzymes). Most ofthe differences are confined to the external loops. Thedetails of the structural superposition are listed in Table2. Similarity is greater between 1b8b and 2pfl [3.1-Åroot mean square deviation (RMSD) in 451 equivalentresidues] than between 1r1r (RNR I) and either 1b8b (3.6Å/331) or 2pfl (3.7 Å/345). According to Murzin’s(1993) criterion the degree of structural similarity be-tween 1b8b and 2pfl is significant (P3d value, <10−3).Apparently, these two proteins shared a common ances-tor. When either of these two structures is aligned with1r1r, the similarity is not significant (P3d value, >10−3).However, the functionally important residues alignnicely, always occupying equivalent regions. This kindof pattern has proven to be sufficient to assign two pro-teins to the same superfamily in cases like ours, whereother functional features are also consistent (i.e., theRMSD value; see Table 2).

Similarly to the case of many other remote homo-logues (see Holm and Sander 1997), a comparison of thealigned primary amino acid sequences of the three RNRclasses reflects a low pairwise overall similarity. Aver-age within-class similarity values are 38, 35, and 36, forRNR classes I, II, and III, respectively. Interclass aver-age global similarity is highest between class I and classII (25%) and substantially lower (≈10%) between RNRclass I and RNR class III or between class II and class III.Figures 2A–C show that conservation of primary se-quence concentrates along short stretches encompassingthe allosteric binding specificity (Fig. 2A) and activity(Fig. 2B) sites and the active site (Fig. 2C), which arecritical to maintain the fold and function of the enzymes.

In all nonviral ribonucleotide reductases investigatedso far specificity toward the substrate is achieved basi-cally in the same fashion (Reichard 1997). Therefore, itis not surprising that the binding residues for the alloste-ric effector in the specificity site (Fig. 2A) are almostidentical for RNR classes I and II. In class III this regionalso exhibits an increased proportion of conserved resi-dues (specially hydrophobic amino acids; Fig. 2A); al-

Fig. 1. Molscript diagram (Kraulis 1991) showing the structural su-perposition of the NrdA (lrlr), NrdD (1b8b), and PFL (2pfl) subunits.The structurally equivalent regions are shown in helix’n’strand repre-sentation. The nonequivalent segments are shown in Ca trace. Thecatalytic cysteines are represented as spacefill residues.

Table 2. RMSDs and protein lengtha

1b8b 2pfl 1rlr

1b8b 589 res2pfl 3.1 Å/451 res 759 res1rlr 3.6 Å/331 res 3.7 Å/345 res 748 res

a The number of residues for each protein is given on the diagonal.Below the diagonal are given the RMSDs and the length of the struc-turally equivalent regions.

143

Fig

.2.

Alig

nmen

tof

eigh

trep

rese

ntat

ives

ofR

NR

clas

ses

I,II

,and

III

for

the

regi

ons

invo

lved

inbi

ndin

gth

eef

fect

ors

atth

e(A

)sp

ecif

icity

site

,(B

)ac

tivity

site

,and

(C)

activ

esi

te.A

min

oac

idsi

mila

rity

grou

psar

eas

follo

ws:

hydr

opho

bic

resi

dues

L,

I,A

,V

,M

,F,

and

W;

Ran

dK

;an

dD

and

E.

Bla

ckan

dgr

aysh

adin

gde

note

ssi

tes

show

ing

87an

d75

%of

the

resi

dues

sim

ilar,

resp

ectiv

ely.

Dow

nwar

dar

row

head

sin

dica

tere

sidu

esin

volv

edin

(or

clos

eto

)ef

fect

oran

dsu

bstr

ate

bind

ing

(allo

ster

ican

dac

tive

site

)in

the

3Dst

ruct

ure

ofcl

ass

Iaof

E.c

oli.

Upw

ard

arro

whe

ads

inC

indi

cate

resi

dues

corr

espo

ndin

gto

the

activ

e

site

inth

e3D

stru

ctur

eof

clas

sII

Iof

phag

eT

4.T

heas

teri

skin

dica

tes

the

cyst

eine

resp

onsi

ble

for

initi

atin

gth

ere

duct

ion

inal

lth

ree

RN

Rcl

asse

s.E

SSC

OL

,E

sche

rich

iaco

li;

CA

EE

LE

,C

aeno

rhab

-di

tis

eleg

ans;

LA

CL

AC

,L

acto

cocc

usla

ctis

;PY

RFU

R,

Pyr

ococ

cus

lact

is;

MY

CT

UB

,M

ycob

acte

-ri

umtu

berc

ulos

is;

ME

TJA

N,

Met

hano

cocc

usja

nnas

chii

;E

B,

euba

cter

ia;

EK

,eu

kary

otes

;A

R,

ar-

chae

bact

eria

;1A

,cl

ass

IaR

NR

;1B

,cl

ass

1bR

NR

;2,

clas

sII

RN

R;

3,cl

ass

III

RN

R.

144

though they do not match those given above, which wereinferred from the X-ray structure of class Ia (Eriksson1997), the fact that they are conserved indicates likelycontrol substrate specificity in NrdD as well.

Figure 2B shows the alignment of the N-terminal re-gion for the three RNR classes. There is a conservedmotif (V–x–KRDG–x(9)–KI–x(3)–I) which comprises theresidues involved in binding the nucleotides at the over-all activity site. From the alignment and data on enzy-matic properties of the RNR, two situations can be dis-tinguished: (i) RNR classes Ia and III and some class IIproteins, which exhibit the conserved motif of the activ-ity site; and (ii) the remaining class II proteins and RNRclass Ib, which lack the first 50 residues of the N-terminal end of the protein and, thus, do not have theactivity site (Eliasson 1999).

In the active site (Fig. 2C) only one residue, the tran-sient radical cysteine (Cys439 in E. coli class Ia, Cys408in L. leichmannii class II, and Cys290 in T4-phage classIII) is conserved throughout the three RNR classes. Thiscysteine is involved in substrate activation by hydrogenabstraction from the C3� of the substrate. Two additionalcysteine residues are also highly conserved, at leastacross classes I and II: Cys225 in E. coli class I, whichcorresponds to Cys119 in L. leichamanii class II andCys79 in T4-phage class III; and Cys462 in E. coli classI, which corresponds to Cys419 in L. leichmanii class IIbut becomes Asn311 in T4-phage class III. Althoughthese two residues are not identical across the threeclasses, they occupy the same position in the 3D struc-tures, with both being required for the reduction of theribose moiety. The area surrounding the transient radicalcystein exhibits the highest degree of similarity in thealignment (Fig. 2C).

The rest of the alignment (data not shown) reflects thepeculiarities of each RNR class. Classes I and II are moresimilar to each other than to class III in the C-terminalmoiety. In this region the first two classes contain twocysteines that function putatively as a hydrogen acceptorfor the reductive system (Stubbe and van der Donk1998). In addition, class II exhibits an extra domain in

the C-terminal part of the protein with a conserved motif[D–x–H–x–G–x14–V–x–x–G–x35–S–x–V–x36–G(Tollinger 1998)], which binds S-adenosylcobalamin re-quired to generate the radical. In class III, the domaincontaining the glycyl radical is located near the activecenter in the crystal structure. Specifically, the glycylradical occupies the same position as the tyrosine sidechain (Tyr730 and Tyr731) of E. coli class I, which isinvolved in the radical transfer pathway.

Phylogenetic Inferences

Table 3 lists the log-likelihood ratio statistic values forthe models. Log-likelihood values were derived assum-ing the topology obtained with the NJ algorithm based onthe Poisson distances between pairs of amino acid se-quences, separately from each RNR class (I, Ia, Ib, II,and III) and from all three RNR classes pooled. dG mod-els (i.e., Poisson + dG and JTT-F + dG) always yieldhigher likelihood scores than their uniform counterparts(i.e., Poisson and JTT-F; not shown), meaning that varia-tion among sites in the rate of amino acid substitution isa significant feature of the nrd data (see Table 3). Also,using an empirical matrix of substitution rates with theamino acid frequencies set as free parameters with theJTT-F + dG model of Yang et al. (1998) yields greaterlikelihood scores than assuming an equal rate of changebetween any two amino acids with the Poisson + dGmodel (see Table 3). Table 3 also lists the estimates ofthe parameter � of the discrete-gamma distribution ob-tained from each model. The Poisson + dG model pro-duces systematically higher values of � than the JTT-F +dG model (i.e., 1.99 vs 1.54, 1.91 vs 1.50, 1.26 vs 1.01,1.75 vs 1.13, 2.31 vs 1.73, and 2.78 vs 2.22 for nrd I, Ia,Ib, II, and III, respectively; see Table 3), underestimatingthe extent of the substitution rate variation from site tosite. As inferred from the JTT-F + dG model, nrd classesIb and III show the lowest (1.01) and the highest (1.73)values of �, respectively. When the sequences of allthree RNR classes are pooled, the value of � increases

Table 3. Results of the likelihood ratio test carried out on the RNR amino acid sequences used in this studya

H0:H1 df

RNR class

I(56,406)

Ia(44,419)

Ib(12,676)

II(22,412)

III(21,473)

Allcombined(99,237)

Poisson:Poisson + dG 1 1339.9 1077.0 474.1 673.1 318.9 1127.9� 1.99 1.91 1.26 1.75 2.31 2.78

Poisson + dG:JTT-F + dG 19 5148.9 4096.1 2210.2 1909.5 1912.7 4539.7� 1.54 1.50 1.01 1.13 1.73 2.22

a In each row, the null hypothesis (H0) is compared with another hypothesis (H1). Log-likelihood values were obtained assuming the NJ topologybased on the Poisson distance using the complete RNR sequence data set. All values of the likelihood ratio test statistic (×2log�) are significant (p< 10−6). In parentheses are given the number of sequences and the length of the alignment for each RNR class. Poisson + dG and JTT-F + dG arethe Poisson and Yang et al. (1998) models (see Materials and Methods) models assuming discrete gamma (with shape parameter �)-distributed ratesat sites.

145

146

(2.22; Table 3), because of a reduction in the proportionof invariable sites in the alignment. It should be noted,however, that the � values for the RNR classes listed inTable 3 were obtained from different data sets, involvingdifferent species and numbers of sequences, so they can-not be compared.

Figure 3 shows the unrooted NJ tree based on thePoisson + dG pairwise amino acid distances, using theungapped alignment of all the RNR sequences in thisstudy. The value of � used for distance calculation is thatobtained by joint likelihood comparison of all sequencesusing the JTT-F + dG model (i.e., � � 2.22; see Table3). The following general patterns are observed. (i) Thethree RNR classes form three separate clusters, highlysupported statistically [bootstrap values are 73 for RNRclass I and 100 for RNR class III, i.e., above the 70%Hillis and Bull’s (1993) criterion, respectively]. Underthe premise that they share a common ancestor, this ob-servation suggests that the three RNR classes evolvedindependently from each other before the diversificationof the tree of life. Within RNR class I, class Ib forms acluster clearly separated from class Ia (class Ib splitsfrom within class Ia, but this branching is insignificant).Class Ia groups eukaryotes apart from eubacteria anddivides the eubacteria into two separate clusters. Overall,species relationships are less defined for class Ia than forclass Ib, suggesting that class Ia has diverged more rap-idly. (ii) RNR classes I and II are more related to eachother than to class III. (iii) Sequences from eukaryoticviruses form a cluster which splits before the diversifi-cation of the RNR class Ia of their eukaryote hosts. Inaddition, RNR class II of Lactobacillus leichmanii formsa strongly supported clade with bacteriophages myco-bacteriophage L5 and roseophage SIO1 (bootstrap value94; Fig. 3). Apparently, this clade split before the diver-sification of the remaining RNR class II sequences. (iv)Overall the phylogenetic relationships among the se-quences reconstructed from each nrd class do not matchthe accepted phylogenetic relationships among the spe-cies. Thus, for example, RNR class Ia places eukaryotes(and eukaryote virus) together but cannot resolve thebase of the eukaryote tree and clusters different bacteriallineages intertwined with each other. Particularly notice-able is the positioning of the archaebacterium Halobac-terium, which forms a strongly supported cluster with theeubacterial lineages of P. aeruginosa and Chlamydia(bootstrap value 98; Fig. 3). In addition, the phylogenetictrees yielded by the different RNR classes are not con-

gruent with each other. Topological inconsistency be-tween trees is not caused by differences in the set ofoutgroups used for the rooting of each RNR class (i.e.,RNR classes I, II, and III are rooted using RNR classesII and III, I and III, and I and II, respectively), becausethe NJ trees obtained for each RNR class separately arebasically identical to their counterparts in the global treeshown in Fig. 3 (results not shown). Apparently, thethree RNRs evolve in a fashion beyond the constraintsimposed by the history of the cellular lineages that con-tain their coding genes.

Discussion

RNRs represent highly plastic proteins at the level oftheir primary sequence (similarity between classesranges from 25 to 10%) yet, as we show here, theirstructures are strikingly conserved (see Fig. 1). In addi-tion, all maintain the same protein radical chemistry. Ourstudy suggests that all modern RNR (classes I, II, and III)share a common ancestor that lived before the diversifi-cation of the tree of life. The different RNR arose byduplication, followed by divergence with acquisition ofnew adaptations (i.e., regarding their function and howthey were affected by oxygen). Taking into account whatis currently known about the function and distribution ofRNRs across the tree of life, together with the results ofthe present study, next we try to trace back some promi-nent episodes in the evolutionary history of this highlyheterogeneous protein family.

Our first question concerns which modern RNR rep-resents the ancestral enzyme. Figure 4 depicts present-day relationships among RNR classes that are expectedfrom nine duplication scenarios (a trivial scenario, thatthe three classes derived from each other at the sametime, i.e., the star tree, is not depicted). To discriminateamong these hypothetical scenarios we need to know thedirection of character evolution along the tree shown inFig. 3. Yet this tree is an unrooted tree (i.e., a network).Deciding the polarity of this network is complicated be-cause there are no outgroups (i.e., external sequences thatcan be used as a reference) available. In addition, differ-ent RNR classes are represented by different sets of taxa,which can affect the location of the root by alternativemethods such as the midpoint-rooting method. The po-sitioning of the root by this technique would be morereliable if the different RNR classes were represented in

<

Fig. 3. Neighbor-joining (NJ) tree based on the Poisson + dG(��2.25; see Table 3) distance for the RNR amino acid data. Graysquares represent archaea; gray circles, eubacteria; gray diamonds,eukaryotes; inverted black triangles, eubacteria viruses; upright blacktriangles, eukaryota viruses. Prt-�, proteobacteria beta subdivision;Prt-�, proteobacteria gamma subdivision; G+, Gram-positive bacteria;EA, euryarcheota; Prt-�, proteobacteria alpha subdivision; Chl, Chla-

mydia group; Prt-�, proteobacteria delta and epsilon subdivision; Aqu,aquificales; Spi, spirochaetes; TD, Thermus/Deinococcus group; CFB,cytophaga/flexibacter/bacteroides group; The, thermotogales; GS,green sulfur bacteria. Branch lengths are proportional to the scale,given as substitutions per nucleotide. Percentage bootstrap values(based on 1000 pseudo-replications) are given on the nodes for treesA–C.

147

the same species. Therefore we limit our analysis to C.acetobutylicum and P. aeruginosa, the only species car-rying RNR from each of the three classes. Figure 5shows the NJ topology obtained for the C. acetobutyli-cum and P. aeruginosa RNR data subset using the pois-son + dG model (� � 2.5; obtained with the JTT + dGmodel assuming the topology in Fig. 3), with the positionof the root inferred by the midpoint rooting method. Thismethod assumes that the most divergent taxa in the phy-logeny evolve at equal rates. To check this premise wetested the somewhat more restrictive assumption that therate of evolution is the same for all taxa (i.e., the globalclock assumption). The likelihood ratio test was per-formed using the JTT-F + dG model assuming the sametopology as in Fig. 5 (i.e., rooted along branch 2-5).Relaxing the global clock assumption does not lead to asignificant improvement of the likelihood score (−2log�� 12.9; 5 df, p > 0.01). Moreover, placing the root alongbranch 2-5 yields a likelihood score which is signifi-cantly higher than the likelihood scores that result frompositioning the root along branch 2-3 or 2-4 (−2log� �123.2; 5 df, p < 10−6, in both cases; by the RELL test ofKishino et al. (1990), which reinforces the hypothesisthat branch 2-5 is the true location of the root. Conse-quently, we can rule out scenarios b, c, e, f, g, and h inFig. 4 and are left with three alternative hypotheses forthe origin of the RNR: scenario a assumes that RNRclass I is the most primitive class, from which RNR IIIoriginated first, followed by class II; scenario d assumesthat RNR class II is the ancestral, from which RNR classIII originated first, followed by class I; and scenario gassumes that RNR class III is the ancestral, from whicheither class I or class II originated. The relative likeli-hood of these three hypotheses is discussed below inconnection with available data on structural and func-tional properties of the enzymes and what is knownabout the conditions of the primitive earth.

Because nodes 3, 4, and 5 in Fig. 5 are simultaneousin time, under a constant evolutionary rate the branchesleading from them to the terminal nodes should be of thesame length. Yet this is clearly not the case (e.g., branch2-3 is nearly three times longer than branch 2-4; see Fig.5). We have conducted a likelihood ratio test of the nullhypothesis that branches 2-3 and 2-4 are equal using theJTT-F + dG model and the topology shown in Fig. 5without the global clock assumption [test carried outwith the HYPHY program (Kosakovsky Pond and Muse2000)]. The unconstrained model is significantly betterthan the model constraining the two branches to be equal(−2log� � 3.38; 1 df, p ∼ 0.06), meaning that the evo-lutionary distance between Clostridium and Pseudomo-nas is significantly shorter when measured from RNRclass I than when estimated from RNR class II or III.This result is consistent with our above inference that thetree shown in Fig. 5 fits the global clock assumption,because increases and decreases in the rate of evolution

of each RNR protein could have canceled each other outin the long term. There are at least two hypotheses toexplain why class I appears to be evolving more slowly.(i) A larger proportion of the amino acid sites of thisprotein might be submitted to purifying selection. Yetthere is not an obvious reason why RNR class I shouldbehave differently than RNR classes II and III. (ii) Thegene encoding this protein could have arrived at the ge-nome of Clostridium by lateral transfer from Pseudomo-nas, or the other way around.

There is consensus in that, for lateral transfer to occur,first, the gene to be transferred should confer a selectiveadvantage on the recipient species; and/or, second, astrong selective environment favoring the growth andsurvival of the species containing the transferred geneshould exist (see Gupta 1998). Considering the relevanceof their metabolic role, together with the different sus-ceptibilities of RNR classes to oxygen, it seems reason-able to assume that at least the second scenario shouldhave been a factor contributing to lateral RNR transferduring early evolution. Besides the unexpected closenessbetween class I of Clostridium and class I of Pseudomo-nas discussed above, lateral gene transfer (plus loss ofthe phylogenetic signal) could account for the tangle-likeconfiguration of the global tree shown in Fig. 3. Specifi-cally, horizontal transmission might be most apparentfrom several statistically highly supported, yet unex-pected relationships in this tree, for example, the positionof the archaebacterium Halobacterium sp. close to theeubacteria Pseudomonas and Chlamydia from class IRNR (see Fig. 3). In fact, lateral transfer from eubacteriahas been hypothesized to explain the occurrence of ribu-lose-1,5-biphosphate (rbcl) in the halobacteriumHaloferax mediterranei [(Rawal et al. 1988); note thatrbcl is a very complex enzyme which is absent in allprimitive archaebacteria]. As we have pointed out, so farHalobacterium sp. is the only archaebacterium known tocontain RNR class I (note that other aerobic archaebac-teria such as Thermoplasma acidophilum and Pyrodic-tium carry solely class II). If we accept that Halobacte-rium sp. obtained its class I RNR by lateral transfer froma eubacterium [instead of vertically from the LUCA;note that Halobacterium clusters with Pseudomonasfrom RNR II as well, although this grouping is not sig-nificant (see Fig. 3)], it might be the case that archae-bacteria never had class I RNR.

Put together, the considerations above lead to a hy-pothesis for the origin and subsequent evolution of thedifferent RNR classes. Based on the Earth’s geologicalhistory, the conditions under which the earliest organ-isms evolved were anaerobic. Because of its anaerobicfunction RNR class III most likely is the primitive formfrom which all others derived. However, class II RNRcan function either with or without oxygen, hence class IImight alternatively be the ancestral state rather than afeature that evolved later in evolution, e.g., as an adap-

148

tation to oxygenic environments. This scenario is, how-ever, quite unlikely, for the following reasons. (i)AdoCbl, the molecule that activates class II RNR, isconspicuously more complex structurally than S-adenosylmethionine (SAM), the activator (together withan extra activator protein) of RNR class III; in fact, SAMhas been dubbed the poor man’s AdoCbl (Frey 1993).Moreover, AdoCbl biosynthesis involves many moresteps and enzymes (Roth 1996) than SAM, whose pro-duction is comparatively straightforward. (ii) RNR IIIcontains a Fe–S cluster, in protein NrdG, for the genera-tion of its glycyl radical. It is widely held that iron–sulfurclusters are among the most ancient, ubiquitous, andfunctionally diverse classes of biological prostheticgroups (Beinert 1997). It was even suggested that allchemical conversions of the primordial metabolismmight have occurred on the surface of iron–sulfur min-erals (Huber and Wachtershauser 1997). (iii) RNR IIIuses formate as the external reductant (Torrents 2001).This is a much simpler compound than the thioredoxin

and glutaredoxin system used by class I and II RNR. (iv)As we have shown (see Fig. 1 and Table 2) class III RNRgreatly resembles pyruvate formate lyase. PFL catalyzesa key step in anaerobic energy metabolism and shouldhave appeared early during the evolution of life (Rei-chard 1997). An enzyme similar to present-day class IIIreductase could then have ursurped the radical mecha-nism of a primitive form of the PFL to evolve into aribonucleotide reductase. Indeed, given the apparentstructural similarity between RNR III and PFL (see alsoLogan 1999), the possibility that class III RNR evolveddirectly from duplication of the pyruvate formate-lyaseseems quite likely. (v) Table 1 shows that an anaerobicmicroorganism can live indistinctly with class II or classIII RNR (e.g., Chlorobium tepidum and Methanococcusjannaschii; note that for these two species the wholegenome has been sequenced, such that the possibility thatthey could have additional, unknown nrd genes can besafely ruled out). Keeping this in mind, it is unclearwhich advantage could provide the origin of an anaero-

Fig. 4. Nine hypotheses for thesequence of duplications (a trivialhypothesis, that the three classesderived from each othersimultaneously, i.e., the starphylogeny, is not represented).

149

bic enzyme (i.e., class III) if another enzyme performingequally well already existed under those conditions (i.e.,class II). In principle, it makes more sense that class IIoriginated from class III, allowing the exploitation ofincreasingly aerobic environments. It seems, therefore,likely that RNR class III represents the primitive RNRclass.

Considering the long branches preceding the diversi-fication of each RNR in Fig. 3, the duplication of theancestral RNR III should have occurred very early inevolution. Free of functional constraints, the new copycould have evolved rapidly into class II. This processcould have occurred by a few changes in the C-terminalend of the protein, involving loss of the glycyl radical,with concomitant acquisition of affinity for AdoCbl (i.e.,an oxygen-independent radical generator). Releasedfrom the limitations imposed by oxygen, this new RNRclass would have allowed colonization of existing oxy-genic environments. There is good evidence that inEarth’s early atmosphere traces of oxygen existed as aconsequence of the photolysis of water, which could re-sult in localized oxygen oases (Kasting 1993). Both RNRclasses coexisted until the LUCA split into archaeabac-teria and eubacteria. The second duplication took placelater in the eubacterial lineage and resulted in the evo-lution of class I from class II RNR. This would explainwhy current archaebacteria seem to be devoid of class I

RNR (except Halobacterium, which appears to have ob-tained its class I RNR by later transfer; see above). Itseems very likely that the emergence of RNR class I meta selectively favorable environment, because it freed thesynthesis of deoxyribonucleotides from the requirementfor AdoCbl. Note that AdoCbl synthesis is restricted tosome eubacteria and archaebacteria; animals and someprotists require AdoCbl but cannot produce it, and plantsand fungi neither synthesize nor use AdoCbl (Roth et al.1996). In a later step, fusion of a RNR class I-carryingeubacterium with an archaebacterium (see Martin et al.2001) would have resulted in a eukaryotic cell containingthe three RNR classes. Classes II and III degenerated inmost eukaryotes, with the concomitant development ofaerobic metabolism. Also, since all eukaryote class IRNRs belong to type Ia, it seems most plausible thatclass Ib RNR originated from class Ia in eubacteria afterthe origin of the eukaryotic cell. This sequence is alsosupported by the observation that class Ib is the onlyRNR lacking one of the two allosteric sites.

Acknowledgments. F.R.-T. has received support from the Ministeriode Educacion y Cultura (Spain; Contrato Ramon y Cajal). E.T. and I.G.were supported by a grant from the Spanish Direccion General deEnsenanza Superior e Investigacion Cientıfica (PB97-0196). We wouldlike to express our gratitude to Dr. Albert Jordan for revising themanuscript and to Prof. Peter Reichard for helpful, stimulating discus-sions and revising the manuscript.

Fig. 5. The molecular clock maximum likelihood tree of the relationships among the three nrd classes from Clostridium acetobutylicum andPseudomonas aeruginosa. Internal node numbers are shown in gray boxes.

150

References

Aloy P, Querol E, Aviles FX, Sternberg MJE (2001a) Automated struc-ture-based prediction of functional sites in proteins—Application toassessing the validity of inheriting protein function from homologyin genome annotation and to protein docking. J Mol Biol 311:395–408

Aloy P, Oliva B, Querol E, Aviles FX, Russell RB (2001b) Structuralsimilarity to link sequence space. New potencial superfamilies andimplications for structural genomics. Protein Sci 11:1101–1116

Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller M,Lipman DJ (1997) Gapped BLAST and PSI-BLAST: A new gen-eration of protein database search programs. Nucleic Acids Res25:3389–3402

Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer EL(2000) The Pfam protein families database. Nucleic Acids Res28:263–266

Becker A, Fritz-Wolf K, Kabsch K, Knappe J, Schultz S, Volker Wag-ner AF (1999) Structure and mechanism of the glycyl radical en-zyme pyruvate formate-lyase. Nature Struct Biol 6:969–975

Beinert H, Holm RH, Munck E (1997) Iron-sulfur clusters: Nature’smodular, multipurpose structures. Science 277:653–659

Berglund O (1972) Ribonucleoside diphosphate reductase induced bybacteriophage T4: Allosteric regulation of substrate specificity andcatalytic activity. J Biol Chem 247:7276–7281

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H,Shindyalov IN, Bourne PE (2000) The Protein Data Bank. NucleicAcids Res 28:235–242

Chothia C, Lesk AM (1986) The relation between the divergence ofsequence and structure in proteins. EMBO J 5:823–826

Eliasson R, Pontis E, Jordan A, Reichard P (1999) Allosteric control ofthree B12-dependent (class II) ribonucleotide reductases. Implica-tions for the evolution of ribonucleotide reduction. J Biol Chem274:7182–7189

Eriksson M, Uhlin U, Ramaswamy S, Ekberg M, Regnstrom K,Sjoberg B-M, Eklund H (1997) Binding of allosteric effectors toribonucleotide protein R1: Reduction of active-site cysteines pro-motes substrate binding. Structure 5:1077–1092

Felsenstein J (1985) Confidence limits on phylogenies: an approachusing the bootstrap. Evolution 39:783–791

Freeland SJ, Knight RD, Landweber L (1999) Do proteins predateDNA? Science 286:690–692

Frey PA (1993) Lysine 2,3-aminomutase: Is adenosylmethionine a poorman’s adenosylcobalamin? FASEB J 7:662–670

Gleason FK, Hogenkamp HP (1970) Ribonucleotide reductase fromEuglena gracilis, a deoxyadenosylcobalamin-dependent enzyme. JBiol Chem 245:4894–4899

Gupta RS (1998) Protein phylogenies and signature sequences: A re-appraisal of evolutionary relationships among archaebacteria, au-bacteria, and eukaryotes. Microbiol Mol Biol Rev 62:1435–1491

Holm L, Sander C (1997) Decision support system for the evolutionaryclassification of protein structures. ISMB 5:240–246

Hillis DM, Bull JJ (1993) An empirical test of bootstrapping as amethod for assessing confidence in phylogenetic analysis. Syst Biol42:182–192

Huber C, Wachtershauser G (1997) Activated acetic acid by carbonfixation on (Fe, Ni)S under primordial conditions. Science 276:245–247

Huelsenbeck JP, Crandall KA (1997) Phylogeny estimation and hy-pothesis testing using maximum likelihood. Annu Rev Ecol Syst28:437–466

Jones DT, Taylor WR, Thornton JM (1992) The rapid generation ofmutation data matrices from protein sequences. CABIOS 8:275–282

Jordan A, Reichard P (1998) Ribonucleotide reductases. Annu RevBiochem 67:71–98

Kasting JF (1993) Earth’s early atmosphere. Science 259:920–926

Kishino H, Miyata T, Hasegawa M (1990) Maximum likelihood infer-ence of protein phylogeny and the origin of chloroplasts. J MolEvol 31:151–160

Kosakovsky Pond SL, Muse SV (2000) HYPHY: Hypothesis testingusing phylogenies (kernel beta 0.71). University of Arizona (dis-tributed by the authors)

Kraulis PJ (1991) MOLSCRIPT: A program to produce both detailedand schematic plots of protein structures. J Appl Cryst 24:946–950

Kyrpides N, Overbeek R, Ouzounis C (1999) Universal protein fami-lies and the functional content of the last universal common ances-tor. J Mol Evol 49:413–423

Logan DT, Andersson J, Sjoberg B-M, Nordlund P (1999) A glycylradical site in the crystal structure of a class III ribonucleotidereductase. Science 283:1499–1504

Martin W, Hoffmeister M, Rotte C, Hence K (2001) An overview ofendosymbiotic models for the origins of Eukaryotes, their ATP-producing organelles (mitochondria and hydrogenosomes), andtheir heterotrophic lifestyle. Biol Chem 382:1521–1539

Murzin AG (1993) Sweet-tasting protein monellin is related to thecystatin family of thiol proteinase inhibitors. J Mol Biol 230:689–694

Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: Astructural classification of proteins database for the investigation ofsequences and structures. J Mol Biol 247:536–540

Nicholas KB Jr, Nicholas HB (1997) GeneDoc: A tool for editing andannotating multiple sequence alignments, version 2.6.2001 (distrib-uted by the authors)

Nikas I, McLauchlan J, Davison AJ, Taylor WR, Clements JB (1986)Structural features of ribonucleotide reductase. Proteins 1:376–384

Patthy L (1999) Protein evolution. Blackwell Science, Malden, MARawal N, Kelkar SM, Altekar W (1988) Ribulose 1,5-bisphosphate

dependent CO2 fixation in the halophilic archaebacterium, Halo-bacterium mediterranei. Biochem Biophys Res Commun 156:451–456

Reichard P (1993) From RNA to DNA, why so many ribonucleotidereductases? Science 260:1773–1777

Reichard P (1997) The evolution of ribonucleotide reduction. TIBS22:81–85

Rodrıguez-Trelles F, Tarrıo R, Ayala FJ (1999) Molecular evolutionand phylogeny of the Drosophila saltans species group inferredfrom the Xdh gene. Mol Phylogenet Evol 13:110–121

Rodrıguez-Trelles F, Alarcon L, Fontdevila A (2000) Molecular evo-lution and phylogeny of the buzzatii complex (Drosophila repletagroup): A maximum-likelihood approach. Mol Biol Evol 17:1112–1122

Roth JR, Lawrence JG, Bobik TA (1996) Cobalamin (coenzyme B12):Synthesis and biological significance. Annu Rev Microbiol 50:137–181

Russell RB (1998) Detection of protein three-dimensional side-chainpatterns: New examples of convergent evolution. J Mol Biol 279:1211–1227

Russell RB, Barton GJ (1992) Multiple protein sequence alignmentfrom tertiary structure comparison: Assignment of global and resi-due confidence levels. Proteins 14:309–323

Russell RB, Sasieni PD, Sternberg MJE (1998) Supersites within su-perfolds. Binding site similarity in the absence of homology. J MolBiol 282:903–918

Sjoberg B-M (1997) Ribonucleotide reductases—A group of enzymeswith different metallosites and a similar reaction mechanism. StructBond 88:139–173

Stubbe J, van der Donk WA (1998) Protein radicals in enzyme cataly-sis. Chem Rev 98:705–762

Stubbe J, Ge J, Yee CS (2001) The evolution of ribonucleotide reduc-tion revisited. TIBS 26:93–99

Stutzenberger F (1974) Ribonucleotide reductase of Pithomyces char-tarum: Requirement for B12 coenzyme. J Gen Microbiol 81:501–503

Tamarit J, Mulliez E, Meier C, Trautwein A, Fontecave M (1999) The

151

anaerobic ribonucleotide reductase from Escherichia coli. Thesmall protein is an activating enzyme containing a [4Fe-4S](2+)center. J Biol Chem 274:31291–31296

Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG(1997) The CLUSTAL-X windows interface: Flexible strategies formultiple sequence alignment aided by quality analysis tools.Nucleic Acids Res 25:4876–4882

Tollinger M, Konrat R, Hilbert BH, Marsh ENG, Krautler B (1998)How a protein prepares for B12-binding subunit of glutamate mu-tase from Clostridium tetanomorphum. Structure 6:1021–1033

Torrents E, Eliasson R, Wolpher H, Graslund A, Reichard P (2001) Theribonucleotide reductase from Lactococcus lactis. Interactions be-tween the two proteins NrdD and NrdG. J Biol Chem 276:33488–33494

Uhlin U, Eklund H (1994) Structure of ribonucleotide reductase proteinR1. Nature 370:533–539

Woese CR, Kandler O, Wheelis ML (1990) Towards a natural system

of organisms: Proposal for the domains Archaea, Bacteria and Eu-karia. Proc Natl Acad Sci USA 87:4576–4579

Xia X (2000) Data analysis in molecular biology and evolution. De-partment of Ecology and Biodiversity, University of Hong Kong,Hong Kong

Yang Z (1996a) The among-site rate variation and its impact on phy-logenetic analyses. TREE 11:367–372

Yang Z (1996b) Maximum likelihood models for combined analyses ofmultiple sequence data. J Mol Evol 42:587–596

Yang Z, Lauder IJ, Lin HJ (1995) Molecular evolution of the hepatitisB virus genome. J Mol Evol 41:587–596

Yang Z, Nielsen R, Hasegawa M (1998) Models of amino acid sub-stitution and applications to mitochondrial DNA evolution. J MolBiol Evol 15:1600–1611

Yang Z (2000) Phylogenetic analysis by maximum likelihood (PAML),version 3.0a University College London, London

152


Recommended