331T. Satyanarayana et al. (eds.), Thermophilic Microbes in Environmental and Industrial Biotechnology: Biotechnology of Thermophiles, DOI 10.1007/978-94-007-5899-5_12, © Springer Science+Business Media Dordrecht 2013
Abstract Elucidation of the origin and the early evolution of life is fundamental to our understanding of ancient living systems and of the ancient global environment where early life evolved. A number of molecular phylogenetic trees have been con-structed by comparing the homologous gene sequences.
In this chapter, we have reviewed the universal trees constructed based on differ-ent types of genetic information. The tree topology was different depending on the type of the gene analyzed as well as the method used. The root of the universal tree is most likely placed between the bacterial branch and the common ancestor of Archaea and Eucarya. However, there are possibilities that the root may be within the bacterial branches.
Monophyly of Archaea is rather controversial. Though the rRNA tree suggested the monophyly, other types of the tree are also reported. The conclusive result where the Eucarya originated within/outside of the branch of Archaea is yet to come.
The growth temperature of the ancient organism has long been a topic that has interested many scientists. Theoretical works suggested mesophilic, thermophilic, and hyperthermophilic origin of life, depending on the report. Experimental test analyzing the effect of each or combination of ancestral amino acid residues sug-gested the hyperthermophilic origin of life. However, we cannot totally deny the possible artifact based on the method used for the estimation of ancestral sequences possessed by the ancestral organisms.
Keywords
*)
1432-1 Horinouchi, Hachioji-shi, Tokyo 192-0392, Japane-mail: [email protected]
Chapter 12Comparative Genomics of Thermophilic Bacteria and Archaea
Satoshi Akanuma, Shin-ichi Yokobori, and Akihiko Yamagishi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
332 S. Akanuma et al.
12.1 Introduction
Elucidation of the origin and the early evolution of life is fundamental to our understanding of ancient living systems and of the ancient global environment where early life evolved. Extant genes are evolutionary descendants of ancient genes. Consequently, information on the traits of ancient genes is embedded in the sequences of extant genes. A number of molecular phylogenetic trees have been constructed by comparing the homologous gene sequences. However, the information used in con-structing the phylogenetic trees has been limited. The topologies of the trees largely depend on the genes analyzed. In this chapter, we first review the phylogenetic trees built by several different ways and their possible interpretation. We then discuss the nature of the last universal common ancestor predicted from such phylogenetic analy-ses. Finally, we introduce several studies where ancient proteins were reconstructed by the combination of computational prediction and experimental resurrection of
1998).
12.2 Topology of Universal Trees
In this section, we will review the point to be considered to obtain the true universal trees, as well as the genes to be used for the construction of the universal tree.
12.2.1 Ribosomal RNA Gene Trees
used for phylogenetic analyses. All living organisms contain rRNAs, which are the main components in ribosome involved in protein synthesis. Although the copy numbers of ribosomal RNA genes are often multiple, they are almost identical in an organism (see Hillis and Dixon 1991 -ism as well as isolated ones have been extensively analyzed (Barns et al. 1994; Ward et al. 1990rRNA (gene) sequences and consequently suggested monophyletic status of Bacteria, Archaea, and Eucarya (Woese et al. 1990).
the basal position of Archaea and Bacteria (i.e., Woese et al. 1990; Stetter 2006), sug-gesting the (hyper)thermophilic ancestry of Archaea and Bacteria (Woese et al. 1990; Stetter 2006 1998). However, hyperthermophilic and thermophilic
2002). Because varied nucleotide compositions among operational taxonomic units
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
333
1990; Hasegawa and Hashimoto 1993), it could be that placing hyperthermophilic and thermophilic organisms at the basal position of Archaea and of Bacteria is the
-
than those of other organisms (see Woese’s tree: Woese et al. 1990evolutionary rate among taxa can also cause the unreliable phylogenetic tree (e.g.,
1998group and slow-evolving taxa form another group in a phylogenetic tree. Thus, fast-
Fast-evolving taxa tend to be placed near the basal position of the phylogenetic
genes generally show faster evolutionary rate than do bacterial and archaeal genes (e.g., the tree reported in Woese et al. (1990)). Therefore, it is difficult to examine the precise phylogenetic position of Eucarya in the universal tree.
Organisms with parasitic life often show accelerated evolutionary rate. If an evo-lutionary model where evolutionary pattern and rate are invariable among sites and across time was used, the substitution rate may be underestimated for first-evolving sites and branches, and overestimated for invariant/slow-evolving sites and branches. Accordingly, fast-evolving taxa tend to be placed near the basal position of the tree
Nanoarchaeum equitans, a parasite of another archaeon, is the only member of Nanoarchaeota and often represented to be the basal group of Archaea (Huber et al. 2002; Waters et al. 2003). Because N. equitans has a long branch in the archaeal tree, the basal position of N. equitans
12.2.2 Protein Gene Trees
Many genes encoding proteins have been also used for building universal trees. Elongation factors (EFs) trees (see below for more details) suggested the monophyly of each Archaea, Bacteria, and Eucarya (Iwabe et al. 1989; Baldauf et al. 1996). The monophylies of the three groups were also suggested by the analysis of RNA poly-merase sequences (Iwabe et al. 1991) and of ribosomal protein sequence analysis
2010 2010) suggested that Archaea and Bacteria tend to show similar phylogenetic trend based on about 100 universal trees.
12.2.3 Genome Trees
Increasing number of complete genome sequences (see public databases such as gen-
information. To obtain the reliable phylogenetic tree using genome level information, all of the genes to be analyzed must be orthologous. However, it is not easy to judge
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
334 S. Akanuma et al.
if the protein genes are orthologous or not. For example, the archaeal elongation factor 1 (aEF-1 ) is generally regarded as the ortholog of eukaryotic EF-1 (eEF-1 ). However, aEF-1 is functionally similar to eukaryotic release factor 3 and HBS1, which are paralogs of eEF1 (Saito et al. 2010). Can we think that aEF-1ortholog of eEF-1 ? It is not easy to answer this question. We, however, need to remember that the aEF-1 might be under certain selection pressure different from eEF-1
-tributes to the ribosomal recycling in the termination process of translation (Zavialov et al. 2005that show different characteristics (Suematsu et al. 2010share the same origin, evolutionary constraints (e.g., rate of substitution, invariable residues) are expected to be different if the roles of these proteins are different.
absent from the analysis. Certain bacterial species have natural transformation ability (e.g., Thermus spp., Claverys et al. 2009; Koyama et al. 1986; Hidaka et al. 1994). Horizontal gene transfer event occurred frequently during the early evolution of Bacteria and Archaea. For instance, 24% of protein genes within the Thermotoga mar-itima genome are likely to be the descendants of archaeal genes (Nelson et al. 1999). Horizontal gene transfer between Eucarya and Bacteria may have occurred during the early stage of eukaryote evolution, for example, Tunicata (or Urochordata), one of three subphyla belonging to Chordata. Tunicates are the only multicellular animals producing cellulose. Recent studies suggested that the common ancestor of tunicates might have acquired bacterial cellulose synthetase genes (Sagane et al. 2010).
The numbers of the genes suitable for phylogenetic analysis are limited. Only 31 protein gene families were used for the analysis by Ciccarelli et al. (2006). Most of them are members of the protein families related to translation and transcription. They have reported an unrooted tree including Archaea, Bacteria, and Eucarya. In the tree, Bacteria can be divided into three groups: the basal group is represented by Firmicutes including Bacilli, Clostridia, and Mycoplasmatales; the second group includes Actinobacteria and Bacteroidetes; and the third group consists of Proteobacteria, Cyanobacteria, Deinococcus-Thermus group, and thermophilic Thermotogales and Aquificalessuggests that the common ancestor of Bacteria was not (hyper)thermophilic although Firmicutes include thermophilic species (it should be noted that the root cannot be
Harris et al. (2003) proposed a different type of the tree obtained from genome data. They analyzed 80 conserved clusters of genes throughout the three domains and
A problem for the genome-based phylogenetic analyses with primary sequences is how to select the genes (regions) for analyses. Although more than 1,000 com-
the function of several 10% of predicted protein genes are not known. As mentioned above, only tens of the protein genes were used for phylogenetic analyses by Ciccarelli et al. (2006) and Harris et al. (2003).
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
335
12.2.4 Other Approaches
Other approaches for reconstruction of universal tree also have been reported. Wang et al. (2007) used protein structures to infer relationship among life. They compared presence/absence of each protein structure (or protein structure family) among spe-cies of which complete genome sequences are known. By counting the number of protein structures conserved among species, they estimated the relationship among species. In their conclusion, Eucarya and Archaea appear as monophyletic groups and do not seem to form a group in their (unrooted) tree.
Another method to determine the direction of evolution is to utilize the evolu-tionary event that can indicate direction of evolution. Existence and absence of ret-rotransposon in certain loci has been used for phylogenetic analyses (e.g., Shimamura et al. 1997) because retrotransposon is first transcribed and then inserted into other positions of genome from the original position by reverse transcription of the RNA followed by insertion event. Sharing the same retrotransposon sequence at the same locus within their genomes suggests that these two species diverged after the period when the retrotransposon was inserted to the position. However, this kind of phylo-genetic marker is rare.
12.3 Placing the Root on the Universal Tree
To identify the root of extant organisms, a multiple gene (protein) tree of paralogous genes which might be duplicated into two or more prior to the age of Commonote has been constructed. Commonote has been positioned at the different branches, depending on the type of the gene and the analytical method used.
12.3.1 The Root on the Bacterial Branch
The root is most often placed between Bacteria and common ancestor of Archaea and Eucarya. Iwabe et al. (1989) reconstructed the multiple gene tree of EF-Tu/EF-1of translation, the date of diversification to EF-Tu/EF-1assumed to be before the age of Commonote. Therefore, the Commonote is expected to be located on the branch connecting the EF-Tu/EF-1
EF-1 clade. In turn, the EF-Tu/EF-1EF-2. In the trees of Iwabe et al. (1989), Archaea is the sister group of Eucarya, and Bacteria is their sister group. The similar result was reported by using larger dataset of these proteins (Baldauf et al. 1996 1989) also reported the close relationship between Archaea and Eucarya based on the H+
However, in the case of the H+
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
336 S. Akanuma et al.
1993 2006), 1990) adopted the position
of Commonote in the Bacteria branch to the small subunit ribosomal RNA tree and proposed the three domains of life, Archaea, Bacteria and Eucarya.
Brown and Doolittle (1995) also reconstructed a composite tree for three closely related aminoacyl-tRNA synthetases: that is, valyl tRNA synthetase, leucyl tRNA synthetase, and isoleucyl tRNA synthetase. From the isoleucyl tRNA synthetase part of the tree, Brown and Doolittle (1995) suggested the Archaea/Eucarya clade. However, after the report, Brown et al. (2003) have noticed the two types of bacterial isoleucyl tRNA synthetase present. One of them shows mupirocin resistance, and this type is found in limited lineage of bacteria. Brown et al. (2003) suggested that the bacterial isoleucyl tRNA synthetase resistant to mupirocin appeared indepen-dently to the mupirocin-sensitive bacterial isoleucyl tRNA synthetase, and the eukaryotic isoleucyl tRNA synthetase may originate from the mupirocin-resistant isoleucyl tRNA synthetase. In the earlier work of Brown and Doolittle (1995), mupi-rocin-resistant isoleucyl tRNA synthetases were not included. Therefore, at least from the isoleucyl tRNA synthetase data, we cannot conclude the monophyly of the Archaea/Eucarya clade. On the other hand, another analyses of aminoacyl-tRNA synthetases (seryl, tyrosyl, and tryptophanyl tRNA synthetases) had suggested simi-lar relationship among Archaea, Bacteria, and Eucarya to the tree of Iwabe et al. (1989) (Kollman and Doolittle 2000), although threonyl tRNA synthetase did not.
When we use base or amino acid sequences for reconstruction of molecular phy-logenetic trees, we have to distinguish orthologs from paralogs. Orthologs have common ancestry and share the same function in the biological processes. On the other hand, paralogs have common ancestry, but have different functions in the bio-logical processes. For example, human elongation factor 1 (EF-1 ) and chimpan-zee EF-1 are orthologs since they share their ancestry and the same biological function in the translation. On the other hand, EF-1 and mitochondrial EF-Tu are paralogs. Though both proteins are responsible for elongation process of translation to bring aminoacylated tRNAs to the A site of ribosome, eukaryotic EF-1 works in cytoplasm, while mitochondrial EF-Tu works in mitochondria. When we want to reconstruct species tree, accidental inclusion of paralogous genes (proteins) may mislead to wrong trees. However, the discrimination of the orthologs from the para-logs is not obvious (see discussion above on the aEF-1 ).
12.3.2 The Commonote as a Member of Bacteria
12.3.2.1 Cavalier-Smith’s Hypothesis
Several recent studies have suggested that the Commonote was the member of Bacteria. In other words, the root was placed within the Bacterial branches. Cavalier-Smith (2002, 2006a, b, 2010), for example, has suggested that Commonote is in Eobacteria. In his hypothesis, the oldest extant lineage is
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
337
Eobacteria. Eobacteria in his term include Chloroflexi. Negibacteria (overlapping with gram-negative bacteria) includes Eobacteria and Glycobacteria (consists of Cyanobacteria, Proteobacteria, and so on) with two-layered surface mem-brane. Eobacteria are older groups in the course of evolution in the Cavalier-Smith’s hypothesis. Posibacteria (overlapping with gram-positive bacteria) with single-layered membrane was originated from Negibacteria (Glycobacteria) in his view. From Posibacteria, common ancestor of Eukaryotes and Archaebacteria might be appeared.
His hypothesis on the evolution of life depends on several observations, one being the structure of membranes surrounding cells. Most of others are also pres-ence/absence of certain structures (proteins and other high-weight molecules) in the cell. Cavalier-Smith divided life into two classes, one is the class of two-membrane organisms and another is the class of single-membrane organisms (Cavalier-Smith 2002, 2006a, b, 2010). In addition, he predicted evolutionary direction from two-membrane organisms to single-membrane organisms. This
Smith 2001).The hypothesis of Cavalier-Smith on the early evolution of life and the evolution
of Bacteria, Archaea, and Eucarya depend on the various topics he gathered. However, in general, it is very difficult to tell the evolutionary directions of traits. Although the discussion of Cavalier-Smith (2001, 2002, 2006a, b, 2010) is fruitful for the research on early evolution of life, the standing position is very different from others: most of them find bases in the molecular phylogenetic analyses. Only when we accept his obcell hypothesis, the direction of evolution of traits, and then the direction of evolution of life, can be accepted.
12.3.2.2 Lake’s Hypothesis
living organisms based on the indel analyses of various pairs of protein genes
et al. 2007 2008, 2009). They also delineated the early evolution of 2004). They suggested the pos-
sible close relationship between Archaea and Firmicutes (in particular Bacilli), based on the indel analyses (absence/presence of residues at the well-conserved region). Then, they placed the root of all of life between Actinobacteria and Clostridia 2008, 2009). In other words, Actinobacteria and Clostridia
Commonote was the gram-positive bacteria similar to Actinobacteria and/or Clostridia. This conclusion is opposite to the conclusion on the early evolution of life presented by Cavalier-Smith (2002, 2006a, b, 2010), although both conclu-sions suggest that Commonote was not Archaea, but Bacteria. We need to note
(e.g., Valas and Bourne 2009).
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
338 S. Akanuma et al.
12.4 Are Archaea Monophyletic?
Another issue remained to be answered is whether Eucarya is a subgroup of Archaea 1990) suggested
that Eucarya and Archaea are distinct monophyletic groups.
12.4.1 On the Origin of Eucarya
Determination of the origin of eukaryotic cell is challenging. Substantial evidence has been accumulated on the bacterial origin of mitochondria and plastids (chloro-plasts). Molecular phylogenetic analyses have suggested that the mitochondrion is derived from Alphaproteobacteria (e.g., Andersson et al. 1998) and the plastid from Cyanobacteria (e.g., Rodríguez-Ezpeleta et al. 2005). Therefore, the early eukary-otic cells incorporated bacterial genes through mitochondria and plastid symbiosis. Because mitochondria are the organelle responsible for respiration and related metabolism, and because plastid is responsible for photosynthesis, many eukaryotic metabolic genes are the descendants of early bacterial genes.
Very few evidence is present regarding the origin of cytoplasm and nucleus. The transcription and translation systems of Eucarya are similar to those of Archaea rather than of Bacteria (e.g., Werner 2007 2009). Therefore, Eucarya are thought to be relatives of Archaea rather than Bacteria. However, there are tens of hypothesis on the origin of nucleus (see the review by Martin 2005). Some of them are going to be reviewed in the following sections.
12.4.2 Eocyte Hypothesis
1984; 1992). Eocytes are one of the subgroups of Archaea, which include
some groups in Crenarchaeota. They have analyzed the phylogeny using the indel trait. Since nucleotide insertion/deletion event is thought to occur much less fre-quently than base substitution during evolution, parallel evolution does not seriously affect the phylogenetic analyses. Accordingly, shared insertion/deletion between orthologous sequences can be a good phylogenetic signal. By comparing the exis-tence of indels in the EF-Tu/EF-1concluded that the Eocytes (Crenarchaeota) are the sister group of Eucarya.
Recent phylogenetic analyses of combined data of large and small ribosomal RNA genes, as well as concatenated protein genes, supported the Eocyte hypothesis (Cox et al. 2008; Foster et al. 2009). In these analyses, a method allowing heteroge-neity of nucleotide composition through time was adopted. The evolutionary rates
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
339
of ribosomal RNA genes are accelerated in Eukaryotic lineage (Cox et al. 2008). Therefore, we cannot rule out the possibility that Eucarya being the distinguished group from Archaea because of the long branch attraction in the Woese’s tree (Woese 1987; Woese et al. 1990). In the case of Woese’s molecular phylogenetic
1969) was used for the estimation of evolutionary distances. This is the simplest substitution model for nucleotide sequences, and the effects of transversions on the sequence evolution might be over-estimated. In addition, there was no consideration on the rate heterogeneity among sites and branches, in Woese’s original tree, because of the limitation of analytical technique in those days.
12.4.3 Other Hypothesis on the Archaeal Origin of Eucarya
Euryarchaeota is also a candidate of the closest relative to Eucarya. Martin and his -
chondria and hydrogenosomes (Martin and Müller 1998). According to Martin and Müller (1998 -biosis and the origin of eukaryotic cells. In agreement to the hypothesis, several molecular studies proposed that Eucarya are the closest relatives to methanogens (e.g., Sandman et al. 1990).
It has been argued that Thermoplasmatales or their close relatives were hosts of eukaryotic cells (e.g., Searcy et al. 1978; Margulis 1996). Thermoplasmatales lack cell wall (Darland et al. 1970) and therefore can be good hosts for the intracellular symbiosis. Currently, there are few molecular evidences that directly support the close relationship between Eucarya and Thermoplasmatales 2007; see also Shimizu et al. 2007). In addition, Thermoplasma (and Archaeoglobus) MreB, a bacterial/archaeal homolog of actin, is closely related to eukaryotic actin rather than to those of methanogens (Hara et al. 2007), although the direct ancestor of eukary-otic actin may be different.
The universal tree has been used to obtain the information regarding the origin of 2008) analyzed eukaryotic protein genes that were the descen-
dants of archaeal genes and found that most of them were the sister group of all archaeal orthologs. The result suggests that the Archaea and Eucarya form different
et al. (2007) reported that eukaryotic genes show high affinity to Alphaproteobacteria, Cyanobacteria, and Thermoplasmatales. If the affinity to Alphaproteobacteria is caused by the mitochondrial origin of the genes and if affinity to Cyanobacteria is caused by the plastid origin of the genes, nuclear genes of Eucarya may most closely related to those of Thermoplasmatales.
Recently, Kelly et al. (2011) suggested the Thaumarchaeal origin of Eucarya based on the presence/absence of protein gene families. They also suggested the ancestral characteristics in methanogens in Archaea.
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
340 S. Akanuma et al.
12.5 Was the Commonote Thermophile?
12.5.1 Theoretical Analyses on the Growth Temperature of Commonote
The growth temperature of the ancient organism has long been a topic that has 2010). In a well-accepted phylogenetic
are represented in the deepest and shortest branches (Woese 1987; Achenbach-Richter et al. 1988). Based on this observation, Stetter described that the common ancestors of Archaea and of Bacteria were likely hyperthermophilic (Stetter 2006).
Commonote is also parsimoniously thought to have been thermophilic. However, it cannot be ruled out the hypothesis that the most ancestral organism lived in a
Warwicker 2007).
positive supercoils into circular DNAs in vitro (Kikuchi and Asai 1984). Although the precise role of reverse gyrase in vivo is still unknown, the facts that the protein is found only in thermophiles and that all known hyperthermophiles contain this protein suggest an essential role of reverse gyrase in the adaptation of life to very high temperatures (Forterre 2002). Indeed, a reverse gyrase knockout Thermococcus kodakaraensis mutant can grow at 90 °C but not at 93 °C at which the growth of wild-type T. kodakaraensis can be observed (Atomi et al. 2004). Therefore, although reverse gyrase is not the absolute requirement, the emergence of this enzyme was crucial in the origin of hyperthermophiles. An important struc-tural feature of reverse gyrase is that the protein is composed of two non-related domains, a topoisomerase domain and a helicase domain (Declais et al. 2000). It is apparent that these two domains could not have been fused to produce a single-chained reverse gyrase molecule before topoisomerase and helicase families were diverged. Therefore, assuming that reverse gyrase is an essential protein for hyper-thermophilic organisms (Heine and Chandra 2009), the primitive microorganisms could not be hyperthermophilic. In addition, eukaryotic type I DNA topoisomerase interacts with helicases in vivo, suggesting that type I topoisomerases and helicases originated and evolved independently in mesophiles or thermophiles and later recruited to hyperthermophiles (Forterre 1996). This argument suggests that hyper-thermophiles descended from less thermophilic organisms, but does not preclude the idea that reverse gyrase had evolved prior to the appearance of the last universal common ancestor.
1999) established a model of sequence evolution and estimated the
-fore concluded that the Commonote was likely a mesophile. However, a different
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
341
same genome data set by using maximum parsimony and then claimed that the last 2001).
the hypothesis that the last universal common ancestor was a hyperthermophile 2003a, b).
Ancestral amino acid compositions were also computed. Brooks et al. estimated amino acid compositions of a set of proteins postulated to have existed in the last universal common ancestor using an expectation-maximization method (Brooks et al. 2004). The calculated amino acid composition of this protein set was more similar to the observed composition of the same set in extant thermophilic species than in extant mesophilic species. They therefore concluded that the Commonote lived in a thermophilic environment.
Becerra et al. focused on the evolution of protein disulfide oxidoreductases and then implicated the thermostabilities of proteins in the Commonote. The results imply that disulfide oxidoreductase sequence was missing in genome of the last universal common ancestor, suggesting non-thermophilic ancestry (Becerra et al. 2007). However, it should be noted that disulfide bond formation is not necessarily required for the high thermostability of thermophilic proteins. Indeed, a number of thermophilic and hyperthermophilic proteins lack or contain few cysteine residues (Cambillau and Claverie 2000).
Recently Boussau et al. conducted computational analyses of both rRNA and protein sequences (Boussau et al. 2008). The results suggested that the Commonote was mesophilic and, subsequently, the common ancestor evolved divergently to thermophilic ancestors of Bacteria and of Archaea-Eucarya that were adapted to high temperature, possibly in response to a climate change of early earth.
Thus, a number of theoretical studies have argued the growth temperature of the last universal common ancestor, but these studies remained inferential due to the lack of empirical testing. In the next section, we will describe some experi-mental studies to assess if the Commonote was thermophilic, performed in authors’ laboratory.
12.5.2 Experimental Testing if the Commonote Was (Hyper) thermophilic
Ancestral sequences of a particular protein can be inferred by comparison with extant homologous protein sequences (Messier and Stewart 1997; Bielawski and
2003; Thornton 2004). We have developed an experimental way to assess the antiquity of hyperthermophilic organisms using an inferred amino acid sequence of a protein postulated to exist in the Commonote. In this experimental method, inferred ancestral residues are introduced into an extant protein and then the thermal stabilities of the resulting mutant proteins are examined. If the Commonote was thermophilic, the mutant proteins, each of which contains one or a few inferred
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
342 S. Akanuma et al.
ancestral residues, are expected to show the trend toward enhanced thermal stability when compared to the wild-type protein.
Miyazaki et al. inferred an ancestral amino acid sequence of 3-isopropylmalate
-thermophilic archaeon, Sulfolobus tokodaii (Miyazaki et al. 2001). When the ther-mal stabilities of the resulting mutant proteins were investigated by measuring the remaining activity after heat treatment and the change in 222-nm ellipticity upon thermal unfolding, at least five of the seven ancestral mutants tested showed thermal
extremely thermophilic bacterium, Thermus thermophilus, was also used for the experimental testing as another model protein. Watanabe et al. designed 12 ances-tral mutants each containing an ancestral amino acid residue that was postulated to be present in the common ancestor of Bacteria and Archaea (Watanabe et al. 2006). When the thermal stabilities of the designed mutants were compared to the wild-
T. thermophilus, at least 6 of the 12 ancestral mutants designed exhibited enhanced thermal stability. A similar trend was also observed when we constructed ancestral mutant proteins of isocitrate dehydrogenase (ICDH) from the extremely thermophilic archaeon, Caldococcus noboribetus (Iwabata et al. 2005). At least four of the five ancestral mutants, each containing an ancestral amino acid residue, showed thermal stability higher than that of the wild-type ICDH. Thus, the ancestral amino acid residues tend to increase the thermostability of metabolic pro-teins originating from thermophilic and even hyperthermophilic organisms. The results provide experimental evidences for existence of extremely thermostable pro-teins in the last universal common ancestor, supporting the hypothesis that the Commonote was a hyperthermophile.
A similar experiment was also performed using a protein involved in the trans-lation system of T. thermophilus (Shimizu et al. 2007proteins involved in translation system often show the same topology as the rRNA tree, which is frequently used in phylogenetic analysis. The function of the transla-tion system is universal because all organisms on the earth have a translation sys-tem. Furthermore, aminoacyl-tRNA synthetases must be primordial proteins that emerged early in evolution. Therefore, the evolution of an aminoacyl-tRNA syn-thetase is likely coincided to the evolution of host organisms. In addition, probably because mutations occurred in aminoacyl-tRNA synthetases would affect survival of the organisms, the sequences of the proteins are well conserved. Therefore, it is unlikely that modification of the functions and horizontal transfer of the genes have been frequent during evolution. Thus, it is advantageous to use an aminoacyl-tRNA synthetase for a phylogenetic analysis. Shimizu et al. deduced a possible ancestral
-hood tree of 2 2007). An individual or pairs of the
T. thermophilus, and the thermal stabilities of the resulting eight mutant proteins were evaluated by monitoring the change in 222-nm ellipticity upon thermal unfolding. As a result,
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
343
the Commonote possessed extremely thermophilic translation enzymes. The result is again compatible with the hyperthermophile common ancestry. However, as dis-cussed below, it cannot be fully precluded that the observed trend for enhanced thermostability of mutant proteins is an artifact of the ancestral design method (Williams et al. 2006).
As described above, introduction of ancestral residues further enhanced the thermo-stabilities of the proteins involved in a metabolic pathway or a translation system of the (hyper)thermophiles with the probability between 50 and 80%. Therefore, the ancestral design method is a useful technique of designing mutant enzymes with higher thermo-stability that only relies on the primary amino acid sequences of homologous proteins. We also found that the extent to which thermostability of the mutants with an intro-duced ancestral residue enhances is directly correlated with the degree to which resi-
2010).Consensus approach is a very similar way to improve thermal stability of a protein
using a multiple amino acid sequence alignment of homologous proteins. This method is based on the hypothesis that, at a given position of a multiple sequence alignment of homologous proteins, most frequently occurring amino acids contribute to the thermostability of the protein more than other less frequently occurring amino acids. In 1994, Steipe et al. first rationalized the feasibility of this approach using statistical thermodynamics (Steipe et al. 1994). They analyzed the amino acid
basis of their design is that randomly occurring mutations are often destabilizing and, therefore, mutations tend to destabilize proteins if selection pressure is absent. However, during the actual evolution, mutations that caused reduced stability insufficient to maintain protein’s specific tertiary structure have been hardly selected. Consequently, the frequency of a given residue in the multiple sequence alignment of a protein correlates with the contribution of the amino acid to protein stability. Hence, the most frequent amino acid at any position among homologous immunoglobulin
than that with an amino acid rarely seen in the homologous sequences. They calcu-lated a statistical free energy from the frequencies of occurrence of a particular amino acid at a given site and designed proteins with specific amino acid residue substitu-
the consensus approach method was used to enhance other proteins: for example, 2000, 2002), the SH3 domain from the yeast actin-binding
protein 1 (Rath and Davidson 2000), the DNA-binding domain of the tumor suppres-sor p53 protein (Nikolova et al. 19981999). The consensus design concept was also applied to improve thermal stability of chorismate mutase from Escherichia coli by using artificially generated functional protein sequences selected from binary-patterned libraries (Jackel et al. 2010).
The consensus design approach and the ancestral design approach frequently resulted in the same residue substitutions because consensus residues often origi-nated from the ancestral residues. Therefore, it remain unclear if the enhanced ther-mostability of the proteins with an ancestral amino acid could be ascribed to the antiquity of the residue or if the enhanced thermostability is attributable to the sta-
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
344 S. Akanuma et al.
tistic free energy. To clarify the reason why ancestral mutations tended to improve protein stability, Watanabe et al.and ICDHs designed to date (Watanabe et al. 2006). In authors’ laboratory, the ther-
Bacillus subtilis and Saccharomyces cerevisiae had been improved by an evolutionary molecular engineering technique that consisted of random mutagenesis and selection (Akanuma et al. 1998, 1999; Tamakoshi et al. 2001). Some of the mutants isolated by evolutionary engineering have an ancestral residue at the mutated site. Therefore, the thermostabilizing ancestral amino acids found in the experimental evolution were also incorporated into the analysis. Watanabe et al. classified the ancestral mutations into two groups from the view-point of the consensus approach, that is, dominant ancestral residues and minor ancestral residues (Watanabe et al. 2006). The dominant residues are the residues that occupied a given site most frequently in the amino acid sequence alignment of
residue is not coincident to the ancestral residue at a site, the ancestral residue was designated as a minor ancestral residue. Among the 15 mutants with a dominant ancestral residue, ten led to improved thermal stability. Similarly, out of six mutants with a minor ancestral residue, four improved the thermal stability of the proteins. Because the rate of improving thermal stability by introducing the ancestral residue was not related to whether the ancestral residue was dominant, the stabilization effect of the ancestral residues cannot be attributed to the consensus residue: that is, statistical free energy. However, the analyzed data are limited and therefore not sufficient to justify that the increased stability of the mutant proteins into which an ancestral amino acid is introduced is only related to the inherent nature of ancestral sequences. Very recently, we predicted the sequence for the deepest nodal position of a phylogenic tree composed of 16 gyrase B subunit sequences, which was then synthesized and characterized (Akanuma et al. 2011of the reconstructed gyrase B is more thermally stable than is a corresponding sequence containing the most frequently occurring amino acids among the 16 gyrases. The thermal stability of the designed protein is likely due in part to the antiquity of some of the inferred residues. However, it would be also possible that the ancestral design algorithm simply corrected for the potential inclusion of erro-neous residues in the reconstructed sequence that would have been caused by the use of limited number of homologous amino acid sequences (Akanuma et al. 2011). Further evidences are, therefore, required to conclude that the results of our experi-mental testing really support the (hyper)thermophilic ancestor hypothesis.
12.6 Computer Prediction and Experimental Reconstruction of Ancient Proteins
Information about the ancient environment of Earth is often obtained from fossil records. In contrast, no tangible remnants of the primitive protein forms hosted by ancient organisms that lived more than 3,500 million years ago are preserved
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
345
(Schopf 1993). However, in addition to the currently available genome information that has provided growing database of homologous protein sequences, recent advance in phylogenetic analysis and whole-gene-synthesis technique have made it possible to reconstruct the genes encoding ancient proteins in laboratories. Therefore, predicting ancestral protein sequences and characterizing the properties of the reconstructed proteins are one of the most powerful means available for studying
examples of resurrection experiments are discussed in greater detail in an excellent review by Thornton (2004).
The empirical reconstruction of ancient proteins was used as a novel tool for improving our knowledge of environmental temperatures experienced by ancient
.et al. 2003, 2008). They estimated the growth temperature of the common ancestor of Bacteria according to the concept that the denaturation temperature of a protein
experiment, they reported that the common ancestor of bacteria was thermophilic, rather than hyperthermophilic or mesophilic. However, it is well known that a sin-gle random mutation, insertion, deletion, or substitution can drastically decrease the thermal stability of a protein. Therefore, it cannot be ruled out that the common ancestor of Bacteria was a hyperthermophilic organism. Conversely, Williams et al. have pointed out that an inaccurate estimation of ancestral amino acids has a risk to overestimate the thermostability and other properties of ancestral proteins (Williams et al. 2006). To assess the reliability of the properties of ancestral pro-teins reconstructed by various methods, they performed an evolution simulation of
thermodynamic properties of the true ancestral sequences with those of ancestral sequences inferred by maximum parsimony, maximum likelihood, and Bayesian inference. As the result, they found that reconstruction by maximum parsimony or maximum likelihood tends to overestimate thermodynamic stabilities although the two methods can effectively predict accurate ancestral amino acids. In contrast, Bayesian inference sometimes predicts less probable ancestral amino acids, but the method is more reliable guide to ancestral thermodynamic properties. Nevertheless, there may still be anxiety to use incorrect models, even when Bayesian inference is used. It is therefore important to keep in mind that none of the reconstruction methods provide a perfect success for predicting ancestral amino acid residues. Thus, although phylogenetic reconstruction of ancestral protein sequences is a powerful way for studying early evolution of life, any conclusion obtained from such studies relies largely on the accuracy of the reconstructed sequences.
Similar resurrection experiments have been also applied to eukaryotic pro-teins; ancestral reconstruction has been used to understand the evolution of ethanol production/consumption in yeast (Thomson et al. 2005) and the evolu-tionary trajectory of changes in substrate specificity of hormone receptors (Bridgham et al. 2006; Ortlund et al. 2007). Thus, the reconstruction method is currently a common technique to study the molecular evolutions of genes, proteins, and life.
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
346 S. Akanuma et al.
12.7 Conclusions
In this chapter, we have reviewed the universal trees constructed based on different types of genetic information. The tree topology was different depending on the type of the gene analyzed as well as the method used. The root of the universal tree is most likely placed between the bacterial branch and the common ancestor of Archaea and Eucarya. However, there are possibilities that the root may be within the Bacterial branches.
Monophyly of Archaea is rather controversial. Though the rRNA tree suggested the monophyly, other types of the trees have been also reported. The conclusive result where the Eucarya originated within/outside of the branch of Archaea is yet to come.
The growth temperature of the ancient organism has long been a topic that has interested many scientists. Theoretical works suggested mesophilic, thermophilic, and hyperthermophilic origins of life, depending on the report. Experimental test analyzing the effect of each or combination of ancestral amino acid residues sug-gested the hyperthermophilic origin of life. However, we cannot totally deny the possible artifact originated from the method used for the estimation of ancestral sequences possessed by the ancestral organisms.
References
Cavalier-Smith T (2006b) Biol Direct 1:19
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
347
Hasegawa M, Hashimoto T (1993) Nature 361:23
Jukes TH, Cantor CR (1969) In: Munro HN (ed) Mammalian protein metabolism. Academic,
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
348 S. Akanuma et al.
Martin W (2005) Archaebacteria (archaea) ant the origin of the eukaryotic nucleus. Curr Opin 693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
349
Zavialov AV, Hauryliuk VV, Ehrenberg M (2005) Splitting of the posttermination ribosome into
747
748
749
750
751
752
753
754
755